ETL (Extract, Transform, Load)

What exactly is ETL?
ETL stands for Extract, Transform and Load. Data is extracted from one or more data sources and then written to a target system in a formatted form. A target system could be a data warehouse or an ERP system, for example. In the past, the data was mainly stored in a data warehouses. Today, it is typical that data is distributed across systems or stored in a data lake in a less structured way.

The goal of transformation is to prepare the data from the source system for further processing in the target system. A clearly defined and structured procedure is followed for the ETL.

ETL-Structure

The database schemas of the systems used often differ fundamentally. Furthermore, the data structures of systems of the same type often differ. A common example from sales is the individual structure of a CRM system. When data is transferred from one system to another, a transformation of the data and its structures is usually necessary.

ETL in sales and marketing

Systems are more and more integrated and the stored sales and marketing data increases significantly. The breaking up of database silos and Big Data are important issues, especially in sales and marketing.

In sales, systems are integrated to model processes across systems and to automate them as far as possible. The ETL process must therefore be integrated in real-time processes. An entire industry has been born for this requirement, and we are one of them.

ETL and Distributed Platforms

Using iPaaS solutions, systems can be integrated very easily and without significant programming effort. The data is provided via the APIs of the target and source systems. The exchange of data between the two systems can be managed via API middleware applications. The HubEngine is an iPaaS solution that performs exactly this task, whereby it can be configured flexibly and easily via a no-coding interface.

Some iPaaS solutions also provide transformation capabilities, such as resolving entities. The problem here, however, is that these solutions are usually difficult to configure and use.

Simplest Case of a Distributed Platform

We use separate applications for the transformation of database schemas, which we switch flexibly and quite simply between the systems. The DataEngine offers suitable applications for this. Via the DataEngine any database models can be mapped and transformed via workflows. Thus, if different database schemas are to be synchronized or integrated, the schema of a distributed platform in the simplest case looks like this.

Exemplary ETL Diagram
System A (Source) > HubEngine (Extract) > DataEngine (Transform) > HubEngine (Load) > System B (Target)

More complex Case of a Distributed Platform

The data can thus also be synchronized from system B to system A, in other words bidirectionally. It is also possible to connect additional systems and thus integrate all relevant data across all systems in use in real time. The following diagram shows a more complex distributed platform. Here, two modules in system A are processed to four modules in system B. Of course, one could include other systems as well (C, D, E, etc.).

Advanced ETL Example Diagram
From the left system two modules are bidirectionally synchronized with four modules from the right system. The relation resolution takes place in the DataEngine with Transformation Workflows.

The Enterprise Data Platform with its components HubEngine and DataEngine can be used to build solutions of any size. In the solutions you will find many examples of how such a solution could look like.

Further articles