What is Extract Transform Load (ETL)? #
ETL stands for extract, transform, load. These are the three processes that enable organisations to collect data from multiple sources and then push that data into a target system or database.
This data integration process enables organisations to collate information in one system or database allowing for data synchronisation, business intelligence and data analytics.
An ETL Tool is used to maintain ETL data consistency and simplify database integration and synchronisation development tasks.
ETL is used by organisations to:
- Extract data from a database, eCommerce store, legacy system etc.
- Transform and cleanse the data to improve data quality and establish consistency
- Load the data in the correct format to a target application or database
Extract #
The data extraction function reads and extracts data from either a specific source or multiple data sources. The data can be structured or unstructured from a variety of sources, including:
- SQL or NoSQL databases
- CRM and ERP systems
- Flat files
- Emails
- Web pages
The extracted data is then placed into a staging area for the transformation process.
Transform #
The transformation process is possibly the most important aspect of ETL and the one that brings the greatest value to its deployment due to its ability to manipulate the data.
The transformation work in ETL is carried out by a specialised ETL tool or engine, with the data usually held in staging tables whilst the transformation process is in progress.
As data can be extracted from multiple sources, it needs to be consolidated and transformed into the correct format. For example, the data may need converting into a different format or database structure in order to become compatible with the target system or database.
Data quality is of paramount importance, therefore this process also provides the opportunity to cleanse and authenticate the data, such as eliminating any duplicate information, filtering out any unwanted data or validating the data with records.
The process can also be used to convert currencies or other units of measurement, ensure compliance procedures, as well as standardising data or changing row and column headers for uniformity.
Load #
Once all the data is validated and converted into the correct format in the staging area, the data loading function can write the data into the targeted database or application location.
In an ETL scenario, this will usually involve an initial complete load of data. Data can then be added incrementally and periodically to reflect any data changes.
The other common type of loading is when an organisation decides on a full refresh – erasing the existing data and replacing it with fresh data.
The majority of organisations that use ETL tend to require batch processing, hence it is best to set this up to run during off peak hours, i.e. overnight, when traffic and demand on resources is at its lowest.