What is an Extract-Transform-Load (ETL) process?

the order matters...

2022-12-01 10:04:27
ETL stands for extract, transform, load. It is a process that is used to transfer data from one or more sources to a destination system, such as a data warehouse. The ETL process consists of three main stages:



  1. Extract: In the extract stage, data is extracted from the source systems. This may involve accessing databases, files, or other data sources and extracting the data into a staging area.


  2. Transform: In the transform stage, the data is cleaned, transformed, and structured to fit the needs of the destination system. This may involve tasks such as filtering, sorting, merging, and aggregating data.


  3. Load: In the load stage, the transformed data is loaded into the destination system. This may involve inserting the data into tables, updating existing records, or deleting obsolete records.



The ETL process is an important part of data warehousing and helps ensure that data in the data warehouse is accurate, consistent, and up-to-date. It allows businesses to integrate data from multiple sources and cleanse it to ensure that it is ready for analysis. ETL processes can be automated using specialized tools and run on a regular basis to ensure that the data warehouse is constantly updated with the latest data.

A variation on ETL is actually to perform an ELT - in other words Extract-Load-Transform. We tend to prefer this approach because typically it is preferably to extract data from source exactly as it is, capture it as a snapshot/version and then perform any transformations after this. Considerations such as data volumes and requirements for recoverability can determine your approach to ETL or ELT.

If you're looking for data warehouse software to help you perform common data warehouse tasks such as extract transform load (ETL) and data migration, then have a look at our product, Bragi which is purpose-built for these scenarios.