ETL Process (Extraction, Transformation, Loading) # MCQs Practice set

Q.1 What does ETL stand for in Data Warehousing?

Extract, Transform, Load
Evaluate, Transfer, Log
Edit, Test, Launch
Encode, Translate, Locate
Explanation - ETL refers to the process of Extracting data from sources, Transforming it into suitable formats, and Loading it into a data warehouse.
Correct answer is: Extract, Transform, Load

Q.2 Which step in ETL involves cleaning and formatting data?

Extraction
Transformation
Loading
Replication
Explanation - Transformation includes tasks like data cleaning, formatting, and applying business rules before loading.
Correct answer is: Transformation

Q.3 In ETL, the 'Load' phase is primarily responsible for:

Fetching data from sources
Converting data to meaningful formats
Storing processed data into target databases
Removing duplicate data
Explanation - The load step inserts or updates the transformed data into the final data warehouse or target system.
Correct answer is: Storing processed data into target databases

Q.4 Which of the following is an example of a data source for ETL?

ERP system
Data warehouse
BI dashboard
Report generator
Explanation - Data sources include ERP systems, CRM applications, or flat files where raw data is stored before ETL.
Correct answer is: ERP system

Q.5 Which tool is commonly used for ETL processes?

Hadoop
Informatica
Tableau
Jupyter Notebook
Explanation - Informatica is a widely used ETL tool, whereas Tableau is for visualization and Hadoop for distributed storage.
Correct answer is: Informatica

Q.6 During extraction, which type of data is typically collected?

Only structured data
Only unstructured data
Both structured and unstructured data
Metadata only
Explanation - Extraction can involve structured (databases) and unstructured (logs, documents) data sources.
Correct answer is: Both structured and unstructured data

Q.7 Which phase of ETL ensures data consistency and accuracy?

Extraction
Transformation
Loading
Archiving
Explanation - Transformation ensures consistency by cleaning, normalizing, and validating data before loading.
Correct answer is: Transformation

Q.8 What is a staging area in ETL?

Final data warehouse
Temporary storage for extracted data
Visualization dashboard
Backup server
Explanation - The staging area temporarily holds raw data before transformation and loading into the warehouse.
Correct answer is: Temporary storage for extracted data

Q.9 In ETL, incremental load refers to:

Loading all data from scratch
Loading only new or updated records
Archiving old data
Extracting metadata only
Explanation - Incremental loading optimizes performance by updating only new/changed records instead of full reloads.
Correct answer is: Loading only new or updated records

Q.10 Which transformation technique removes redundant data entries?

Data cleansing
Deduplication
Normalization
Data integration
Explanation - Deduplication eliminates duplicate records, ensuring data integrity in the warehouse.
Correct answer is: Deduplication

Q.11 Which of the following is NOT a transformation operation?

Data cleansing
Data aggregation
Data extraction
Data normalization
Explanation - Extraction is a separate ETL phase, while cleansing, aggregation, and normalization are transformation operations.
Correct answer is: Data extraction

Q.12 What is the primary goal of the 'Load' step?

Data validation
Data migration to the target system
Data replication
Data archiving
Explanation - The load step moves transformed data into the target system for analysis.
Correct answer is: Data migration to the target system

Q.13 Which data format is commonly handled in ETL processes?

XML
JSON
CSV
All of the above
Explanation - ETL systems work with multiple formats like XML, JSON, and CSV from diverse data sources.
Correct answer is: All of the above

Q.14 Which ETL process involves mapping fields between source and target?

Transformation
Extraction
Loading
Scheduling
Explanation - Mapping is part of transformation, aligning source fields to the correct target fields.
Correct answer is: Transformation

Q.15 What is a full load in ETL?

Loading only new data
Loading entire dataset into target
Loading data only once
Loading test data
Explanation - Full load refreshes the target system with the entire dataset, replacing previous data.
Correct answer is: Loading entire dataset into target

Q.16 Which of these is a cloud-based ETL tool?

Talend
Apache Spark
AWS Glue
Oracle SQL
Explanation - AWS Glue is a fully managed cloud ETL service by Amazon.
Correct answer is: AWS Glue

Q.17 What is the main advantage of using ETL tools?

They visualize reports
They automate data integration
They store big data
They replace databases
Explanation - ETL tools automate extraction, transformation, and loading, reducing manual effort and errors.
Correct answer is: They automate data integration

Q.18 In ETL, surrogate keys are used to:

Encrypt data
Replace natural keys
Track data errors
Speed up transformation
Explanation - Surrogate keys are artificial unique identifiers used instead of natural keys for consistency.
Correct answer is: Replace natural keys

Q.19 Which step of ETL often requires the most processing power?

Extraction
Transformation
Loading
Scheduling
Explanation - Transformation involves complex operations like joins, aggregations, and validation, requiring high processing.
Correct answer is: Transformation

Q.20 Which type of load may temporarily lock target tables?

Batch load
Incremental load
Real-time load
Slow load
Explanation - Batch loading large data volumes can lock target tables until the operation completes.
Correct answer is: Batch load

Q.21 What ensures data quality during ETL?

Data profiling
Data visualization
Data replication
Data archiving
Explanation - Data profiling analyzes data for accuracy, completeness, and consistency during ETL.
Correct answer is: Data profiling

Q.22 Which ETL phase may include API calls to gather data?

Extraction
Transformation
Loading
Validation
Explanation - Extraction can use APIs to fetch data from web services or external systems.
Correct answer is: Extraction

Q.23 In ETL, what is meant by data lineage?

Flow of data from source to target
History of database versions
Security of data
Backup process
Explanation - Data lineage traces the movement and transformations of data throughout ETL processes.
Correct answer is: Flow of data from source to target

Q.24 Which of the following is a real-time ETL tool?

Apache Kafka
Excel
Tableau
Power BI
Explanation - Apache Kafka enables real-time data streaming and ETL pipelines.
Correct answer is: Apache Kafka

Q.25 Which scheduling tool is often integrated with ETL pipelines?

Apache Airflow
MS Excel
Notepad++
R Studio
Explanation - Apache Airflow is a popular workflow scheduler for orchestrating ETL jobs.
Correct answer is: Apache Airflow