Introduction to Data Mining and Data Warehousing # MCQs Practice set

Q.1 What is Data Mining primarily concerned with?

Data storage
Knowledge discovery
Network design
Software development
Explanation - Data Mining is the process of discovering useful patterns and knowledge from large sets of data.
Correct answer is: Knowledge discovery

Q.2 Which of the following best describes Data Warehousing?

Real-time transaction processing
Storage of raw data without processing
Subject-oriented, integrated, time-variant, and non-volatile collection of data
Random data storage
Explanation - A data warehouse integrates data from multiple sources and supports decision-making processes.
Correct answer is: Subject-oriented, integrated, time-variant, and non-volatile collection of data

Q.3 ETL in data warehousing stands for?

Extract, Transform, Load
Enter, Transfer, Log
Execute, Test, Learn
Extract, Transfer, List
Explanation - ETL is the process of extracting data from sources, transforming it into the required format, and loading it into the warehouse.
Correct answer is: Extract, Transform, Load

Q.4 Which of these is NOT a step in the Knowledge Discovery in Databases (KDD) process?

Data cleaning
Data integration
Data mining
Software debugging
Explanation - KDD involves cleaning, integration, selection, transformation, mining, and evaluation — not debugging software.
Correct answer is: Software debugging

Q.5 Which type of learning is most commonly associated with Data Mining?

Supervised and Unsupervised learning
Reinforcement learning only
Transfer learning only
Machine teaching
Explanation - Data mining tasks are often divided into supervised learning (classification) and unsupervised learning (clustering).
Correct answer is: Supervised and Unsupervised learning

Q.6 What does OLAP stand for?

Online Analytical Processing
Offline Analytical Processing
Online Application Processing
Offline Application Protocol
Explanation - OLAP tools allow fast analysis of multidimensional data from multiple perspectives.
Correct answer is: Online Analytical Processing

Q.7 Which is a classification algorithm in Data Mining?

K-means
Decision Tree
Apriori
DBSCAN
Explanation - Decision Trees are supervised classification algorithms, whereas K-means and DBSCAN are clustering, and Apriori is association rule mining.
Correct answer is: Decision Tree

Q.8 The star schema is used in:

Database normalization
Data warehousing
Transaction processing
Operating system design
Explanation - The star schema is a common modeling technique in data warehouses for organizing fact and dimension tables.
Correct answer is: Data warehousing

Q.9 Which data mining task finds relationships among variables?

Classification
Clustering
Association rule mining
Regression
Explanation - Association rule mining identifies interesting correlations among variables, like market basket analysis.
Correct answer is: Association rule mining

Q.10 Which of these is an unsupervised learning task in data mining?

Classification
Regression
Clustering
Prediction
Explanation - Clustering is unsupervised as it groups similar data points without predefined labels.
Correct answer is: Clustering

Q.11 Data in a data warehouse is typically:

Volatile
Normalized
Time-variant and non-volatile
Temporary
Explanation - Data warehouses store stable, historical data that is not frequently updated.
Correct answer is: Time-variant and non-volatile

Q.12 Which technique is commonly used for market basket analysis?

Regression
Apriori Algorithm
Naive Bayes
K-means clustering
Explanation - The Apriori algorithm discovers frequent itemsets and association rules useful in basket analysis.
Correct answer is: Apriori Algorithm

Q.13 Which of these is NOT a data warehouse characteristic?

Subject-oriented
Volatile
Integrated
Time-variant
Explanation - A data warehouse is non-volatile, meaning data is stable and primarily used for analysis.
Correct answer is: Volatile

Q.14 The process of cleaning and preparing data before mining is known as:

Data enrichment
Data preprocessing
Data normalization
Data annotation
Explanation - Preprocessing involves cleaning, transforming, and reducing data before applying mining techniques.
Correct answer is: Data preprocessing

Q.15 In a star schema, dimension tables are usually:

Normalized
Denormalized
Temporal
Volatile
Explanation - Dimension tables are often denormalized to improve query performance in star schemas.
Correct answer is: Denormalized

Q.16 Which of these algorithms is used for clustering?

Apriori
K-means
Naive Bayes
Decision Tree
Explanation - K-means is a popular clustering algorithm that partitions data into k groups.
Correct answer is: K-means

Q.17 The snowflake schema is a variation of:

Entity Relationship Model
Star schema
Network model
Hierarchical model
Explanation - Snowflake schema normalizes dimension tables of a star schema into multiple related tables.
Correct answer is: Star schema

Q.18 Which measure indicates the strength of an association rule?

Support
Confidence
Entropy
Variance
Explanation - Confidence measures how often items in Y appear in transactions that contain X in the rule X → Y.
Correct answer is: Confidence

Q.19 Which process ensures data is free from errors before entering the warehouse?

Data normalization
Data cleaning
Data transformation
Data modeling
Explanation - Data cleaning removes inaccuracies and inconsistencies to ensure reliable warehouse data.
Correct answer is: Data cleaning

Q.20 Regression in data mining is used for:

Finding clusters
Predicting continuous values
Discovering frequent patterns
Classification
Explanation - Regression techniques model and predict continuous numerical outcomes.
Correct answer is: Predicting continuous values

Q.21 What does dimensionality reduction help with?

Increasing number of variables
Reducing irrelevant features
Increasing warehouse storage
Generating duplicate data
Explanation - Dimensionality reduction improves efficiency and accuracy by removing redundant variables.
Correct answer is: Reducing irrelevant features

Q.22 A cube in OLAP represents:

Single data point
Multidimensional data model
Flat file storage
Hierarchical structure
Explanation - OLAP cubes allow multidimensional analysis of data with measures and dimensions.
Correct answer is: Multidimensional data model

Q.23 Which of these is a supervised learning algorithm?

K-means
Decision Tree
Hierarchical clustering
DBSCAN
Explanation - Decision trees use labeled training data, making them supervised learning algorithms.
Correct answer is: Decision Tree

Q.24 The fact table in a star schema contains:

Detailed transactions
Aggregated measures and keys to dimension tables
Normalized attributes
Metadata only
Explanation - Fact tables store quantitative measures and foreign keys linking to dimensions.
Correct answer is: Aggregated measures and keys to dimension tables

Q.25 Which term refers to discovering previously unknown but useful information from data?

Data preprocessing
Data cleaning
Data mining
Data transformation
Explanation - Data mining uncovers patterns and knowledge hidden in large datasets.
Correct answer is: Data mining