Introduction to Data Mining and Data Warehousing # MCQs Practice set

Q.1 What is Data Mining primarily concerned with?

Data storage

Knowledge discovery

Network design

Software development

Explanation - Data Mining is the process of discovering useful patterns and knowledge from large sets of data.

Correct answer is: Knowledge discovery

Q.2 Which of the following best describes Data Warehousing?

Real-time transaction processing

Storage of raw data without processing

Subject-oriented, integrated, time-variant, and non-volatile collection of data

Random data storage

Explanation - A data warehouse integrates data from multiple sources and supports decision-making processes.

Correct answer is: Subject-oriented, integrated, time-variant, and non-volatile collection of data

Q.3 ETL in data warehousing stands for?

Extract, Transform, Load

Enter, Transfer, Log

Execute, Test, Learn

Extract, Transfer, List

Explanation - ETL is the process of extracting data from sources, transforming it into the required format, and loading it into the warehouse.

Correct answer is: Extract, Transform, Load

Q.4 Which of these is NOT a step in the Knowledge Discovery in Databases (KDD) process?

Data cleaning

Data integration

Data mining

Software debugging

Explanation - KDD involves cleaning, integration, selection, transformation, mining, and evaluation — not debugging software.

Correct answer is: Software debugging

Q.5 Which type of learning is most commonly associated with Data Mining?

Supervised and Unsupervised learning

Reinforcement learning only

Transfer learning only

Machine teaching

Explanation - Data mining tasks are often divided into supervised learning (classification) and unsupervised learning (clustering).

Correct answer is: Supervised and Unsupervised learning

Q.6 What does OLAP stand for?

Online Analytical Processing

Offline Analytical Processing

Online Application Processing

Offline Application Protocol

Explanation - OLAP tools allow fast analysis of multidimensional data from multiple perspectives.

Correct answer is: Online Analytical Processing

Q.7 Which is a classification algorithm in Data Mining?

K-means

Decision Tree

Apriori

DBSCAN

Explanation - Decision Trees are supervised classification algorithms, whereas K-means and DBSCAN are clustering, and Apriori is association rule mining.

Correct answer is: Decision Tree

Q.8 The star schema is used in:

Database normalization

Data warehousing

Transaction processing

Operating system design

Explanation - The star schema is a common modeling technique in data warehouses for organizing fact and dimension tables.

Correct answer is: Data warehousing

Q.9 Which data mining task finds relationships among variables?

Classification

Clustering

Association rule mining

Regression

Explanation - Association rule mining identifies interesting correlations among variables, like market basket analysis.

Correct answer is: Association rule mining

Q.10 Which of these is an unsupervised learning task in data mining?

Classification

Regression

Clustering

Prediction

Explanation - Clustering is unsupervised as it groups similar data points without predefined labels.

Correct answer is: Clustering

Q.11 Data in a data warehouse is typically:

Volatile

Normalized

Time-variant and non-volatile

Temporary

Explanation - Data warehouses store stable, historical data that is not frequently updated.

Correct answer is: Time-variant and non-volatile

Q.12 Which technique is commonly used for market basket analysis?

Regression

Apriori Algorithm

Naive Bayes

K-means clustering

Explanation - The Apriori algorithm discovers frequent itemsets and association rules useful in basket analysis.

Correct answer is: Apriori Algorithm

Q.13 Which of these is NOT a data warehouse characteristic?

Subject-oriented

Volatile

Integrated

Time-variant

Explanation - A data warehouse is non-volatile, meaning data is stable and primarily used for analysis.

Correct answer is: Volatile

Q.14 The process of cleaning and preparing data before mining is known as:

Data enrichment

Data preprocessing

Data normalization

Data annotation

Explanation - Preprocessing involves cleaning, transforming, and reducing data before applying mining techniques.

Correct answer is: Data preprocessing

Q.15 In a star schema, dimension tables are usually:

Normalized

Denormalized

Temporal

Volatile

Explanation - Dimension tables are often denormalized to improve query performance in star schemas.

Correct answer is: Denormalized

Q.16 Which of these algorithms is used for clustering?

Apriori

K-means

Naive Bayes

Decision Tree

Explanation - K-means is a popular clustering algorithm that partitions data into k groups.

Correct answer is: K-means

Q.17 The snowflake schema is a variation of:

Entity Relationship Model

Star schema

Network model

Hierarchical model

Explanation - Snowflake schema normalizes dimension tables of a star schema into multiple related tables.

Correct answer is: Star schema

Q.18 Which measure indicates the strength of an association rule?

Support

Confidence

Entropy

Variance

Explanation - Confidence measures how often items in Y appear in transactions that contain X in the rule X → Y.

Correct answer is: Confidence

Q.19 Which process ensures data is free from errors before entering the warehouse?

Data normalization

Data cleaning

Data transformation

Data modeling

Explanation - Data cleaning removes inaccuracies and inconsistencies to ensure reliable warehouse data.

Correct answer is: Data cleaning

Q.20 Regression in data mining is used for:

Finding clusters

Predicting continuous values

Discovering frequent patterns

Classification

Explanation - Regression techniques model and predict continuous numerical outcomes.

Correct answer is: Predicting continuous values

Q.21 What does dimensionality reduction help with?

Increasing number of variables

Reducing irrelevant features

Increasing warehouse storage

Generating duplicate data

Explanation - Dimensionality reduction improves efficiency and accuracy by removing redundant variables.

Correct answer is: Reducing irrelevant features

Q.22 A cube in OLAP represents:

Single data point

Multidimensional data model

Flat file storage

Hierarchical structure

Explanation - OLAP cubes allow multidimensional analysis of data with measures and dimensions.

Correct answer is: Multidimensional data model

Q.23 Which of these is a supervised learning algorithm?

K-means

Decision Tree

Hierarchical clustering

DBSCAN

Explanation - Decision trees use labeled training data, making them supervised learning algorithms.

Correct answer is: Decision Tree

Q.24 The fact table in a star schema contains:

Detailed transactions

Aggregated measures and keys to dimension tables

Normalized attributes

Metadata only

Explanation - Fact tables store quantitative measures and foreign keys linking to dimensions.

Correct answer is: Aggregated measures and keys to dimension tables

Q.25 Which term refers to discovering previously unknown but useful information from data?

Data preprocessing

Data cleaning

Data mining

Data transformation

Explanation - Data mining uncovers patterns and knowledge hidden in large datasets.

Correct answer is: Data mining

Q.1 What is Data Mining primarily concerned with?

Q.2 Which of the following best describes Data Warehousing?

Q.3 ETL in data warehousing stands for?

Q.4 Which of these is NOT a step in the Knowledge Discovery in Databases (KDD) process?

Q.5 Which type of learning is most commonly associated with Data Mining?

Q.6 What does OLAP stand for?

Q.7 Which is a classification algorithm in Data Mining?

Q.8 The star schema is used in:

Q.9 Which data mining task finds relationships among variables?

Q.10 Which of these is an unsupervised learning task in data mining?

Q.11 Data in a data warehouse is typically:

Q.12 Which technique is commonly used for market basket analysis?

Q.13 Which of these is NOT a data warehouse characteristic?

Q.14 The process of cleaning and preparing data before mining is known as:

Q.15 In a star schema, dimension tables are usually:

Q.16 Which of these algorithms is used for clustering?

Q.17 The snowflake schema is a variation of:

Q.18 Which measure indicates the strength of an association rule?

Q.19 Which process ensures data is free from errors before entering the warehouse?

Q.20 Regression in data mining is used for:

Q.21 What does dimensionality reduction help with?

Q.22 A cube in OLAP represents:

Q.23 Which of these is a supervised learning algorithm?

Q.24 The fact table in a star schema contains:

Q.25 Which term refers to discovering previously unknown but useful information from data?

Privacy & Cookie Consent