Q.1 What is Data Mining primarily concerned with?
Data storage
Knowledge discovery
Network design
Software development
Explanation - Data Mining is the process of discovering useful patterns and knowledge from large sets of data.
Correct answer is: Knowledge discovery
Q.2 Which of the following best describes Data Warehousing?
Real-time transaction processing
Storage of raw data without processing
Subject-oriented, integrated, time-variant, and non-volatile collection of data
Random data storage
Explanation - A data warehouse integrates data from multiple sources and supports decision-making processes.
Correct answer is: Subject-oriented, integrated, time-variant, and non-volatile collection of data
Q.3 ETL in data warehousing stands for?
Extract, Transform, Load
Enter, Transfer, Log
Execute, Test, Learn
Extract, Transfer, List
Explanation - ETL is the process of extracting data from sources, transforming it into the required format, and loading it into the warehouse.
Correct answer is: Extract, Transform, Load
Q.4 Which of these is NOT a step in the Knowledge Discovery in Databases (KDD) process?
Data cleaning
Data integration
Data mining
Software debugging
Explanation - KDD involves cleaning, integration, selection, transformation, mining, and evaluation — not debugging software.
Correct answer is: Software debugging
Q.5 Which type of learning is most commonly associated with Data Mining?
Supervised and Unsupervised learning
Reinforcement learning only
Transfer learning only
Machine teaching
Explanation - Data mining tasks are often divided into supervised learning (classification) and unsupervised learning (clustering).
Correct answer is: Supervised and Unsupervised learning
Q.6 What does OLAP stand for?
Online Analytical Processing
Offline Analytical Processing
Online Application Processing
Offline Application Protocol
Explanation - OLAP tools allow fast analysis of multidimensional data from multiple perspectives.
Correct answer is: Online Analytical Processing
Q.7 Which is a classification algorithm in Data Mining?
K-means
Decision Tree
Apriori
DBSCAN
Explanation - Decision Trees are supervised classification algorithms, whereas K-means and DBSCAN are clustering, and Apriori is association rule mining.
Correct answer is: Decision Tree
Q.8 The star schema is used in:
Database normalization
Data warehousing
Transaction processing
Operating system design
Explanation - The star schema is a common modeling technique in data warehouses for organizing fact and dimension tables.
Correct answer is: Data warehousing
Q.9 Which data mining task finds relationships among variables?
Classification
Clustering
Association rule mining
Regression
Explanation - Association rule mining identifies interesting correlations among variables, like market basket analysis.
Correct answer is: Association rule mining
Q.10 Which of these is an unsupervised learning task in data mining?
Classification
Regression
Clustering
Prediction
Explanation - Clustering is unsupervised as it groups similar data points without predefined labels.
Correct answer is: Clustering
Q.11 Data in a data warehouse is typically:
Volatile
Normalized
Time-variant and non-volatile
Temporary
Explanation - Data warehouses store stable, historical data that is not frequently updated.
Correct answer is: Time-variant and non-volatile
Q.12 Which technique is commonly used for market basket analysis?
Regression
Apriori Algorithm
Naive Bayes
K-means clustering
Explanation - The Apriori algorithm discovers frequent itemsets and association rules useful in basket analysis.
Correct answer is: Apriori Algorithm
Q.13 Which of these is NOT a data warehouse characteristic?
Subject-oriented
Volatile
Integrated
Time-variant
Explanation - A data warehouse is non-volatile, meaning data is stable and primarily used for analysis.
Correct answer is: Volatile
Q.14 The process of cleaning and preparing data before mining is known as:
Data enrichment
Data preprocessing
Data normalization
Data annotation
Explanation - Preprocessing involves cleaning, transforming, and reducing data before applying mining techniques.
Correct answer is: Data preprocessing
Q.15 In a star schema, dimension tables are usually:
Normalized
Denormalized
Temporal
Volatile
Explanation - Dimension tables are often denormalized to improve query performance in star schemas.
Correct answer is: Denormalized
Q.16 Which of these algorithms is used for clustering?
Apriori
K-means
Naive Bayes
Decision Tree
Explanation - K-means is a popular clustering algorithm that partitions data into k groups.
Correct answer is: K-means
Q.17 The snowflake schema is a variation of:
Entity Relationship Model
Star schema
Network model
Hierarchical model
Explanation - Snowflake schema normalizes dimension tables of a star schema into multiple related tables.
Correct answer is: Star schema
Q.18 Which measure indicates the strength of an association rule?
Support
Confidence
Entropy
Variance
Explanation - Confidence measures how often items in Y appear in transactions that contain X in the rule X → Y.
Correct answer is: Confidence
Q.19 Which process ensures data is free from errors before entering the warehouse?
Data normalization
Data cleaning
Data transformation
Data modeling
Explanation - Data cleaning removes inaccuracies and inconsistencies to ensure reliable warehouse data.
Correct answer is: Data cleaning
Q.20 Regression in data mining is used for:
Finding clusters
Predicting continuous values
Discovering frequent patterns
Classification
Explanation - Regression techniques model and predict continuous numerical outcomes.
Correct answer is: Predicting continuous values
Q.21 What does dimensionality reduction help with?
Increasing number of variables
Reducing irrelevant features
Increasing warehouse storage
Generating duplicate data
Explanation - Dimensionality reduction improves efficiency and accuracy by removing redundant variables.
Correct answer is: Reducing irrelevant features
Q.22 A cube in OLAP represents:
Single data point
Multidimensional data model
Flat file storage
Hierarchical structure
Explanation - OLAP cubes allow multidimensional analysis of data with measures and dimensions.
Correct answer is: Multidimensional data model
Q.23 Which of these is a supervised learning algorithm?
K-means
Decision Tree
Hierarchical clustering
DBSCAN
Explanation - Decision trees use labeled training data, making them supervised learning algorithms.
Correct answer is: Decision Tree
Q.24 The fact table in a star schema contains:
Detailed transactions
Aggregated measures and keys to dimension tables
Normalized attributes
Metadata only
Explanation - Fact tables store quantitative measures and foreign keys linking to dimensions.
Correct answer is: Aggregated measures and keys to dimension tables
Q.25 Which term refers to discovering previously unknown but useful information from data?
Data preprocessing
Data cleaning
Data mining
Data transformation
Explanation - Data mining uncovers patterns and knowledge hidden in large datasets.
Correct answer is: Data mining
