Q.1 What is an outlier in data mining?
A data point that fits perfectly into a cluster
A value that significantly deviates from other observations
A missing value in the dataset
A duplicate entry in the dataset
Explanation - An outlier is a data point that is different from the majority of the data and may indicate anomalies or rare events.
Correct answer is: A value that significantly deviates from other observations
Q.2 Which technique is most commonly used for detecting outliers?
Clustering
Classification
Regression
Distance-based methods
Explanation - Distance-based methods are widely used to detect outliers by measuring how far data points deviate from others.
Correct answer is: Distance-based methods
Q.3 Which algorithm is often used for clustering-based outlier detection?
K-Means
Naïve Bayes
Decision Trees
Apriori
Explanation - K-Means can identify outliers as data points that are far from their cluster centroids.
Correct answer is: K-Means
Q.4 In pattern analysis, frequent pattern mining is mainly concerned with?
Finding rare items
Finding common itemsets
Eliminating duplicates
Finding anomalies
Explanation - Frequent pattern mining focuses on identifying items or sets of items that appear frequently in the dataset.
Correct answer is: Finding common itemsets
Q.5 Which of the following is a density-based outlier detection method?
DBSCAN
KNN
Decision Trees
Linear Regression
Explanation - DBSCAN is a density-based clustering algorithm that can also identify outliers as points in low-density regions.
Correct answer is: DBSCAN
Q.6 Which measure is commonly used in distance-based outlier detection?
Manhattan distance
Cosine similarity
Euclidean distance
Hamming distance
Explanation - Euclidean distance is widely used to measure the distance of points from one another for outlier detection.
Correct answer is: Euclidean distance
Q.7 What is the role of z-score in outlier detection?
It identifies missing values
It measures how far a point is from the mean
It clusters data points
It builds classification models
Explanation - The z-score standardizes data and helps detect outliers by showing how many standard deviations a value is away from the mean.
Correct answer is: It measures how far a point is from the mean
Q.8 Which statistical method can be used for outlier detection?
Mean shift
Boxplot analysis
Gradient descent
Principal component analysis
Explanation - Boxplots can highlight values beyond the whiskers, which are potential outliers.
Correct answer is: Boxplot analysis
Q.9 Which type of outliers occur due to system errors?
Contextual outliers
Collective outliers
Erroneous outliers
Global outliers
Explanation - Erroneous outliers occur due to errors such as incorrect data entry or faulty sensors.
Correct answer is: Erroneous outliers
Q.10 Contextual outliers are also known as?
Conditional outliers
Global outliers
Noise points
Extreme anomalies
Explanation - Contextual outliers are data points considered normal in one context but abnormal in another, hence called conditional outliers.
Correct answer is: Conditional outliers
Q.11 Which method finds outliers by examining data distribution?
Histogram analysis
Decision tree splitting
Hashing
Apriori algorithm
Explanation - Histograms can reveal unusual spikes or gaps that indicate outliers in data distribution.
Correct answer is: Histogram analysis
Q.12 What type of data is more prone to outliers?
Categorical data
Time-series data
Binary data
Nominal data
Explanation - Time-series data often exhibits outliers due to unexpected events or anomalies in trends.
Correct answer is: Time-series data
Q.13 Which of the following is an application of outlier detection?
Spam detection
Sorting data
Data normalization
Clustering
Explanation - Outlier detection techniques are often applied in spam filtering to identify unusual email patterns.
Correct answer is: Spam detection
Q.14 Which algorithm is suitable for high-dimensional outlier detection?
LOF (Local Outlier Factor)
Naïve Bayes
Apriori
Linear Regression
Explanation - LOF is effective for high-dimensional datasets as it compares local density deviations of data points.
Correct answer is: LOF (Local Outlier Factor)
Q.15 In clustering, what do outliers usually represent?
Points with highest density
Points that do not belong to any cluster
Centroid points
Boundary points of clusters
Explanation - Outliers in clustering are data points that do not clearly fit into any of the defined clusters.
Correct answer is: Points that do not belong to any cluster
Q.16 Which distance measure is better for categorical data outlier detection?
Euclidean distance
Hamming distance
Cosine similarity
Jaccard similarity
Explanation - Hamming distance is suitable for categorical or binary data, counting mismatches between attributes.
Correct answer is: Hamming distance
Q.17 Which of the following is NOT a type of outlier?
Global
Contextual
Collective
Cumulative
Explanation - The three major types of outliers are global, contextual, and collective. Cumulative is not recognized as a type.
Correct answer is: Cumulative
Q.18 Which of the following is a real-world example of contextual outlier?
A very tall student in a class
A cold day in summer
Duplicate customer entries
System log errors
Explanation - A cold day is not unusual in winter, but in summer it is an outlier due to the contextual season factor.
Correct answer is: A cold day in summer
Q.19 Which factor is crucial in density-based outlier detection?
Average mean
Standard deviation
Neighborhood density
Median
Explanation - Density-based methods compare the density of neighborhoods to identify points with relatively low density as outliers.
Correct answer is: Neighborhood density
Q.20 What does the Local Outlier Factor (LOF) measure?
Relative density deviation of a data point
Global distance of a point
Z-score value
Mean and variance
Explanation - LOF compares the local density of a point with its neighbors to determine its degree of being an outlier.
Correct answer is: Relative density deviation of a data point
Q.21 Which approach combines clustering and outlier detection?
Hybrid approach
Ensemble clustering
Boosting
Decision trees
Explanation - Hybrid approaches integrate clustering methods with outlier detection to improve accuracy.
Correct answer is: Hybrid approach
Q.22 Why is outlier detection important in fraud detection?
Fraudulent transactions often follow common patterns
Fraudulent transactions usually appear as anomalies
Fraud detection requires clustering
Fraud detection uses classification only
Explanation - Fraudulent activities are uncommon and deviate from normal behavior, making them detectable as outliers.
Correct answer is: Fraudulent transactions usually appear as anomalies
Q.23 Which is a limitation of distance-based outlier detection?
Requires labeled data
Not suitable for low-dimensional data
Sensitive to data scale and dimensionality
Cannot be used for numeric data
Explanation - Distance-based methods suffer in high-dimensional data due to the curse of dimensionality.
Correct answer is: Sensitive to data scale and dimensionality
Q.24 Which visualization is most helpful for outlier detection?
Line chart
Boxplot
Pie chart
Histogram
Explanation - Boxplots are widely used to easily visualize and detect potential outliers.
Correct answer is: Boxplot
Q.25 In pattern analysis, what is sequential pattern mining?
Finding patterns in unordered sets
Finding patterns in ordered sequences
Detecting duplicate items
Clustering time-series
Explanation - Sequential pattern mining identifies frequent patterns where order of items is important, e.g., purchase sequences.
Correct answer is: Finding patterns in ordered sequences
