Pattern Analysis and Outlier Detection # MCQs Practice set

Q.1 What is an outlier in data mining?

A data point that fits perfectly into a cluster

A value that significantly deviates from other observations

A missing value in the dataset

A duplicate entry in the dataset

Explanation - An outlier is a data point that is different from the majority of the data and may indicate anomalies or rare events.

Correct answer is: A value that significantly deviates from other observations

Q.2 Which technique is most commonly used for detecting outliers?

Clustering

Classification

Regression

Distance-based methods

Explanation - Distance-based methods are widely used to detect outliers by measuring how far data points deviate from others.

Correct answer is: Distance-based methods

Q.3 Which algorithm is often used for clustering-based outlier detection?

K-Means

Naïve Bayes

Decision Trees

Apriori

Explanation - K-Means can identify outliers as data points that are far from their cluster centroids.

Correct answer is: K-Means

Q.4 In pattern analysis, frequent pattern mining is mainly concerned with?

Finding rare items

Finding common itemsets

Eliminating duplicates

Finding anomalies

Explanation - Frequent pattern mining focuses on identifying items or sets of items that appear frequently in the dataset.

Correct answer is: Finding common itemsets

Q.5 Which of the following is a density-based outlier detection method?

DBSCAN

KNN

Decision Trees

Linear Regression

Explanation - DBSCAN is a density-based clustering algorithm that can also identify outliers as points in low-density regions.

Correct answer is: DBSCAN

Q.6 Which measure is commonly used in distance-based outlier detection?

Manhattan distance

Cosine similarity

Euclidean distance

Hamming distance

Explanation - Euclidean distance is widely used to measure the distance of points from one another for outlier detection.

Correct answer is: Euclidean distance

Q.7 What is the role of z-score in outlier detection?

It identifies missing values

It measures how far a point is from the mean

It clusters data points

It builds classification models

Explanation - The z-score standardizes data and helps detect outliers by showing how many standard deviations a value is away from the mean.

Correct answer is: It measures how far a point is from the mean

Q.8 Which statistical method can be used for outlier detection?

Mean shift

Boxplot analysis

Gradient descent

Principal component analysis

Explanation - Boxplots can highlight values beyond the whiskers, which are potential outliers.

Correct answer is: Boxplot analysis

Q.9 Which type of outliers occur due to system errors?

Contextual outliers

Collective outliers

Erroneous outliers

Global outliers

Explanation - Erroneous outliers occur due to errors such as incorrect data entry or faulty sensors.

Correct answer is: Erroneous outliers

Q.10 Contextual outliers are also known as?

Conditional outliers

Global outliers

Noise points

Extreme anomalies

Explanation - Contextual outliers are data points considered normal in one context but abnormal in another, hence called conditional outliers.

Correct answer is: Conditional outliers

Q.11 Which method finds outliers by examining data distribution?

Histogram analysis

Decision tree splitting

Hashing

Apriori algorithm

Explanation - Histograms can reveal unusual spikes or gaps that indicate outliers in data distribution.

Correct answer is: Histogram analysis

Q.12 What type of data is more prone to outliers?

Categorical data

Time-series data

Binary data

Nominal data

Explanation - Time-series data often exhibits outliers due to unexpected events or anomalies in trends.

Correct answer is: Time-series data

Q.13 Which of the following is an application of outlier detection?

Spam detection

Sorting data

Data normalization

Clustering

Explanation - Outlier detection techniques are often applied in spam filtering to identify unusual email patterns.

Correct answer is: Spam detection

Q.14 Which algorithm is suitable for high-dimensional outlier detection?

LOF (Local Outlier Factor)

Naïve Bayes

Apriori

Linear Regression

Explanation - LOF is effective for high-dimensional datasets as it compares local density deviations of data points.

Correct answer is: LOF (Local Outlier Factor)

Q.15 In clustering, what do outliers usually represent?

Points with highest density

Points that do not belong to any cluster

Centroid points

Boundary points of clusters

Explanation - Outliers in clustering are data points that do not clearly fit into any of the defined clusters.

Correct answer is: Points that do not belong to any cluster

Q.16 Which distance measure is better for categorical data outlier detection?

Euclidean distance

Hamming distance

Cosine similarity

Jaccard similarity

Explanation - Hamming distance is suitable for categorical or binary data, counting mismatches between attributes.

Correct answer is: Hamming distance

Q.17 Which of the following is NOT a type of outlier?

Global

Contextual

Collective

Cumulative

Explanation - The three major types of outliers are global, contextual, and collective. Cumulative is not recognized as a type.

Correct answer is: Cumulative

Q.18 Which of the following is a real-world example of contextual outlier?

A very tall student in a class

A cold day in summer

Duplicate customer entries

System log errors

Explanation - A cold day is not unusual in winter, but in summer it is an outlier due to the contextual season factor.

Correct answer is: A cold day in summer

Q.19 Which factor is crucial in density-based outlier detection?

Average mean

Standard deviation

Neighborhood density

Median

Explanation - Density-based methods compare the density of neighborhoods to identify points with relatively low density as outliers.

Correct answer is: Neighborhood density

Q.20 What does the Local Outlier Factor (LOF) measure?

Relative density deviation of a data point

Global distance of a point

Z-score value

Mean and variance

Explanation - LOF compares the local density of a point with its neighbors to determine its degree of being an outlier.

Correct answer is: Relative density deviation of a data point

Q.21 Which approach combines clustering and outlier detection?

Hybrid approach

Ensemble clustering

Boosting

Decision trees

Explanation - Hybrid approaches integrate clustering methods with outlier detection to improve accuracy.

Correct answer is: Hybrid approach

Q.22 Why is outlier detection important in fraud detection?

Fraudulent transactions often follow common patterns

Fraudulent transactions usually appear as anomalies

Fraud detection requires clustering

Fraud detection uses classification only

Explanation - Fraudulent activities are uncommon and deviate from normal behavior, making them detectable as outliers.

Correct answer is: Fraudulent transactions usually appear as anomalies

Q.23 Which is a limitation of distance-based outlier detection?

Requires labeled data

Not suitable for low-dimensional data

Sensitive to data scale and dimensionality

Cannot be used for numeric data

Explanation - Distance-based methods suffer in high-dimensional data due to the curse of dimensionality.

Correct answer is: Sensitive to data scale and dimensionality

Q.24 Which visualization is most helpful for outlier detection?

Line chart

Boxplot

Pie chart

Histogram

Explanation - Boxplots are widely used to easily visualize and detect potential outliers.

Correct answer is: Boxplot

Q.25 In pattern analysis, what is sequential pattern mining?

Finding patterns in unordered sets

Finding patterns in ordered sequences

Detecting duplicate items

Clustering time-series

Explanation - Sequential pattern mining identifies frequent patterns where order of items is important, e.g., purchase sequences.

Correct answer is: Finding patterns in ordered sequences