Stream Data Mining # MCQs Practice set

Q.1 What is the primary challenge in stream data mining compared to traditional data mining?

Data is static and small
Data arrives continuously and rapidly
Data is always clean and structured
There are no storage limitations
Explanation - Stream data mining deals with continuous and potentially unbounded data, requiring algorithms that can process data in real-time without storing everything.
Correct answer is: Data arrives continuously and rapidly

Q.2 Which of the following is a common approach in stream data mining to handle unbounded data?

Storing all historical data
Windowing techniques
Ignoring old data
Manual analysis of batches
Explanation - Windowing techniques allow algorithms to focus on the most recent data, making real-time analysis feasible without storing the entire stream.
Correct answer is: Windowing techniques

Q.3 What is a sliding window in stream data mining?

A technique to visualize data streams
A fixed-size subset of the most recent data items
A method to delete data permanently
A type of database index
Explanation - A sliding window captures a fixed number of recent data points and updates as new data arrives, allowing for real-time analysis.
Correct answer is: A fixed-size subset of the most recent data items

Q.4 Which algorithm is widely used for frequent pattern mining in streams?

Apriori
FP-Stream
K-Means
Decision Tree
Explanation - FP-Stream is specifically designed for mining frequent patterns over streaming data using incremental and window-based approaches.
Correct answer is: FP-Stream

Q.5 In stream clustering, which algorithm adapts to changing data distribution over time?

DBSCAN
CluStream
K-Means on static data
Naive Bayes
Explanation - CluStream maintains micro-clusters and updates them over time, allowing clustering to adapt to evolving data streams.
Correct answer is: CluStream

Q.6 What is concept drift in stream data mining?

Data becomes clean over time
Data distribution changes over time
Data stops arriving
All data points become identical
Explanation - Concept drift refers to changes in the underlying patterns of the stream, requiring adaptive models to maintain accuracy.
Correct answer is: Data distribution changes over time

Q.7 Which technique is commonly used to detect concept drift?

Decision tree pruning
Statistical tests and monitoring error rates
Storing all historical data
Data normalization
Explanation - Monitoring model performance and using statistical tests can indicate when the data distribution has shifted, signaling concept drift.
Correct answer is: Statistical tests and monitoring error rates

Q.8 Why are traditional batch learning algorithms often unsuitable for stream data mining?

They cannot handle labeled data
They require multiple passes over all data
They are too fast
They work only on numeric data
Explanation - Stream data is continuous and unbounded, so algorithms that need multiple passes over the entire dataset are impractical.
Correct answer is: They require multiple passes over all data

Q.9 Which type of data summarization is commonly used in stream mining?

Exact storage of all events
Synopsis data structures like sketches and histograms
Manual logging
Storing only the first data point
Explanation - Synopsis structures provide compact summaries of large data streams, allowing approximate answers while reducing memory usage.
Correct answer is: Synopsis data structures like sketches and histograms

Q.10 What is the main advantage of online learning algorithms in stream mining?

They store the full dataset
They update the model incrementally with each new data point
They need multiple passes over data
They ignore new data
Explanation - Online learning algorithms continuously update their model with incoming data, which is essential for real-time stream analysis.
Correct answer is: They update the model incrementally with each new data point

Q.11 In stream classification, which approach helps maintain accuracy in the presence of concept drift?

Static decision trees
Ensemble learning and adaptive classifiers
Storing all historical data
Ignoring old errors
Explanation - Adaptive classifiers and ensembles can adjust to changes in data distribution, improving robustness against concept drift.
Correct answer is: Ensemble learning and adaptive classifiers

Q.12 What is the difference between a landmark window and a sliding window in stream mining?

Landmark window stores old data permanently, sliding window uses only recent data
Sliding window stores old data permanently, landmark window uses recent data
Both are identical
Neither stores any data
Explanation - Landmark windows consider all data since a specific point (landmark), while sliding windows consider only the most recent data.
Correct answer is: Landmark window stores old data permanently, sliding window uses only recent data

Q.13 Which of the following is NOT a common stream data mining task?

Classification
Clustering
Frequent pattern mining
Manual spreadsheet editing
Explanation - Stream data mining focuses on automated tasks like classification, clustering, and pattern mining rather than manual operations.
Correct answer is: Manual spreadsheet editing

Q.14 What is micro-clustering in the context of stream clustering?

Creating clusters of very large datasets
Maintaining summary statistics of small clusters over time
Clustering historical data only
Clustering only numeric attributes
Explanation - Micro-clustering keeps compact summaries of data points, which can later be merged or analyzed to form macro clusters.
Correct answer is: Maintaining summary statistics of small clusters over time

Q.15 Which stream mining algorithm is suitable for detecting rare events?

Hoeffding Tree
SWIM (Sliding Window Interestingness Mining)
K-Means
Apriori
Explanation - SWIM focuses on detecting unusual or rare patterns in a sliding window of stream data, which standard algorithms may miss.
Correct answer is: SWIM (Sliding Window Interestingness Mining)

Q.16 Why is memory management critical in stream data mining?

Streams are always small
Streams are unbounded and cannot be fully stored
Streams do not require processing
Memory has no effect on algorithm speed
Explanation - Because data streams are potentially infinite, algorithms must summarize or selectively store data to operate within memory limits.
Correct answer is: Streams are unbounded and cannot be fully stored

Q.17 Which of the following is a challenge unique to stream data mining?

Data cleaning
Real-time processing of evolving data
SQL query execution
Data normalization
Explanation - Unlike traditional mining, stream mining must handle data that evolves over time and requires real-time analysis.
Correct answer is: Real-time processing of evolving data

Q.18 Which evaluation metric is commonly used to measure stream classification performance?

Mean squared error
Accuracy over time with incremental updates
Pearson correlation
Disk space usage
Explanation - Stream classification often uses metrics like time-dependent accuracy or F1-score to monitor performance as the model adapts to new data.
Correct answer is: Accuracy over time with incremental updates

Q.19 What is the role of synopsis data structures like count-min sketch in stream mining?

Exact storage of all data
Compact approximation of frequencies or aggregates
Visualizing clusters
Sorting incoming data
Explanation - These structures provide memory-efficient approximations for queries like frequency counts, suitable for high-speed streams.
Correct answer is: Compact approximation of frequencies or aggregates

Q.20 In stream association rule mining, why are exact counts often impractical?

Data streams are small
Streams are unbounded and memory is limited
Rules are always irrelevant
No algorithm exists for counting
Explanation - Since data streams are potentially infinite, approximate counting using windowing or synopsis structures is necessary.
Correct answer is: Streams are unbounded and memory is limited

Q.21 What is the Hoeffding bound used for in stream classification?

To determine confidence in model updates with limited data
To compute exact cluster centers
To visualize sliding windows
To sort streaming data
Explanation - The Hoeffding bound provides a statistical guarantee that decisions made using a subset of data are likely to be correct, enabling incremental learning.
Correct answer is: To determine confidence in model updates with limited data

Q.22 Which of the following is an advantage of incremental algorithms in stream mining?

They discard new data
They process each data point once and update models
They require full batch processing
They ignore changes in data
Explanation - Incremental algorithms are suitable for streams because they update the model on the fly without multiple passes or full storage.
Correct answer is: They process each data point once and update models

Q.23 What is the main purpose of data aging in stream mining?

To store all historical data permanently
To gradually reduce the influence of old data
To ignore new incoming data
To normalize data
Explanation - Data aging ensures that recent trends have more impact on the model than outdated information, improving adaptability.
Correct answer is: To gradually reduce the influence of old data

Q.24 Which scenario is ideal for applying stream data mining?

Analyzing historical census data
Monitoring network traffic in real-time
Batch processing of financial reports
Data entry in spreadsheets
Explanation - Stream data mining is suitable for real-time applications where data arrives continuously and timely insights are critical.
Correct answer is: Monitoring network traffic in real-time

Q.25 Which property distinguishes data streams from static datasets?

Finite and fixed
Continuous and potentially unbounded
Always numeric
Never evolving
Explanation - Data streams are characterized by their continuous flow and potentially infinite size, unlike static datasets which are finite.
Correct answer is: Continuous and potentially unbounded