Web Mining and Text Mining # MCQs Practice set

Q.1 What is the primary goal of web mining?

To extract knowledge from databases
To extract useful information from the web
To perform data cleaning
To visualize data in graphs
Explanation - Web mining focuses on discovering useful patterns and information from web data, such as websites, user behavior, and content.
Correct answer is: To extract useful information from the web

Q.2 Which type of web mining focuses on analyzing the content of web pages?

Web structure mining
Web content mining
Web usage mining
Link analysis
Explanation - Web content mining extracts useful information from the contents of web pages, including text, images, and videos.
Correct answer is: Web content mining

Q.3 Web usage mining primarily analyzes:

The structure of web links
User navigation patterns and behaviors
The content of web pages
The server performance
Explanation - Web usage mining studies the user interactions with websites to discover navigation patterns, popular pages, and user behavior.
Correct answer is: User navigation patterns and behaviors

Q.4 Which of the following is a technique used in text mining?

Clustering
Normalization
Indexing
All of the above
Explanation - Text mining involves techniques such as clustering, indexing, and normalization to process and extract meaningful patterns from text data.
Correct answer is: All of the above

Q.5 Which representation is commonly used for text mining?

Bag of Words
Decision Tree
Neural Network
Adjacency Matrix
Explanation - Bag of Words (BoW) is a popular text representation method where each document is represented by word frequency counts.
Correct answer is: Bag of Words

Q.6 Which of the following is NOT a task of web structure mining?

PageRank computation
Link analysis
User clickstream analysis
Finding hubs and authorities
Explanation - User clickstream analysis is part of web usage mining, not web structure mining, which focuses on analyzing hyperlinks and page connectivity.
Correct answer is: User clickstream analysis

Q.7 Named Entity Recognition (NER) in text mining is used to:

Identify key topics in documents
Extract specific entities like names, locations, and organizations
Cluster similar documents
Summarize text automatically
Explanation - NER identifies proper nouns in text and classifies them into predefined categories like person names, organizations, locations, etc.
Correct answer is: Extract specific entities like names, locations, and organizations

Q.8 Which of the following is a common challenge in web mining?

High dimensionality of data
Noisy and unstructured data
Dynamic changes in web content
All of the above
Explanation - Web mining faces challenges like high-dimensionality, unstructured/noisy data, and continuously changing web content.
Correct answer is: All of the above

Q.9 Text mining is also referred to as:

Web crawling
Knowledge discovery in text
Data warehousing
Link mining
Explanation - Text mining is often called Knowledge Discovery in Text (KDT), as it extracts useful knowledge from unstructured text.
Correct answer is: Knowledge discovery in text

Q.10 In web content mining, which technique is used to extract frequent patterns?

Association rule mining
Regression analysis
Decision trees
Neural networks
Explanation - Association rule mining can identify frequently co-occurring patterns in web content, such as common keywords or tags.
Correct answer is: Association rule mining

Q.11 Tokenization in text mining refers to:

Converting text into tokens or words
Removing stop words
Stemming words
Parsing HTML content
Explanation - Tokenization splits text into meaningful units like words or phrases, which are then used for analysis in text mining.
Correct answer is: Converting text into tokens or words

Q.12 Which method is widely used for measuring similarity between documents?

Euclidean distance
Cosine similarity
Jaccard index
All of the above
Explanation - Various measures like Euclidean distance, Cosine similarity, and Jaccard index are used to quantify the similarity between documents.
Correct answer is: All of the above

Q.13 In web usage mining, the primary data source is:

Web server logs
Web content
Hyperlinks
Database schemas
Explanation - Web usage mining mainly relies on web server logs to understand user behavior, page visits, and navigation patterns.
Correct answer is: Web server logs

Q.14 Which of the following is a preprocessing step in text mining?

Stemming
Lemmatization
Stop word removal
All of the above
Explanation - Text preprocessing involves stemming, lemmatization, and removing stop words to clean and normalize text for mining.
Correct answer is: All of the above

Q.15 Which of the following best describes sentiment analysis in text mining?

Classifying documents by length
Analyzing user emotions and opinions in text
Extracting named entities
Indexing web pages
Explanation - Sentiment analysis determines the polarity (positive, negative, neutral) of text, often used for reviews and social media data.
Correct answer is: Analyzing user emotions and opinions in text

Q.16 Which of these is a clustering algorithm used in text mining?

K-means
Apriori
Decision Tree
Naive Bayes
Explanation - K-means is widely used for clustering documents in text mining based on similarity between feature vectors.
Correct answer is: K-means

Q.17 Which technique is used to reduce dimensionality in text mining?

Principal Component Analysis (PCA)
Stemming
Tokenization
Web crawling
Explanation - PCA is used to reduce the number of features (dimensionality) in text data while preserving important information.
Correct answer is: Principal Component Analysis (PCA)

Q.18 What is the purpose of web crawling?

To collect web pages for indexing
To analyze server performance
To remove duplicate pages
To cluster documents
Explanation - Web crawlers systematically browse the web to collect pages for search engines and further mining tasks.
Correct answer is: To collect web pages for indexing

Q.19 Which of the following is a semantic analysis task in text mining?

Part-of-speech tagging
Topic modeling
Tokenization
Stop word removal
Explanation - Topic modeling is a semantic analysis technique that identifies latent topics from large text corpora.
Correct answer is: Topic modeling

Q.20 Link analysis in web structure mining is used to:

Determine relationships between web pages
Cluster similar documents
Perform sentiment analysis
Extract entities from text
Explanation - Link analysis studies the connectivity and relationships between web pages, often used in ranking algorithms like PageRank.
Correct answer is: Determine relationships between web pages

Q.21 Which of the following is a challenge specific to web usage mining?

Handling server log noise and incomplete data
Extracting keywords from documents
Clustering web pages by content
Applying PCA on text data
Explanation - Web usage mining often deals with noisy and incomplete log data, making preprocessing and cleaning a critical step.
Correct answer is: Handling server log noise and incomplete data

Q.22 Which of the following is a common application of text mining?

Spam email detection
Customer feedback analysis
Recommendation systems
All of the above
Explanation - Text mining is used in various applications including spam detection, sentiment analysis of feedback, and recommendation engines.
Correct answer is: All of the above

Q.23 Which representation technique captures semantic meaning in text mining?

TF-IDF
Word embeddings
Bag of Words
One-hot encoding
Explanation - Word embeddings like Word2Vec and GloVe represent words in a vector space, capturing semantic relationships and meaning.
Correct answer is: Word embeddings

Q.24 Which type of mining discovers patterns from hyperlinks in web pages?

Web content mining
Web usage mining
Web structure mining
Text clustering
Explanation - Web structure mining focuses on analyzing hyperlinks and relationships between pages to discover patterns or authority hubs.
Correct answer is: Web structure mining

Q.25 Which of the following is NOT a typical preprocessing step in text mining?

Stop word removal
Tokenization
Indexing web pages
Stemming
Explanation - Indexing web pages is part of web crawling and retrieval, not a text preprocessing step like tokenization or stemming.
Correct answer is: Indexing web pages