Q.1 Which of the following best defines Big Data?
A small dataset that fits in a single computer's memory
Data that is structured and stored in relational databases only
Extremely large datasets that cannot be processed by traditional methods
Data stored in spreadsheets for analysis
Explanation - Big Data refers to datasets so large and complex that traditional data processing tools are inadequate to handle them.
Correct answer is: Extremely large datasets that cannot be processed by traditional methods
Q.2 What are the three V's of Big Data?
Volume, Velocity, Variety
Value, Validation, Visibility
Volume, Value, Visualization
Velocity, Verification, Variety
Explanation - The three V's of Big Data describe its characteristics: Volume (size), Velocity (speed of generation), and Variety (different forms).
Correct answer is: Volume, Velocity, Variety
Q.3 Which of the following is a popular Big Data processing framework?
Hadoop
MySQL
Oracle Database
SQLite
Explanation - Hadoop is an open-source framework used for distributed storage and processing of large datasets using clusters of computers.
Correct answer is: Hadoop
Q.4 What is the main purpose of a data warehouse?
To store transactional data for daily operations
To store and manage large amounts of historical and analytical data
To replace relational databases
To store only unstructured data
Explanation - Data warehouses are designed to store historical data and support business intelligence and analytics rather than daily transactions.
Correct answer is: To store and manage large amounts of historical and analytical data
Q.5 Which type of data is primarily stored in a data warehouse?
Transactional data
Historical and aggregated data
Temporary session data
Encrypted passwords
Explanation - Data warehouses focus on storing historical and summarized data to support analysis and decision-making.
Correct answer is: Historical and aggregated data
Q.6 ETL in Data Warehousing stands for:
Extract, Transform, Load
Encode, Transfer, Link
Extract, Transmit, Log
Encrypt, Transform, Load
Explanation - ETL is the process of extracting data from sources, transforming it into a suitable format, and loading it into a data warehouse.
Correct answer is: Extract, Transform, Load
Q.7 Which of the following is NOT a Big Data characteristic?
Volume
Variety
Velocity
Validation
Explanation - Volume, Velocity, and Variety are the key characteristics of Big Data; Validation is not one of the traditional three V's.
Correct answer is: Validation
Q.8 MapReduce is primarily used for:
Visualizing data
Distributed data processing
Storing data in relational databases
Querying small datasets
Explanation - MapReduce is a programming model used for processing large datasets in parallel across distributed systems.
Correct answer is: Distributed data processing
Q.9 Which type of data model is commonly used in a data warehouse?
Relational OLTP model
Star and Snowflake schemas
Network model
Hierarchical model
Explanation - Star and Snowflake schemas are widely used for organizing data in data warehouses to facilitate analysis.
Correct answer is: Star and Snowflake schemas
Q.10 Which Big Data storage system allows storing unstructured and semi-structured data?
HDFS
MySQL
PostgreSQL
SQLite
Explanation - HDFS (Hadoop Distributed File System) is designed to store massive amounts of unstructured or semi-structured data across distributed clusters.
Correct answer is: HDFS
Q.11 OLAP in data warehousing stands for:
Online Analytical Processing
Online Linear Access Protocol
Offline Analytical Processing
Optimized Linear Analysis Procedure
Explanation - OLAP is used in data warehouses to enable multidimensional analytical queries for business intelligence purposes.
Correct answer is: Online Analytical Processing
Q.12 Which of the following is an advantage of using a data warehouse?
Improves daily transaction speed
Supports complex queries and analysis
Reduces the size of operational databases
Eliminates the need for backups
Explanation - Data warehouses are optimized for analytical queries, not transactional processing, providing insights for decision-making.
Correct answer is: Supports complex queries and analysis
Q.13 Which of the following tools is commonly used for Big Data analytics?
Apache Hive
Microsoft Word
Adobe Photoshop
Oracle Forms
Explanation - Apache Hive is used for querying and analyzing large datasets stored in Hadoop.
Correct answer is: Apache Hive
Q.14 Which term describes the speed at which data is generated and processed in Big Data?
Velocity
Volume
Variety
Validity
Explanation - Velocity refers to the rate at which new data is generated and the speed of its processing in Big Data environments.
Correct answer is: Velocity
Q.15 Data marts are:
Smaller, focused subsets of a data warehouse
Transactional databases
Unstructured data repositories
Temporary files used in Hadoop
Explanation - Data marts are specialized subsets of data warehouses that focus on specific business areas or departments.
Correct answer is: Smaller, focused subsets of a data warehouse
Q.16 Which of the following is a NoSQL database suitable for Big Data?
MongoDB
Oracle
MySQL
Microsoft Access
Explanation - MongoDB is a NoSQL database designed to handle large volumes of unstructured and semi-structured data efficiently.
Correct answer is: MongoDB
Q.17 In Hadoop, the NameNode is responsible for:
Storing the actual data blocks
Managing the metadata and directory structure
Processing MapReduce jobs
Generating reports
Explanation - The NameNode in Hadoop manages the metadata of the file system and tracks where data blocks are stored in DataNodes.
Correct answer is: Managing the metadata and directory structure
Q.18 Which is an example of structured data?
Customer names and phone numbers in a relational table
Emails and social media posts
Images and videos
Sensor data in raw text files
Explanation - Structured data is organized and stored in a fixed schema, like tables in relational databases.
Correct answer is: Customer names and phone numbers in a relational table
Q.19 Which Hadoop component is used for querying large datasets?
Hive
Spark Streaming
HBase
Oozie
Explanation - Hive provides an SQL-like interface to query and analyze large datasets stored in Hadoop.
Correct answer is: Hive
Q.20 Which process ensures that data in a data warehouse is accurate and consistent?
Data cleaning and transformation
Data replication
Data deletion
Data encryption
Explanation - Data cleaning and transformation in ETL ensures the accuracy, consistency, and quality of data loaded into the warehouse.
Correct answer is: Data cleaning and transformation
Q.21 Which term describes the diversity of data types in Big Data?
Variety
Volume
Velocity
Validity
Explanation - Variety refers to the different types of data (structured, unstructured, semi-structured) that Big Data encompasses.
Correct answer is: Variety
Q.22 Which of the following is an advantage of using Hadoop?
Scalable storage and processing for large datasets
Automatic report generation
Built-in transactional support
Faster local disk performance
Explanation - Hadoop allows distributed storage and processing, enabling scalability across clusters for Big Data workloads.
Correct answer is: Scalable storage and processing for large datasets
Q.23 Data warehouse schemas that normalize dimensions into multiple related tables are called:
Snowflake schemas
Star schemas
Fact schemas
Flat schemas
Explanation - Snowflake schemas normalize dimension tables into multiple related tables to reduce redundancy in data warehouses.
Correct answer is: Snowflake schemas
Q.24 Which Big Data technology supports in-memory distributed processing for faster analytics?
Apache Spark
HDFS
Hive
Cassandra
Explanation - Apache Spark performs distributed in-memory computations, making it faster than traditional MapReduce for iterative analytics.
Correct answer is: Apache Spark
