Biological Databases and Management # MCQs Practice set

Q.1 What is the primary purpose of the GenBank database?

To store protein sequences
To store nucleotide sequences
To store metabolic pathways
To store structural data
Explanation - GenBank is a public repository managed by NCBI that contains DNA and RNA sequence records.
Correct answer is: To store nucleotide sequences

Q.2 Which file format is commonly used to store raw DNA sequencing reads?

FASTA
FASTQ
GenBank
PDB
Explanation - FASTQ files include the nucleotide sequence and the per-base quality scores, making them the standard for raw sequencing data.
Correct answer is: FASTQ

Q.3 The Protein Data Bank (PDB) primarily contains data about?

DNA sequences
Protein and nucleic acid 3D structures
Gene expression levels
Protein‑protein interaction networks
Explanation - PDB stores experimentally determined 3‑D structures of biomolecules, mainly proteins and nucleic acids.
Correct answer is: Protein and nucleic acid 3D structures

Q.4 Which database contains detailed information on metabolic pathways?

NCBI
KEGG
UniProt
PubMed
Explanation - KEGG (Kyoto Encyclopedia of Genes and Genomes) provides curated pathway maps linking genes to metabolic and signaling pathways.
Correct answer is: KEGG

Q.5 Which of the following is NOT a typical function of a biological database?

Data storage
Data retrieval
Data generation
Data annotation
Explanation - Biological databases store, retrieve, and annotate existing data; they do not generate new experimental data.
Correct answer is: Data generation

Q.6 What is the main advantage of using a relational database for biological data?

Easy to visualize data
Supports complex queries across tables
Handles unstructured text better
Requires no indexing
Explanation - Relational databases allow joins and structured queries that enable efficient retrieval of related data across multiple tables.
Correct answer is: Supports complex queries across tables

Q.7 Which of the following file formats is used to describe the structure of protein families?

PFAM
FASTA
GenBank
PDB
Explanation - PFAM is a database of protein families and domains, typically accessed via the Pfam XML or text files.
Correct answer is: PFAM

Q.8 What does the acronym EMBL stand for in the context of biological databases?

European Molecular Biology Laboratory
Encyclopedia of Molecular Bioinformatics Lists
Electronic Metadata Base Library
European Metagenomics Biological Log
Explanation - EMBL is the European organization that maintains a nucleotide sequence database similar to GenBank.
Correct answer is: European Molecular Biology Laboratory

Q.9 Which database is the primary source for functional annotations of genes in *Arabidopsis thaliana*?

TAIR
UniProt
Ensembl
PDB
Explanation - TAIR (The Arabidopsis Information Resource) specializes in gene information for this model plant.
Correct answer is: TAIR

Q.10 What is a key feature of the UniProtKB/Swiss‑Prot subset?

Only contains bacterial proteins
Provides manually curated protein annotations
Stores raw sequencing reads
Focuses on structural data only
Explanation - Swiss‑Prot is the manually annotated, reviewed portion of UniProtKB, ensuring high-quality protein information.
Correct answer is: Provides manually curated protein annotations

Q.11 Which query language is commonly used to retrieve data from RDF-based biological databases?

SQL
SPARQL
XQuery
CQL
Explanation - SPARQL is the standard query language for RDF (Resource Description Framework) data models used in semantic web databases.
Correct answer is: SPARQL

Q.12 In the FASTA file format, how are individual sequences identified?

By a header line starting with a ‘>’ character
By a header line starting with a ‘#’ character
By a header line starting with a ‘@’ character
By a header line starting with a ‘*’ character
Explanation - Each FASTA record begins with a ‘>’ line that contains the sequence identifier and optional description.
Correct answer is: By a header line starting with a ‘>’ character

Q.13 Which database contains information on protein–protein interaction networks?

STRING
PDB
UniProt
GenBank
Explanation - STRING provides predicted and experimentally verified protein‑protein interaction data across many organisms.
Correct answer is: STRING

Q.14 What does the GISAID database specialize in?

Genomic variants of SARS‑CoV‑2
Protein structural data
Metabolomic datasets
Microarray gene expression data
Explanation - GISAID is a global initiative that shares influenza and SARS‑CoV‑2 sequence data with researchers.
Correct answer is: Genomic variants of SARS‑CoV‑2

Q.15 Which of the following best describes the term 'ontology' in bioinformatics?

A database of DNA sequences
A hierarchical classification of biological terms
A software for sequence alignment
An algorithm for phylogenetic tree construction
Explanation - Ontologies define controlled vocabularies and relationships between terms, facilitating standardized annotation.
Correct answer is: A hierarchical classification of biological terms

Q.16 The Ensembl database provides genomic data primarily for which type of organisms?

Bacterial strains
Model eukaryotes and vertebrates
Plants only
Viral genomes only
Explanation - Ensembl hosts annotated genomes for a wide range of eukaryotic species, including many model organisms.
Correct answer is: Model eukaryotes and vertebrates

Q.17 Which of the following is a commonly used tool for aligning short sequencing reads to a reference genome?

BLAST
BWA
Clustal Omega
MUSCLE
Explanation - BWA (Burrows–Wheeler Aligner) is designed for fast alignment of short reads against a reference genome.
Correct answer is: BWA

Q.18 What is the role of the NCBI Taxonomy database?

Store protein structures
Provide a hierarchical classification of organisms
Host gene expression datasets
Track publication metrics
Explanation - The Taxonomy database assigns a unique taxonomic ID and provides a tree-like classification for all known organisms.
Correct answer is: Provide a hierarchical classification of organisms

Q.19 Which file format is typically used to represent phylogenetic trees?

NEWICK
FASTA
JSON
XML
Explanation - The Newick format encodes tree topology as nested parentheses, commonly used in phylogenetics.
Correct answer is: NEWICK

Q.20 In a relational database, what is an 'index' used for?

Store backup copies
Speed up data retrieval
Compress data
Validate data integrity
Explanation - Indexes create a data structure that allows the database engine to find rows faster.
Correct answer is: Speed up data retrieval

Q.21 Which of the following best describes the FASTQ quality score encoding method Sanger?

ASCII 33 offset
ASCII 64 offset
Binary encoding
Hexadecimal encoding
Explanation - Sanger quality scores use ASCII characters starting at 33, representing quality from 0 to 93.
Correct answer is: ASCII 33 offset

Q.22 What does the acronym 'RNA‑seq' refer to?

Sequencing of ribosomal DNA
Sequencing of messenger RNA
Sequencing of all genomic DNA
Sequencing of protein structures
Explanation - RNA‑seq is a high‑throughput sequencing technique to quantify RNA transcripts in a sample.
Correct answer is: Sequencing of messenger RNA

Q.23 Which of the following is a primary advantage of using a NoSQL database for genomic data?

Strict schema enforcement
Easy to perform joins
Scalable storage for large, unstructured datasets
Built-in relational integrity
Explanation - NoSQL databases allow flexible schema and horizontal scaling, suitable for big genomic datasets.
Correct answer is: Scalable storage for large, unstructured datasets

Q.24 Which type of metadata is essential for a sequencing dataset in a public repository?

Author’s favorite color
Sample source and experimental conditions
Personal contact information
Stock market data
Explanation - Accurate metadata ensures that other researchers can understand and reuse the dataset.
Correct answer is: Sample source and experimental conditions

Q.25 What is the main purpose of the BioProject record in NCBI?

To catalog individual protein structures
To group related biological datasets for a single research project
To provide a list of all known genes
To host user forums
Explanation - BioProject serves as an umbrella for all data (sequences, annotations, etc.) generated in a particular study.
Correct answer is: To group related biological datasets for a single research project

Q.26 Which of the following describes a 'blast hit' in BLAST results?

A complete match to the query sequence
A statistically significant alignment between query and subject
A random occurrence
A predicted structure
Explanation - BLAST reports high‑scoring segment pairs (HSPs) that have a low probability of occurring by chance.
Correct answer is: A statistically significant alignment between query and subject

Q.27 Which database contains curated information on protein functional families?

PFAM
PDB
KEGG
GenBank
Explanation - PFAM catalogs protein families and domains, providing sequence alignments and hidden Markov models.
Correct answer is: PFAM

Q.28 What is a 'sequence identifier' in GenBank?

A unique accession number assigned to each entry
The length of the DNA sequence
The file format of the entry
The organism name only
Explanation - Accession numbers serve as stable references for GenBank records.
Correct answer is: A unique accession number assigned to each entry

Q.29 Which of the following best represents a 'gene ontology (GO)' term?

A unique protein ID
A structured vocabulary describing biological processes, cellular components, and molecular functions
A DNA sequence
A metabolic pathway diagram
Explanation - GO provides standardized terms to annotate gene products across species.
Correct answer is: A structured vocabulary describing biological processes, cellular components, and molecular functions

Q.30 What is the primary function of the Sequence Read Archive (SRA)?

Store raw sequencing reads
Provide protein tertiary structures
Archive research publications
Manage grant applications
Explanation - SRA is a public repository that holds raw sequencing data from high-throughput platforms.
Correct answer is: Store raw sequencing reads

Q.31 Which of the following is NOT a typical format for representing protein families?

Pfam HMM
Clustal alignment
MCL graph
FASTA file
Explanation - MCL is an algorithm; its output graph is not a format for storing families but for clustering relationships.
Correct answer is: MCL graph

Q.32 The 'Ensemble' gene annotation system is most closely associated with which scientific discipline?

Structural biology
Ecology
Genomics
Pharmacology
Explanation - Ensembl provides high‑quality genome annotations for a wide range of eukaryotic species.
Correct answer is: Genomics

Q.33 Which of the following statements best describes a 'metadata schema' in the context of biological databases?

A file format for raw data
A blueprint defining the structure of metadata records
An algorithm for sequence alignment
A type of sequencing machine
Explanation - A metadata schema specifies the fields, data types, and relationships for cataloging datasets.
Correct answer is: A blueprint defining the structure of metadata records

Q.34 Which database is a primary source for curated, reviewed protein sequences?

UniProtKB/Swiss‑Prot
GenBank
PDB
KEGG
Explanation - Swiss‑Prot contains manually curated, high‑quality protein entries.
Correct answer is: UniProtKB/Swiss‑Prot

Q.35 What does 'SRA' stand for in bioinformatics?

Sequence Read Archive
Sequence Research Array
Sequence Reference Atlas
Sequence Retrieval Algorithm
Explanation - SRA is NCBI's repository for raw sequencing data.
Correct answer is: Sequence Read Archive

Q.36 Which of the following best describes the purpose of the Gene Ontology Consortium?

To produce new sequencing technologies
To create a shared vocabulary for gene product attributes
To store 3‑D protein structures
To manage grant funding
Explanation - The Gene Ontology provides controlled terms for biological processes, cellular components, and molecular functions.
Correct answer is: To create a shared vocabulary for gene product attributes

Q.37 Which database contains curated information about drug–target interactions?

DrugBank
PDB
KEGG
GenBank
Explanation - DrugBank catalogs detailed drug information along with target proteins and mechanisms of action.
Correct answer is: DrugBank

Q.38 In a relational database, what does 'normalization' primarily aim to achieve?

Increase query speed at the cost of data redundancy
Reduce data redundancy and prevent update anomalies
Enable real‑time analytics
Create backup copies
Explanation - Normalization structures tables to minimize duplication and maintain data integrity.
Correct answer is: Reduce data redundancy and prevent update anomalies

Q.39 Which of the following is a common tool for visualizing phylogenetic trees?

MEGA
BLAST
BWA
SAMtools
Explanation - MEGA (Molecular Evolutionary Genetics Analysis) provides tools for constructing and viewing phylogenetic trees.
Correct answer is: MEGA

Q.40 Which of the following best describes the term 'ortholog'?

Two genes within the same species that have similar functions
Genes in different species that evolved from a common ancestral gene
A protein that binds to DNA
A type of RNA molecule
Explanation - Orthologs are homologous genes in different species that originated from a single gene in the last common ancestor.
Correct answer is: Genes in different species that evolved from a common ancestral gene

Q.41 Which database is known for providing 2‑D and 3‑D representations of metabolic pathways?

KEGG
PDB
GenBank
Ensembl
Explanation - KEGG includes pathway maps with both 2‑D diagrams and linked 3‑D structures.
Correct answer is: KEGG

Q.42 What does the term 'FASTA format' refer to?

A binary format for protein sequences
A plain text format for nucleotide or protein sequences with header lines
A compressed archive format
An image format for genomic data
Explanation - FASTA uses ‘>’ header lines followed by sequence lines, widely used for storing sequences.
Correct answer is: A plain text format for nucleotide or protein sequences with header lines

Q.43 Which of the following is a standard identifier used to refer to a specific protein in the UniProt database?

Accession number
Gene name
Chromosome position
RNA‑seq count
Explanation - UniProt accession numbers uniquely identify each protein entry.
Correct answer is: Accession number

Q.44 In the context of databases, what does 'REST API' stand for?

Representational State Transfer Application Programming Interface
Randomized Sequence Transfer Algorithmic Protocol Interface
Reliable Storage Transactional Encrypted Protocol Interface
Resource Secure Transfer Access Protocol Interface
Explanation - REST APIs allow programmatic access to database services via standard HTTP methods.
Correct answer is: Representational State Transfer Application Programming Interface

Q.45 What is the primary use of the BioCyc database collection?

To store raw sequencing reads
To host curated metabolic pathway databases for multiple organisms
To archive protein structures
To provide genome assembly tools
Explanation - BioCyc contains detailed, organism‑specific metabolic pathways and associated data.
Correct answer is: To host curated metabolic pathway databases for multiple organisms

Q.46 Which of the following describes a 'feature' in a GenBank flat file?

A file extension for compressed data
A section detailing specific genomic annotations like genes, exons, or regulatory elements
The file's checksum
The number of sequences in the file
Explanation - Features in GenBank files provide precise positions and functional annotations within the sequence.
Correct answer is: A section detailing specific genomic annotations like genes, exons, or regulatory elements

Q.47 Which of the following best defines the term 'sequence alignment'?

Combining multiple sequences to create a consensus sequence
Sorting sequences alphabetically
Determining the best match between two or more sequences
Compressing sequences for storage
Explanation - Sequence alignment finds regions of similarity that may indicate functional, structural, or evolutionary relationships.
Correct answer is: Determining the best match between two or more sequences

Q.48 What does the 'BLAST' program primarily use to assess the similarity between sequences?

Random guessing
Exact matches of all nucleotides
Statistical scoring matrices and gap penalties
Manual curation
Explanation - BLAST uses scoring matrices (e.g., BLOSUM) and gap penalties to evaluate alignments.
Correct answer is: Statistical scoring matrices and gap penalties

Q.49 Which database provides detailed gene expression data from microarray experiments?

GEO (Gene Expression Omnibus)
PDB
KEGG
GenBank
Explanation - GEO archives high‑throughput gene expression and other functional genomics data.
Correct answer is: GEO (Gene Expression Omnibus)

Q.50 In a relational database, a 'foreign key' is used to:

Create an index for faster queries
Ensure uniqueness of a column
Enforce a relationship between two tables
Store binary data
Explanation - A foreign key links a column in one table to a primary key in another, maintaining referential integrity.
Correct answer is: Enforce a relationship between two tables

Q.51 Which of the following tools is commonly used for visualizing high‑dimensional omics data?

cBioPortal
BLAST
BWA
SAMtools
Explanation - cBioPortal provides interactive visualizations of cancer genomics, including high‑dimensional data.
Correct answer is: cBioPortal

Q.52 Which of the following databases focuses on non‑coding RNA sequences and their functions?

Rfam
PDB
GenBank
KEGG
Explanation - Rfam catalogs families of non‑coding RNA and their consensus alignments.
Correct answer is: Rfam

Q.53 What is the main purpose of the 'Sequence Ontology' (SO) in genomics?

To provide a standardized vocabulary for genomic sequence features
To store raw sequencing data
To design sequencing primers
To predict protein secondary structure
Explanation - SO defines terms such as exon, intron, and variant type for consistent annotation.
Correct answer is: To provide a standardized vocabulary for genomic sequence features

Q.54 Which of the following best describes a 'feature table' in a GenBank file?

A list of all files in the database
A table describing the start, end, and annotation of genomic features
A summary of user access logs
A list of protein 3‑D structures
Explanation - The feature table provides precise genomic coordinates and functional descriptors.
Correct answer is: A table describing the start, end, and annotation of genomic features

Q.55 Which of the following is a key advantage of using cloud storage for genomic datasets?

Increased physical security
Unlimited local access without internet
Scalable storage and computing resources
Mandatory encryption of all data
Explanation - Cloud platforms provide elastic storage and computational power suitable for large‑scale bioinformatics.
Correct answer is: Scalable storage and computing resources

Q.56 What does the 'E‑value' in BLAST represent?

The number of errors in the alignment
The probability of observing an alignment of similar or better quality by chance
The length of the aligned region
The number of matching nucleotides
Explanation - A lower E‑value indicates a more statistically significant match.
Correct answer is: The probability of observing an alignment of similar or better quality by chance

Q.57 Which file format is used to describe the secondary structure of proteins in the PDB file?

SEQRES
HELIX
MODEL
REMARK
Explanation - The HELIX records in a PDB file describe alpha‑helices and their start/end residues.
Correct answer is: HELIX

Q.58 Which database is specifically dedicated to storing curated information on enzyme‑catalyzed reactions?

BRENDA
PDB
KEGG
GenBank
Explanation - BRENDA is a comprehensive enzyme information system providing data on reactions, substrates, and conditions.
Correct answer is: BRENDA

Q.59 Which of the following is NOT a typical component of a 'FASTA header' line?

Sequence identifier
Description of the sequence
A ‘>’ character at the beginning
The full DNA sequence itself
Explanation - The header line contains metadata; the sequence follows on subsequent lines.
Correct answer is: The full DNA sequence itself

Q.60 What does the acronym 'NCBI' stand for?

National Center for Biotechnology Information
National Center for Bioinformatics Integration
Nucleotide Collection Bioinformatics Index
None of the above
Explanation - NCBI manages major biological databases like GenBank, PubMed, and BLAST.
Correct answer is: National Center for Biotechnology Information

Q.61 Which of the following best describes the use of a 'hash index' in database systems?

To sort data alphabetically
To enable quick retrieval by key value using a hash function
To compress large datasets
To enforce relational constraints
Explanation - A hash index maps key values to locations, providing O(1) average lookup time.
Correct answer is: To enable quick retrieval by key value using a hash function

Q.62 Which of the following is a commonly used tool for assembling short sequencing reads into longer contigs?

SPAdes
BLAST
SAMtools
ClustalW
Explanation - SPAdes is a genome assembler designed for single‑cell and bacterial genome projects.
Correct answer is: SPAdes

Q.63 What does the 'GeneID' field in NCBI refer to?

A unique identifier for a gene
The length of a gene in base pairs
The number of exons in a gene
The chromosomal position of a gene
Explanation - GeneID is a stable numeric identifier assigned to each gene in the NCBI Gene database.
Correct answer is: A unique identifier for a gene

Q.64 Which of the following best represents a 'primary database' in bioinformatics?

A database that stores raw experimental data directly from instruments
A database that aggregates data from multiple primary databases
A database that only contains annotations
A database that provides analytical tools
Explanation - Primary databases collect original data; secondary databases integrate and annotate it.
Correct answer is: A database that stores raw experimental data directly from instruments

Q.65 Which of the following is NOT a typical component of a 'GenBank flat file'?

LOCUS line
ORIGIN line
FEATURES line
IMAGE line
Explanation - GenBank files contain LOCUS, FEATURES, ORIGIN, and other structured lines; no IMAGE line exists.
Correct answer is: IMAGE line

Q.66 What is the purpose of the 'BioMart' portal?

To visualize protein structures
To provide a flexible, web‑based interface for querying biological datasets
To host raw sequencing data
To perform sequence alignment
Explanation - BioMart allows users to retrieve customized data from large databases like Ensembl.
Correct answer is: To provide a flexible, web‑based interface for querying biological datasets

Q.67 Which of the following is a key challenge in maintaining biological databases?

Ensuring consistent naming conventions across species
Producing new sequencing machines
Disseminating research papers
Designing laboratory protocols
Explanation - Standardized nomenclature is essential for reliable data integration and retrieval.
Correct answer is: Ensuring consistent naming conventions across species

Q.68 What does the 'GTF' file format represent in genomics?

Genetic Transfer File
Gene Transfer Format
Genome Transcript Format
Gene Table File
Explanation - GTF (Gene Transfer Format) is used for storing gene annotations and transcript information.
Correct answer is: Gene Transfer Format

Q.69 Which of the following best describes a 'secondary structure' prediction for RNA?

Predicting the 3‑D tertiary fold
Identifying the arrangement of base pairs (e.g., stems, loops)
Determining the gene’s chromosomal location
Mapping the RNA to protein domains
Explanation - Secondary structure prediction focuses on base pairing patterns rather than full 3‑D conformation.
Correct answer is: Identifying the arrangement of base pairs (e.g., stems, loops)

Q.70 Which of the following is a major feature of the 'Sequence Read Archive (SRA)'?

It stores only assembled genomes
It hosts raw sequencing reads from diverse platforms
It provides only protein annotations
It is used exclusively for microbiome studies
Explanation - SRA archives raw data from Illumina, PacBio, Oxford Nanopore, and more.
Correct answer is: It hosts raw sequencing reads from diverse platforms

Q.71 In the context of bioinformatics databases, what is a 'controlled vocabulary'?

A list of random words
A predefined set of terms with defined relationships
A dictionary for translating between languages
A set of user‑generated tags
Explanation - Controlled vocabularies ensure consistency in data annotation and retrieval.
Correct answer is: A predefined set of terms with defined relationships

Q.72 Which of the following databases primarily provides curated information on genetic variants?

ClinVar
PDB
GenBank
KEGG
Explanation - ClinVar aggregates clinically relevant genetic variation data with interpretation of pathogenicity.
Correct answer is: ClinVar

Q.73 Which of the following best describes 'sequence clustering' in bioinformatics?

Separating sequences by length
Grouping similar sequences to reduce redundancy
Aligning sequences to a reference genome
Converting sequences into protein structures
Explanation - Clustering reduces dataset size and highlights representative sequences.
Correct answer is: Grouping similar sequences to reduce redundancy

Q.74 What does the 'SAM' file format store?

Sequencing reads before alignment
Alignment information of sequencing reads to a reference
Protein tertiary structures
Metabolic pathway maps
Explanation - SAM (Sequence Alignment/Map) records the mapping of reads to reference sequences.
Correct answer is: Alignment information of sequencing reads to a reference

Q.75 Which of the following databases is a primary source for curated, high‑quality enzyme classification?

KEGG
BRENDA
UniProt
GenBank
Explanation - BRENDA contains detailed enzyme data, including EC numbers and reaction conditions.
Correct answer is: BRENDA

Q.76 In a relational database, a 'view' is:

A physical copy of a table
A virtual table generated from a query
An index for faster searches
A backup of the database
Explanation - Views present data from one or more tables as a single table without storing data themselves.
Correct answer is: A virtual table generated from a query

Q.77 Which of the following is a key advantage of using version control for biological sequence databases?

It eliminates the need for backups
It allows tracking of changes and ensures reproducibility
It speeds up sequence alignment
It provides automated annotation
Explanation - Version control systems log every edit, facilitating audit trails and reproducibility.
Correct answer is: It allows tracking of changes and ensures reproducibility

Q.78 Which database includes curated information on small non‑coding RNAs such as miRNA and siRNA?

miRBase
KEGG
PDB
GenBank
Explanation - miRBase catalogs known microRNA sequences and annotation information.
Correct answer is: miRBase

Q.79 What is a 'metadata field' in the context of a biological database?

A field that stores the raw data
A field that stores additional descriptive information about the data
A field for storing image files
A field that contains the file size
Explanation - Metadata provides context such as source, method, and conditions for the primary data.
Correct answer is: A field that stores additional descriptive information about the data

Q.80 Which of the following best describes a 'circular genome'?

A genome that can be rearranged in any order
A genome that contains no linear ends and forms a loop
A genome that is only present in eukaryotes
A genome with multiple chromosomes
Explanation - Circular genomes, typical of many bacteria and mitochondria, form closed loops.
Correct answer is: A genome that contains no linear ends and forms a loop

Q.81 Which of the following database systems uses SQL (Structured Query Language) as its primary query language?

MySQL
MongoDB
Cassandra
Neo4j
Explanation - MySQL is a relational database system that uses SQL for querying and manipulation.
Correct answer is: MySQL

Q.82 Which of the following best defines a 'substitution matrix' used in sequence alignment?

A matrix that assigns scores to matches and mismatches between residues
A matrix that determines the location of sequences in a database
A matrix that stores quality scores for sequencing reads
A matrix that represents 3‑D coordinates of proteins
Explanation - Substitution matrices (e.g., BLOSUM) guide alignment scoring by providing match/mismatch penalties.
Correct answer is: A matrix that assigns scores to matches and mismatches between residues

Q.83 What does the 'FASTA format' use to indicate the end of a sequence record?

A blank line
The next header line starting with ‘>’
A special end marker ‘END’
A line of dashes ‘----’
Explanation - FASTA records are separated by new header lines; the sequence continues until the next header.
Correct answer is: The next header line starting with ‘>’

Q.84 Which database provides information about protein–protein interactions specific to humans?

STRING
BioGRID
KEGG
PDB
Explanation - BioGRID catalogs experimentally validated interactions, including many human proteins.
Correct answer is: BioGRID

Q.85 What is a 'flat file' in the context of biological databases?

A single, unstructured text file containing records
A database with multiple tables
An image file of a chromosome
A compressed archive of sequences
Explanation - Flat files (e.g., GenBank flat file) store data in plain text without relational structure.
Correct answer is: A single, unstructured text file containing records

Q.86 Which of the following is a characteristic of a 'structured query language' (SQL)?

It supports only insert operations
It requires manual parsing of text
It allows declarative queries using SELECT, FROM, WHERE clauses
It is used exclusively for graph databases
Explanation - SQL enables users to specify the data they want rather than how to retrieve it.
Correct answer is: It allows declarative queries using SELECT, FROM, WHERE clauses

Q.87 Which of the following best describes the purpose of the 'Gene Expression Omnibus (GEO)'?

Storing raw sequencing reads
Storing gene expression and related functional genomics data
Providing protein structural data
Listing chemical compounds
Explanation - GEO archives microarray, RNA‑seq, and other expression datasets.
Correct answer is: Storing gene expression and related functional genomics data

Q.88 Which of the following is a common challenge when integrating data from multiple biological databases?

Uniform naming conventions
Limited internet bandwidth
Inconsistent data formats and annotations
Low data volume
Explanation - Differences in how data is formatted and annotated hinder seamless integration.
Correct answer is: Inconsistent data formats and annotations

Q.89 What does the 'Accession Number' in a GenBank record signify?

The version of the database
The unique identifier for the record
The number of sequences in the record
The publication year
Explanation - Each GenBank record receives a unique accession number for reference.
Correct answer is: The unique identifier for the record

Q.90 Which of the following best describes a 'clustering algorithm' in genomics?

An algorithm that aligns sequences to a reference
An algorithm that groups sequences based on similarity to reduce redundancy
An algorithm that predicts gene function
An algorithm that converts RNA to DNA
Explanation - Clustering reduces dataset size by grouping similar sequences and selecting representatives.
Correct answer is: An algorithm that groups sequences based on similarity to reduce redundancy

Q.91 Which of the following databases is dedicated to storing information on genetic variations linked to disease?

ClinVar
PDB
KEGG
GenBank
Explanation - ClinVar catalogs clinically relevant variants and their interpretations.
Correct answer is: ClinVar

Q.92 Which file format is used to store high‑quality, annotated genomic sequences for eukaryotes?

GFF3
FASTA
FASTQ
PDB
Explanation - GFF3 (General Feature Format) is used for detailed genomic annotations.
Correct answer is: GFF3

Q.93 What does the 'NCBI Entrez' system provide?

A search and retrieval system for integrated NCBI databases
A tool for aligning sequences
A graphical interface for 3‑D structures
A platform for data compression
Explanation - Entrez allows querying across databases like GenBank, PubMed, and BLAST.
Correct answer is: A search and retrieval system for integrated NCBI databases

Q.94 Which of the following is a type of 'structured data' in a biological database?

A text file with random data
An XML file containing organized data with tags
A binary image
A handwritten note
Explanation - Structured data follows a schema, enabling easy parsing and retrieval.
Correct answer is: An XML file containing organized data with tags

Q.95 Which of the following best describes the use of a 'hash table' in a database?

Storing large sequences in compressed form
Providing O(1) average time lookup by key
Storing relational tables only
Indexing for full‑text search
Explanation - Hash tables use a hash function to map keys to array indices, allowing fast access.
Correct answer is: Providing O(1) average time lookup by key

Q.96 Which of the following is NOT typically included in a 'GenBank feature table'?

Gene
CDS (Coding Sequence)
Protein
Transposable Element
Explanation - The feature table lists genomic features; proteins are represented indirectly via CDS entries.
Correct answer is: Protein

Q.97 What is the function of the 'BioPython' library?

Providing a platform for database management
Facilitating bioinformatics computations and file parsing in Python
Storing large genomic datasets
Visualizing protein structures
Explanation - BioPython supplies modules for sequence manipulation, alignment, and parsing of biological formats.
Correct answer is: Facilitating bioinformatics computations and file parsing in Python

Q.98 Which of the following best describes a 'data repository'?

A physical storage facility for lab equipment
A place where raw or processed biological data is stored and made available to the community
A software for sequence alignment
A type of database index
Explanation - Data repositories archive datasets for preservation and public access.
Correct answer is: A place where raw or processed biological data is stored and made available to the community

Q.99 Which database provides a catalog of microbial genomes?

NCBI RefSeq
PDB
KEGG
GenBank
Explanation - RefSeq offers curated, reference sequences for bacterial genomes.
Correct answer is: NCBI RefSeq

Q.100 Which of the following is a typical format used to represent gene ontology annotations?

GFF3
GAF
FASTA
FASTQ
Explanation - GAF (Gene Ontology Annotation File) encodes annotations in a tab‑delimited format.
Correct answer is: GAF

Q.101 Which of the following databases focuses on the structural biology of macromolecules?

PDB
GenBank
KEGG
ClinVar
Explanation - The Protein Data Bank catalogs experimentally determined macromolecular structures.
Correct answer is: PDB

Q.102 What does the 'E‑value' in a BLAST output indicate?

The expected number of random alignments with equal or better score
The exact number of mismatches
The length of the alignment
The number of sequences in the database
Explanation - A low E‑value means the match is unlikely due to chance.
Correct answer is: The expected number of random alignments with equal or better score

Q.103 Which database contains curated information on protein‑binding domains?

Pfam
PDB
KEGG
GenBank
Explanation - Pfam catalogs protein families and domains with hidden Markov models.
Correct answer is: Pfam

Q.104 Which of the following best describes a 'public database' in bioinformatics?

A database that requires a paid subscription
A database that is freely accessible to the research community
A private database for personal use only
A database that only stores images
Explanation - Public databases provide open access to biological data for everyone.
Correct answer is: A database that is freely accessible to the research community

Q.105 Which of the following is an example of a 'secondary database'?

GenBank
RefSeq
BioMart
PDB
Explanation - BioMart aggregates and integrates data from multiple primary sources.
Correct answer is: BioMart

Q.106 Which of the following best describes the 'FASTA' format header line?

It starts with a ‘>’ symbol followed by an identifier and optional description
It starts with a ‘#’ symbol and contains the file size
It ends with a ‘*’ symbol
It contains the full sequence directly
Explanation - The header line begins with ‘>’ and provides metadata about the sequence.
Correct answer is: It starts with a ‘>’ symbol followed by an identifier and optional description

Q.107 Which of the following database systems is best suited for storing and querying large genomic datasets with flexible schemas?

MySQL
MongoDB
SQLite
Oracle
Explanation - MongoDB is a NoSQL document store that handles large, flexible datasets efficiently.
Correct answer is: MongoDB

Q.108 Which of the following is a key feature of the 'Sequence Read Archive (SRA)'?

Only stores assembled genomes
Holds raw sequencing reads from high‑throughput platforms
Provides protein tertiary structures
Stores only human DNA sequences
Explanation - SRA archives the original reads before assembly or analysis.
Correct answer is: Holds raw sequencing reads from high‑throughput platforms

Q.109 In a relational database, what does 'normalization' primarily aim to achieve?

Increase query speed at the cost of redundancy
Reduce data redundancy and avoid anomalies
Create backup copies of tables
Provide graphical user interface
Explanation - Normalization organizes tables to minimize duplication and maintain consistency.
Correct answer is: Reduce data redundancy and avoid anomalies

Q.110 What does the acronym 'GTF' stand for?

Genetic Transfer Format
Gene Transfer Format
Genomic Text File
Gene Translation File
Explanation - GTF is a file format that records gene and transcript annotations.
Correct answer is: Gene Transfer Format

Q.111 Which database contains curated information on enzymes and their reactions?

BRENDA
KEGG
PDB
GenBank
Explanation - BRENDA is the comprehensive enzyme database, including reaction conditions.
Correct answer is: BRENDA

Q.112 Which of the following is a common challenge in biological database management?

Ensuring data quality and consistency across different data sources
Building a physical laboratory
Sequencing DNA in a single step
Printing large images
Explanation - Data heterogeneity often leads to integration issues and requires curation.
Correct answer is: Ensuring data quality and consistency across different data sources

Q.113 Which of the following best describes a 'sequence alignment'?

Merging two sequences into one
Determining the similarity between two or more sequences
Converting a DNA sequence to RNA
Counting the number of nucleotides
Explanation - Alignment aligns sequences to identify conserved regions and infer evolutionary relationships.
Correct answer is: Determining the similarity between two or more sequences

Q.114 Which database provides curated data on genomic variation and its clinical significance?

ClinVar
PDB
KEGG
GenBank
Explanation - ClinVar collects clinical interpretations of genetic variants.
Correct answer is: ClinVar

Q.115 Which of the following database types stores unstructured data like raw sequencing reads?

Relational database
NoSQL document store
Graph database
XML database
Explanation - NoSQL stores can handle large unstructured data efficiently.
Correct answer is: NoSQL document store

Q.116 What is the main purpose of the 'BioMart' tool?

To provide an interface for querying biological datasets across multiple databases
To sequence DNA directly from samples
To visualize 3‑D protein structures
To perform statistical analyses on clinical trials
Explanation - BioMart allows flexible, web‑based queries over integrated data sources.
Correct answer is: To provide an interface for querying biological datasets across multiple databases

Q.117 Which of the following best describes a 'metadata field'?

A field containing the primary sequence data
A field that holds descriptive information about the data (e.g., source, method)
A field that stores images only
A field that indicates the size of the file
Explanation - Metadata provides context for the primary data, facilitating reuse.
Correct answer is: A field that holds descriptive information about the data (e.g., source, method)

Q.118 Which database is primarily used for storing 3‑D structural models of proteins?

PDB
GenBank
KEGG
ClinVar
Explanation - The Protein Data Bank contains experimentally determined 3‑D structures.
Correct answer is: PDB

Q.119 What is a 'substitution matrix' used for in sequence alignment?

To store the sequence data itself
To assign scores to matches, mismatches, and gaps
To keep track of file formats
To control the database connection
Explanation - Substitution matrices (e.g., BLOSUM) guide the scoring of alignments.
Correct answer is: To assign scores to matches, mismatches, and gaps

Q.120 Which of the following best describes a 'relational database'?

A database that stores data in flat files
A database that uses tables and defines relationships among them
A database that only stores images
A database that does not support querying
Explanation - Relational databases structure data in tables linked by keys, enabling complex queries.
Correct answer is: A database that uses tables and defines relationships among them

Q.121 Which file format is commonly used to store raw sequencing read data along with per‑base quality scores?

FASTA
FASTQ
GenBank
PDB
Explanation - FASTQ files contain both the nucleotide sequence and quality information.
Correct answer is: FASTQ

Q.122 Which of the following is NOT a common database for storing gene expression data?

GEO
ArrayExpress
PDB
KEGG
Explanation - PDB stores protein structures, not expression data.
Correct answer is: PDB

Q.123 What does the acronym 'NCBI' stand for?

National Center for Biotechnology Information
National Council for Bioinformatics Innovation
New Catalogue of Biological Inferences
None of the above
Explanation - NCBI manages major biological databases and resources.
Correct answer is: National Center for Biotechnology Information

Q.124 Which database contains curated information on miRNA sequences?

miRBase
PDB
KEGG
GenBank
Explanation - miRBase is dedicated to microRNA sequences and annotations.
Correct answer is: miRBase

Q.125 Which of the following best describes the purpose of the 'Sequence Ontology' (SO)?

To provide a standard set of terms for describing genomic sequence features
To store raw sequencing reads
To predict protein folding
To manage laboratory equipment
Explanation - SO defines terms such as exon, intron, and variant types for consistent annotation.
Correct answer is: To provide a standard set of terms for describing genomic sequence features

Q.126 What is the main benefit of using a 'hash index' in a database?

It speeds up lookup operations
It compresses data
It creates redundant copies
It enforces foreign key constraints
Explanation - Hash indexes provide constant‑time average access to records.
Correct answer is: It speeds up lookup operations

Q.127 Which of the following best describes the 'FASTA' file header?

A line that starts with ‘>’ and contains an identifier
A line that starts with ‘#’ and contains metadata
A line that ends with ‘$’ and contains the sequence
A line that starts with ‘@’ and contains quality scores
Explanation - FASTA headers begin with ‘>’ and provide sequence identifiers and optional descriptions.
Correct answer is: A line that starts with ‘>’ and contains an identifier

Q.128 Which of the following is a key feature of the 'Ensembl' database?

It only stores bacterial genomes
It provides high‑quality, annotated eukaryotic genomes
It offers only protein structural data
It focuses exclusively on viral genomes
Explanation - Ensembl hosts curated genomes for many eukaryotes, including humans.
Correct answer is: It provides high‑quality, annotated eukaryotic genomes

Q.129 Which of the following file formats is used to store annotated genomic features?

GFF3
FASTA
FASTQ
PDB
Explanation - GFF3 (General Feature Format) describes gene locations, exons, and other features.
Correct answer is: GFF3

Q.130 What is the main purpose of a 'bioinformatics pipeline'?

To sequence DNA in a single step
To automate a series of computational analyses on biological data
To store raw data only
To provide a graphical interface for manual data entry
Explanation - Pipelines chain tools like alignment, assembly, and annotation into reproducible workflows.
Correct answer is: To automate a series of computational analyses on biological data

Q.131 Which of the following is a typical use of the 'BLAST' tool?

To assemble genomes from short reads
To compare a query sequence to a database and find similar sequences
To store raw sequencing data
To visualize 3‑D protein structures
Explanation - BLAST performs rapid similarity searches between a query and database sequences.
Correct answer is: To compare a query sequence to a database and find similar sequences

Q.132 Which database contains information on metabolic pathways across multiple species?

KEGG
PDB
GenBank
ClinVar
Explanation - KEGG maps genes and enzymes to metabolic and signaling pathways.
Correct answer is: KEGG

Q.133 What is a 'metadata schema' used for?

To define how data is stored physically
To describe the structure and meaning of metadata fields
To compress data files
To generate random sequences
Explanation - A schema specifies data types, relationships, and constraints for metadata.
Correct answer is: To describe the structure and meaning of metadata fields

Q.134 Which of the following best describes a 'data curation' process?

Creating new data from scratch
Cleaning, validating, and annotating existing data for reliability
Deleting outdated data
Transmitting data over a network
Explanation - Curators ensure datasets are accurate, complete, and consistently annotated.
Correct answer is: Cleaning, validating, and annotating existing data for reliability

Q.135 What does the 'GenBank' flat file feature table contain?

Only the sequence itself
Metadata about the sequence and annotations such as genes, CDS, and regulatory elements
3‑D structure coordinates
Only the accession number
Explanation - Feature tables detail the location and function of genomic elements.
Correct answer is: Metadata about the sequence and annotations such as genes, CDS, and regulatory elements

Q.136 Which of the following is a key advantage of using a graph database for biological data?

Efficient representation of complex relationships between entities
Only stores tabular data
Requires rigid schemas
Limited to small datasets
Explanation - Graph databases model entities as nodes and relationships as edges, ideal for interaction networks.
Correct answer is: Efficient representation of complex relationships between entities

Q.137 Which of the following file formats is used for storing annotated gene structures and transcript information?

GFF3
FASTA
GenBank
PDB
Explanation - GFF3 encodes features such as exons, introns, and transcripts with coordinates.
Correct answer is: GFF3

Q.138 Which of the following best describes the 'PDB' file header record 'HEADER'?

Provides a short description of the macromolecule
Stores the sequence data directly
Indicates the file type only
Contains the raw sequencing reads
Explanation - The HEADER record contains metadata like title, classification, and deposition date.
Correct answer is: Provides a short description of the macromolecule

Q.139 Which database provides curated information on protein‑protein interactions?

BioGRID
KEGG
GenBank
PDB
Explanation - BioGRID catalogs experimentally verified protein‑protein interactions across species.
Correct answer is: BioGRID

Q.140 What is the primary purpose of the 'Sequence Read Archive (SRA)'?

To store raw sequencing reads and metadata
To provide protein tertiary structures
To archive published research articles
To host databases of metabolic pathways
Explanation - SRA preserves raw reads from high‑throughput sequencing experiments.
Correct answer is: To store raw sequencing reads and metadata

Q.141 Which of the following best describes the 'GAF' file format?

An image format for protein structures
A tab‑delimited format for gene ontology annotations
A compressed binary file format
A text file for raw sequences
Explanation - GAF contains GO annotations in a structured, machine‑readable format.
Correct answer is: A tab‑delimited format for gene ontology annotations

Q.142 Which of the following best describes a 'primary database' in bioinformatics?

A database that stores raw experimental data directly from instruments
A database that aggregates curated data from multiple sources
A database that only stores images
A private database for personal use
Explanation - Primary databases collect the original data, while secondary databases provide curated views.
Correct answer is: A database that stores raw experimental data directly from instruments

Q.143 Which of the following is a commonly used format for representing 3‑D protein structures?

PDB
FASTA
FASTQ
GenBank
Explanation - PDB files store atomic coordinates and related information for macromolecules.
Correct answer is: PDB

Q.144 What does the acronym 'SRA' stand for in bioinformatics?

Sequence Read Archive
Standard Read Application
Sequence Repository Access
Statistical Reference Analysis
Explanation - The SRA holds raw sequencing reads from high‑throughput platforms.
Correct answer is: Sequence Read Archive

Q.145 Which of the following is NOT a typical database entry type for GenBank?

gene
CDS
protein
chromosome
Explanation - GenBank records describe genomic DNA or RNA, not individual proteins.
Correct answer is: protein

Q.146 Which of the following is a primary benefit of using a relational database for a biological database?

It allows for flexible schema changes on the fly
It supports complex joins across multiple tables for integrated queries
It can only store text data
It is slower than NoSQL for large datasets
Explanation - Relational databases excel at relational data and complex queries.
Correct answer is: It supports complex joins across multiple tables for integrated queries

Q.147 Which of the following best describes 'data provenance' in bioinformatics?

The storage location of raw data files
The record of how data was generated, processed, and curated
The size of the dataset
The format of the data file
Explanation - Provenance tracks the history of a dataset to ensure reproducibility.
Correct answer is: The record of how data was generated, processed, and curated

Q.148 Which of the following databases stores curated protein sequences and annotations?

UniProtKB
PDB
KEGG
GenBank
Explanation - UniProtKB is the main repository for protein sequences, including annotations.
Correct answer is: UniProtKB

Q.149 Which of the following best describes a 'FASTQ quality score' character?

It represents a nucleotide base
It encodes the confidence of each base call on a logarithmic scale
It indicates the sequence length
It is a binary flag
Explanation - FASTQ quality scores map to Phred scores indicating the probability of error.
Correct answer is: It encodes the confidence of each base call on a logarithmic scale

Q.150 Which of the following databases provides a comprehensive view of human metabolic pathways?

KEGG
PDB
GenBank
ClinVar
Explanation - KEGG maps genes and enzymes to metabolic pathways, including human pathways.
Correct answer is: KEGG

Q.151 Which of the following best describes the 'BioCyc' database collection?

A set of databases focused on microbial genomes only
A collection of curated metabolic pathway databases for multiple organisms
A database of protein tertiary structures
A database of clinical trials
Explanation - BioCyc contains organism‑specific pathway databases with detailed annotations.
Correct answer is: A collection of curated metabolic pathway databases for multiple organisms

Q.152 What is the main use of the 'SAMtools' software suite?

To align raw sequencing reads to a reference genome
To manipulate SAM/BAM files (sorting, indexing, filtering)
To visualize protein structures
To store raw sequencing reads in a database
Explanation - SAMtools provides utilities for working with alignment files in SAM/BAM format.
Correct answer is: To manipulate SAM/BAM files (sorting, indexing, filtering)