Biological Databases and Management # MCQs Practice set

Q.1 What is the primary purpose of the GenBank database?

To store protein sequences

To store nucleotide sequences

To store metabolic pathways

To store structural data

Explanation - GenBank is a public repository managed by NCBI that contains DNA and RNA sequence records.

Correct answer is: To store nucleotide sequences

Q.2 Which file format is commonly used to store raw DNA sequencing reads?

FASTA

FASTQ

GenBank

PDB

Explanation - FASTQ files include the nucleotide sequence and the per-base quality scores, making them the standard for raw sequencing data.

Correct answer is: FASTQ

Q.3 The Protein Data Bank (PDB) primarily contains data about?

DNA sequences

Protein and nucleic acid 3D structures

Gene expression levels

Protein‑protein interaction networks

Explanation - PDB stores experimentally determined 3‑D structures of biomolecules, mainly proteins and nucleic acids.

Correct answer is: Protein and nucleic acid 3D structures

Q.4 Which database contains detailed information on metabolic pathways?

NCBI

KEGG

UniProt

PubMed

Explanation - KEGG (Kyoto Encyclopedia of Genes and Genomes) provides curated pathway maps linking genes to metabolic and signaling pathways.

Correct answer is: KEGG

Q.5 Which of the following is NOT a typical function of a biological database?

Data storage

Data retrieval

Data generation

Data annotation

Explanation - Biological databases store, retrieve, and annotate existing data; they do not generate new experimental data.

Correct answer is: Data generation

Q.6 What is the main advantage of using a relational database for biological data?

Easy to visualize data

Supports complex queries across tables

Handles unstructured text better

Requires no indexing

Explanation - Relational databases allow joins and structured queries that enable efficient retrieval of related data across multiple tables.

Correct answer is: Supports complex queries across tables

Q.7 Which of the following file formats is used to describe the structure of protein families?

PFAM

FASTA

GenBank

PDB

Explanation - PFAM is a database of protein families and domains, typically accessed via the Pfam XML or text files.

Correct answer is: PFAM

Q.8 What does the acronym EMBL stand for in the context of biological databases?

European Molecular Biology Laboratory

Encyclopedia of Molecular Bioinformatics Lists

Electronic Metadata Base Library

European Metagenomics Biological Log

Explanation - EMBL is the European organization that maintains a nucleotide sequence database similar to GenBank.

Correct answer is: European Molecular Biology Laboratory

Q.9 Which database is the primary source for functional annotations of genes in Arabidopsis thaliana?

TAIR

UniProt

Ensembl

PDB

Explanation - TAIR (The Arabidopsis Information Resource) specializes in gene information for this model plant.

Correct answer is: TAIR

Q.10 What is a key feature of the UniProtKB/Swiss‑Prot subset?

Only contains bacterial proteins

Provides manually curated protein annotations

Stores raw sequencing reads

Focuses on structural data only

Explanation - Swiss‑Prot is the manually annotated, reviewed portion of UniProtKB, ensuring high-quality protein information.

Correct answer is: Provides manually curated protein annotations

Q.11 Which query language is commonly used to retrieve data from RDF-based biological databases?

SQL

SPARQL

XQuery

CQL

Explanation - SPARQL is the standard query language for RDF (Resource Description Framework) data models used in semantic web databases.

Correct answer is: SPARQL

Q.12 In the FASTA file format, how are individual sequences identified?

By a header line starting with a ‘>’ character

By a header line starting with a ‘#’ character

By a header line starting with a ‘@’ character

By a header line starting with a ‘*’ character

Explanation - Each FASTA record begins with a ‘>’ line that contains the sequence identifier and optional description.

Correct answer is: By a header line starting with a ‘>’ character

Q.13 Which database contains information on protein–protein interaction networks?

STRING

PDB

UniProt

GenBank

Explanation - STRING provides predicted and experimentally verified protein‑protein interaction data across many organisms.

Correct answer is: STRING

Q.14 What does the GISAID database specialize in?

Genomic variants of SARS‑CoV‑2

Protein structural data

Metabolomic datasets

Microarray gene expression data

Explanation - GISAID is a global initiative that shares influenza and SARS‑CoV‑2 sequence data with researchers.

Correct answer is: Genomic variants of SARS‑CoV‑2

Q.15 Which of the following best describes the term 'ontology' in bioinformatics?

A database of DNA sequences

A hierarchical classification of biological terms

A software for sequence alignment

An algorithm for phylogenetic tree construction

Explanation - Ontologies define controlled vocabularies and relationships between terms, facilitating standardized annotation.

Correct answer is: A hierarchical classification of biological terms

Q.16 The Ensembl database provides genomic data primarily for which type of organisms?

Bacterial strains

Model eukaryotes and vertebrates

Plants only

Viral genomes only

Explanation - Ensembl hosts annotated genomes for a wide range of eukaryotic species, including many model organisms.

Correct answer is: Model eukaryotes and vertebrates

Q.17 Which of the following is a commonly used tool for aligning short sequencing reads to a reference genome?

BLAST

BWA

Clustal Omega

MUSCLE

Explanation - BWA (Burrows–Wheeler Aligner) is designed for fast alignment of short reads against a reference genome.

Correct answer is: BWA

Q.18 What is the role of the NCBI Taxonomy database?

Store protein structures

Provide a hierarchical classification of organisms

Host gene expression datasets

Track publication metrics

Explanation - The Taxonomy database assigns a unique taxonomic ID and provides a tree-like classification for all known organisms.

Correct answer is: Provide a hierarchical classification of organisms

Q.19 Which file format is typically used to represent phylogenetic trees?

NEWICK

FASTA

JSON

XML

Explanation - The Newick format encodes tree topology as nested parentheses, commonly used in phylogenetics.

Correct answer is: NEWICK

Q.20 In a relational database, what is an 'index' used for?

Store backup copies

Speed up data retrieval

Compress data

Validate data integrity

Explanation - Indexes create a data structure that allows the database engine to find rows faster.

Correct answer is: Speed up data retrieval

Q.21 Which of the following best describes the FASTQ quality score encoding method Sanger?

ASCII 33 offset

ASCII 64 offset

Binary encoding

Hexadecimal encoding

Explanation - Sanger quality scores use ASCII characters starting at 33, representing quality from 0 to 93.

Correct answer is: ASCII 33 offset

Q.22 What does the acronym 'RNA‑seq' refer to?

Sequencing of ribosomal DNA

Sequencing of messenger RNA

Sequencing of all genomic DNA

Sequencing of protein structures

Explanation - RNA‑seq is a high‑throughput sequencing technique to quantify RNA transcripts in a sample.

Correct answer is: Sequencing of messenger RNA

Q.23 Which of the following is a primary advantage of using a NoSQL database for genomic data?

Strict schema enforcement

Easy to perform joins

Scalable storage for large, unstructured datasets

Built-in relational integrity

Explanation - NoSQL databases allow flexible schema and horizontal scaling, suitable for big genomic datasets.

Correct answer is: Scalable storage for large, unstructured datasets

Q.24 Which type of metadata is essential for a sequencing dataset in a public repository?

Author’s favorite color

Sample source and experimental conditions

Personal contact information

Stock market data

Explanation - Accurate metadata ensures that other researchers can understand and reuse the dataset.

Correct answer is: Sample source and experimental conditions

Q.25 What is the main purpose of the BioProject record in NCBI?

To catalog individual protein structures

To group related biological datasets for a single research project

To provide a list of all known genes

To host user forums

Explanation - BioProject serves as an umbrella for all data (sequences, annotations, etc.) generated in a particular study.

Correct answer is: To group related biological datasets for a single research project

Q.26 Which of the following describes a 'blast hit' in BLAST results?

A complete match to the query sequence

A statistically significant alignment between query and subject

A random occurrence

A predicted structure

Explanation - BLAST reports high‑scoring segment pairs (HSPs) that have a low probability of occurring by chance.

Correct answer is: A statistically significant alignment between query and subject

Q.27 Which database contains curated information on protein functional families?

PFAM

PDB

KEGG

GenBank

Explanation - PFAM catalogs protein families and domains, providing sequence alignments and hidden Markov models.

Correct answer is: PFAM

Q.28 What is a 'sequence identifier' in GenBank?

A unique accession number assigned to each entry

The length of the DNA sequence

The file format of the entry

The organism name only

Explanation - Accession numbers serve as stable references for GenBank records.

Correct answer is: A unique accession number assigned to each entry

Q.29 Which of the following best represents a 'gene ontology (GO)' term?

A unique protein ID

A structured vocabulary describing biological processes, cellular components, and molecular functions

A DNA sequence

A metabolic pathway diagram

Explanation - GO provides standardized terms to annotate gene products across species.

Correct answer is: A structured vocabulary describing biological processes, cellular components, and molecular functions

Q.30 What is the primary function of the Sequence Read Archive (SRA)?

Store raw sequencing reads

Provide protein tertiary structures

Archive research publications

Manage grant applications

Explanation - SRA is a public repository that holds raw sequencing data from high-throughput platforms.

Correct answer is: Store raw sequencing reads

Q.31 Which of the following is NOT a typical format for representing protein families?

Pfam HMM

Clustal alignment

MCL graph

FASTA file

Explanation - MCL is an algorithm; its output graph is not a format for storing families but for clustering relationships.

Correct answer is: MCL graph

Q.32 The 'Ensemble' gene annotation system is most closely associated with which scientific discipline?

Structural biology

Ecology

Genomics

Pharmacology

Explanation - Ensembl provides high‑quality genome annotations for a wide range of eukaryotic species.

Correct answer is: Genomics

Q.33 Which of the following statements best describes a 'metadata schema' in the context of biological databases?

A file format for raw data

A blueprint defining the structure of metadata records

An algorithm for sequence alignment

A type of sequencing machine

Explanation - A metadata schema specifies the fields, data types, and relationships for cataloging datasets.

Correct answer is: A blueprint defining the structure of metadata records

Q.34 Which database is a primary source for curated, reviewed protein sequences?

UniProtKB/Swiss‑Prot

GenBank

PDB

KEGG

Explanation - Swiss‑Prot contains manually curated, high‑quality protein entries.

Correct answer is: UniProtKB/Swiss‑Prot

Q.35 What does 'SRA' stand for in bioinformatics?

Sequence Read Archive

Sequence Research Array

Sequence Reference Atlas

Sequence Retrieval Algorithm

Explanation - SRA is NCBI's repository for raw sequencing data.

Correct answer is: Sequence Read Archive

Q.36 Which of the following best describes the purpose of the Gene Ontology Consortium?

To produce new sequencing technologies

To create a shared vocabulary for gene product attributes

To store 3‑D protein structures

To manage grant funding

Explanation - The Gene Ontology provides controlled terms for biological processes, cellular components, and molecular functions.

Correct answer is: To create a shared vocabulary for gene product attributes

Q.37 Which database contains curated information about drug–target interactions?

DrugBank

PDB

KEGG

GenBank

Explanation - DrugBank catalogs detailed drug information along with target proteins and mechanisms of action.

Correct answer is: DrugBank

Q.38 In a relational database, what does 'normalization' primarily aim to achieve?

Increase query speed at the cost of data redundancy

Reduce data redundancy and prevent update anomalies

Enable real‑time analytics

Create backup copies

Explanation - Normalization structures tables to minimize duplication and maintain data integrity.

Correct answer is: Reduce data redundancy and prevent update anomalies

Q.39 Which of the following is a common tool for visualizing phylogenetic trees?

MEGA

BLAST

BWA

SAMtools

Explanation - MEGA (Molecular Evolutionary Genetics Analysis) provides tools for constructing and viewing phylogenetic trees.

Correct answer is: MEGA

Q.40 Which of the following best describes the term 'ortholog'?

Two genes within the same species that have similar functions

Genes in different species that evolved from a common ancestral gene

A protein that binds to DNA

A type of RNA molecule

Explanation - Orthologs are homologous genes in different species that originated from a single gene in the last common ancestor.

Correct answer is: Genes in different species that evolved from a common ancestral gene

Q.41 Which database is known for providing 2‑D and 3‑D representations of metabolic pathways?

KEGG

PDB

GenBank

Ensembl

Explanation - KEGG includes pathway maps with both 2‑D diagrams and linked 3‑D structures.

Correct answer is: KEGG

Q.42 What does the term 'FASTA format' refer to?

A binary format for protein sequences

A plain text format for nucleotide or protein sequences with header lines

A compressed archive format

An image format for genomic data

Explanation - FASTA uses ‘>’ header lines followed by sequence lines, widely used for storing sequences.

Correct answer is: A plain text format for nucleotide or protein sequences with header lines

Q.43 Which of the following is a standard identifier used to refer to a specific protein in the UniProt database?

Accession number

Gene name

Chromosome position

RNA‑seq count

Explanation - UniProt accession numbers uniquely identify each protein entry.

Correct answer is: Accession number

Q.44 In the context of databases, what does 'REST API' stand for?

Representational State Transfer Application Programming Interface

Randomized Sequence Transfer Algorithmic Protocol Interface

Reliable Storage Transactional Encrypted Protocol Interface

Resource Secure Transfer Access Protocol Interface

Explanation - REST APIs allow programmatic access to database services via standard HTTP methods.

Correct answer is: Representational State Transfer Application Programming Interface

Q.45 What is the primary use of the BioCyc database collection?

To store raw sequencing reads

To host curated metabolic pathway databases for multiple organisms

To archive protein structures

To provide genome assembly tools

Explanation - BioCyc contains detailed, organism‑specific metabolic pathways and associated data.

Correct answer is: To host curated metabolic pathway databases for multiple organisms

Q.46 Which of the following describes a 'feature' in a GenBank flat file?

A file extension for compressed data

A section detailing specific genomic annotations like genes, exons, or regulatory elements

The file's checksum

The number of sequences in the file

Explanation - Features in GenBank files provide precise positions and functional annotations within the sequence.

Correct answer is: A section detailing specific genomic annotations like genes, exons, or regulatory elements

Q.47 Which of the following best defines the term 'sequence alignment'?

Combining multiple sequences to create a consensus sequence

Sorting sequences alphabetically

Determining the best match between two or more sequences

Compressing sequences for storage

Explanation - Sequence alignment finds regions of similarity that may indicate functional, structural, or evolutionary relationships.

Correct answer is: Determining the best match between two or more sequences

Q.48 What does the 'BLAST' program primarily use to assess the similarity between sequences?

Random guessing

Exact matches of all nucleotides

Statistical scoring matrices and gap penalties

Manual curation

Explanation - BLAST uses scoring matrices (e.g., BLOSUM) and gap penalties to evaluate alignments.

Correct answer is: Statistical scoring matrices and gap penalties

Q.49 Which database provides detailed gene expression data from microarray experiments?

GEO (Gene Expression Omnibus)

PDB

KEGG

GenBank

Explanation - GEO archives high‑throughput gene expression and other functional genomics data.

Correct answer is: GEO (Gene Expression Omnibus)

Q.50 In a relational database, a 'foreign key' is used to:

Create an index for faster queries

Ensure uniqueness of a column

Enforce a relationship between two tables

Store binary data

Explanation - A foreign key links a column in one table to a primary key in another, maintaining referential integrity.

Correct answer is: Enforce a relationship between two tables

Q.51 Which of the following tools is commonly used for visualizing high‑dimensional omics data?

cBioPortal

BLAST

BWA

SAMtools

Explanation - cBioPortal provides interactive visualizations of cancer genomics, including high‑dimensional data.

Correct answer is: cBioPortal

Q.52 Which of the following databases focuses on non‑coding RNA sequences and their functions?

Rfam

PDB

GenBank

KEGG

Explanation - Rfam catalogs families of non‑coding RNA and their consensus alignments.

Correct answer is: Rfam

Q.53 What is the main purpose of the 'Sequence Ontology' (SO) in genomics?

To provide a standardized vocabulary for genomic sequence features

To store raw sequencing data

To design sequencing primers

To predict protein secondary structure

Explanation - SO defines terms such as exon, intron, and variant type for consistent annotation.

Correct answer is: To provide a standardized vocabulary for genomic sequence features

Q.54 Which of the following best describes a 'feature table' in a GenBank file?

A list of all files in the database

A table describing the start, end, and annotation of genomic features

A summary of user access logs

A list of protein 3‑D structures

Explanation - The feature table provides precise genomic coordinates and functional descriptors.

Correct answer is: A table describing the start, end, and annotation of genomic features

Q.55 Which of the following is a key advantage of using cloud storage for genomic datasets?

Increased physical security

Unlimited local access without internet

Scalable storage and computing resources

Mandatory encryption of all data

Explanation - Cloud platforms provide elastic storage and computational power suitable for large‑scale bioinformatics.

Correct answer is: Scalable storage and computing resources

Q.56 What does the 'E‑value' in BLAST represent?

The number of errors in the alignment

The probability of observing an alignment of similar or better quality by chance

The length of the aligned region

The number of matching nucleotides

Explanation - A lower E‑value indicates a more statistically significant match.

Correct answer is: The probability of observing an alignment of similar or better quality by chance

Q.57 Which file format is used to describe the secondary structure of proteins in the PDB file?

SEQRES

HELIX

MODEL

REMARK

Explanation - The HELIX records in a PDB file describe alpha‑helices and their start/end residues.

Correct answer is: HELIX

Q.58 Which database is specifically dedicated to storing curated information on enzyme‑catalyzed reactions?

BRENDA

PDB

KEGG

GenBank

Explanation - BRENDA is a comprehensive enzyme information system providing data on reactions, substrates, and conditions.

Correct answer is: BRENDA

Q.59 Which of the following is NOT a typical component of a 'FASTA header' line?

Sequence identifier

Description of the sequence

A ‘>’ character at the beginning

The full DNA sequence itself

Explanation - The header line contains metadata; the sequence follows on subsequent lines.

Correct answer is: The full DNA sequence itself

Q.60 What does the acronym 'NCBI' stand for?

National Center for Biotechnology Information

National Center for Bioinformatics Integration

Nucleotide Collection Bioinformatics Index

None of the above

Explanation - NCBI manages major biological databases like GenBank, PubMed, and BLAST.

Correct answer is: National Center for Biotechnology Information

Q.61 Which of the following best describes the use of a 'hash index' in database systems?

To sort data alphabetically

To enable quick retrieval by key value using a hash function

To compress large datasets

To enforce relational constraints

Explanation - A hash index maps key values to locations, providing O(1) average lookup time.

Correct answer is: To enable quick retrieval by key value using a hash function

Q.62 Which of the following is a commonly used tool for assembling short sequencing reads into longer contigs?

SPAdes

BLAST

SAMtools

ClustalW

Explanation - SPAdes is a genome assembler designed for single‑cell and bacterial genome projects.

Correct answer is: SPAdes

Q.63 What does the 'GeneID' field in NCBI refer to?

A unique identifier for a gene

The length of a gene in base pairs

The number of exons in a gene

The chromosomal position of a gene

Explanation - GeneID is a stable numeric identifier assigned to each gene in the NCBI Gene database.

Correct answer is: A unique identifier for a gene

Q.64 Which of the following best represents a 'primary database' in bioinformatics?

A database that stores raw experimental data directly from instruments

A database that aggregates data from multiple primary databases

A database that only contains annotations

A database that provides analytical tools

Explanation - Primary databases collect original data; secondary databases integrate and annotate it.

Correct answer is: A database that stores raw experimental data directly from instruments

Q.65 Which of the following is NOT a typical component of a 'GenBank flat file'?

LOCUS line

ORIGIN line

FEATURES line

IMAGE line

Explanation - GenBank files contain LOCUS, FEATURES, ORIGIN, and other structured lines; no IMAGE line exists.

Correct answer is: IMAGE line

Q.66 What is the purpose of the 'BioMart' portal?

To visualize protein structures

To provide a flexible, web‑based interface for querying biological datasets

To host raw sequencing data

To perform sequence alignment

Explanation - BioMart allows users to retrieve customized data from large databases like Ensembl.

Correct answer is: To provide a flexible, web‑based interface for querying biological datasets

Q.67 Which of the following is a key challenge in maintaining biological databases?

Ensuring consistent naming conventions across species

Producing new sequencing machines

Disseminating research papers

Designing laboratory protocols

Explanation - Standardized nomenclature is essential for reliable data integration and retrieval.

Correct answer is: Ensuring consistent naming conventions across species

Q.68 What does the 'GTF' file format represent in genomics?

Genetic Transfer File

Gene Transfer Format

Genome Transcript Format

Gene Table File

Explanation - GTF (Gene Transfer Format) is used for storing gene annotations and transcript information.

Correct answer is: Gene Transfer Format

Q.69 Which of the following best describes a 'secondary structure' prediction for RNA?

Predicting the 3‑D tertiary fold

Identifying the arrangement of base pairs (e.g., stems, loops)

Determining the gene’s chromosomal location

Mapping the RNA to protein domains

Explanation - Secondary structure prediction focuses on base pairing patterns rather than full 3‑D conformation.

Correct answer is: Identifying the arrangement of base pairs (e.g., stems, loops)

Q.70 Which of the following is a major feature of the 'Sequence Read Archive (SRA)'?

It stores only assembled genomes

It hosts raw sequencing reads from diverse platforms

It provides only protein annotations

It is used exclusively for microbiome studies

Explanation - SRA archives raw data from Illumina, PacBio, Oxford Nanopore, and more.

Correct answer is: It hosts raw sequencing reads from diverse platforms

Q.71 In the context of bioinformatics databases, what is a 'controlled vocabulary'?

A list of random words

A predefined set of terms with defined relationships

A dictionary for translating between languages

A set of user‑generated tags

Explanation - Controlled vocabularies ensure consistency in data annotation and retrieval.

Correct answer is: A predefined set of terms with defined relationships

Q.72 Which of the following databases primarily provides curated information on genetic variants?

ClinVar

PDB

GenBank

KEGG

Explanation - ClinVar aggregates clinically relevant genetic variation data with interpretation of pathogenicity.

Correct answer is: ClinVar

Q.73 Which of the following best describes 'sequence clustering' in bioinformatics?

Separating sequences by length

Grouping similar sequences to reduce redundancy

Aligning sequences to a reference genome

Converting sequences into protein structures

Explanation - Clustering reduces dataset size and highlights representative sequences.

Correct answer is: Grouping similar sequences to reduce redundancy

Q.74 What does the 'SAM' file format store?

Sequencing reads before alignment

Alignment information of sequencing reads to a reference

Protein tertiary structures

Metabolic pathway maps

Explanation - SAM (Sequence Alignment/Map) records the mapping of reads to reference sequences.

Correct answer is: Alignment information of sequencing reads to a reference

Q.75 Which of the following databases is a primary source for curated, high‑quality enzyme classification?

KEGG

BRENDA

UniProt

GenBank

Explanation - BRENDA contains detailed enzyme data, including EC numbers and reaction conditions.

Correct answer is: BRENDA

Q.76 In a relational database, a 'view' is:

A physical copy of a table

A virtual table generated from a query

An index for faster searches

A backup of the database

Explanation - Views present data from one or more tables as a single table without storing data themselves.

Correct answer is: A virtual table generated from a query

Q.77 Which of the following is a key advantage of using version control for biological sequence databases?

It eliminates the need for backups

It allows tracking of changes and ensures reproducibility

It speeds up sequence alignment

It provides automated annotation

Explanation - Version control systems log every edit, facilitating audit trails and reproducibility.

Correct answer is: It allows tracking of changes and ensures reproducibility

Q.78 Which database includes curated information on small non‑coding RNAs such as miRNA and siRNA?

miRBase

KEGG

PDB

GenBank

Explanation - miRBase catalogs known microRNA sequences and annotation information.

Correct answer is: miRBase

Q.79 What is a 'metadata field' in the context of a biological database?

A field that stores the raw data

A field that stores additional descriptive information about the data

A field for storing image files

A field that contains the file size

Explanation - Metadata provides context such as source, method, and conditions for the primary data.

Correct answer is: A field that stores additional descriptive information about the data

Q.80 Which of the following best describes a 'circular genome'?

A genome that can be rearranged in any order

A genome that contains no linear ends and forms a loop

A genome that is only present in eukaryotes

A genome with multiple chromosomes

Explanation - Circular genomes, typical of many bacteria and mitochondria, form closed loops.

Correct answer is: A genome that contains no linear ends and forms a loop

Q.81 Which of the following database systems uses SQL (Structured Query Language) as its primary query language?

MySQL

MongoDB

Cassandra

Neo4j

Explanation - MySQL is a relational database system that uses SQL for querying and manipulation.

Correct answer is: MySQL

Q.82 Which of the following best defines a 'substitution matrix' used in sequence alignment?

A matrix that assigns scores to matches and mismatches between residues

A matrix that determines the location of sequences in a database

A matrix that stores quality scores for sequencing reads

A matrix that represents 3‑D coordinates of proteins

Explanation - Substitution matrices (e.g., BLOSUM) guide alignment scoring by providing match/mismatch penalties.

Correct answer is: A matrix that assigns scores to matches and mismatches between residues

Q.83 What does the 'FASTA format' use to indicate the end of a sequence record?

A blank line

The next header line starting with ‘>’

A special end marker ‘END’

A line of dashes ‘----’

Explanation - FASTA records are separated by new header lines; the sequence continues until the next header.

Correct answer is: The next header line starting with ‘>’

Q.84 Which database provides information about protein–protein interactions specific to humans?

STRING

BioGRID

KEGG

PDB

Explanation - BioGRID catalogs experimentally validated interactions, including many human proteins.

Correct answer is: BioGRID

Q.85 What is a 'flat file' in the context of biological databases?

A single, unstructured text file containing records

A database with multiple tables

An image file of a chromosome

A compressed archive of sequences

Explanation - Flat files (e.g., GenBank flat file) store data in plain text without relational structure.

Correct answer is: A single, unstructured text file containing records

Q.86 Which of the following is a characteristic of a 'structured query language' (SQL)?

It supports only insert operations

It requires manual parsing of text

It allows declarative queries using SELECT, FROM, WHERE clauses

It is used exclusively for graph databases

Explanation - SQL enables users to specify the data they want rather than how to retrieve it.

Correct answer is: It allows declarative queries using SELECT, FROM, WHERE clauses

Q.87 Which of the following best describes the purpose of the 'Gene Expression Omnibus (GEO)'?

Storing raw sequencing reads

Storing gene expression and related functional genomics data

Providing protein structural data

Listing chemical compounds

Explanation - GEO archives microarray, RNA‑seq, and other expression datasets.

Correct answer is: Storing gene expression and related functional genomics data

Q.88 Which of the following is a common challenge when integrating data from multiple biological databases?

Uniform naming conventions

Limited internet bandwidth

Inconsistent data formats and annotations

Low data volume

Explanation - Differences in how data is formatted and annotated hinder seamless integration.

Correct answer is: Inconsistent data formats and annotations

Q.89 What does the 'Accession Number' in a GenBank record signify?

The version of the database

The unique identifier for the record

The number of sequences in the record

The publication year

Explanation - Each GenBank record receives a unique accession number for reference.

Correct answer is: The unique identifier for the record

Q.90 Which of the following best describes a 'clustering algorithm' in genomics?

An algorithm that aligns sequences to a reference

An algorithm that groups sequences based on similarity to reduce redundancy

An algorithm that predicts gene function

An algorithm that converts RNA to DNA

Explanation - Clustering reduces dataset size by grouping similar sequences and selecting representatives.

Correct answer is: An algorithm that groups sequences based on similarity to reduce redundancy

Q.91 Which of the following databases is dedicated to storing information on genetic variations linked to disease?

ClinVar

PDB

KEGG

GenBank

Explanation - ClinVar catalogs clinically relevant variants and their interpretations.

Correct answer is: ClinVar

Q.92 Which file format is used to store high‑quality, annotated genomic sequences for eukaryotes?

GFF3

FASTA

FASTQ

PDB

Explanation - GFF3 (General Feature Format) is used for detailed genomic annotations.

Correct answer is: GFF3

Q.93 What does the 'NCBI Entrez' system provide?

A search and retrieval system for integrated NCBI databases

A tool for aligning sequences

A graphical interface for 3‑D structures

A platform for data compression

Explanation - Entrez allows querying across databases like GenBank, PubMed, and BLAST.

Correct answer is: A search and retrieval system for integrated NCBI databases

Q.94 Which of the following is a type of 'structured data' in a biological database?

A text file with random data

An XML file containing organized data with tags

A binary image

A handwritten note

Explanation - Structured data follows a schema, enabling easy parsing and retrieval.

Correct answer is: An XML file containing organized data with tags

Q.95 Which of the following best describes the use of a 'hash table' in a database?

Storing large sequences in compressed form

Providing O(1) average time lookup by key

Storing relational tables only

Indexing for full‑text search

Explanation - Hash tables use a hash function to map keys to array indices, allowing fast access.

Correct answer is: Providing O(1) average time lookup by key

Q.96 Which of the following is NOT typically included in a 'GenBank feature table'?

Gene

CDS (Coding Sequence)

Protein

Transposable Element

Explanation - The feature table lists genomic features; proteins are represented indirectly via CDS entries.

Correct answer is: Protein

Q.97 What is the function of the 'BioPython' library?

Providing a platform for database management

Facilitating bioinformatics computations and file parsing in Python

Storing large genomic datasets

Visualizing protein structures

Explanation - BioPython supplies modules for sequence manipulation, alignment, and parsing of biological formats.

Correct answer is: Facilitating bioinformatics computations and file parsing in Python

Q.98 Which of the following best describes a 'data repository'?

A physical storage facility for lab equipment

A place where raw or processed biological data is stored and made available to the community

A software for sequence alignment

A type of database index

Explanation - Data repositories archive datasets for preservation and public access.

Correct answer is: A place where raw or processed biological data is stored and made available to the community

Q.99 Which database provides a catalog of microbial genomes?

NCBI RefSeq

PDB

KEGG

GenBank

Explanation - RefSeq offers curated, reference sequences for bacterial genomes.

Correct answer is: NCBI RefSeq

Q.100 Which of the following is a typical format used to represent gene ontology annotations?

GFF3

GAF

FASTA

FASTQ

Explanation - GAF (Gene Ontology Annotation File) encodes annotations in a tab‑delimited format.

Correct answer is: GAF

Q.101 Which of the following databases focuses on the structural biology of macromolecules?

PDB

GenBank

KEGG

ClinVar

Explanation - The Protein Data Bank catalogs experimentally determined macromolecular structures.

Correct answer is: PDB

Q.102 What does the 'E‑value' in a BLAST output indicate?

The expected number of random alignments with equal or better score

The exact number of mismatches

The length of the alignment

The number of sequences in the database

Explanation - A low E‑value means the match is unlikely due to chance.

Correct answer is: The expected number of random alignments with equal or better score

Q.103 Which database contains curated information on protein‑binding domains?

Pfam

PDB

KEGG

GenBank

Explanation - Pfam catalogs protein families and domains with hidden Markov models.

Correct answer is: Pfam

Q.104 Which of the following best describes a 'public database' in bioinformatics?

A database that requires a paid subscription

A database that is freely accessible to the research community

A private database for personal use only

A database that only stores images

Explanation - Public databases provide open access to biological data for everyone.

Correct answer is: A database that is freely accessible to the research community

Q.105 Which of the following is an example of a 'secondary database'?

GenBank

RefSeq

BioMart

PDB

Explanation - BioMart aggregates and integrates data from multiple primary sources.

Correct answer is: BioMart

Q.106 Which of the following best describes the 'FASTA' format header line?

It starts with a ‘>’ symbol followed by an identifier and optional description

It starts with a ‘#’ symbol and contains the file size

It ends with a ‘*’ symbol

It contains the full sequence directly

Explanation - The header line begins with ‘>’ and provides metadata about the sequence.

Correct answer is: It starts with a ‘>’ symbol followed by an identifier and optional description

Q.107 Which of the following database systems is best suited for storing and querying large genomic datasets with flexible schemas?

MySQL

MongoDB

SQLite

Oracle

Explanation - MongoDB is a NoSQL document store that handles large, flexible datasets efficiently.

Correct answer is: MongoDB

Q.108 Which of the following is a key feature of the 'Sequence Read Archive (SRA)'?

Only stores assembled genomes

Holds raw sequencing reads from high‑throughput platforms

Provides protein tertiary structures

Stores only human DNA sequences

Explanation - SRA archives the original reads before assembly or analysis.

Correct answer is: Holds raw sequencing reads from high‑throughput platforms

Q.109 In a relational database, what does 'normalization' primarily aim to achieve?

Increase query speed at the cost of redundancy

Reduce data redundancy and avoid anomalies

Create backup copies of tables

Provide graphical user interface

Explanation - Normalization organizes tables to minimize duplication and maintain consistency.

Correct answer is: Reduce data redundancy and avoid anomalies

Q.110 What does the acronym 'GTF' stand for?

Genetic Transfer Format

Gene Transfer Format

Genomic Text File

Gene Translation File

Explanation - GTF is a file format that records gene and transcript annotations.

Correct answer is: Gene Transfer Format

Q.111 Which database contains curated information on enzymes and their reactions?

BRENDA

KEGG

PDB

GenBank

Explanation - BRENDA is the comprehensive enzyme database, including reaction conditions.

Correct answer is: BRENDA

Q.112 Which of the following is a common challenge in biological database management?

Ensuring data quality and consistency across different data sources

Building a physical laboratory

Sequencing DNA in a single step

Printing large images

Explanation - Data heterogeneity often leads to integration issues and requires curation.

Correct answer is: Ensuring data quality and consistency across different data sources

Q.113 Which of the following best describes a 'sequence alignment'?

Merging two sequences into one

Determining the similarity between two or more sequences

Converting a DNA sequence to RNA

Counting the number of nucleotides

Explanation - Alignment aligns sequences to identify conserved regions and infer evolutionary relationships.

Correct answer is: Determining the similarity between two or more sequences

Q.114 Which database provides curated data on genomic variation and its clinical significance?

ClinVar

PDB

KEGG

GenBank

Explanation - ClinVar collects clinical interpretations of genetic variants.

Correct answer is: ClinVar

Q.115 Which of the following database types stores unstructured data like raw sequencing reads?

Relational database

NoSQL document store

Graph database

XML database

Explanation - NoSQL stores can handle large unstructured data efficiently.

Correct answer is: NoSQL document store

Q.116 What is the main purpose of the 'BioMart' tool?

To provide an interface for querying biological datasets across multiple databases

To sequence DNA directly from samples

To visualize 3‑D protein structures

To perform statistical analyses on clinical trials

Explanation - BioMart allows flexible, web‑based queries over integrated data sources.

Correct answer is: To provide an interface for querying biological datasets across multiple databases

Q.117 Which of the following best describes a 'metadata field'?

A field containing the primary sequence data

A field that holds descriptive information about the data (e.g., source, method)

A field that stores images only

A field that indicates the size of the file

Explanation - Metadata provides context for the primary data, facilitating reuse.

Correct answer is: A field that holds descriptive information about the data (e.g., source, method)

Q.118 Which database is primarily used for storing 3‑D structural models of proteins?

PDB

GenBank

KEGG

ClinVar

Explanation - The Protein Data Bank contains experimentally determined 3‑D structures.

Correct answer is: PDB

Q.119 What is a 'substitution matrix' used for in sequence alignment?

To store the sequence data itself

To assign scores to matches, mismatches, and gaps

To keep track of file formats

To control the database connection

Explanation - Substitution matrices (e.g., BLOSUM) guide the scoring of alignments.

Correct answer is: To assign scores to matches, mismatches, and gaps

Q.120 Which of the following best describes a 'relational database'?

A database that stores data in flat files

A database that uses tables and defines relationships among them

A database that only stores images

A database that does not support querying

Explanation - Relational databases structure data in tables linked by keys, enabling complex queries.

Correct answer is: A database that uses tables and defines relationships among them

Q.121 Which file format is commonly used to store raw sequencing read data along with per‑base quality scores?

FASTA

FASTQ

GenBank

PDB

Explanation - FASTQ files contain both the nucleotide sequence and quality information.

Correct answer is: FASTQ

Q.122 Which of the following is NOT a common database for storing gene expression data?

GEO

ArrayExpress

PDB

KEGG

Explanation - PDB stores protein structures, not expression data.

Correct answer is: PDB

Q.123 What does the acronym 'NCBI' stand for?

National Center for Biotechnology Information

National Council for Bioinformatics Innovation

New Catalogue of Biological Inferences

None of the above

Explanation - NCBI manages major biological databases and resources.

Correct answer is: National Center for Biotechnology Information

Q.124 Which database contains curated information on miRNA sequences?

miRBase

PDB

KEGG

GenBank

Explanation - miRBase is dedicated to microRNA sequences and annotations.

Correct answer is: miRBase

Q.125 Which of the following best describes the purpose of the 'Sequence Ontology' (SO)?

To provide a standard set of terms for describing genomic sequence features

To store raw sequencing reads

To predict protein folding

To manage laboratory equipment

Explanation - SO defines terms such as exon, intron, and variant types for consistent annotation.

Correct answer is: To provide a standard set of terms for describing genomic sequence features

Q.126 What is the main benefit of using a 'hash index' in a database?

It speeds up lookup operations

It compresses data

It creates redundant copies

It enforces foreign key constraints

Explanation - Hash indexes provide constant‑time average access to records.

Correct answer is: It speeds up lookup operations

Q.127 Which of the following best describes the 'FASTA' file header?

A line that starts with ‘>’ and contains an identifier

A line that starts with ‘#’ and contains metadata

A line that ends with ‘$’ and contains the sequence

A line that starts with ‘@’ and contains quality scores

Explanation - FASTA headers begin with ‘>’ and provide sequence identifiers and optional descriptions.

Correct answer is: A line that starts with ‘>’ and contains an identifier

Q.128 Which of the following is a key feature of the 'Ensembl' database?

It only stores bacterial genomes

It provides high‑quality, annotated eukaryotic genomes

It offers only protein structural data

It focuses exclusively on viral genomes

Explanation - Ensembl hosts curated genomes for many eukaryotes, including humans.

Correct answer is: It provides high‑quality, annotated eukaryotic genomes

Q.129 Which of the following file formats is used to store annotated genomic features?

GFF3

FASTA

FASTQ

PDB

Explanation - GFF3 (General Feature Format) describes gene locations, exons, and other features.

Correct answer is: GFF3

Q.130 What is the main purpose of a 'bioinformatics pipeline'?

To sequence DNA in a single step

To automate a series of computational analyses on biological data

To store raw data only

To provide a graphical interface for manual data entry

Explanation - Pipelines chain tools like alignment, assembly, and annotation into reproducible workflows.

Correct answer is: To automate a series of computational analyses on biological data

Q.131 Which of the following is a typical use of the 'BLAST' tool?

To assemble genomes from short reads

To compare a query sequence to a database and find similar sequences

To store raw sequencing data

To visualize 3‑D protein structures

Explanation - BLAST performs rapid similarity searches between a query and database sequences.

Correct answer is: To compare a query sequence to a database and find similar sequences

Q.132 Which database contains information on metabolic pathways across multiple species?

KEGG

PDB

GenBank

ClinVar

Explanation - KEGG maps genes and enzymes to metabolic and signaling pathways.

Correct answer is: KEGG

Q.133 What is a 'metadata schema' used for?

To define how data is stored physically

To describe the structure and meaning of metadata fields

To compress data files

To generate random sequences

Explanation - A schema specifies data types, relationships, and constraints for metadata.

Correct answer is: To describe the structure and meaning of metadata fields

Q.134 Which of the following best describes a 'data curation' process?

Creating new data from scratch

Cleaning, validating, and annotating existing data for reliability

Deleting outdated data

Transmitting data over a network

Explanation - Curators ensure datasets are accurate, complete, and consistently annotated.

Correct answer is: Cleaning, validating, and annotating existing data for reliability

Q.135 What does the 'GenBank' flat file feature table contain?

Only the sequence itself

Metadata about the sequence and annotations such as genes, CDS, and regulatory elements

3‑D structure coordinates

Only the accession number

Explanation - Feature tables detail the location and function of genomic elements.

Correct answer is: Metadata about the sequence and annotations such as genes, CDS, and regulatory elements

Q.136 Which of the following is a key advantage of using a graph database for biological data?

Efficient representation of complex relationships between entities

Only stores tabular data

Requires rigid schemas

Limited to small datasets

Explanation - Graph databases model entities as nodes and relationships as edges, ideal for interaction networks.

Correct answer is: Efficient representation of complex relationships between entities

Q.137 Which of the following file formats is used for storing annotated gene structures and transcript information?

GFF3

FASTA

GenBank

PDB

Explanation - GFF3 encodes features such as exons, introns, and transcripts with coordinates.

Correct answer is: GFF3

Q.138 Which of the following best describes the 'PDB' file header record 'HEADER'?

Provides a short description of the macromolecule

Stores the sequence data directly

Indicates the file type only

Contains the raw sequencing reads

Explanation - The HEADER record contains metadata like title, classification, and deposition date.

Correct answer is: Provides a short description of the macromolecule

Q.139 Which database provides curated information on protein‑protein interactions?

BioGRID

KEGG

GenBank

PDB

Explanation - BioGRID catalogs experimentally verified protein‑protein interactions across species.

Correct answer is: BioGRID

Q.140 What is the primary purpose of the 'Sequence Read Archive (SRA)'?

To store raw sequencing reads and metadata

To provide protein tertiary structures

To archive published research articles

To host databases of metabolic pathways

Explanation - SRA preserves raw reads from high‑throughput sequencing experiments.

Correct answer is: To store raw sequencing reads and metadata

Q.141 Which of the following best describes the 'GAF' file format?

An image format for protein structures

A tab‑delimited format for gene ontology annotations

A compressed binary file format

A text file for raw sequences

Explanation - GAF contains GO annotations in a structured, machine‑readable format.

Correct answer is: A tab‑delimited format for gene ontology annotations

Q.142 Which of the following best describes a 'primary database' in bioinformatics?

A database that stores raw experimental data directly from instruments

A database that aggregates curated data from multiple sources

A database that only stores images

A private database for personal use

Explanation - Primary databases collect the original data, while secondary databases provide curated views.

Correct answer is: A database that stores raw experimental data directly from instruments

Q.143 Which of the following is a commonly used format for representing 3‑D protein structures?

PDB

FASTA

FASTQ

GenBank

Explanation - PDB files store atomic coordinates and related information for macromolecules.

Correct answer is: PDB

Q.144 What does the acronym 'SRA' stand for in bioinformatics?

Sequence Read Archive

Standard Read Application

Sequence Repository Access

Statistical Reference Analysis

Explanation - The SRA holds raw sequencing reads from high‑throughput platforms.

Correct answer is: Sequence Read Archive

Q.145 Which of the following is NOT a typical database entry type for GenBank?

gene

CDS

protein

chromosome

Explanation - GenBank records describe genomic DNA or RNA, not individual proteins.

Correct answer is: protein

Q.146 Which of the following is a primary benefit of using a relational database for a biological database?

It allows for flexible schema changes on the fly

It supports complex joins across multiple tables for integrated queries

It can only store text data

It is slower than NoSQL for large datasets

Explanation - Relational databases excel at relational data and complex queries.

Correct answer is: It supports complex joins across multiple tables for integrated queries

Q.147 Which of the following best describes 'data provenance' in bioinformatics?

The storage location of raw data files

The record of how data was generated, processed, and curated

The size of the dataset

The format of the data file

Explanation - Provenance tracks the history of a dataset to ensure reproducibility.

Correct answer is: The record of how data was generated, processed, and curated

Q.148 Which of the following databases stores curated protein sequences and annotations?

UniProtKB

PDB

KEGG

GenBank

Explanation - UniProtKB is the main repository for protein sequences, including annotations.

Correct answer is: UniProtKB

Q.149 Which of the following best describes a 'FASTQ quality score' character?

It represents a nucleotide base

It encodes the confidence of each base call on a logarithmic scale

It indicates the sequence length

It is a binary flag

Explanation - FASTQ quality scores map to Phred scores indicating the probability of error.

Correct answer is: It encodes the confidence of each base call on a logarithmic scale

Q.150 Which of the following databases provides a comprehensive view of human metabolic pathways?

KEGG

PDB

GenBank

ClinVar

Explanation - KEGG maps genes and enzymes to metabolic pathways, including human pathways.

Correct answer is: KEGG

Q.151 Which of the following best describes the 'BioCyc' database collection?

A set of databases focused on microbial genomes only

A collection of curated metabolic pathway databases for multiple organisms

A database of protein tertiary structures

A database of clinical trials

Explanation - BioCyc contains organism‑specific pathway databases with detailed annotations.

Correct answer is: A collection of curated metabolic pathway databases for multiple organisms

Q.152 What is the main use of the 'SAMtools' software suite?

To align raw sequencing reads to a reference genome

To manipulate SAM/BAM files (sorting, indexing, filtering)

To visualize protein structures

To store raw sequencing reads in a database

Explanation - SAMtools provides utilities for working with alignment files in SAM/BAM format.

Correct answer is: To manipulate SAM/BAM files (sorting, indexing, filtering)

Q.1 What is the primary purpose of the GenBank database?

Q.2 Which file format is commonly used to store raw DNA sequencing reads?

Q.3 The Protein Data Bank (PDB) primarily contains data about?

Q.4 Which database contains detailed information on metabolic pathways?

Q.5 Which of the following is NOT a typical function of a biological database?

Q.6 What is the main advantage of using a relational database for biological data?

Q.7 Which of the following file formats is used to describe the structure of protein families?

Q.8 What does the acronym EMBL stand for in the context of biological databases?

Q.9 Which database is the primary source for functional annotations of genes in *Arabidopsis thaliana*?

Q.10 What is a key feature of the UniProtKB/Swiss‑Prot subset?

Q.11 Which query language is commonly used to retrieve data from RDF-based biological databases?

Q.12 In the FASTA file format, how are individual sequences identified?

Q.13 Which database contains information on protein–protein interaction networks?

Q.14 What does the GISAID database specialize in?

Q.15 Which of the following best describes the term 'ontology' in bioinformatics?

Q.16 The Ensembl database provides genomic data primarily for which type of organisms?

Q.17 Which of the following is a commonly used tool for aligning short sequencing reads to a reference genome?

Q.18 What is the role of the NCBI Taxonomy database?

Q.19 Which file format is typically used to represent phylogenetic trees?

Q.20 In a relational database, what is an 'index' used for?

Q.21 Which of the following best describes the FASTQ quality score encoding method Sanger?

Q.22 What does the acronym 'RNA‑seq' refer to?

Q.23 Which of the following is a primary advantage of using a NoSQL database for genomic data?

Q.24 Which type of metadata is essential for a sequencing dataset in a public repository?

Q.25 What is the main purpose of the BioProject record in NCBI?

Q.26 Which of the following describes a 'blast hit' in BLAST results?

Q.27 Which database contains curated information on protein functional families?

Q.28 What is a 'sequence identifier' in GenBank?

Q.29 Which of the following best represents a 'gene ontology (GO)' term?

Q.30 What is the primary function of the Sequence Read Archive (SRA)?

Q.31 Which of the following is NOT a typical format for representing protein families?

Q.32 The 'Ensemble' gene annotation system is most closely associated with which scientific discipline?

Q.33 Which of the following statements best describes a 'metadata schema' in the context of biological databases?

Q.34 Which database is a primary source for curated, reviewed protein sequences?

Q.35 What does 'SRA' stand for in bioinformatics?

Q.36 Which of the following best describes the purpose of the Gene Ontology Consortium?

Q.37 Which database contains curated information about drug–target interactions?

Q.38 In a relational database, what does 'normalization' primarily aim to achieve?

Q.39 Which of the following is a common tool for visualizing phylogenetic trees?

Q.40 Which of the following best describes the term 'ortholog'?

Q.41 Which database is known for providing 2‑D and 3‑D representations of metabolic pathways?

Q.42 What does the term 'FASTA format' refer to?

Q.43 Which of the following is a standard identifier used to refer to a specific protein in the UniProt database?

Q.44 In the context of databases, what does 'REST API' stand for?

Q.45 What is the primary use of the BioCyc database collection?

Q.46 Which of the following describes a 'feature' in a GenBank flat file?

Q.47 Which of the following best defines the term 'sequence alignment'?

Q.48 What does the 'BLAST' program primarily use to assess the similarity between sequences?

Q.49 Which database provides detailed gene expression data from microarray experiments?

Q.50 In a relational database, a 'foreign key' is used to:

Q.51 Which of the following tools is commonly used for visualizing high‑dimensional omics data?

Q.52 Which of the following databases focuses on non‑coding RNA sequences and their functions?

Q.53 What is the main purpose of the 'Sequence Ontology' (SO) in genomics?

Q.54 Which of the following best describes a 'feature table' in a GenBank file?

Q.55 Which of the following is a key advantage of using cloud storage for genomic datasets?

Q.56 What does the 'E‑value' in BLAST represent?

Q.57 Which file format is used to describe the secondary structure of proteins in the PDB file?

Q.58 Which database is specifically dedicated to storing curated information on enzyme‑catalyzed reactions?

Q.59 Which of the following is NOT a typical component of a 'FASTA header' line?

Q.60 What does the acronym 'NCBI' stand for?

Q.61 Which of the following best describes the use of a 'hash index' in database systems?

Q.62 Which of the following is a commonly used tool for assembling short sequencing reads into longer contigs?

Q.63 What does the 'GeneID' field in NCBI refer to?

Q.64 Which of the following best represents a 'primary database' in bioinformatics?

Q.65 Which of the following is NOT a typical component of a 'GenBank flat file'?

Q.66 What is the purpose of the 'BioMart' portal?

Q.67 Which of the following is a key challenge in maintaining biological databases?

Q.68 What does the 'GTF' file format represent in genomics?

Q.69 Which of the following best describes a 'secondary structure' prediction for RNA?

Q.70 Which of the following is a major feature of the 'Sequence Read Archive (SRA)'?

Q.71 In the context of bioinformatics databases, what is a 'controlled vocabulary'?

Q.72 Which of the following databases primarily provides curated information on genetic variants?

Q.73 Which of the following best describes 'sequence clustering' in bioinformatics?

Q.74 What does the 'SAM' file format store?

Q.75 Which of the following databases is a primary source for curated, high‑quality enzyme classification?

Q.76 In a relational database, a 'view' is:

Q.77 Which of the following is a key advantage of using version control for biological sequence databases?

Q.78 Which database includes curated information on small non‑coding RNAs such as miRNA and siRNA?

Q.79 What is a 'metadata field' in the context of a biological database?

Q.80 Which of the following best describes a 'circular genome'?

Q.9 Which database is the primary source for functional annotations of genes in Arabidopsis thaliana?