Algorithms in Bioinformatics # MCQs Practice set

Q.1 What is the time complexity of the naive substring search algorithm?

O(n)
O(n log n)
O(n^2)
O(n^3)
Explanation - The naive algorithm compares the pattern at every possible starting position, leading to a quadratic time complexity.
Correct answer is: O(n^2)

Q.2 Which of the following is NOT a typical file format for storing raw sequencing reads?

FASTA
FASTQ
BAM
SAM
Explanation - BAM is a binary alignment format; raw reads are usually stored in FASTA or FASTQ. SAM is the text counterpart of BAM.
Correct answer is: BAM

Q.3 The Needleman-Wunsch algorithm is used for which type of sequence alignment?

Local alignment
Global alignment
Protein structure alignment
Multiple sequence alignment
Explanation - Needleman-Wunsch performs optimal global alignment between two sequences.
Correct answer is: Global alignment

Q.4 Which scoring matrix is commonly used for aligning protein sequences?

PAM250
BLOSUM62
Identity
Gap penalty
Explanation - BLOSUM62 is a widely used substitution matrix for protein sequence alignment.
Correct answer is: BLOSUM62

Q.5 What does the 'C' in the FASTA file format stand for?

Compressed
Contig
Contig Identifier
Comment
Explanation - The 'C' indicates the comment line that starts with '>' in FASTA files.
Correct answer is: Comment

Q.6 In Hidden Markov Models (HMM) for gene prediction, what does the 'state' represent?

A specific nucleotide
A type of gene feature (e.g., exon, intron)
The quality of the sequencing data
A particular DNA sequence motif
Explanation - HMM states model functional genomic features such as exons and introns.
Correct answer is: A type of gene feature (e.g., exon, intron)

Q.7 Which of the following best describes a suffix tree?

A binary search tree for DNA bases
A data structure that stores all suffixes of a string for efficient substring queries
A tree used for phylogenetic analysis
A hierarchical clustering tool
Explanation - Suffix trees allow fast pattern matching and are useful in genomic sequence analysis.
Correct answer is: A data structure that stores all suffixes of a string for efficient substring queries

Q.8 What is the purpose of a 'gap penalty' in sequence alignment?

To reward matches
To penalize insertions/deletions
To normalize scores
To select the best alignment algorithm
Explanation - Gap penalties discourage excessive gaps in alignments, balancing matches and gaps.
Correct answer is: To penalize insertions/deletions

Q.9 In a microarray experiment, what does normalization aim to achieve?

Increase signal intensity
Remove technical variations between arrays
Add background noise
Simplify the data layout
Explanation - Normalization corrects for systematic biases, allowing meaningful comparisons.
Correct answer is: Remove technical variations between arrays

Q.10 Which algorithm is used to reconstruct a genome from short sequencing reads?

Dynamic programming
Eulerian path (de Bruijn graph)
Smith-Waterman
Needleman-Wunsch
Explanation - Genome assembly often models reads as edges in a de Bruijn graph, solving an Eulerian path.
Correct answer is: Eulerian path (de Bruijn graph)

Q.11 What does the acronym 'BLAST' stand for?

Basic Local Alignment Search Tool
Biological Language Analysis System Tool
Binary Linear Array Search Tool
Base Level Alignment Sequence Tool
Explanation - BLAST is a popular algorithm for quick sequence similarity searching.
Correct answer is: Basic Local Alignment Search Tool

Q.12 Which of the following is a common application of Principal Component Analysis (PCA) in bioinformatics?

Phylogenetic tree construction
Gene expression data dimensionality reduction
Protein folding simulation
DNA sequencing error correction
Explanation - PCA reduces dimensionality of high‑dimensional expression datasets.
Correct answer is: Gene expression data dimensionality reduction

Q.13 In the context of Next‑Generation Sequencing (NGS), what does a 'paired‑end read' refer to?

Two reads from the same DNA fragment sequenced from both ends
Two reads from two different fragments
One read sequenced twice
A read that includes a pair of identical sequences
Explanation - Paired‑end sequencing generates two reads per fragment, improving alignment accuracy.
Correct answer is: Two reads from the same DNA fragment sequenced from both ends

Q.14 Which of these algorithms is NOT used for clustering genes based on expression patterns?

k‑means
Hierarchical clustering
Smith‑Waterman
DBSCAN
Explanation - Smith‑Waterman is a local alignment algorithm, not a clustering method.
Correct answer is: Smith‑Waterman

Q.15 What is a key advantage of using a Hidden Markov Model over a simple Markov Chain for sequence modeling?

It can model variable-length sequences
It requires fewer parameters
It always has linear time complexity
It does not need training data
Explanation - HMMs include hidden states, allowing modeling of sequences with varying lengths and structures.
Correct answer is: It can model variable-length sequences

Q.16 In a BLAST search, what does an 'E‑value' of 1e-10 indicate?

A highly significant match
A random match
A low‑quality alignment
The sequence length in base pairs
Explanation - Low E‑values mean the match is unlikely by chance, indicating significance.
Correct answer is: A highly significant match

Q.17 Which data structure is most efficient for storing and querying k‑mer frequencies in large genomes?

Linked list
Binary search tree
Hash table
Stack
Explanation - Hash tables provide constant‑time access to k‑mer counts, crucial for large datasets.
Correct answer is: Hash table

Q.18 In phylogenetics, what is the purpose of a 'bootstrapping' analysis?

To generate synthetic sequences
To evaluate the support for branches in a tree
To calculate the evolutionary rate
To align sequences
Explanation - Bootstrapping resamples data to assess the robustness of phylogenetic tree branches.
Correct answer is: To evaluate the support for branches in a tree

Q.19 Which of the following is a feature of the FastQC software tool?

Aligns sequencing reads to a reference genome
Detects structural variants
Provides quality metrics for raw sequencing data
Assembles genomes from reads
Explanation - FastQC generates reports on read quality, GC content, etc., for raw data.
Correct answer is: Provides quality metrics for raw sequencing data

Q.20 What does the 'U' in UTR stand for in genomic annotation?

Untranslated
Upstream
Ubiquitous
Unknown
Explanation - UTR means Untranslated Region, which is not translated into protein.
Correct answer is: Untranslated

Q.21 Which algorithm would you use to find the longest common subsequence between two strings?

Dijkstra's algorithm
Levenshtein distance
Dynamic programming with a 2‑D table
QuickSort
Explanation - The LCS problem is solved by a dynamic programming matrix.
Correct answer is: Dynamic programming with a 2‑D table

Q.22 What is the primary goal of a 'de‑novo' genome assembly?

To assemble a genome using a reference sequence
To predict gene functions
To assemble a genome without a reference
To annotate the genome
Explanation - De‑novo assembly reconstructs genomes solely from reads.
Correct answer is: To assemble a genome without a reference

Q.23 In the context of sequencing, what does 'coverage' refer to?

The depth of sequencing reads over the genome
The number of different sequencing instruments used
The error rate in reads
The length of each read
Explanation - Coverage indicates how many times each base is sequenced on average.
Correct answer is: The depth of sequencing reads over the genome

Q.24 Which of these tools is commonly used for protein structure prediction based on homology?

BLAST
MODELLER
Bowtie
SAMtools
Explanation - MODELLER builds 3D protein models using homology modeling.
Correct answer is: MODELLER

Q.25 What is the main advantage of using a 'suffix array' over a 'suffix tree'?

Lower time complexity
Smaller memory footprint
Faster construction time
Supports dynamic updates
Explanation - Suffix arrays are more space‑efficient while retaining many search capabilities.
Correct answer is: Smaller memory footprint

Q.26 Which type of mutation results in a codon change that still codes for the same amino acid?

Synonymous
Non‑synonymous
Frameshift
Nonsense
Explanation - Synonymous mutations alter codons without changing the encoded amino acid.
Correct answer is: Synonymous

Q.27 In a Hidden Markov Model used for gene prediction, which algorithm finds the most probable sequence of states?

Viterbi
Forward
Baum-Welch
Gradient Descent
Explanation - The Viterbi algorithm computes the most likely state path.
Correct answer is: Viterbi

Q.28 Which of the following is NOT a standard step in RNA‑seq data processing?

Read trimming
Alignment to a reference genome
Protein structure modeling
Differential expression analysis
Explanation - RNA‑seq focuses on transcript quantification, not protein modeling.
Correct answer is: Protein structure modeling

Q.29 What is the purpose of a 'quality score' in FASTQ files?

To indicate the read length
To quantify the confidence of each base call
To specify the sequencing machine used
To encode the read’s mapping position
Explanation - Quality scores represent the probability of a base call error.
Correct answer is: To quantify the confidence of each base call

Q.30 Which algorithm is used to quickly align sequencing reads to a reference genome?

Smith‑Waterman
Burrows‑Wheeler Transform (BWT)
Levenshtein distance
QuickSort
Explanation - BWT‑based aligners (e.g., BWA, Bowtie) are fast and memory‑efficient.
Correct answer is: Burrows‑Wheeler Transform (BWT)

Q.31 In a gene expression microarray, what does a 'probe' represent?

A DNA sequence complementary to a target RNA
A protein of interest
An RNA‑binding protein
A fluorescent dye
Explanation - Probes hybridize to specific RNA transcripts, indicating expression levels.
Correct answer is: A DNA sequence complementary to a target RNA

Q.32 Which of the following is a key feature of the 'Smith‑Waterman' algorithm?

Global alignment
Local alignment
Multiple sequence alignment
Phylogenetic tree construction
Explanation - Smith‑Waterman finds optimal local alignments between sequence segments.
Correct answer is: Local alignment

Q.33 In a phylogenetic tree, what does the length of a branch typically represent?

Sequence length
Number of species
Genetic distance
Time of divergence
Explanation - Branch lengths often reflect the amount of evolutionary change.
Correct answer is: Genetic distance

Q.34 Which of the following is NOT a typical use of the Bioconductor project?

Genomic data analysis in R
Statistical modeling of biological data
Protein structure simulation
Visualization of high‑throughput data
Explanation - Bioconductor focuses on analysis of high‑throughput sequencing and expression data.
Correct answer is: Protein structure simulation

Q.35 Which term describes a genomic region that is transcribed but not translated into protein?

Coding sequence
UTR
Non‑coding RNA
Exon
Explanation - Non‑coding RNAs are transcribed but not translated.
Correct answer is: Non‑coding RNA

Q.36 Which algorithm is used to find the shortest path in a weighted graph?

Dijkstra's algorithm
Prim's algorithm
Kruskal's algorithm
Bellman–Ford algorithm
Explanation - Dijkstra’s finds shortest paths from a single source in non‑negative weighted graphs.
Correct answer is: Dijkstra's algorithm

Q.37 In sequence alignment, what does a 'gap opening penalty' represent?

The cost to start a new gap
The cost to extend an existing gap
The score for a match
The penalty for a mismatch
Explanation - Gap opening penalty discourages the initiation of new gaps.
Correct answer is: The cost to start a new gap

Q.38 Which of these metrics is used to evaluate the quality of a multiple sequence alignment?

Silhouette score
Sum of Pairs (SP) score
Jensen‑Shannon divergence
Entropy rate
Explanation - SP score measures the consistency of pairwise alignments within a multiple alignment.
Correct answer is: Sum of Pairs (SP) score

Q.39 Which file format is typically used to store compressed alignments?

FASTA
FASTQ
BAM
SAM
Explanation - BAM is the binary, compressed version of the SAM alignment format.
Correct answer is: BAM

Q.40 In the context of RNA‑seq, what does 'FPKM' stand for?

Fragments Per Kilobase of transcript per Million mapped reads
Fragments per Kilo base per Mapped reads
Full Position Kmer Matching
Frequency per Kilo of Microarray
Explanation - FPKM normalizes read counts by transcript length and sequencing depth.
Correct answer is: Fragments Per Kilobase of transcript per Million mapped reads

Q.41 Which of the following is NOT a typical function of a genetic variant caller?

Identify SNPs from sequencing data
Call structural variants
Predict protein tertiary structure
Annotate variants
Explanation - Variant callers focus on identifying genomic differences, not structure prediction.
Correct answer is: Predict protein tertiary structure

Q.42 What is the role of the 'forward algorithm' in an HMM?

To find the most probable state sequence
To compute the probability of an observation sequence
To train the HMM parameters
To decode the best path
Explanation - The forward algorithm sums over all possible state paths to compute sequence likelihood.
Correct answer is: To compute the probability of an observation sequence

Q.43 Which algorithm is used for constructing a minimal spanning tree?

Prim's algorithm
Dijkstra's algorithm
Bellman–Ford algorithm
Viterbi algorithm
Explanation - Prim's algorithm builds a minimal spanning tree from a weighted graph.
Correct answer is: Prim's algorithm

Q.44 In a de Bruijn graph used for assembly, what does an edge typically represent?

A read
A k‑mer
An overlap between k‑mers
A contig
Explanation - Edges connect k‑mers sharing a (k‑1)-mer overlap, enabling path traversal.
Correct answer is: An overlap between k‑mers

Q.45 Which of the following is a common method for correcting sequencing errors in high‑throughput data?

PCR amplification
Error‑correcting codes (e.g., Hamming code)
Read trimming
Multiple sequence alignment
Explanation - Error‑correcting codes can detect and correct certain errors in sequencing data.
Correct answer is: Error‑correcting codes (e.g., Hamming code)

Q.46 What does a 'false discovery rate (FDR)' control in statistical analyses?

The probability of a type I error per test
The proportion of false positives among all significant results
The probability of a type II error
The overall error rate in sequencing
Explanation - FDR limits the expected proportion of false positives when many tests are performed.
Correct answer is: The proportion of false positives among all significant results

Q.47 In the context of DNA microarrays, what is a 'spot'?

A region of the chip containing a specific probe
A fluorescent dye
A data point in the analysis
A type of sequencing error
Explanation - Each spot holds identical copies of a probe that hybridizes to target DNA.
Correct answer is: A region of the chip containing a specific probe

Q.48 Which of the following best describes the 'Read‑1' in paired‑end sequencing?

The first read from one DNA fragment
The second read from one DNA fragment
A read from the reverse strand
A duplicate of Read‑2
Explanation - Read‑1 is the first of two reads sequenced from opposite ends of a fragment.
Correct answer is: The first read from one DNA fragment

Q.49 Which of the following is a key assumption of the standard model for phylogenetic tree inference?

All mutations are independent and identically distributed
Sequences are of equal length
Sequences are perfectly aligned
All branch lengths are equal
Explanation - Phylogenetic models often assume i.i.d. evolution across sites.
Correct answer is: All mutations are independent and identically distributed

Q.50 What does the acronym 'SAM' stand for in genomics?

Sequence Alignment Map
Simple Alignment Method
Sequence Analysis Model
Standard Alignment Matrix
Explanation - SAM is a text format for storing sequence alignment information.
Correct answer is: Sequence Alignment Map

Q.51 Which type of sequencing library preparation results in reads from both ends of the original DNA fragment?

Paired‑end library
Mate‑pair library
Single‑cell library
Targeted sequencing library
Explanation - Paired‑end libraries are designed to sequence both ends of fragments.
Correct answer is: Paired‑end library

Q.52 In a k‑mer counting task, which algorithmic approach reduces memory usage by hashing?

Suffix tree traversal
Bloom filter
Hash table
Depth‑first search
Explanation - Hash tables store k‑mers and their counts efficiently.
Correct answer is: Hash table

Q.53 Which algorithm is commonly used for aligning sequencing reads to a reference genome with high speed?

Smith‑Waterman
Bowtie
Needleman‑Wunsch
Levenshtein
Explanation - Bowtie uses the Burrows‑Wheeler transform for fast read alignment.
Correct answer is: Bowtie

Q.54 Which of the following best describes the 'k‑means' algorithm?

A supervised learning method
A method for aligning sequences
An unsupervised clustering algorithm
A phylogenetic tree reconstruction method
Explanation - k‑means partitions data into k clusters based on feature similarity.
Correct answer is: An unsupervised clustering algorithm

Q.55 Which data structure is used by the popular 'BWA' aligner?

Suffix array
Suffix tree
Trie
Binary heap
Explanation - BWA uses a compressed suffix array (FM‑index) for efficient alignment.
Correct answer is: Suffix array

Q.56 Which of the following metrics is used to evaluate the significance of a BLAST hit?

GC content
E‑value
Coverage
Identity
Explanation - The E‑value estimates the probability of obtaining a match by chance.
Correct answer is: E‑value

Q.57 What does a 'phylogenetic tree' illustrate?

The sequence alignment of DNA fragments
The evolutionary relationships between species
The gene expression levels of a single organism
The structure of a protein complex
Explanation - Phylogenetic trees depict shared ancestry and divergence.
Correct answer is: The evolutionary relationships between species

Q.58 Which of the following best describes the 'FASTA' format?

A binary file for alignments
A compressed text format for raw reads
A simple text format for nucleotide or protein sequences
A database schema for genomic data
Explanation - FASTA stores sequences with a header line starting with '>'.
Correct answer is: A simple text format for nucleotide or protein sequences

Q.59 In the context of microarray data, what does 'log₂ fold change' measure?

The ratio of expression levels between two conditions
The absolute difference in expression levels
The background fluorescence intensity
The sequencing coverage
Explanation - Log₂ fold change quantifies up‑ or down‑regulation between samples.
Correct answer is: The ratio of expression levels between two conditions

Q.60 Which algorithm is used for detecting structural variants in genomic data?

SAMtools mpileup
GATK HaplotypeCaller
BreakDancer
Bowtie
Explanation - BreakDancer identifies structural variations such as insertions and deletions.
Correct answer is: BreakDancer

Q.61 Which of the following is NOT a component of a typical sequencing workflow?

Library preparation
Read alignment
Protein folding prediction
Variant calling
Explanation - Protein folding prediction is unrelated to sequencing pipelines.
Correct answer is: Protein folding prediction

Q.62 What does the 'forward' algorithm compute in an HMM?

The probability of the most likely state path
The total probability of observing the sequence
The posterior probabilities of states
The error rate of the model
Explanation - The forward algorithm sums probabilities over all paths to get likelihood.
Correct answer is: The total probability of observing the sequence

Q.63 Which of the following is a common measure of gene expression derived from RNA‑seq?

RPKM
TPM
FPKM
All of the above
Explanation - RPKM, TPM, and FPKM are all normalization metrics for RNA‑seq data.
Correct answer is: All of the above

Q.64 In the context of sequence assembly, what is a 'contig'?

A short read from sequencing
A contiguous stretch of assembled sequence
A type of sequencing error
A data compression algorithm
Explanation - Contigs are longer sequences formed by merging overlapping reads.
Correct answer is: A contiguous stretch of assembled sequence

Q.65 Which of the following is NOT a step in the basic RNA‑seq analysis pipeline?

Read quality control
Alignment to a reference genome
Protein tertiary structure modeling
Differential expression analysis
Explanation - RNA‑seq focuses on transcript quantification, not protein modeling.
Correct answer is: Protein tertiary structure modeling

Q.66 What does the 'E‑value' in BLAST represent?

The probability that the match is random
The alignment score
The number of mismatches
The length of the query sequence
Explanation - E‑value estimates how likely the alignment would occur by chance.
Correct answer is: The probability that the match is random

Q.67 Which of the following tools is commonly used for de‑novo assembly of short reads?

SPAdes
Bowtie
SAMtools
MAFFT
Explanation - SPAdes is a popular assembler for short‑read sequencing data.
Correct answer is: SPAdes

Q.68 In a Hidden Markov Model for gene prediction, what does a 'transition probability' describe?

The likelihood of observing a particular base
The chance of moving from one state to another
The quality score of a read
The alignment score
Explanation - Transition probabilities govern state changes in an HMM.
Correct answer is: The chance of moving from one state to another

Q.69 Which of the following best describes a 'k‑mer'?

A protein motif
A substring of length k
A genomic region of length k
A type of sequencing error
Explanation - k‑mers are all possible substrings of fixed length k in a sequence.
Correct answer is: A substring of length k

Q.70 What is the purpose of a 'quality trimming' step in RNA‑seq preprocessing?

To remove low‑quality bases from reads
To align reads to a reference genome
To predict gene functions
To assemble the genome
Explanation - Quality trimming eliminates unreliable base calls to improve downstream analysis.
Correct answer is: To remove low‑quality bases from reads

Q.71 Which of the following best describes a 'de Bruijn graph' used in genome assembly?

A graph where nodes are k‑mers and edges represent overlaps
A graph of all possible alignments
A representation of phylogenetic relationships
A data structure for storing suffixes
Explanation - de Bruijn graphs encode k‑mer overlaps to reconstruct sequences.
Correct answer is: A graph where nodes are k‑mers and edges represent overlaps

Q.72 Which of the following is a typical metric used to assess differential gene expression significance?

P‑value
Fold change
Both A and B
Alignment score
Explanation - Significance combines statistical p‑values with biological fold changes.
Correct answer is: Both A and B

Q.73 Which algorithm is often used to reconstruct phylogenetic trees from distance matrices?

NJ (Neighbor‑Joining)
Smith‑Waterman
Viterbi
Dijkstra
Explanation - Neighbor‑Joining constructs trees from pairwise distance data.
Correct answer is: NJ (Neighbor‑Joining)

Q.74 In the context of genomics, what does 'GC content' refer to?

The proportion of guanine and cytosine bases
The number of genes in a genome
The coverage depth
The error rate
Explanation - GC content is the percentage of G and C nucleotides in a sequence.
Correct answer is: The proportion of guanine and cytosine bases

Q.75 What is the main benefit of using a 'paired‑end' library over a 'single‑end' library?

Higher read length
Improved mapping accuracy
Lower cost
Faster sequencing
Explanation - Paired‑end reads provide positional information that aids alignment.
Correct answer is: Improved mapping accuracy

Q.76 Which of the following is a common error type in next‑generation sequencing?

Insertion
Deletion
Substitution
All of the above
Explanation - NGS platforms can produce insertions, deletions, and substitutions.
Correct answer is: All of the above

Q.77 What does 'Read depth' refer to in sequencing?

The average number of times a base is sequenced
The maximum read length
The number of reads in a library
The error rate
Explanation - Read depth (coverage) indicates redundancy of sequencing data.
Correct answer is: The average number of times a base is sequenced

Q.78 Which algorithm is used to compute the most probable alignment path in an HMM?

Forward
Viterbi
Baum‑Welch
HMM‑Tagger
Explanation - The Viterbi algorithm finds the highest‑probability state sequence.
Correct answer is: Viterbi

Q.79 Which of the following is NOT a common step in variant annotation?

Predicting functional impact
Assigning gene symbols
Visualizing alignments
Calling SNPs from raw reads
Explanation - Variant calling precedes annotation, which interprets known variants.
Correct answer is: Calling SNPs from raw reads

Q.80 In a FASTQ file, which line corresponds to the quality scores?

The line starting with '>'
The second line of each four‑line record
The third line of each four‑line record
The fourth line of each four‑line record
Explanation - The fourth line encodes ASCII‑encoded quality values.
Correct answer is: The fourth line of each four‑line record

Q.81 What is the purpose of a 'k‑mer spectrum plot' in genome assembly?

To estimate genome size
To visualize sequencing error rates
To determine optimal k‑mer length
All of the above
Explanation - k‑mer spectra help assess repeat structure, coverage, and errors.
Correct answer is: All of the above

Q.82 Which of the following is a key feature of the 'BWA‑MEM' algorithm?

It uses suffix arrays for alignment
It is designed for short reads only
It reports split alignments for structural variants
It performs local alignment only
Explanation - BWA‑MEM handles longer reads and outputs split alignments.
Correct answer is: It reports split alignments for structural variants

Q.83 What is the main difference between a 'reference genome' and a 'pangenome'?

A reference genome is a single consensus sequence; a pangenome includes multiple genomes
A pangenome is a smaller subset of a reference genome
A reference genome is always human; a pangenome can be any species
There is no difference
Explanation - Pangenomes capture genetic diversity across multiple individuals or strains.
Correct answer is: A reference genome is a single consensus sequence; a pangenome includes multiple genomes

Q.84 Which algorithm is typically used to detect SNPs in sequencing data?

Bowtie
SAMtools mpileup
BLAST
MAFFT
Explanation - mpileup aggregates read data to identify variants.
Correct answer is: SAMtools mpileup

Q.85 Which of the following best describes 'short‑read sequencing'?

Sequencing of long DNA fragments over 50 kb
Sequencing of DNA fragments typically < 300 bp
Sequencing of RNA molecules only
Sequencing of entire genomes in one read
Explanation - Short‑read sequencing platforms produce reads of a few hundred bases.
Correct answer is: Sequencing of DNA fragments typically < 300 bp

Q.86 In the context of DNA methylation analysis, what does the 'bisulfite treatment' do?

Converts unmethylated cytosines to uracil
Adds methyl groups to all cytosines
Cuts DNA at methylated sites
Fluorescently labels methylated bases
Explanation - Bisulfite converts C to U (read as T) if unmethylated, enabling detection.
Correct answer is: Converts unmethylated cytosines to uracil

Q.87 Which of the following is a commonly used clustering method for gene expression data?

Hierarchical clustering
K‑means clustering
Both A and B
None of the above
Explanation - Both hierarchical and k‑means clustering are standard approaches.
Correct answer is: Both A and B

Q.88 What does the 'CIGAR' string in a SAM/BAM file describe?

The read’s quality scores
The alignment operations and lengths
The read identifier
The reference sequence name
Explanation - CIGAR encodes matches, mismatches, insertions, deletions, etc.
Correct answer is: The alignment operations and lengths

Q.89 Which of the following best describes 'mismatch penalty' in alignment algorithms?

The score given for a base mismatch
The penalty for starting a gap
The reward for matching bases
The cost of aligning to a reference
Explanation - Mismatch penalties penalize mismatched base pairs during alignment.
Correct answer is: The score given for a base mismatch

Q.90 Which of the following is NOT a typical output of a BLAST search?

Alignment score
E‑value
Coverage statistics
Protein tertiary structure
Explanation - BLAST reports alignment metrics, not 3D structures.
Correct answer is: Protein tertiary structure

Q.91 What is the purpose of a 'masking' step in genome assembly?

To remove repetitive sequences
To compress the assembly output
To annotate genes
To improve read quality
Explanation - Masking reduces assembly complexity by hiding repeats.
Correct answer is: To remove repetitive sequences

Q.92 Which of the following best describes 'next‑generation sequencing (NGS)'?

Sequencing based on Sanger methodology
High‑throughput, parallel sequencing technologies
Sequencing of proteins
A theoretical concept with no practical applications
Explanation - NGS refers to modern high‑throughput sequencing platforms.
Correct answer is: High‑throughput, parallel sequencing technologies

Q.93 Which of the following is a common method for visualizing phylogenetic trees?

Cladogram
Heat map
Scatter plot
Bar chart
Explanation - Cladograms are tree diagrams showing evolutionary relationships.
Correct answer is: Cladogram

Q.94 In the context of gene prediction, what does a 'gene model' typically include?

Exon positions, intron lengths, and coding sequence
Only the gene name
Protein tertiary structure
DNA methylation patterns
Explanation - Gene models predict the structure of a gene, including exons and introns.
Correct answer is: Exon positions, intron lengths, and coding sequence

Q.95 Which of the following best describes 'k‑mer hashing'?

Storing k‑mers in a hash table to count occurrences
Generating random k‑mers for simulation
Hashing the entire genome for compression
Mapping k‑mers to protein domains
Explanation - Hashing allows efficient counting of k‑mers in large datasets.
Correct answer is: Storing k‑mers in a hash table to count occurrences

Q.96 Which of the following is a typical metric used to evaluate alignment quality?

Identity percentage
GC content
Read length
Sequencing cost
Explanation - Alignment identity indicates the proportion of matching bases.
Correct answer is: Identity percentage

Q.97 In RNA‑seq, what does the term 'coverage uniformity' refer to?

Even distribution of reads across the genome
The consistency of sequencing costs
The ratio of coding to non‑coding regions
The error rate of sequencing
Explanation - Uniform coverage ensures reliable quantification of transcripts.
Correct answer is: Even distribution of reads across the genome

Q.98 Which of the following best describes 'de‑novo assembly'?

Assembly using a reference genome
Assembly without any reference information
Assembly of RNA transcripts only
Assembly of protein sequences only
Explanation - De‑novo assembly reconstructs a genome solely from sequencing reads.
Correct answer is: Assembly without any reference information

Q.99 Which of the following is a common tool for phylogenetic tree visualization?

FigTree
BLAST
SAMtools
MAFFT
Explanation - FigTree is widely used to view and edit phylogenetic trees.
Correct answer is: FigTree

Q.100 What is the main purpose of 'adapter trimming' in sequencing data preprocessing?

To remove sequencing adapters that were ligated during library preparation
To trim low‑quality bases at read ends
To align reads to a reference
To compress the data file
Explanation - Adapter trimming removes artificial sequences that interfere with alignment.
Correct answer is: To remove sequencing adapters that were ligated during library preparation

Q.101 Which algorithm is used to solve the shortest superstring problem in genome assembly?

Greedy algorithm
Smith‑Waterman
Viterbi
Dijkstra
Explanation - A greedy approach is commonly used to assemble overlapping reads into a superstring.
Correct answer is: Greedy algorithm

Q.102 Which of the following is an example of a 'gene ontology (GO)' term?

DNA repair
Signal transduction
All of the above
None of the above
Explanation - GO terms classify genes into biological processes, molecular functions, and cellular components.
Correct answer is: All of the above

Q.103 What does the 'GC skew' measure in a genomic sequence?

The difference between G and C content
The ratio of G to C bases
The absolute amount of G and C bases
The variation of GC content across the genome
Explanation - GC skew is calculated as (G - C) / (G + C) to assess strand bias.
Correct answer is: The difference between G and C content

Q.104 Which of the following best describes the 'E‑value' threshold of 1e-5 in a BLAST search?

A highly significant match
A moderately significant match
A random match
No match
Explanation - An E‑value of 1e-5 indicates a reasonably good alignment but not extremely significant.
Correct answer is: A moderately significant match

Q.105 Which of the following is a key component of a 'variant annotation pipeline'?

Variant calling
Functional impact prediction
Both A and B
Data compression
Explanation - Annotation pipelines take called variants and predict their biological effects.
Correct answer is: Both A and B

Q.106 In RNA‑seq, what does the term 'TPM' stand for?

Transcripts Per Million
Transcripts Per Microarray
Transcripts Per Metagene
Transcripts Per Match
Explanation - TPM normalizes read counts for transcript length and library size.
Correct answer is: Transcripts Per Million

Q.107 Which of the following describes a 'de Bruijn graph' edge?

A nucleotide
A k‑mer
An overlap between two k‑mers
A read fragment
Explanation - Edges represent (k‑1)-mer overlaps between adjacent k‑mers.
Correct answer is: An overlap between two k‑mers

Q.108 Which of the following is a common method for correcting sequencing errors in Illumina data?

Error‑correcting HMMs
Quasi‑Monte Carlo simulation
PCR amplification
Optical readout
Explanation - HMMs can be trained to distinguish true variants from errors.
Correct answer is: Error‑correcting HMMs

Q.109 What is the role of a 'phred quality score' in sequencing data?

Indicates the probability of a base call error
Measures the length of the read
Denotes the GC content
Shows the alignment score
Explanation - Phred scores provide an estimate of the base call accuracy.
Correct answer is: Indicates the probability of a base call error

Q.110 Which of the following best describes the 'BLASTN' program?

Protein‑protein alignment
DNA‑DNA local alignment
RNA‑RNA alignment
Protein‑DNA alignment
Explanation - BLASTN aligns nucleotide sequences locally.
Correct answer is: DNA‑DNA local alignment

Q.111 What is the typical length range of a 'read' in Illumina NovaSeq sequencing?

50–100 bp
150–300 bp
500–1000 bp
10–20 kb
Explanation - Illumina NovaSeq commonly generates paired‑end reads of 150–300 bp each.
Correct answer is: 150–300 bp

Q.112 Which of the following is NOT a feature of a 'cDNA library'?

Derived from mRNA
Contains full‑length transcripts
Includes non‑coding RNAs
Requires reverse transcription
Explanation - Traditional cDNA libraries capture only coding mRNA, not non‑coding RNA.
Correct answer is: Includes non‑coding RNAs

Q.113 In a de Bruijn graph, what is the significance of 'bubbles'?

Represent sequencing errors
Indicate alternative paths due to polymorphisms
Mark start and end of reads
Signal GC‑rich regions
Explanation - Bubbles arise from divergent k‑mer paths, often reflecting SNPs or indels.
Correct answer is: Indicate alternative paths due to polymorphisms

Q.114 Which of the following is a standard method for normalizing microarray data?

Quantile normalization
Standard deviation scaling
Z‑score transformation
All of the above
Explanation - Quantile normalization ensures that the distribution of probe intensities is identical across arrays.
Correct answer is: Quantile normalization

Q.115 In the context of genome annotation, what does 'gene prediction' involve?

Identifying potential coding regions
Assigning functional annotations
Both A and B
None of the above
Explanation - Gene prediction identifies gene models and may annotate their functions.
Correct answer is: Both A and B

Q.116 Which algorithm is commonly used for aligning long reads to a reference genome?

BLAST
BWA‑MEM
Bowtie
MAFFT
Explanation - BWA‑MEM efficiently handles long reads and split alignments.
Correct answer is: BWA‑MEM

Q.117 What does 'NGS' stand for?

Next Generation Sequencing
New Genome System
Non‑Gaseous Sequencing
Nucleotide Grafting Sequence
Explanation - NGS refers to modern high‑throughput sequencing technologies.
Correct answer is: Next Generation Sequencing

Q.118 Which of the following is a measure of the similarity between two sequences?

Alignment score
GC content
Read length
Sequencing depth
Explanation - Alignment score quantifies how similar two sequences are after alignment.
Correct answer is: Alignment score

Q.119 Which of the following best describes 'base‑calling' in sequencing?

Converting raw signal data into nucleotide sequences
Aligning reads to a reference genome
Annotating genes
Assembling contigs
Explanation - Base‑calling interprets detector signals into DNA bases.
Correct answer is: Converting raw signal data into nucleotide sequences

Q.120 In RNA‑seq data analysis, what is the purpose of 'normalizing for library size'?

To correct for varying sequencing depths between samples
To reduce read length
To remove low‑quality reads
To adjust GC content
Explanation - Library size normalization ensures comparability of expression across samples.
Correct answer is: To correct for varying sequencing depths between samples

Q.121 Which of the following is an example of a 'sequence alignment tool'?

SAMtools
BLAST
MAFFT
All of the above
Explanation - SAMtools, BLAST, and MAFFT all perform sequence alignment or related tasks.
Correct answer is: All of the above

Q.122 Which of the following best describes a 'gene ontology (GO)' annotation?

A classification of genes into biological processes, cellular components, and molecular functions
A measure of gene expression levels
A method for sequencing DNA
An alignment scoring matrix
Explanation - GO provides standardized vocabularies for gene functions.
Correct answer is: A classification of genes into biological processes, cellular components, and molecular functions

Q.123 What is the purpose of a 'reference genome' in sequencing?

To provide a template for read alignment
To serve as a database for gene functions
To replace the need for sequencing
To generate random reads
Explanation - The reference genome guides mapping of sequencing reads.
Correct answer is: To provide a template for read alignment

Q.124 Which of the following best describes the 'Smith‑Waterman' algorithm?

A global alignment algorithm
A local alignment algorithm
A multiple sequence alignment algorithm
A phylogenetic tree construction algorithm
Explanation - Smith‑Waterman performs optimal local alignment between sequence segments.
Correct answer is: A local alignment algorithm

Q.125 Which of the following is a common type of sequencing error?

Insertion
Deletion
Substitution
All of the above
Explanation - NGS platforms can produce insertions, deletions, and substitutions.
Correct answer is: All of the above

Q.126 In the context of DNA sequencing, what is a 'read'?

A short DNA fragment sequenced by a platform
A long contiguous sequence assembled from reads
A reference genome
An alignment score
Explanation - Reads are the output units of sequencing machines.
Correct answer is: A short DNA fragment sequenced by a platform

Q.127 Which of the following is a widely used tool for de‑novo transcriptome assembly?

Trinity
SPAdes
BWA
SAMtools
Explanation - Trinity assembles RNA‑seq reads into transcript contigs.
Correct answer is: Trinity

Q.128 What is the main purpose of 'phasing' in genomics?

To determine the haplotype of a sample
To align reads to a reference genome
To trim adapters
To compress data
Explanation - Phasing assigns variants to specific haplotypes.
Correct answer is: To determine the haplotype of a sample

Q.129 Which of the following is a method for detecting differential methylation?

Bisulfite sequencing
RNA‑seq
ChIP‑seq
ATAC‑seq
Explanation - Bisulfite sequencing distinguishes methylated cytosines.
Correct answer is: Bisulfite sequencing

Q.130 What does the 'C' in 'CIGAR' stand for?

Complementary
Count
Match
Operations
Explanation - CIGAR encodes alignment operations like matches, mismatches, insertions, and deletions.
Correct answer is: Operations

Q.131 Which of the following is a characteristic of a 'reference‑based assembly'?

Uses a known genome to guide assembly
Does not require a reference
Only assembles short reads
Requires a complete de‑novo approach
Explanation - Reference‑based assembly maps reads onto an existing genome sequence.
Correct answer is: Uses a known genome to guide assembly

Q.132 Which of the following is NOT a type of 'mRNA sequencing' method?

Poly‑A capture
Ribosomal RNA depletion
Whole‑genome sequencing
Stranded library preparation
Explanation - Whole‑genome sequencing is not focused on mRNA.
Correct answer is: Whole‑genome sequencing

Q.133 In a multiple sequence alignment, what does a 'conserved region' indicate?

High variability across sequences
Low sequence similarity
A region that is highly similar across sequences
An error in alignment
Explanation - Conserved regions suggest functional or evolutionary importance.
Correct answer is: A region that is highly similar across sequences

Q.134 Which of the following best describes a 'de Bruijn graph' vertex?

A single base
A k‑mer
An overlap of k‑mers
A read fragment
Explanation - Vertices represent k‑mers in the graph.
Correct answer is: A k‑mer

Q.135 Which of the following is a typical output of a differential expression analysis?

A list of genes with adjusted p‑values and fold changes
A heat map of sequencing reads
A phylogenetic tree
A DNA sequence alignment
Explanation - Differential expression outputs statistical metrics for each gene.
Correct answer is: A list of genes with adjusted p‑values and fold changes

Q.136 In genome assembly, what is a 'contig'?

A single read
A sequence of overlapping reads assembled into a longer sequence
A type of alignment score
A type of base quality score
Explanation - Contigs are contiguous sequences produced from read assembly.
Correct answer is: A sequence of overlapping reads assembled into a longer sequence

Q.137 Which of the following best describes 'k‑means clustering'?

A method for aligning sequences
A supervised learning algorithm
An unsupervised clustering technique that partitions data into k groups
A phylogenetic tree reconstruction method
Explanation - k‑means groups data points into clusters based on similarity.
Correct answer is: An unsupervised clustering technique that partitions data into k groups

Q.138 What does the 'SAM' format store?

Sequence alignments in plain text
Compressed genomic sequences
Raw sequencing reads
Protein structures
Explanation - SAM is a text format for storing read alignments.
Correct answer is: Sequence alignments in plain text

Q.139 Which of the following is a common method for visualizing read coverage?

Heat map
Bar chart
Line plot
All of the above
Explanation - Coverage is often displayed as a line, heat map, or bar chart.
Correct answer is: All of the above

Q.140 Which of the following best describes an 'alignment score' in sequence alignment?

A measure of the number of mismatches
A measure of similarity between two sequences after alignment
The number of gaps in the alignment
The length of the alignment
Explanation - Alignment scores reflect how well two sequences align.
Correct answer is: A measure of similarity between two sequences after alignment

Q.141 Which of the following is a characteristic of a 'gene expression matrix'?

Rows are genes, columns are samples, and values are expression levels
Rows are samples, columns are genes, and values are base quality scores
Rows are chromosomes, columns are positions, and values are GC content
Rows are reads, columns are bases, and values are read lengths
Explanation - Gene expression matrices store quantified expression values.
Correct answer is: Rows are genes, columns are samples, and values are expression levels

Q.142 Which of the following best describes a 'phylogenetic tree'?

A diagram showing evolutionary relationships between organisms
A list of genes with functional annotations
A sequence alignment of proteins
A statistical summary of sequencing quality
Explanation - Phylogenetic trees represent evolutionary histories.
Correct answer is: A diagram showing evolutionary relationships between organisms

Q.143 What is the main purpose of 'adapter trimming' in sequencing data preprocessing?

To remove artificial sequences that were ligated during library preparation
To trim low‑quality bases at read ends
To align reads to a reference genome
To compress the data file
Explanation - Adapter trimming eliminates artificial sequences that interfere with alignment.
Correct answer is: To remove artificial sequences that were ligated during library preparation

Q.144 Which of the following is a common approach to correct sequencing errors in Illumina data?

Error‑correcting HMMs
Quasi‑Monte Carlo simulation
PCR amplification
Optical readout
Explanation - HMMs can be trained to distinguish true variants from errors.
Correct answer is: Error‑correcting HMMs

Q.145 In the context of RNA‑seq, what does the term 'TPM' stand for?

Transcripts Per Million
Transcripts Per Microarray
Transcripts Per Metagene
Transcripts Per Match
Explanation - TPM normalizes read counts for transcript length and library size.
Correct answer is: Transcripts Per Million

Q.146 Which of the following best describes a 'gene ontology (GO)' annotation?

A classification of genes into biological processes, cellular components, and molecular functions
A measure of gene expression levels
A method for sequencing DNA
An alignment scoring matrix
Explanation - GO provides standardized vocabularies for gene functions.
Correct answer is: A classification of genes into biological processes, cellular components, and molecular functions

Q.147 Which of the following is NOT a common type of sequencing error?

Insertion
Deletion
Substitution
Amplification
Explanation - Amplification is a method, not an error type.
Correct answer is: Amplification

Q.148 What does the term 'de‑novo assembly' refer to in genomics?

Assembly using a reference genome
Assembly without any reference information
Assembly of RNA transcripts only
Assembly of protein sequences only
Explanation - De‑novo assembly reconstructs a genome solely from sequencing reads.
Correct answer is: Assembly without any reference information

Q.149 Which of the following is a typical output of a BLAST search?

Alignment score
E‑value
Coverage statistics
Protein tertiary structure
Explanation - BLAST reports alignment metrics, not 3D structures.
Correct answer is: Protein tertiary structure