Algorithms in Bioinformatics # MCQs Practice set

Q.1 What is the time complexity of the naive substring search algorithm?

O(n)

O(n log n)

O(n^2)

O(n^3)

Explanation - The naive algorithm compares the pattern at every possible starting position, leading to a quadratic time complexity.

Correct answer is: O(n^2)

Q.2 Which of the following is NOT a typical file format for storing raw sequencing reads?

FASTA

FASTQ

BAM

SAM

Explanation - BAM is a binary alignment format; raw reads are usually stored in FASTA or FASTQ. SAM is the text counterpart of BAM.

Correct answer is: BAM

Q.3 The Needleman-Wunsch algorithm is used for which type of sequence alignment?

Local alignment

Global alignment

Protein structure alignment

Multiple sequence alignment

Explanation - Needleman-Wunsch performs optimal global alignment between two sequences.

Correct answer is: Global alignment

Q.4 Which scoring matrix is commonly used for aligning protein sequences?

PAM250

BLOSUM62

Identity

Gap penalty

Explanation - BLOSUM62 is a widely used substitution matrix for protein sequence alignment.

Correct answer is: BLOSUM62

Q.5 What does the 'C' in the FASTA file format stand for?

Compressed

Contig

Contig Identifier

Comment

Explanation - The 'C' indicates the comment line that starts with '>' in FASTA files.

Correct answer is: Comment

Q.6 In Hidden Markov Models (HMM) for gene prediction, what does the 'state' represent?

A specific nucleotide

A type of gene feature (e.g., exon, intron)

The quality of the sequencing data

A particular DNA sequence motif

Explanation - HMM states model functional genomic features such as exons and introns.

Correct answer is: A type of gene feature (e.g., exon, intron)

Q.7 Which of the following best describes a suffix tree?

A binary search tree for DNA bases

A data structure that stores all suffixes of a string for efficient substring queries

A tree used for phylogenetic analysis

A hierarchical clustering tool

Explanation - Suffix trees allow fast pattern matching and are useful in genomic sequence analysis.

Correct answer is: A data structure that stores all suffixes of a string for efficient substring queries

Q.8 What is the purpose of a 'gap penalty' in sequence alignment?

To reward matches

To penalize insertions/deletions

To normalize scores

To select the best alignment algorithm

Explanation - Gap penalties discourage excessive gaps in alignments, balancing matches and gaps.

Correct answer is: To penalize insertions/deletions

Q.9 In a microarray experiment, what does normalization aim to achieve?

Increase signal intensity

Remove technical variations between arrays

Add background noise

Simplify the data layout

Explanation - Normalization corrects for systematic biases, allowing meaningful comparisons.

Correct answer is: Remove technical variations between arrays

Q.10 Which algorithm is used to reconstruct a genome from short sequencing reads?

Dynamic programming

Eulerian path (de Bruijn graph)

Smith-Waterman

Needleman-Wunsch

Explanation - Genome assembly often models reads as edges in a de Bruijn graph, solving an Eulerian path.

Correct answer is: Eulerian path (de Bruijn graph)

Q.11 What does the acronym 'BLAST' stand for?

Basic Local Alignment Search Tool

Biological Language Analysis System Tool

Binary Linear Array Search Tool

Base Level Alignment Sequence Tool

Explanation - BLAST is a popular algorithm for quick sequence similarity searching.

Correct answer is: Basic Local Alignment Search Tool

Q.12 Which of the following is a common application of Principal Component Analysis (PCA) in bioinformatics?

Phylogenetic tree construction

Gene expression data dimensionality reduction

Protein folding simulation

DNA sequencing error correction

Explanation - PCA reduces dimensionality of high‑dimensional expression datasets.

Correct answer is: Gene expression data dimensionality reduction

Q.13 In the context of Next‑Generation Sequencing (NGS), what does a 'paired‑end read' refer to?

Two reads from the same DNA fragment sequenced from both ends

Two reads from two different fragments

One read sequenced twice

A read that includes a pair of identical sequences

Explanation - Paired‑end sequencing generates two reads per fragment, improving alignment accuracy.

Correct answer is: Two reads from the same DNA fragment sequenced from both ends

Q.14 Which of these algorithms is NOT used for clustering genes based on expression patterns?

k‑means

Hierarchical clustering

Smith‑Waterman

DBSCAN

Explanation - Smith‑Waterman is a local alignment algorithm, not a clustering method.

Correct answer is: Smith‑Waterman

Q.15 What is a key advantage of using a Hidden Markov Model over a simple Markov Chain for sequence modeling?

It can model variable-length sequences

It requires fewer parameters

It always has linear time complexity

It does not need training data

Explanation - HMMs include hidden states, allowing modeling of sequences with varying lengths and structures.

Correct answer is: It can model variable-length sequences

Q.16 In a BLAST search, what does an 'E‑value' of 1e-10 indicate?

A highly significant match

A random match

A low‑quality alignment

The sequence length in base pairs

Explanation - Low E‑values mean the match is unlikely by chance, indicating significance.

Correct answer is: A highly significant match

Q.17 Which data structure is most efficient for storing and querying k‑mer frequencies in large genomes?

Linked list

Binary search tree

Hash table

Stack

Explanation - Hash tables provide constant‑time access to k‑mer counts, crucial for large datasets.

Correct answer is: Hash table

Q.18 In phylogenetics, what is the purpose of a 'bootstrapping' analysis?

To generate synthetic sequences

To evaluate the support for branches in a tree

To calculate the evolutionary rate

To align sequences

Explanation - Bootstrapping resamples data to assess the robustness of phylogenetic tree branches.

Correct answer is: To evaluate the support for branches in a tree

Q.19 Which of the following is a feature of the FastQC software tool?

Aligns sequencing reads to a reference genome

Detects structural variants

Provides quality metrics for raw sequencing data

Assembles genomes from reads

Explanation - FastQC generates reports on read quality, GC content, etc., for raw data.

Correct answer is: Provides quality metrics for raw sequencing data

Q.20 What does the 'U' in UTR stand for in genomic annotation?

Untranslated

Upstream

Ubiquitous

Unknown

Explanation - UTR means Untranslated Region, which is not translated into protein.

Correct answer is: Untranslated

Q.21 Which algorithm would you use to find the longest common subsequence between two strings?

Dijkstra's algorithm

Levenshtein distance

Dynamic programming with a 2‑D table

QuickSort

Explanation - The LCS problem is solved by a dynamic programming matrix.

Correct answer is: Dynamic programming with a 2‑D table

Q.22 What is the primary goal of a 'de‑novo' genome assembly?

To assemble a genome using a reference sequence

To predict gene functions

To assemble a genome without a reference

To annotate the genome

Explanation - De‑novo assembly reconstructs genomes solely from reads.

Correct answer is: To assemble a genome without a reference

Q.23 In the context of sequencing, what does 'coverage' refer to?

The depth of sequencing reads over the genome

The number of different sequencing instruments used

The error rate in reads

The length of each read

Explanation - Coverage indicates how many times each base is sequenced on average.

Correct answer is: The depth of sequencing reads over the genome

Q.24 Which of these tools is commonly used for protein structure prediction based on homology?

BLAST

MODELLER

Bowtie

SAMtools

Explanation - MODELLER builds 3D protein models using homology modeling.

Correct answer is: MODELLER

Q.25 What is the main advantage of using a 'suffix array' over a 'suffix tree'?

Lower time complexity

Smaller memory footprint

Faster construction time

Supports dynamic updates

Explanation - Suffix arrays are more space‑efficient while retaining many search capabilities.

Correct answer is: Smaller memory footprint

Q.26 Which type of mutation results in a codon change that still codes for the same amino acid?

Synonymous

Non‑synonymous

Frameshift

Nonsense

Explanation - Synonymous mutations alter codons without changing the encoded amino acid.

Correct answer is: Synonymous

Q.27 In a Hidden Markov Model used for gene prediction, which algorithm finds the most probable sequence of states?

Viterbi

Forward

Baum-Welch

Gradient Descent

Explanation - The Viterbi algorithm computes the most likely state path.

Correct answer is: Viterbi

Q.28 Which of the following is NOT a standard step in RNA‑seq data processing?

Read trimming

Alignment to a reference genome

Protein structure modeling

Differential expression analysis

Explanation - RNA‑seq focuses on transcript quantification, not protein modeling.

Correct answer is: Protein structure modeling

Q.29 What is the purpose of a 'quality score' in FASTQ files?

To indicate the read length

To quantify the confidence of each base call

To specify the sequencing machine used

To encode the read’s mapping position

Explanation - Quality scores represent the probability of a base call error.

Correct answer is: To quantify the confidence of each base call

Q.30 Which algorithm is used to quickly align sequencing reads to a reference genome?

Smith‑Waterman

Burrows‑Wheeler Transform (BWT)

Levenshtein distance

QuickSort

Explanation - BWT‑based aligners (e.g., BWA, Bowtie) are fast and memory‑efficient.

Correct answer is: Burrows‑Wheeler Transform (BWT)

Q.31 In a gene expression microarray, what does a 'probe' represent?

A DNA sequence complementary to a target RNA

A protein of interest

An RNA‑binding protein

A fluorescent dye

Explanation - Probes hybridize to specific RNA transcripts, indicating expression levels.

Correct answer is: A DNA sequence complementary to a target RNA

Q.32 Which of the following is a key feature of the 'Smith‑Waterman' algorithm?

Global alignment

Local alignment

Multiple sequence alignment

Phylogenetic tree construction

Explanation - Smith‑Waterman finds optimal local alignments between sequence segments.

Correct answer is: Local alignment

Q.33 In a phylogenetic tree, what does the length of a branch typically represent?

Sequence length

Number of species

Genetic distance

Time of divergence

Explanation - Branch lengths often reflect the amount of evolutionary change.

Correct answer is: Genetic distance

Q.34 Which of the following is NOT a typical use of the Bioconductor project?

Genomic data analysis in R

Statistical modeling of biological data

Protein structure simulation

Visualization of high‑throughput data

Explanation - Bioconductor focuses on analysis of high‑throughput sequencing and expression data.

Correct answer is: Protein structure simulation

Q.35 Which term describes a genomic region that is transcribed but not translated into protein?

Coding sequence

UTR

Non‑coding RNA

Exon

Explanation - Non‑coding RNAs are transcribed but not translated.

Correct answer is: Non‑coding RNA

Q.36 Which algorithm is used to find the shortest path in a weighted graph?

Dijkstra's algorithm

Prim's algorithm

Kruskal's algorithm

Bellman–Ford algorithm

Explanation - Dijkstra’s finds shortest paths from a single source in non‑negative weighted graphs.

Correct answer is: Dijkstra's algorithm

Q.37 In sequence alignment, what does a 'gap opening penalty' represent?

The cost to start a new gap

The cost to extend an existing gap

The score for a match

The penalty for a mismatch

Explanation - Gap opening penalty discourages the initiation of new gaps.

Correct answer is: The cost to start a new gap

Q.38 Which of these metrics is used to evaluate the quality of a multiple sequence alignment?

Silhouette score

Sum of Pairs (SP) score

Jensen‑Shannon divergence

Entropy rate

Explanation - SP score measures the consistency of pairwise alignments within a multiple alignment.

Correct answer is: Sum of Pairs (SP) score

Q.39 Which file format is typically used to store compressed alignments?

FASTA

FASTQ

BAM

SAM

Explanation - BAM is the binary, compressed version of the SAM alignment format.

Correct answer is: BAM

Q.40 In the context of RNA‑seq, what does 'FPKM' stand for?

Fragments Per Kilobase of transcript per Million mapped reads

Fragments per Kilo base per Mapped reads

Full Position Kmer Matching

Frequency per Kilo of Microarray

Explanation - FPKM normalizes read counts by transcript length and sequencing depth.

Correct answer is: Fragments Per Kilobase of transcript per Million mapped reads

Q.41 Which of the following is NOT a typical function of a genetic variant caller?

Identify SNPs from sequencing data

Call structural variants

Predict protein tertiary structure

Annotate variants

Explanation - Variant callers focus on identifying genomic differences, not structure prediction.

Correct answer is: Predict protein tertiary structure

Q.42 What is the role of the 'forward algorithm' in an HMM?

To find the most probable state sequence

To compute the probability of an observation sequence

To train the HMM parameters

To decode the best path

Explanation - The forward algorithm sums over all possible state paths to compute sequence likelihood.

Correct answer is: To compute the probability of an observation sequence

Q.43 Which algorithm is used for constructing a minimal spanning tree?

Prim's algorithm

Dijkstra's algorithm

Bellman–Ford algorithm

Viterbi algorithm

Explanation - Prim's algorithm builds a minimal spanning tree from a weighted graph.

Correct answer is: Prim's algorithm

Q.44 In a de Bruijn graph used for assembly, what does an edge typically represent?

A read

A k‑mer

An overlap between k‑mers

A contig

Explanation - Edges connect k‑mers sharing a (k‑1)-mer overlap, enabling path traversal.

Correct answer is: An overlap between k‑mers

Q.45 Which of the following is a common method for correcting sequencing errors in high‑throughput data?

PCR amplification

Error‑correcting codes (e.g., Hamming code)

Read trimming

Multiple sequence alignment

Explanation - Error‑correcting codes can detect and correct certain errors in sequencing data.

Correct answer is: Error‑correcting codes (e.g., Hamming code)

Q.46 What does a 'false discovery rate (FDR)' control in statistical analyses?

The probability of a type I error per test

The proportion of false positives among all significant results

The probability of a type II error

The overall error rate in sequencing

Explanation - FDR limits the expected proportion of false positives when many tests are performed.

Correct answer is: The proportion of false positives among all significant results

Q.47 In the context of DNA microarrays, what is a 'spot'?

A region of the chip containing a specific probe

A fluorescent dye

A data point in the analysis

A type of sequencing error

Explanation - Each spot holds identical copies of a probe that hybridizes to target DNA.

Correct answer is: A region of the chip containing a specific probe

Q.48 Which of the following best describes the 'Read‑1' in paired‑end sequencing?

The first read from one DNA fragment

The second read from one DNA fragment

A read from the reverse strand

A duplicate of Read‑2

Explanation - Read‑1 is the first of two reads sequenced from opposite ends of a fragment.

Correct answer is: The first read from one DNA fragment

Q.49 Which of the following is a key assumption of the standard model for phylogenetic tree inference?

All mutations are independent and identically distributed

Sequences are of equal length

Sequences are perfectly aligned

All branch lengths are equal

Explanation - Phylogenetic models often assume i.i.d. evolution across sites.

Correct answer is: All mutations are independent and identically distributed

Q.50 What does the acronym 'SAM' stand for in genomics?

Sequence Alignment Map

Simple Alignment Method

Sequence Analysis Model

Standard Alignment Matrix

Explanation - SAM is a text format for storing sequence alignment information.

Correct answer is: Sequence Alignment Map

Q.51 Which type of sequencing library preparation results in reads from both ends of the original DNA fragment?

Paired‑end library

Mate‑pair library

Single‑cell library

Targeted sequencing library

Explanation - Paired‑end libraries are designed to sequence both ends of fragments.

Correct answer is: Paired‑end library

Q.52 In a k‑mer counting task, which algorithmic approach reduces memory usage by hashing?

Suffix tree traversal

Bloom filter

Hash table

Depth‑first search

Explanation - Hash tables store k‑mers and their counts efficiently.

Correct answer is: Hash table

Q.53 Which algorithm is commonly used for aligning sequencing reads to a reference genome with high speed?

Smith‑Waterman

Bowtie

Needleman‑Wunsch

Levenshtein

Explanation - Bowtie uses the Burrows‑Wheeler transform for fast read alignment.

Correct answer is: Bowtie

Q.54 Which of the following best describes the 'k‑means' algorithm?

A supervised learning method

A method for aligning sequences

An unsupervised clustering algorithm

A phylogenetic tree reconstruction method

Explanation - k‑means partitions data into k clusters based on feature similarity.

Correct answer is: An unsupervised clustering algorithm

Q.55 Which data structure is used by the popular 'BWA' aligner?

Suffix array

Suffix tree

Trie

Binary heap

Explanation - BWA uses a compressed suffix array (FM‑index) for efficient alignment.

Correct answer is: Suffix array

Q.56 Which of the following metrics is used to evaluate the significance of a BLAST hit?

GC content

E‑value

Coverage

Identity

Explanation - The E‑value estimates the probability of obtaining a match by chance.

Correct answer is: E‑value

Q.57 What does a 'phylogenetic tree' illustrate?

The sequence alignment of DNA fragments

The evolutionary relationships between species

The gene expression levels of a single organism

The structure of a protein complex

Explanation - Phylogenetic trees depict shared ancestry and divergence.

Correct answer is: The evolutionary relationships between species

Q.58 Which of the following best describes the 'FASTA' format?

A binary file for alignments

A compressed text format for raw reads

A simple text format for nucleotide or protein sequences

A database schema for genomic data

Explanation - FASTA stores sequences with a header line starting with '>'.

Correct answer is: A simple text format for nucleotide or protein sequences

Q.59 In the context of microarray data, what does 'log₂ fold change' measure?

The ratio of expression levels between two conditions

The absolute difference in expression levels

The background fluorescence intensity

The sequencing coverage

Explanation - Log₂ fold change quantifies up‑ or down‑regulation between samples.

Correct answer is: The ratio of expression levels between two conditions

Q.60 Which algorithm is used for detecting structural variants in genomic data?

SAMtools mpileup

GATK HaplotypeCaller

BreakDancer

Bowtie

Explanation - BreakDancer identifies structural variations such as insertions and deletions.

Correct answer is: BreakDancer

Q.61 Which of the following is NOT a component of a typical sequencing workflow?

Library preparation

Read alignment

Protein folding prediction

Variant calling

Explanation - Protein folding prediction is unrelated to sequencing pipelines.

Correct answer is: Protein folding prediction

Q.62 What does the 'forward' algorithm compute in an HMM?

The probability of the most likely state path

The total probability of observing the sequence

The posterior probabilities of states

The error rate of the model

Explanation - The forward algorithm sums probabilities over all paths to get likelihood.

Correct answer is: The total probability of observing the sequence

Q.63 Which of the following is a common measure of gene expression derived from RNA‑seq?

RPKM

TPM

FPKM

All of the above

Explanation - RPKM, TPM, and FPKM are all normalization metrics for RNA‑seq data.

Correct answer is: All of the above

Q.64 In the context of sequence assembly, what is a 'contig'?

A short read from sequencing

A contiguous stretch of assembled sequence

A type of sequencing error

A data compression algorithm

Explanation - Contigs are longer sequences formed by merging overlapping reads.

Correct answer is: A contiguous stretch of assembled sequence

Q.65 Which of the following is NOT a step in the basic RNA‑seq analysis pipeline?

Read quality control

Alignment to a reference genome

Protein tertiary structure modeling

Differential expression analysis

Explanation - RNA‑seq focuses on transcript quantification, not protein modeling.

Correct answer is: Protein tertiary structure modeling

Q.66 What does the 'E‑value' in BLAST represent?

The probability that the match is random

The alignment score

The number of mismatches

The length of the query sequence

Explanation - E‑value estimates how likely the alignment would occur by chance.

Correct answer is: The probability that the match is random

Q.67 Which of the following tools is commonly used for de‑novo assembly of short reads?

SPAdes

Bowtie

SAMtools

MAFFT

Explanation - SPAdes is a popular assembler for short‑read sequencing data.

Correct answer is: SPAdes

Q.68 In a Hidden Markov Model for gene prediction, what does a 'transition probability' describe?

The likelihood of observing a particular base

The chance of moving from one state to another

The quality score of a read

The alignment score

Explanation - Transition probabilities govern state changes in an HMM.

Correct answer is: The chance of moving from one state to another

Q.69 Which of the following best describes a 'k‑mer'?

A protein motif

A substring of length k

A genomic region of length k

A type of sequencing error

Explanation - k‑mers are all possible substrings of fixed length k in a sequence.

Correct answer is: A substring of length k

Q.70 What is the purpose of a 'quality trimming' step in RNA‑seq preprocessing?

To remove low‑quality bases from reads

To align reads to a reference genome

To predict gene functions

To assemble the genome

Explanation - Quality trimming eliminates unreliable base calls to improve downstream analysis.

Correct answer is: To remove low‑quality bases from reads

Q.71 Which of the following best describes a 'de Bruijn graph' used in genome assembly?

A graph where nodes are k‑mers and edges represent overlaps

A graph of all possible alignments

A representation of phylogenetic relationships

A data structure for storing suffixes

Explanation - de Bruijn graphs encode k‑mer overlaps to reconstruct sequences.

Correct answer is: A graph where nodes are k‑mers and edges represent overlaps

Q.72 Which of the following is a typical metric used to assess differential gene expression significance?

P‑value

Fold change

Both A and B

Alignment score

Explanation - Significance combines statistical p‑values with biological fold changes.

Correct answer is: Both A and B

Q.73 Which algorithm is often used to reconstruct phylogenetic trees from distance matrices?

NJ (Neighbor‑Joining)

Smith‑Waterman

Viterbi

Dijkstra

Explanation - Neighbor‑Joining constructs trees from pairwise distance data.

Correct answer is: NJ (Neighbor‑Joining)

Q.74 In the context of genomics, what does 'GC content' refer to?

The proportion of guanine and cytosine bases

The number of genes in a genome

The coverage depth

The error rate

Explanation - GC content is the percentage of G and C nucleotides in a sequence.

Correct answer is: The proportion of guanine and cytosine bases

Q.75 What is the main benefit of using a 'paired‑end' library over a 'single‑end' library?

Higher read length

Improved mapping accuracy

Lower cost

Faster sequencing

Explanation - Paired‑end reads provide positional information that aids alignment.

Correct answer is: Improved mapping accuracy

Q.76 Which of the following is a common error type in next‑generation sequencing?

Insertion

Deletion

Substitution

All of the above

Explanation - NGS platforms can produce insertions, deletions, and substitutions.

Correct answer is: All of the above

Q.77 What does 'Read depth' refer to in sequencing?

The average number of times a base is sequenced

The maximum read length

The number of reads in a library

The error rate

Explanation - Read depth (coverage) indicates redundancy of sequencing data.

Correct answer is: The average number of times a base is sequenced

Q.78 Which algorithm is used to compute the most probable alignment path in an HMM?

Forward

Viterbi

Baum‑Welch

HMM‑Tagger

Explanation - The Viterbi algorithm finds the highest‑probability state sequence.

Correct answer is: Viterbi

Q.79 Which of the following is NOT a common step in variant annotation?

Predicting functional impact

Assigning gene symbols

Visualizing alignments

Calling SNPs from raw reads

Explanation - Variant calling precedes annotation, which interprets known variants.

Correct answer is: Calling SNPs from raw reads

Q.80 In a FASTQ file, which line corresponds to the quality scores?

The line starting with '>'

The second line of each four‑line record

The third line of each four‑line record

The fourth line of each four‑line record

Explanation - The fourth line encodes ASCII‑encoded quality values.

Correct answer is: The fourth line of each four‑line record

Q.81 What is the purpose of a 'k‑mer spectrum plot' in genome assembly?

To estimate genome size

To visualize sequencing error rates

To determine optimal k‑mer length

All of the above

Explanation - k‑mer spectra help assess repeat structure, coverage, and errors.

Correct answer is: All of the above

Q.82 Which of the following is a key feature of the 'BWA‑MEM' algorithm?

It uses suffix arrays for alignment

It is designed for short reads only

It reports split alignments for structural variants

It performs local alignment only

Explanation - BWA‑MEM handles longer reads and outputs split alignments.

Correct answer is: It reports split alignments for structural variants

Q.83 What is the main difference between a 'reference genome' and a 'pangenome'?

A reference genome is a single consensus sequence; a pangenome includes multiple genomes

A pangenome is a smaller subset of a reference genome

A reference genome is always human; a pangenome can be any species

There is no difference

Explanation - Pangenomes capture genetic diversity across multiple individuals or strains.

Correct answer is: A reference genome is a single consensus sequence; a pangenome includes multiple genomes

Q.84 Which algorithm is typically used to detect SNPs in sequencing data?

Bowtie

SAMtools mpileup

BLAST

MAFFT

Explanation - mpileup aggregates read data to identify variants.

Correct answer is: SAMtools mpileup

Q.85 Which of the following best describes 'short‑read sequencing'?

Sequencing of long DNA fragments over 50 kb

Sequencing of DNA fragments typically < 300 bp

Sequencing of RNA molecules only

Sequencing of entire genomes in one read

Explanation - Short‑read sequencing platforms produce reads of a few hundred bases.

Correct answer is: Sequencing of DNA fragments typically < 300 bp

Q.86 In the context of DNA methylation analysis, what does the 'bisulfite treatment' do?

Converts unmethylated cytosines to uracil

Adds methyl groups to all cytosines

Cuts DNA at methylated sites

Fluorescently labels methylated bases

Explanation - Bisulfite converts C to U (read as T) if unmethylated, enabling detection.

Correct answer is: Converts unmethylated cytosines to uracil

Q.87 Which of the following is a commonly used clustering method for gene expression data?

Hierarchical clustering

K‑means clustering

Both A and B

None of the above

Explanation - Both hierarchical and k‑means clustering are standard approaches.

Correct answer is: Both A and B

Q.88 What does the 'CIGAR' string in a SAM/BAM file describe?

The read’s quality scores

The alignment operations and lengths

The read identifier

The reference sequence name

Explanation - CIGAR encodes matches, mismatches, insertions, deletions, etc.

Correct answer is: The alignment operations and lengths

Q.89 Which of the following best describes 'mismatch penalty' in alignment algorithms?

The score given for a base mismatch

The penalty for starting a gap

The reward for matching bases

The cost of aligning to a reference

Explanation - Mismatch penalties penalize mismatched base pairs during alignment.

Correct answer is: The score given for a base mismatch

Q.90 Which of the following is NOT a typical output of a BLAST search?

Alignment score

E‑value

Coverage statistics

Protein tertiary structure

Explanation - BLAST reports alignment metrics, not 3D structures.

Correct answer is: Protein tertiary structure

Q.91 What is the purpose of a 'masking' step in genome assembly?

To remove repetitive sequences

To compress the assembly output

To annotate genes

To improve read quality

Explanation - Masking reduces assembly complexity by hiding repeats.

Correct answer is: To remove repetitive sequences

Q.92 Which of the following best describes 'next‑generation sequencing (NGS)'?

Sequencing based on Sanger methodology

High‑throughput, parallel sequencing technologies

Sequencing of proteins

A theoretical concept with no practical applications

Explanation - NGS refers to modern high‑throughput sequencing platforms.

Correct answer is: High‑throughput, parallel sequencing technologies

Q.93 Which of the following is a common method for visualizing phylogenetic trees?

Cladogram

Heat map

Scatter plot

Bar chart

Explanation - Cladograms are tree diagrams showing evolutionary relationships.

Correct answer is: Cladogram

Q.94 In the context of gene prediction, what does a 'gene model' typically include?

Exon positions, intron lengths, and coding sequence

Only the gene name

Protein tertiary structure

DNA methylation patterns

Explanation - Gene models predict the structure of a gene, including exons and introns.

Correct answer is: Exon positions, intron lengths, and coding sequence

Q.95 Which of the following best describes 'k‑mer hashing'?

Storing k‑mers in a hash table to count occurrences

Generating random k‑mers for simulation

Hashing the entire genome for compression

Mapping k‑mers to protein domains

Explanation - Hashing allows efficient counting of k‑mers in large datasets.

Correct answer is: Storing k‑mers in a hash table to count occurrences

Q.96 Which of the following is a typical metric used to evaluate alignment quality?

Identity percentage

GC content

Read length

Sequencing cost

Explanation - Alignment identity indicates the proportion of matching bases.

Correct answer is: Identity percentage

Q.97 In RNA‑seq, what does the term 'coverage uniformity' refer to?

Even distribution of reads across the genome

The consistency of sequencing costs

The ratio of coding to non‑coding regions

The error rate of sequencing

Explanation - Uniform coverage ensures reliable quantification of transcripts.

Correct answer is: Even distribution of reads across the genome

Q.98 Which of the following best describes 'de‑novo assembly'?

Assembly using a reference genome

Assembly without any reference information

Assembly of RNA transcripts only

Assembly of protein sequences only

Explanation - De‑novo assembly reconstructs a genome solely from sequencing reads.

Correct answer is: Assembly without any reference information

Q.99 Which of the following is a common tool for phylogenetic tree visualization?

FigTree

BLAST

SAMtools

MAFFT

Explanation - FigTree is widely used to view and edit phylogenetic trees.

Correct answer is: FigTree

Q.100 What is the main purpose of 'adapter trimming' in sequencing data preprocessing?

To remove sequencing adapters that were ligated during library preparation

To trim low‑quality bases at read ends

To align reads to a reference

To compress the data file

Explanation - Adapter trimming removes artificial sequences that interfere with alignment.

Correct answer is: To remove sequencing adapters that were ligated during library preparation

Q.101 Which algorithm is used to solve the shortest superstring problem in genome assembly?

Greedy algorithm

Smith‑Waterman

Viterbi

Dijkstra

Explanation - A greedy approach is commonly used to assemble overlapping reads into a superstring.

Correct answer is: Greedy algorithm

Q.102 Which of the following is an example of a 'gene ontology (GO)' term?

DNA repair

Signal transduction

All of the above

None of the above

Explanation - GO terms classify genes into biological processes, molecular functions, and cellular components.

Correct answer is: All of the above

Q.103 What does the 'GC skew' measure in a genomic sequence?

The difference between G and C content

The ratio of G to C bases

The absolute amount of G and C bases

The variation of GC content across the genome

Explanation - GC skew is calculated as (G - C) / (G + C) to assess strand bias.

Correct answer is: The difference between G and C content

Q.104 Which of the following best describes the 'E‑value' threshold of 1e-5 in a BLAST search?

A highly significant match

A moderately significant match

A random match

No match

Explanation - An E‑value of 1e-5 indicates a reasonably good alignment but not extremely significant.

Correct answer is: A moderately significant match

Q.105 Which of the following is a key component of a 'variant annotation pipeline'?

Variant calling

Functional impact prediction

Both A and B

Data compression

Explanation - Annotation pipelines take called variants and predict their biological effects.

Correct answer is: Both A and B

Q.106 In RNA‑seq, what does the term 'TPM' stand for?

Transcripts Per Million

Transcripts Per Microarray

Transcripts Per Metagene

Transcripts Per Match

Explanation - TPM normalizes read counts for transcript length and library size.

Correct answer is: Transcripts Per Million

Q.107 Which of the following describes a 'de Bruijn graph' edge?

A nucleotide

A k‑mer

An overlap between two k‑mers

A read fragment

Explanation - Edges represent (k‑1)-mer overlaps between adjacent k‑mers.

Correct answer is: An overlap between two k‑mers

Q.108 Which of the following is a common method for correcting sequencing errors in Illumina data?

Error‑correcting HMMs

Quasi‑Monte Carlo simulation

PCR amplification

Optical readout

Explanation - HMMs can be trained to distinguish true variants from errors.

Correct answer is: Error‑correcting HMMs

Q.109 What is the role of a 'phred quality score' in sequencing data?

Indicates the probability of a base call error

Measures the length of the read

Denotes the GC content

Shows the alignment score

Explanation - Phred scores provide an estimate of the base call accuracy.

Correct answer is: Indicates the probability of a base call error

Q.110 Which of the following best describes the 'BLASTN' program?

Protein‑protein alignment

DNA‑DNA local alignment

RNA‑RNA alignment

Protein‑DNA alignment

Explanation - BLASTN aligns nucleotide sequences locally.

Correct answer is: DNA‑DNA local alignment

Q.111 What is the typical length range of a 'read' in Illumina NovaSeq sequencing?

50–100 bp

150–300 bp

500–1000 bp

10–20 kb

Explanation - Illumina NovaSeq commonly generates paired‑end reads of 150–300 bp each.

Correct answer is: 150–300 bp

Q.112 Which of the following is NOT a feature of a 'cDNA library'?

Derived from mRNA

Contains full‑length transcripts

Includes non‑coding RNAs

Requires reverse transcription

Explanation - Traditional cDNA libraries capture only coding mRNA, not non‑coding RNA.

Correct answer is: Includes non‑coding RNAs

Q.113 In a de Bruijn graph, what is the significance of 'bubbles'?

Represent sequencing errors

Indicate alternative paths due to polymorphisms

Mark start and end of reads

Signal GC‑rich regions

Explanation - Bubbles arise from divergent k‑mer paths, often reflecting SNPs or indels.

Correct answer is: Indicate alternative paths due to polymorphisms

Q.114 Which of the following is a standard method for normalizing microarray data?

Quantile normalization

Standard deviation scaling

Z‑score transformation

All of the above

Explanation - Quantile normalization ensures that the distribution of probe intensities is identical across arrays.

Correct answer is: Quantile normalization

Q.115 In the context of genome annotation, what does 'gene prediction' involve?

Identifying potential coding regions

Assigning functional annotations

Both A and B

None of the above

Explanation - Gene prediction identifies gene models and may annotate their functions.

Correct answer is: Both A and B

Q.116 Which algorithm is commonly used for aligning long reads to a reference genome?

BLAST

BWA‑MEM

Bowtie

MAFFT

Explanation - BWA‑MEM efficiently handles long reads and split alignments.

Correct answer is: BWA‑MEM

Q.117 What does 'NGS' stand for?

Next Generation Sequencing

New Genome System

Non‑Gaseous Sequencing

Nucleotide Grafting Sequence

Explanation - NGS refers to modern high‑throughput sequencing technologies.

Correct answer is: Next Generation Sequencing

Q.118 Which of the following is a measure of the similarity between two sequences?

Alignment score

GC content

Read length

Sequencing depth

Explanation - Alignment score quantifies how similar two sequences are after alignment.

Correct answer is: Alignment score

Q.119 Which of the following best describes 'base‑calling' in sequencing?

Converting raw signal data into nucleotide sequences

Aligning reads to a reference genome

Annotating genes

Assembling contigs

Explanation - Base‑calling interprets detector signals into DNA bases.

Correct answer is: Converting raw signal data into nucleotide sequences

Q.120 In RNA‑seq data analysis, what is the purpose of 'normalizing for library size'?

To correct for varying sequencing depths between samples

To reduce read length

To remove low‑quality reads

To adjust GC content

Explanation - Library size normalization ensures comparability of expression across samples.

Correct answer is: To correct for varying sequencing depths between samples

Q.121 Which of the following is an example of a 'sequence alignment tool'?

SAMtools

BLAST

MAFFT

All of the above

Explanation - SAMtools, BLAST, and MAFFT all perform sequence alignment or related tasks.

Correct answer is: All of the above

Q.122 Which of the following best describes a 'gene ontology (GO)' annotation?

A classification of genes into biological processes, cellular components, and molecular functions

A measure of gene expression levels

A method for sequencing DNA

An alignment scoring matrix

Explanation - GO provides standardized vocabularies for gene functions.

Correct answer is: A classification of genes into biological processes, cellular components, and molecular functions

Q.123 What is the purpose of a 'reference genome' in sequencing?

To provide a template for read alignment

To serve as a database for gene functions

To replace the need for sequencing

To generate random reads

Explanation - The reference genome guides mapping of sequencing reads.

Correct answer is: To provide a template for read alignment

Q.124 Which of the following best describes the 'Smith‑Waterman' algorithm?

A global alignment algorithm

A local alignment algorithm

A multiple sequence alignment algorithm

A phylogenetic tree construction algorithm

Explanation - Smith‑Waterman performs optimal local alignment between sequence segments.

Correct answer is: A local alignment algorithm

Q.125 Which of the following is a common type of sequencing error?

Insertion

Deletion

Substitution

All of the above

Explanation - NGS platforms can produce insertions, deletions, and substitutions.

Correct answer is: All of the above

Q.126 In the context of DNA sequencing, what is a 'read'?

A short DNA fragment sequenced by a platform

A long contiguous sequence assembled from reads

A reference genome

An alignment score

Explanation - Reads are the output units of sequencing machines.

Correct answer is: A short DNA fragment sequenced by a platform

Q.127 Which of the following is a widely used tool for de‑novo transcriptome assembly?

Trinity

SPAdes

BWA

SAMtools

Explanation - Trinity assembles RNA‑seq reads into transcript contigs.

Correct answer is: Trinity

Q.128 What is the main purpose of 'phasing' in genomics?

To determine the haplotype of a sample

To align reads to a reference genome

To trim adapters

To compress data

Explanation - Phasing assigns variants to specific haplotypes.

Correct answer is: To determine the haplotype of a sample

Q.129 Which of the following is a method for detecting differential methylation?

Bisulfite sequencing

RNA‑seq

ChIP‑seq

ATAC‑seq

Explanation - Bisulfite sequencing distinguishes methylated cytosines.

Correct answer is: Bisulfite sequencing

Q.130 What does the 'C' in 'CIGAR' stand for?

Complementary

Count

Match

Operations

Explanation - CIGAR encodes alignment operations like matches, mismatches, insertions, and deletions.

Correct answer is: Operations

Q.131 Which of the following is a characteristic of a 'reference‑based assembly'?

Uses a known genome to guide assembly

Does not require a reference

Only assembles short reads

Requires a complete de‑novo approach

Explanation - Reference‑based assembly maps reads onto an existing genome sequence.

Correct answer is: Uses a known genome to guide assembly

Q.132 Which of the following is NOT a type of 'mRNA sequencing' method?

Poly‑A capture

Ribosomal RNA depletion

Whole‑genome sequencing

Stranded library preparation

Explanation - Whole‑genome sequencing is not focused on mRNA.

Correct answer is: Whole‑genome sequencing

Q.133 In a multiple sequence alignment, what does a 'conserved region' indicate?

High variability across sequences

Low sequence similarity

A region that is highly similar across sequences

An error in alignment

Explanation - Conserved regions suggest functional or evolutionary importance.

Correct answer is: A region that is highly similar across sequences

Q.134 Which of the following best describes a 'de Bruijn graph' vertex?

A single base

A k‑mer

An overlap of k‑mers

A read fragment

Explanation - Vertices represent k‑mers in the graph.

Correct answer is: A k‑mer

Q.135 Which of the following is a typical output of a differential expression analysis?

A list of genes with adjusted p‑values and fold changes

A heat map of sequencing reads

A phylogenetic tree

A DNA sequence alignment

Explanation - Differential expression outputs statistical metrics for each gene.

Correct answer is: A list of genes with adjusted p‑values and fold changes

Q.136 In genome assembly, what is a 'contig'?

A single read

A sequence of overlapping reads assembled into a longer sequence

A type of alignment score

A type of base quality score

Explanation - Contigs are contiguous sequences produced from read assembly.

Correct answer is: A sequence of overlapping reads assembled into a longer sequence

Q.137 Which of the following best describes 'k‑means clustering'?

A method for aligning sequences

A supervised learning algorithm

An unsupervised clustering technique that partitions data into k groups

A phylogenetic tree reconstruction method

Explanation - k‑means groups data points into clusters based on similarity.

Correct answer is: An unsupervised clustering technique that partitions data into k groups

Q.138 What does the 'SAM' format store?

Sequence alignments in plain text

Compressed genomic sequences

Raw sequencing reads

Protein structures

Explanation - SAM is a text format for storing read alignments.

Correct answer is: Sequence alignments in plain text

Q.139 Which of the following is a common method for visualizing read coverage?

Heat map

Bar chart

Line plot

All of the above

Explanation - Coverage is often displayed as a line, heat map, or bar chart.

Correct answer is: All of the above

Q.140 Which of the following best describes an 'alignment score' in sequence alignment?

A measure of the number of mismatches

A measure of similarity between two sequences after alignment

The number of gaps in the alignment

The length of the alignment

Explanation - Alignment scores reflect how well two sequences align.

Correct answer is: A measure of similarity between two sequences after alignment

Q.141 Which of the following is a characteristic of a 'gene expression matrix'?

Rows are genes, columns are samples, and values are expression levels

Rows are samples, columns are genes, and values are base quality scores

Rows are chromosomes, columns are positions, and values are GC content

Rows are reads, columns are bases, and values are read lengths

Explanation - Gene expression matrices store quantified expression values.

Correct answer is: Rows are genes, columns are samples, and values are expression levels

Q.142 Which of the following best describes a 'phylogenetic tree'?

A diagram showing evolutionary relationships between organisms

A list of genes with functional annotations

A sequence alignment of proteins

A statistical summary of sequencing quality

Explanation - Phylogenetic trees represent evolutionary histories.

Correct answer is: A diagram showing evolutionary relationships between organisms

Q.143 What is the main purpose of 'adapter trimming' in sequencing data preprocessing?

To remove artificial sequences that were ligated during library preparation

To trim low‑quality bases at read ends

To align reads to a reference genome

To compress the data file

Explanation - Adapter trimming eliminates artificial sequences that interfere with alignment.

Correct answer is: To remove artificial sequences that were ligated during library preparation

Q.144 Which of the following is a common approach to correct sequencing errors in Illumina data?

Error‑correcting HMMs

Quasi‑Monte Carlo simulation

PCR amplification

Optical readout

Explanation - HMMs can be trained to distinguish true variants from errors.

Correct answer is: Error‑correcting HMMs

Q.145 In the context of RNA‑seq, what does the term 'TPM' stand for?

Transcripts Per Million

Transcripts Per Microarray

Transcripts Per Metagene

Transcripts Per Match

Explanation - TPM normalizes read counts for transcript length and library size.

Correct answer is: Transcripts Per Million

Q.146 Which of the following best describes a 'gene ontology (GO)' annotation?

A classification of genes into biological processes, cellular components, and molecular functions

A measure of gene expression levels

A method for sequencing DNA

An alignment scoring matrix

Explanation - GO provides standardized vocabularies for gene functions.

Correct answer is: A classification of genes into biological processes, cellular components, and molecular functions

Q.147 Which of the following is NOT a common type of sequencing error?

Insertion

Deletion

Substitution

Amplification

Explanation - Amplification is a method, not an error type.

Correct answer is: Amplification

Q.148 What does the term 'de‑novo assembly' refer to in genomics?

Assembly using a reference genome

Assembly without any reference information

Assembly of RNA transcripts only

Assembly of protein sequences only

Explanation - De‑novo assembly reconstructs a genome solely from sequencing reads.

Correct answer is: Assembly without any reference information

Q.149 Which of the following is a typical output of a BLAST search?

Alignment score

E‑value

Coverage statistics

Protein tertiary structure

Explanation - BLAST reports alignment metrics, not 3D structures.

Correct answer is: Protein tertiary structure