Genome Assembly and Annotation # MCQs Practice set

Q.1 What is a genome?

All genes in an organism
The complete set of DNA in an organism
Only the protein-coding genes
The set of proteins in a cell
Explanation - A genome comprises all genetic material, including coding, non‑coding, regulatory, and repetitive sequences, present in an organism.
Correct answer is: The complete set of DNA in an organism

Q.2 Which of the following is a genome?

The DNA of a cell
The RNA of a cell
The protein of a cell
The fat in a cell
Explanation - The genome refers specifically to the complete DNA sequence within a cell, not RNA, proteins, or lipids.
Correct answer is: The DNA of a cell

Q.3 Which organism has a small genome?

Human
E. coli
Mouse
Fruit fly
Explanation - E. coli has a genome of about 4.6 Mb, which is much smaller than the genomes of humans (~3 Gb) or mice (~2.7 Gb).
Correct answer is: E. coli

Q.4 What is DNA sequencing?

Reading the order of DNA letters
Measuring cell size
Counting proteins
Identifying blood type
Explanation - DNA sequencing determines the exact sequence of nucleotides (A, T, C, G) in a DNA molecule.
Correct answer is: Reading the order of DNA letters

Q.5 Which method uses tiny pieces of DNA to read the genome?

PCR
Sanger sequencing
Next‑Generation Sequencing (NGS)
Microscopy
Explanation - NGS technologies sequence millions of small DNA fragments simultaneously, enabling rapid genome sequencing.
Correct answer is: Next‑Generation Sequencing (NGS)

Q.6 What is a base pair?

Two proteins that pair
Two nucleotides that pair
Two cells that pair
Two RNA strands
Explanation - In DNA, bases pair through hydrogen bonds: A pairs with T and C pairs with G, forming a base pair.
Correct answer is: Two nucleotides that pair

Q.7 What does the letter 'A' stand for in DNA?

Adenine
Adenosine
Amine
Adenoid
Explanation - In DNA, the four bases are adenine (A), thymine (T), cytosine (C), and guanine (G).
Correct answer is: Adenine

Q.8 What is a chromosome?

A segment of protein
A threadlike structure that carries DNA
A type of cell
A type of RNA
Explanation - Chromosomes are long DNA molecules complexed with proteins, organized within the cell nucleus.
Correct answer is: A threadlike structure that carries DNA

Q.9 What does a genome include?

Only protein‑coding genes
Only non‑coding RNA
All genes and non‑coding DNA
Only mitochondrial DNA
Explanation - The genome comprises coding genes, non‑coding genes, regulatory elements, repeats, and intergenic regions.
Correct answer is: All genes and non‑coding DNA

Q.10 Which of the following is NOT part of a genome?

DNA sequence
Proteins
Regulatory elements
Gene repeats
Explanation - Proteins are produced from the genome but are not part of the DNA sequence itself.
Correct answer is: Proteins

Q.11 Which of the following is a typical read length for Illumina sequencing?

100-150 bp
1-2 kb
50-100 kb
10-20 kb
Explanation - Illumina platforms usually produce reads around 100–150 bp in length, though longer options exist.
Correct answer is: 100-150 bp

Q.12 What does 'paired‑end sequencing' mean?

Two identical reads per fragment
Reads from both ends of a DNA fragment
Sequencing two samples at once
Sequencing pairs of nucleotides
Explanation - Paired‑end sequencing generates two reads flanking a DNA fragment, aiding in assembly and structural variant detection.
Correct answer is: Reads from both ends of a DNA fragment

Q.13 Why are repeats problematic for genome assembly?

They create extra genes
They cause sequencing errors
They confuse read placement
They have no effect
Explanation - Repetitive sequences make it difficult to determine the exact location of reads during assembly.
Correct answer is: They confuse read placement

Q.14 What does coverage refer to in sequencing?

Number of sequencing machines
Average number of times a base is read
Length of the read
Speed of sequencing
Explanation - Coverage (or depth) indicates how many reads overlap a particular base; higher coverage increases confidence.
Correct answer is: Average number of times a base is read

Q.15 Which tool is used for repeat masking?

BLAST
RepeatMasker
SAMtools
GATK
Explanation - RepeatMasker identifies and masks known repetitive elements before annotation or assembly.
Correct answer is: RepeatMasker

Q.16 What is a k‑mer?

A nucleotide pair
A sequence of k nucleotides
A type of read
A type of assembly graph
Explanation - A k‑mer is a substring of length k extracted from a DNA sequence, used in de Bruijn graph assembly.
Correct answer is: A sequence of k nucleotides

Q.17 What is the main advantage of de Bruijn graph assembly?

Handles long reads well
Efficient for short reads
Requires no computational resources
Gives perfect assembly
Explanation - De Bruijn graphs efficiently assemble short reads by overlapping k‑mers, reducing computational demands.
Correct answer is: Efficient for short reads

Q.18 Which metric describes the length at which 50% of the genome is in contigs of that length or longer?

N50
L50
N90
L90
Explanation - N50 is a common contiguity metric: the length L such that 50% of the genome is in contigs ≥ L.
Correct answer is: N50

Q.19 What is scaffolding in genome assembly?

Building protein models
Ordering contigs using mate‑pair information
Cutting DNA into pieces
Sequencing repeats
Explanation - Scaffolding arranges contigs into larger sequences (scaffolds) using long‑range linkage data.
Correct answer is: Ordering contigs using mate‑pair information

Q.20 Which of these is NOT a step in a typical genome annotation pipeline?

Repeat masking
Gene prediction
Protein folding
Functional annotation
Explanation - Annotation pipelines predict genes and annotate function; protein folding is a separate structural biology task.
Correct answer is: Protein folding

Q.21 What does 'de novo' assembly mean?

Using a reference genome
Assembling without a reference
Sequencing only known genes
Mapping to a transcriptome
Explanation - De novo assembly reconstructs a genome from reads without aligning to a known reference.
Correct answer is: Assembling without a reference

Q.22 Which assembler is optimized for long‑read data?

SPAdes
Velvet
Flye
SOAPdenovo
Explanation - Flye is designed to assemble genomes using long noisy reads such as those from PacBio or Nanopore.
Correct answer is: Flye

Q.23 What type of error is most common in Illumina sequencing?

Substitutions
Insertions
Deletions
All equal
Explanation - Illumina reads have a low error rate dominated by base substitutions, not indels.
Correct answer is: Substitutions

Q.24 Which alignment format is commonly used for mapping sequencing reads to a reference?

FASTQ
FASTA
SAM
BED
Explanation - SAM (Sequence Alignment/Map) is the standard format for storing read alignments to a reference.
Correct answer is: SAM

Q.25 What does the 'GATK' toolkit primarily do?

Assemble genomes
Call variants
Predict gene structures
Visualize reads
Explanation - GATK (Genome Analysis Toolkit) is widely used for variant discovery and genotyping from aligned reads.
Correct answer is: Call variants

Q.26 What is the purpose of a 'masking' step before annotation?

Remove low‑quality reads
Identify repetitive DNA
Convert DNA to RNA
Add tags to reads
Explanation - Masking flags repetitive elements to prevent erroneous gene predictions in repetitive regions.
Correct answer is: Identify repetitive DNA

Q.27 Which gene prediction method relies on hidden Markov models?

Exonerate
AUGUSTUS
BLAST
BLAT
Explanation - AUGUSTUS uses HMMs to model gene structure and predict coding sequences in eukaryotic genomes.
Correct answer is: AUGUSTUS

Q.28 What is a 'transcriptome'?

The entire DNA content
All expressed RNA
All proteins
All metabolites
Explanation - A transcriptome represents the full set of RNA molecules transcribed from a genome under specific conditions.
Correct answer is: All expressed RNA

Q.29 Which database is commonly used to annotate gene functions?

NCBI nr
UniProt
RefSeq
All of the above
Explanation - NCBI nr, UniProt, and RefSeq all provide curated protein sequences and functional annotations for use in annotation pipelines.
Correct answer is: All of the above

Q.30 What is the main purpose of functional annotation?

Find gene locations
Predict protein structure
Assign biological roles to genes
Sequence genomes faster
Explanation - Functional annotation links genes to pathways, processes, and molecular functions, often using GO terms.
Correct answer is: Assign biological roles to genes

Q.31 Which algorithmic approach does Velvet use for assembly?

Overlap‑layout‑consensus
de Bruijn graph
MapReduce
Hidden Markov model
Explanation - Velvet constructs a de Bruijn graph from short reads and resolves contigs through graph simplification.
Correct answer is: de Bruijn graph

Q.32 What is the impact of increasing k‑mer size in a de Bruijn graph assembly?

Reduces graph complexity
Increases error sensitivity
Both a and b
No effect
Explanation - Larger k‑mers simplify the graph but also make it more sensitive to sequencing errors and low coverage.
Correct answer is: Both a and b

Q.33 Which metric would you use to assess assembly quality regarding contiguity?

GC content
N50
Read length
Coverage depth
Explanation - N50 quantifies assembly contiguity; a higher N50 indicates longer contiguous sequences.
Correct answer is: N50

Q.34 Why might a genome assembly contain gaps?

Repetitive regions
Low coverage
Sequencing errors
All of the above
Explanation - Repeats, insufficient depth, and errors all can prevent assembly of continuous sequences, leaving gaps.
Correct answer is: All of the above

Q.35 What does the term 'contig' refer to?

A contiguous stretch of assembled sequence
A read from the sequencer
A type of library
A computational algorithm
Explanation - Contigs are continuous sequences produced by the assembly process without gaps.
Correct answer is: A contiguous stretch of assembled sequence

Q.36 Which of the following is an advantage of using PacBio HiFi reads for assembly?

Ultra‑long read lengths
Low error rates
Cheap cost
High throughput
Explanation - HiFi reads combine long read lengths with high accuracy, improving assembly quality.
Correct answer is: Low error rates

Q.37 What is the primary challenge in assembling highly heterozygous genomes?

High GC bias
Distinguishing alleles
Lack of reference
Low coverage
Explanation - Heterozygosity creates two divergent haplotypes that can be mistaken for separate loci during assembly.
Correct answer is: Distinguishing alleles

Q.38 Which tool is used for genome polishing after assembly with long reads?

Pilon
BWA
SAMtools
Kraken
Explanation - Pilon corrects base errors and small indels using high‑accuracy short reads after long‑read assembly.
Correct answer is: Pilon

Q.39 What is 'scaffold N50'?

N50 of scaffolds
N50 of contigs
N50 of reads
None
Explanation - Scaffold N50 measures the length where 50% of the assembly is contained in scaffolds of that length or longer.
Correct answer is: N50 of scaffolds

Q.40 In the context of assembly, what does 'coverage uniformity' refer to?

Even distribution of reads across the genome
Even read lengths
Even base quality
None
Explanation - Uniform coverage ensures no large low‑coverage regions, reducing assembly gaps and errors.
Correct answer is: Even distribution of reads across the genome

Q.41 Which assembly strategy is suitable for complex metagenomic samples?

Single‑cell assembly
Co‑assembly
de Bruijn graph
Overlap‑layout‑consensus
Explanation - Co‑assembly pools reads from multiple related samples, improving coverage for low‑abundance genomes.
Correct answer is: Co‑assembly

Q.42 What is the purpose of using mate‑pair libraries?

Increase read depth
Provide long‑range linking information
Reduce errors
None
Explanation - Mate‑pair libraries generate reads separated by long fragments, aiding scaffolding and structural variant detection.
Correct answer is: Provide long‑range linking information

Q.43 Which software is commonly used for polishing de novo assemblies with Illumina reads?

Racon
Pilon
Canu
BWA
Explanation - Pilon uses short high‑accuracy reads to correct errors in long‑read assemblies.
Correct answer is: Pilon

Q.44 What does a high GC content region imply for sequencing?

Easier to sequence
More likely to form secondary structures
No effect
Lower error rates
Explanation - High GC regions can form stable duplexes, making them difficult for polymerases during sequencing.
Correct answer is: More likely to form secondary structures

Q.45 What is the difference between a 'scaffold' and a 'contig'?

Scaffold is longer
Scaffold may contain gaps
Contig includes gaps
Both a and b
Explanation - Scaffolds are constructed from ordered contigs and may contain unknown bases (gaps), making them longer.
Correct answer is: Both a and b

Q.46 Which method can resolve repeat‑induced misassemblies?

Increased coverage
Long‑read sequencing
Using a reference genome
All of the above
Explanation - All three strategies help differentiate repeats and correctly join adjacent unique regions.
Correct answer is: All of the above

Q.47 What is a 'pseudo‑reference'?

An assembled genome used as reference
A simulated genome
A reference from a related species
None
Explanation - A pseudo‑reference is an assembly that serves as a reference for read mapping when a true reference is unavailable.
Correct answer is: An assembled genome used as reference

Q.48 In genome annotation, what is an 'ORF'?

Open reading frame
Outlined region fragment
Ordered repeat fragment
None
Explanation - An ORF is a stretch of DNA that could encode a protein, starting with a start codon and ending with a stop codon.
Correct answer is: Open reading frame

Q.49 Which of these tools is used for structural variant detection using long reads?

DELLY
SAMtools
Bowtie
RSEM
Explanation - DELLY can detect insertions, deletions, inversions, and translocations from paired‑end and long‑read data.
Correct answer is: DELLY

Q.50 What does BUSCO assess in genome assemblies?

Assembly speed
Presence of universal single‑copy orthologs
Read quality
Coverage
Explanation - BUSCO evaluates completeness by checking for highly conserved, single‑copy genes expected in a taxonomic group.
Correct answer is: Presence of universal single‑copy orthologs

Q.51 Which assembler is designed for hybrid assembly combining short and long reads?

SPAdes
Flye
MaSuRCA
Velvet
Explanation - MaSuRCA integrates both short‑read accuracy and long‑read continuity for hybrid genome assembly.
Correct answer is: MaSuRCA

Q.52 What is the main advantage of using Hi‑C data in genome assembly?

Provides long‑range chromatin contact information for scaffolding
Improves base‑calling accuracy
Reduces sequencing cost
None
Explanation - Hi‑C captures physical proximity of DNA segments, enabling chromosome‑level scaffolding.
Correct answer is: Provides long‑range chromatin contact information for scaffolding

Q.53 Which of the following best describes the 'error profile' of Oxford Nanopore sequencing?

Mostly substitutions
Mostly indels
Balanced errors
No errors
Explanation - Nanopore sequencing tends to produce indel errors more frequently than substitution errors.
Correct answer is: Mostly indels

Q.54 In a de Bruijn graph, what does a node represent?

k‑mers
Reads
Contigs
Scaffolds
Explanation - Each node corresponds to a unique k‑mer; edges connect overlapping k‑mers.
Correct answer is: k‑mers

Q.55 Which pipeline is widely used for eukaryotic genome annotation?

MAKER
Prokka
BWA
SAMtools
Explanation - MAKER integrates evidence from RNA‑seq, proteins, and ab initio predictions to produce high‑quality annotations.
Correct answer is: MAKER

Q.56 What is 'phasing' in the context of diploid genome assembly?

Sequencing both strands
Assigning alleles to haplotypes
Removing duplicates
None
Explanation - Phasing determines which variants co‑occur on the same chromosome copy, creating haplotype‑resolved assemblies.
Correct answer is: Assigning alleles to haplotypes

Q.57 Which tool can be used for aligning long reads to a reference genome?

BWA‑MEM
Minimap2
BLAST
Bowtie2
Explanation - Minimap2 is optimized for fast alignment of long noisy reads to a reference sequence.
Correct answer is: Minimap2

Q.58 What does the term 'contamination' refer to in genome sequencing?

Presence of foreign DNA sequences
High GC bias
Sequencing errors
None
Explanation - Contamination indicates DNA from other organisms or sources, which can mislead assembly and annotation.
Correct answer is: Presence of foreign DNA sequences

Q.59 Which sequencing platform is known for producing the longest reads?

Illumina
PacBio Sequel II
Ion Torrent
Roche 454
Explanation - PacBio Sequel II can generate continuous long reads up to 30 kb and beyond, surpassing other platforms.
Correct answer is: PacBio Sequel II

Q.60 In genome assembly, what is the purpose of 'error correction' of reads?

Remove adapter sequences
Correct base errors before assembly
Increase read length
None
Explanation - Error correction improves read quality, reducing misassemblies caused by sequencing errors.
Correct answer is: Correct base errors before assembly

Q.61 What does a high 'L50' value indicate?

Many large contigs
Many small contigs
High coverage
Low GC content
Explanation - L50 is the smallest number of contigs that together sum to 50% of the assembly; a high L50 means fewer, larger contigs.
Correct answer is: Many large contigs

Q.62 Which of the following best describes 'scaffolding errors'?

Incorrect contig ordering
Wrong base calling
Duplicate contigs
None
Explanation - Scaffolding errors occur when contigs are incorrectly ordered or oriented, leading to misrepresentations of genome structure.
Correct answer is: Incorrect contig ordering

Q.63 Which software is commonly used for repeat annotation in eukaryotic genomes?

RepeatMasker
Prokka
RAxML
ClustalW
Explanation - RepeatMasker identifies and masks known repetitive elements to prevent false gene predictions.
Correct answer is: RepeatMasker

Q.64 What is the purpose of 'gene ontology' (GO) terms in annotation?

Classify gene functions
Align reads
Assemble contigs
None
Explanation - GO provides a standardized vocabulary to describe gene product attributes across species.
Correct answer is: Classify gene functions

Q.65 Which assembly metric is most sensitive to misassemblies?

N50
L50
Number of contigs
All
Explanation - A high number of contigs often reflects fragmentation due to misassemblies, unlike N50 which can remain high.
Correct answer is: Number of contigs

Q.66 What does the 'coverage depth' of 30x imply for variant calling?

Each base is sequenced once
Each base is sequenced 30 times
30% of the genome is covered
None
Explanation - 30x depth means that, on average, each base has been read 30 times, improving variant confidence.
Correct answer is: Each base is sequenced 30 times

Q.67 Which tool is used to predict protein‑coding genes in bacterial genomes?

Prokka
MAKER
AUGUSTUS
Flye
Explanation - Prokka is tailored for rapid bacterial genome annotation, integrating gene prediction and functional annotation.
Correct answer is: Prokka

Q.68 What is the role of 'splice site prediction' in annotation?

Identify intron‑exon boundaries
Find repeats
Align reads
None
Explanation - Predicting splice sites helps delineate exonic and intronic regions in eukaryotic gene models.
Correct answer is: Identify intron‑exon boundaries

Q.69 Which of the following describes a 'de novo transcriptome assembly'?

Assembling RNA‑Seq reads without reference
Aligning reads to a reference transcriptome
Annotating genes
None
Explanation - De novo transcriptome assembly reconstructs transcript sequences directly from RNA‑Seq data.
Correct answer is: Assembling RNA‑Seq reads without reference

Q.70 What is the purpose of using 'RNA‑Seq data' in genome annotation?

Validate gene models
Estimate GC content
Increase read length
None
Explanation - RNA‑Seq provides transcript evidence to confirm and refine predicted gene structures.
Correct answer is: Validate gene models

Q.71 Which computational approach can resolve haplotypes in a highly heterozygous genome?

Trio binning
de Bruijn graph
Overlap‑layout‑consensus
BLAST
Explanation - Trio binning separates reads by parental origin, enabling phased diploid assembly.
Correct answer is: Trio binning

Q.72 What is the main benefit of using trio binning in diploid assembly?

Assign reads to parental haplotypes
Reduce computational load
Increase coverage
None
Explanation - By separating reads into haplotype‑specific bins, trio binning simplifies assembly of each parental genome.
Correct answer is: Assign reads to parental haplotypes

Q.73 Which algorithm is employed by the tool 'Canu' for long‑read assembly?

Overlap‑layout‑consensus
de Bruijn graph
Hidden Markov model
MapReduce
Explanation - Canu uses OLC, leveraging overlap information between long reads to assemble genomes.
Correct answer is: Overlap‑layout‑consensus

Q.74 Which metric assesses completeness of gene content using conserved single‑copy orthologs?

BUSCO
N50
GC%
L50
Explanation - BUSCO checks for expected universal single‑copy genes, giving a completeness score.
Correct answer is: BUSCO

Q.75 What is the main purpose of a 'variant effect predictor' (VEP)?

Annotate the functional impact of variants
Call variants
Assemble genomes
Align reads
Explanation - VEP predicts how genomic variants may affect gene function, such as missense or nonsense changes.
Correct answer is: Annotate the functional impact of variants

Q.76 Which sequencing technology provides 99% consensus accuracy after multiple passes?

Illumina
PacBio HiFi
Oxford Nanopore
Ion Torrent
Explanation - HiFi reads are generated by multiple passes of the same DNA molecule, yielding high consensus accuracy.
Correct answer is: PacBio HiFi

Q.77 What is 'HiFi' in PacBio sequencing?

High‑fidelity long reads
High‑throughput Illumina
Hybrid assembly
None
Explanation - HiFi refers to long reads with high base‑calling accuracy, achieved by multiple sub‑read passes.
Correct answer is: High‑fidelity long reads

Q.78 Which of the following tools is used for structural variant detection using long reads?

Sniffles
GATK
RSEM
SAMtools
Explanation - Sniffles identifies large structural variants from long‑read alignments.
Correct answer is: Sniffles

Q.79 In genome assembly, what does 'polishing' refer to?

Refining consensus sequence
Removing contaminants
Increasing read length
None
Explanation - Polishing corrects base errors and small indels in an assembled sequence using high‑accuracy data.
Correct answer is: Refining consensus sequence

Q.80 Which method can detect copy number variations (CNVs) from sequencing data?

Depth‑of‑coverage analysis
de Bruijn graph
Read mapping
Repeat masking
Explanation - CNVs manifest as changes in read depth relative to the genome average.
Correct answer is: Depth‑of‑coverage analysis

Q.81 What is 'chromosome conformation capture' (Hi‑C) used for?

Determine 3D genome organization
Sequence DNA
Predict gene expression
None
Explanation - Hi‑C measures physical interactions between chromosomal regions, aiding scaffold construction and studying 3D structure.
Correct answer is: Determine 3D genome organization

Q.82 Which assembly evaluation tool uses reference alignment to compute misassemblies?

QUAST
BLAST
SAMtools
Bowtie2
Explanation - QUAST compares an assembly to a reference, reporting misassemblies, gaps, and other metrics.
Correct answer is: QUAST

Q.83 What does 'karyotype' describe?

Chromosomal number and structure
Genome sequence
Read depth
None
Explanation - A karyotype shows the number, size, and shape of chromosomes in a species.
Correct answer is: Chromosomal number and structure

Q.84 Which approach is used to identify conserved non‑coding elements?

PhastCons
BLAST
Bowtie
SAMtools
Explanation - PhastCons uses phylogenetic models to detect conserved non‑coding DNA across multiple species.
Correct answer is: PhastCons

Q.85 What is the purpose of a 'gene prediction consensus model'?

Combine predictions from multiple tools
Align reads
Assemble contigs
None
Explanation - Consensus models integrate results from several gene predictors to improve accuracy.
Correct answer is: Combine predictions from multiple tools

Q.86 Which of the following is NOT a step in manual curation of genome annotation?

Reviewing predicted gene models
Checking functional annotations
Running BUSCO
Comparing with literature
Explanation - Manual curation involves inspecting predictions, not automated completeness checks like BUSCO.
Correct answer is: Running BUSCO

Q.87 What does the 'FASTA' file format contain?

Raw sequencing reads
Aligned reads
Sequence data with headers
Variant calls
Explanation - FASTA stores nucleotide or protein sequences preceded by header lines beginning with '>'.
Correct answer is: Sequence data with headers

Q.88 Which metric best indicates scaffold continuity?

GC%
Contig N50
Scaffold N50
Coverage depth
Explanation - Scaffold N50 reflects the continuity of assembled scaffolds, indicating long-range assembly quality.
Correct answer is: Scaffold N50

Q.89 In the context of metagenomics, what is a 'binning' approach?

Grouping contigs by taxonomy
Assigning reads to reference genomes
Sorting by GC%
None
Explanation - Binning clusters assembled contigs into bins that represent individual species or taxa.
Correct answer is: Grouping contigs by taxonomy

Q.90 Which pipeline integrates transcriptome evidence for improved gene annotation?

MAKER
Prokka
QUAST
RSEM
Explanation - MAKER incorporates RNA‑seq data to refine gene predictions and functional annotation.
Correct answer is: MAKER

Q.91 What is the benefit of using a phased assembly in population genomics?

Identifies haplotype‑specific variants
Improves read depth
Reduces errors
None
Explanation - Phased assemblies distinguish variants on each chromosome copy, aiding allele‑specific analyses.
Correct answer is: Identifies haplotype‑specific variants

Q.92 Which tool can be used for genome‑wide phylogenetic placement of a novel strain?

Mash
BLAST
Bowtie
SAMtools
Explanation - Mash rapidly estimates genomic distances using MinHash, enabling phylogenetic placement.
Correct answer is: Mash

Q.93 What is the purpose of a 'repeat library' in RepeatMasker?

Provide known repeat sequences for masking
Store gene annotations
Sequence reads
None
Explanation - The library contains consensus sequences of repeats to identify and mask them during annotation.
Correct answer is: Provide known repeat sequences for masking

Q.94 In a de Bruijn graph, what does an edge represent?

Overlap between k‑mers
Read
Contig
Scaffold
Explanation - Edges connect k‑mers that overlap by k‑1 bases, forming the graph structure.
Correct answer is: Overlap between k‑mers

Q.95 Which of the following is an example of a structural variant?

SNP
Insertion
Indel
All of the above
Explanation - Structural variants include large insertions, deletions, inversions, etc., whereas SNPs are small changes.
Correct answer is: Insertion

Q.96 What does 'GC skew' measure?

Difference in G and C distribution across a strand
GC content
Number of repeats
None
Explanation - GC skew quantifies the imbalance of G versus C bases, often revealing replication origins.
Correct answer is: Difference in G and C distribution across a strand

Q.97 Which sequencing platform is best suited for detecting methylation directly?

Illumina
PacBio
Oxford Nanopore
Ion Torrent
Explanation - Nanopore sequencing senses base modifications, allowing direct methylation detection.
Correct answer is: Oxford Nanopore

Q.98 What is 'mate‑pair sequencing' used for?

Provide long‑range link information
Sequence short fragments
Reduce errors
None
Explanation - Mate‑pair libraries generate reads from the ends of long fragments, aiding scaffolding and structural analysis.
Correct answer is: Provide long‑range link information

Q.99 Which tool is used for comparative genome analysis across multiple species?

Mauve
BLAST
Bowtie
SAMtools
Explanation - Mauve aligns whole genomes and identifies large-scale rearrangements among species.
Correct answer is: Mauve

Q.100 What is the main goal of the 'Gene Ontology (GO)' consortium?

Provide a standardized vocabulary for gene function
Sequence genomes
Assemble genomes
None
Explanation - GO defines terms for biological processes, cellular components, and molecular functions across species.
Correct answer is: Provide a standardized vocabulary for gene function