Genome Assembly and Annotation # MCQs Practice set

Q.1 What is a genome?

All genes in an organism

The complete set of DNA in an organism

Only the protein-coding genes

The set of proteins in a cell

Explanation - A genome comprises all genetic material, including coding, non‑coding, regulatory, and repetitive sequences, present in an organism.

Correct answer is: The complete set of DNA in an organism

Q.2 Which of the following is a genome?

The DNA of a cell

The RNA of a cell

The protein of a cell

The fat in a cell

Explanation - The genome refers specifically to the complete DNA sequence within a cell, not RNA, proteins, or lipids.

Correct answer is: The DNA of a cell

Q.3 Which organism has a small genome?

Human

E. coli

Mouse

Fruit fly

Explanation - E. coli has a genome of about 4.6 Mb, which is much smaller than the genomes of humans (~3 Gb) or mice (~2.7 Gb).

Correct answer is: E. coli

Q.4 What is DNA sequencing?

Reading the order of DNA letters

Measuring cell size

Counting proteins

Identifying blood type

Explanation - DNA sequencing determines the exact sequence of nucleotides (A, T, C, G) in a DNA molecule.

Correct answer is: Reading the order of DNA letters

Q.5 Which method uses tiny pieces of DNA to read the genome?

PCR

Sanger sequencing

Next‑Generation Sequencing (NGS)

Microscopy

Explanation - NGS technologies sequence millions of small DNA fragments simultaneously, enabling rapid genome sequencing.

Correct answer is: Next‑Generation Sequencing (NGS)

Q.6 What is a base pair?

Two proteins that pair

Two nucleotides that pair

Two cells that pair

Two RNA strands

Explanation - In DNA, bases pair through hydrogen bonds: A pairs with T and C pairs with G, forming a base pair.

Correct answer is: Two nucleotides that pair

Q.7 What does the letter 'A' stand for in DNA?

Adenine

Adenosine

Amine

Adenoid

Explanation - In DNA, the four bases are adenine (A), thymine (T), cytosine (C), and guanine (G).

Correct answer is: Adenine

Q.8 What is a chromosome?

A segment of protein

A threadlike structure that carries DNA

A type of cell

A type of RNA

Explanation - Chromosomes are long DNA molecules complexed with proteins, organized within the cell nucleus.

Correct answer is: A threadlike structure that carries DNA

Q.9 What does a genome include?

Only protein‑coding genes

Only non‑coding RNA

All genes and non‑coding DNA

Only mitochondrial DNA

Explanation - The genome comprises coding genes, non‑coding genes, regulatory elements, repeats, and intergenic regions.

Correct answer is: All genes and non‑coding DNA

Q.10 Which of the following is NOT part of a genome?

DNA sequence

Proteins

Regulatory elements

Gene repeats

Explanation - Proteins are produced from the genome but are not part of the DNA sequence itself.

Correct answer is: Proteins

Q.11 Which of the following is a typical read length for Illumina sequencing?

100-150 bp

1-2 kb

50-100 kb

10-20 kb

Explanation - Illumina platforms usually produce reads around 100–150 bp in length, though longer options exist.

Correct answer is: 100-150 bp

Q.12 What does 'paired‑end sequencing' mean?

Two identical reads per fragment

Reads from both ends of a DNA fragment

Sequencing two samples at once

Sequencing pairs of nucleotides

Explanation - Paired‑end sequencing generates two reads flanking a DNA fragment, aiding in assembly and structural variant detection.

Correct answer is: Reads from both ends of a DNA fragment

Q.13 Why are repeats problematic for genome assembly?

They create extra genes

They cause sequencing errors

They confuse read placement

They have no effect

Explanation - Repetitive sequences make it difficult to determine the exact location of reads during assembly.

Correct answer is: They confuse read placement

Q.14 What does coverage refer to in sequencing?

Number of sequencing machines

Average number of times a base is read

Length of the read

Speed of sequencing

Explanation - Coverage (or depth) indicates how many reads overlap a particular base; higher coverage increases confidence.

Correct answer is: Average number of times a base is read

Q.15 Which tool is used for repeat masking?

BLAST

RepeatMasker

SAMtools

GATK

Explanation - RepeatMasker identifies and masks known repetitive elements before annotation or assembly.

Correct answer is: RepeatMasker

Q.16 What is a k‑mer?

A nucleotide pair

A sequence of k nucleotides

A type of read

A type of assembly graph

Explanation - A k‑mer is a substring of length k extracted from a DNA sequence, used in de Bruijn graph assembly.

Correct answer is: A sequence of k nucleotides

Q.17 What is the main advantage of de Bruijn graph assembly?

Handles long reads well

Efficient for short reads

Requires no computational resources

Gives perfect assembly

Explanation - De Bruijn graphs efficiently assemble short reads by overlapping k‑mers, reducing computational demands.

Correct answer is: Efficient for short reads

Q.18 Which metric describes the length at which 50% of the genome is in contigs of that length or longer?

N50

L50

N90

L90

Explanation - N50 is a common contiguity metric: the length L such that 50% of the genome is in contigs ≥ L.

Correct answer is: N50

Q.19 What is scaffolding in genome assembly?

Building protein models

Ordering contigs using mate‑pair information

Cutting DNA into pieces

Sequencing repeats

Explanation - Scaffolding arranges contigs into larger sequences (scaffolds) using long‑range linkage data.

Correct answer is: Ordering contigs using mate‑pair information

Q.20 Which of these is NOT a step in a typical genome annotation pipeline?

Repeat masking

Gene prediction

Protein folding

Functional annotation

Explanation - Annotation pipelines predict genes and annotate function; protein folding is a separate structural biology task.

Correct answer is: Protein folding

Q.21 What does 'de novo' assembly mean?

Using a reference genome

Assembling without a reference

Sequencing only known genes

Mapping to a transcriptome

Explanation - De novo assembly reconstructs a genome from reads without aligning to a known reference.

Correct answer is: Assembling without a reference

Q.22 Which assembler is optimized for long‑read data?

SPAdes

Velvet

Flye

SOAPdenovo

Explanation - Flye is designed to assemble genomes using long noisy reads such as those from PacBio or Nanopore.

Correct answer is: Flye

Q.23 What type of error is most common in Illumina sequencing?

Substitutions

Insertions

Deletions

All equal

Explanation - Illumina reads have a low error rate dominated by base substitutions, not indels.

Correct answer is: Substitutions

Q.24 Which alignment format is commonly used for mapping sequencing reads to a reference?

FASTQ

FASTA

SAM

BED

Explanation - SAM (Sequence Alignment/Map) is the standard format for storing read alignments to a reference.

Correct answer is: SAM

Q.25 What does the 'GATK' toolkit primarily do?

Assemble genomes

Call variants

Predict gene structures

Visualize reads

Explanation - GATK (Genome Analysis Toolkit) is widely used for variant discovery and genotyping from aligned reads.

Correct answer is: Call variants

Q.26 What is the purpose of a 'masking' step before annotation?

Remove low‑quality reads

Identify repetitive DNA

Convert DNA to RNA

Add tags to reads

Explanation - Masking flags repetitive elements to prevent erroneous gene predictions in repetitive regions.

Correct answer is: Identify repetitive DNA

Q.27 Which gene prediction method relies on hidden Markov models?

Exonerate

AUGUSTUS

BLAST

BLAT

Explanation - AUGUSTUS uses HMMs to model gene structure and predict coding sequences in eukaryotic genomes.

Correct answer is: AUGUSTUS

Q.28 What is a 'transcriptome'?

The entire DNA content

All expressed RNA

All proteins

All metabolites

Explanation - A transcriptome represents the full set of RNA molecules transcribed from a genome under specific conditions.

Correct answer is: All expressed RNA

Q.29 Which database is commonly used to annotate gene functions?

NCBI nr

UniProt

RefSeq

All of the above

Explanation - NCBI nr, UniProt, and RefSeq all provide curated protein sequences and functional annotations for use in annotation pipelines.

Correct answer is: All of the above

Q.30 What is the main purpose of functional annotation?

Find gene locations

Predict protein structure

Assign biological roles to genes

Sequence genomes faster

Explanation - Functional annotation links genes to pathways, processes, and molecular functions, often using GO terms.

Correct answer is: Assign biological roles to genes

Q.31 Which algorithmic approach does Velvet use for assembly?

Overlap‑layout‑consensus

de Bruijn graph

MapReduce

Hidden Markov model

Explanation - Velvet constructs a de Bruijn graph from short reads and resolves contigs through graph simplification.

Correct answer is: de Bruijn graph

Q.32 What is the impact of increasing k‑mer size in a de Bruijn graph assembly?

Reduces graph complexity

Increases error sensitivity

Both a and b

No effect

Explanation - Larger k‑mers simplify the graph but also make it more sensitive to sequencing errors and low coverage.

Correct answer is: Both a and b

Q.33 Which metric would you use to assess assembly quality regarding contiguity?

GC content

N50

Read length

Coverage depth

Explanation - N50 quantifies assembly contiguity; a higher N50 indicates longer contiguous sequences.

Correct answer is: N50

Q.34 Why might a genome assembly contain gaps?

Repetitive regions

Low coverage

Sequencing errors

All of the above

Explanation - Repeats, insufficient depth, and errors all can prevent assembly of continuous sequences, leaving gaps.

Correct answer is: All of the above

Q.35 What does the term 'contig' refer to?

A contiguous stretch of assembled sequence

A read from the sequencer

A type of library

A computational algorithm

Explanation - Contigs are continuous sequences produced by the assembly process without gaps.

Correct answer is: A contiguous stretch of assembled sequence

Q.36 Which of the following is an advantage of using PacBio HiFi reads for assembly?

Ultra‑long read lengths

Low error rates

Cheap cost

High throughput

Explanation - HiFi reads combine long read lengths with high accuracy, improving assembly quality.

Correct answer is: Low error rates

Q.37 What is the primary challenge in assembling highly heterozygous genomes?

High GC bias

Distinguishing alleles

Lack of reference

Low coverage

Explanation - Heterozygosity creates two divergent haplotypes that can be mistaken for separate loci during assembly.

Correct answer is: Distinguishing alleles

Q.38 Which tool is used for genome polishing after assembly with long reads?

Pilon

BWA

SAMtools

Kraken

Explanation - Pilon corrects base errors and small indels using high‑accuracy short reads after long‑read assembly.

Correct answer is: Pilon

Q.39 What is 'scaffold N50'?

N50 of scaffolds

N50 of contigs

N50 of reads

None

Explanation - Scaffold N50 measures the length where 50% of the assembly is contained in scaffolds of that length or longer.

Correct answer is: N50 of scaffolds

Q.40 In the context of assembly, what does 'coverage uniformity' refer to?

Even distribution of reads across the genome

Even read lengths

Even base quality

None

Explanation - Uniform coverage ensures no large low‑coverage regions, reducing assembly gaps and errors.

Correct answer is: Even distribution of reads across the genome

Q.41 Which assembly strategy is suitable for complex metagenomic samples?

Single‑cell assembly

Co‑assembly

de Bruijn graph

Overlap‑layout‑consensus

Explanation - Co‑assembly pools reads from multiple related samples, improving coverage for low‑abundance genomes.

Correct answer is: Co‑assembly

Q.42 What is the purpose of using mate‑pair libraries?

Increase read depth

Provide long‑range linking information

Reduce errors

None

Explanation - Mate‑pair libraries generate reads separated by long fragments, aiding scaffolding and structural variant detection.

Correct answer is: Provide long‑range linking information

Q.43 Which software is commonly used for polishing de novo assemblies with Illumina reads?

Racon

Pilon

Canu

BWA

Explanation - Pilon uses short high‑accuracy reads to correct errors in long‑read assemblies.

Correct answer is: Pilon

Q.44 What does a high GC content region imply for sequencing?

Easier to sequence

More likely to form secondary structures

No effect

Lower error rates

Explanation - High GC regions can form stable duplexes, making them difficult for polymerases during sequencing.

Correct answer is: More likely to form secondary structures

Q.45 What is the difference between a 'scaffold' and a 'contig'?

Scaffold is longer

Scaffold may contain gaps

Contig includes gaps

Both a and b

Explanation - Scaffolds are constructed from ordered contigs and may contain unknown bases (gaps), making them longer.

Correct answer is: Both a and b

Q.46 Which method can resolve repeat‑induced misassemblies?

Increased coverage

Long‑read sequencing

Using a reference genome

All of the above

Explanation - All three strategies help differentiate repeats and correctly join adjacent unique regions.

Correct answer is: All of the above

Q.47 What is a 'pseudo‑reference'?

An assembled genome used as reference

A simulated genome

A reference from a related species

None

Explanation - A pseudo‑reference is an assembly that serves as a reference for read mapping when a true reference is unavailable.

Correct answer is: An assembled genome used as reference

Q.48 In genome annotation, what is an 'ORF'?

Open reading frame

Outlined region fragment

Ordered repeat fragment

None

Explanation - An ORF is a stretch of DNA that could encode a protein, starting with a start codon and ending with a stop codon.

Correct answer is: Open reading frame

Q.49 Which of these tools is used for structural variant detection using long reads?

DELLY

SAMtools

Bowtie

RSEM

Explanation - DELLY can detect insertions, deletions, inversions, and translocations from paired‑end and long‑read data.

Correct answer is: DELLY

Q.50 What does BUSCO assess in genome assemblies?

Assembly speed

Presence of universal single‑copy orthologs

Read quality

Coverage

Explanation - BUSCO evaluates completeness by checking for highly conserved, single‑copy genes expected in a taxonomic group.

Correct answer is: Presence of universal single‑copy orthologs

Q.51 Which assembler is designed for hybrid assembly combining short and long reads?

SPAdes

Flye

MaSuRCA

Velvet

Explanation - MaSuRCA integrates both short‑read accuracy and long‑read continuity for hybrid genome assembly.

Correct answer is: MaSuRCA

Q.52 What is the main advantage of using Hi‑C data in genome assembly?

Provides long‑range chromatin contact information for scaffolding

Improves base‑calling accuracy

Reduces sequencing cost

None

Explanation - Hi‑C captures physical proximity of DNA segments, enabling chromosome‑level scaffolding.

Correct answer is: Provides long‑range chromatin contact information for scaffolding

Q.53 Which of the following best describes the 'error profile' of Oxford Nanopore sequencing?

Mostly substitutions

Mostly indels

Balanced errors

No errors

Explanation - Nanopore sequencing tends to produce indel errors more frequently than substitution errors.

Correct answer is: Mostly indels

Q.54 In a de Bruijn graph, what does a node represent?

k‑mers

Reads

Contigs

Scaffolds

Explanation - Each node corresponds to a unique k‑mer; edges connect overlapping k‑mers.

Correct answer is: k‑mers

Q.55 Which pipeline is widely used for eukaryotic genome annotation?

MAKER

Prokka

BWA

SAMtools

Explanation - MAKER integrates evidence from RNA‑seq, proteins, and ab initio predictions to produce high‑quality annotations.

Correct answer is: MAKER

Q.56 What is 'phasing' in the context of diploid genome assembly?

Sequencing both strands

Assigning alleles to haplotypes

Removing duplicates

None

Explanation - Phasing determines which variants co‑occur on the same chromosome copy, creating haplotype‑resolved assemblies.

Correct answer is: Assigning alleles to haplotypes

Q.57 Which tool can be used for aligning long reads to a reference genome?

BWA‑MEM

Minimap2

BLAST

Bowtie2

Explanation - Minimap2 is optimized for fast alignment of long noisy reads to a reference sequence.

Correct answer is: Minimap2

Q.58 What does the term 'contamination' refer to in genome sequencing?

Presence of foreign DNA sequences

High GC bias

Sequencing errors

None

Explanation - Contamination indicates DNA from other organisms or sources, which can mislead assembly and annotation.

Correct answer is: Presence of foreign DNA sequences

Q.59 Which sequencing platform is known for producing the longest reads?

Illumina

PacBio Sequel II

Ion Torrent

Roche 454

Explanation - PacBio Sequel II can generate continuous long reads up to 30 kb and beyond, surpassing other platforms.

Correct answer is: PacBio Sequel II

Q.60 In genome assembly, what is the purpose of 'error correction' of reads?

Remove adapter sequences

Correct base errors before assembly

Increase read length

None

Explanation - Error correction improves read quality, reducing misassemblies caused by sequencing errors.

Correct answer is: Correct base errors before assembly

Q.61 What does a high 'L50' value indicate?

Many large contigs

Many small contigs

High coverage

Low GC content

Explanation - L50 is the smallest number of contigs that together sum to 50% of the assembly; a high L50 means fewer, larger contigs.

Correct answer is: Many large contigs

Q.62 Which of the following best describes 'scaffolding errors'?

Incorrect contig ordering

Wrong base calling

Duplicate contigs

None

Explanation - Scaffolding errors occur when contigs are incorrectly ordered or oriented, leading to misrepresentations of genome structure.

Correct answer is: Incorrect contig ordering

Q.63 Which software is commonly used for repeat annotation in eukaryotic genomes?

RepeatMasker

Prokka

RAxML

ClustalW

Explanation - RepeatMasker identifies and masks known repetitive elements to prevent false gene predictions.

Correct answer is: RepeatMasker

Q.64 What is the purpose of 'gene ontology' (GO) terms in annotation?

Classify gene functions

Align reads

Assemble contigs

None

Explanation - GO provides a standardized vocabulary to describe gene product attributes across species.

Correct answer is: Classify gene functions

Q.65 Which assembly metric is most sensitive to misassemblies?

N50

L50

Number of contigs

All

Explanation - A high number of contigs often reflects fragmentation due to misassemblies, unlike N50 which can remain high.

Correct answer is: Number of contigs

Q.66 What does the 'coverage depth' of 30x imply for variant calling?

Each base is sequenced once

Each base is sequenced 30 times

30% of the genome is covered

None

Explanation - 30x depth means that, on average, each base has been read 30 times, improving variant confidence.

Correct answer is: Each base is sequenced 30 times

Q.67 Which tool is used to predict protein‑coding genes in bacterial genomes?

Prokka

MAKER

AUGUSTUS

Flye

Explanation - Prokka is tailored for rapid bacterial genome annotation, integrating gene prediction and functional annotation.

Correct answer is: Prokka

Q.68 What is the role of 'splice site prediction' in annotation?

Identify intron‑exon boundaries

Find repeats

Align reads

None

Explanation - Predicting splice sites helps delineate exonic and intronic regions in eukaryotic gene models.

Correct answer is: Identify intron‑exon boundaries

Q.69 Which of the following describes a 'de novo transcriptome assembly'?

Assembling RNA‑Seq reads without reference

Aligning reads to a reference transcriptome

Annotating genes

None

Explanation - De novo transcriptome assembly reconstructs transcript sequences directly from RNA‑Seq data.

Correct answer is: Assembling RNA‑Seq reads without reference

Q.70 What is the purpose of using 'RNA‑Seq data' in genome annotation?

Validate gene models

Estimate GC content

Increase read length

None

Explanation - RNA‑Seq provides transcript evidence to confirm and refine predicted gene structures.

Correct answer is: Validate gene models

Q.71 Which computational approach can resolve haplotypes in a highly heterozygous genome?

Trio binning

de Bruijn graph

Overlap‑layout‑consensus

BLAST

Explanation - Trio binning separates reads by parental origin, enabling phased diploid assembly.

Correct answer is: Trio binning

Q.72 What is the main benefit of using trio binning in diploid assembly?

Assign reads to parental haplotypes

Reduce computational load

Increase coverage

None

Explanation - By separating reads into haplotype‑specific bins, trio binning simplifies assembly of each parental genome.

Correct answer is: Assign reads to parental haplotypes

Q.73 Which algorithm is employed by the tool 'Canu' for long‑read assembly?

Overlap‑layout‑consensus

de Bruijn graph

Hidden Markov model

MapReduce

Explanation - Canu uses OLC, leveraging overlap information between long reads to assemble genomes.

Correct answer is: Overlap‑layout‑consensus

Q.74 Which metric assesses completeness of gene content using conserved single‑copy orthologs?

BUSCO

N50

GC%

L50

Explanation - BUSCO checks for expected universal single‑copy genes, giving a completeness score.

Correct answer is: BUSCO

Q.75 What is the main purpose of a 'variant effect predictor' (VEP)?

Annotate the functional impact of variants

Call variants

Assemble genomes

Align reads

Explanation - VEP predicts how genomic variants may affect gene function, such as missense or nonsense changes.

Correct answer is: Annotate the functional impact of variants

Q.76 Which sequencing technology provides 99% consensus accuracy after multiple passes?

Illumina

PacBio HiFi

Oxford Nanopore

Ion Torrent

Explanation - HiFi reads are generated by multiple passes of the same DNA molecule, yielding high consensus accuracy.

Correct answer is: PacBio HiFi

Q.77 What is 'HiFi' in PacBio sequencing?

High‑fidelity long reads

High‑throughput Illumina

Hybrid assembly

None

Explanation - HiFi refers to long reads with high base‑calling accuracy, achieved by multiple sub‑read passes.

Correct answer is: High‑fidelity long reads

Q.78 Which of the following tools is used for structural variant detection using long reads?

Sniffles

GATK

RSEM

SAMtools

Explanation - Sniffles identifies large structural variants from long‑read alignments.

Correct answer is: Sniffles

Q.79 In genome assembly, what does 'polishing' refer to?

Refining consensus sequence

Removing contaminants

Increasing read length

None

Explanation - Polishing corrects base errors and small indels in an assembled sequence using high‑accuracy data.

Correct answer is: Refining consensus sequence

Q.80 Which method can detect copy number variations (CNVs) from sequencing data?

Depth‑of‑coverage analysis

de Bruijn graph

Read mapping

Repeat masking

Explanation - CNVs manifest as changes in read depth relative to the genome average.

Correct answer is: Depth‑of‑coverage analysis

Q.81 What is 'chromosome conformation capture' (Hi‑C) used for?

Determine 3D genome organization

Sequence DNA

Predict gene expression

None

Explanation - Hi‑C measures physical interactions between chromosomal regions, aiding scaffold construction and studying 3D structure.

Correct answer is: Determine 3D genome organization

Q.82 Which assembly evaluation tool uses reference alignment to compute misassemblies?

QUAST

BLAST

SAMtools

Bowtie2

Explanation - QUAST compares an assembly to a reference, reporting misassemblies, gaps, and other metrics.

Correct answer is: QUAST

Q.83 What does 'karyotype' describe?

Chromosomal number and structure

Genome sequence

Read depth

None

Explanation - A karyotype shows the number, size, and shape of chromosomes in a species.

Correct answer is: Chromosomal number and structure

Q.84 Which approach is used to identify conserved non‑coding elements?

PhastCons

BLAST

Bowtie

SAMtools

Explanation - PhastCons uses phylogenetic models to detect conserved non‑coding DNA across multiple species.

Correct answer is: PhastCons

Q.85 What is the purpose of a 'gene prediction consensus model'?

Combine predictions from multiple tools

Align reads

Assemble contigs

None

Explanation - Consensus models integrate results from several gene predictors to improve accuracy.

Correct answer is: Combine predictions from multiple tools

Q.86 Which of the following is NOT a step in manual curation of genome annotation?

Reviewing predicted gene models

Checking functional annotations

Running BUSCO

Comparing with literature

Explanation - Manual curation involves inspecting predictions, not automated completeness checks like BUSCO.

Correct answer is: Running BUSCO

Q.87 What does the 'FASTA' file format contain?

Raw sequencing reads

Aligned reads

Sequence data with headers

Variant calls

Explanation - FASTA stores nucleotide or protein sequences preceded by header lines beginning with '>'.

Correct answer is: Sequence data with headers

Q.88 Which metric best indicates scaffold continuity?

GC%

Contig N50

Scaffold N50

Coverage depth

Explanation - Scaffold N50 reflects the continuity of assembled scaffolds, indicating long-range assembly quality.

Correct answer is: Scaffold N50

Q.89 In the context of metagenomics, what is a 'binning' approach?

Grouping contigs by taxonomy

Assigning reads to reference genomes

Sorting by GC%

None

Explanation - Binning clusters assembled contigs into bins that represent individual species or taxa.

Correct answer is: Grouping contigs by taxonomy

Q.90 Which pipeline integrates transcriptome evidence for improved gene annotation?

MAKER

Prokka

QUAST

RSEM

Explanation - MAKER incorporates RNA‑seq data to refine gene predictions and functional annotation.

Correct answer is: MAKER

Q.91 What is the benefit of using a phased assembly in population genomics?

Identifies haplotype‑specific variants

Improves read depth

Reduces errors

None

Explanation - Phased assemblies distinguish variants on each chromosome copy, aiding allele‑specific analyses.

Correct answer is: Identifies haplotype‑specific variants

Q.92 Which tool can be used for genome‑wide phylogenetic placement of a novel strain?

Mash

BLAST

Bowtie

SAMtools

Explanation - Mash rapidly estimates genomic distances using MinHash, enabling phylogenetic placement.

Correct answer is: Mash

Q.93 What is the purpose of a 'repeat library' in RepeatMasker?

Provide known repeat sequences for masking

Store gene annotations

Sequence reads

None

Explanation - The library contains consensus sequences of repeats to identify and mask them during annotation.

Correct answer is: Provide known repeat sequences for masking

Q.94 In a de Bruijn graph, what does an edge represent?

Overlap between k‑mers

Read

Contig

Scaffold

Explanation - Edges connect k‑mers that overlap by k‑1 bases, forming the graph structure.

Correct answer is: Overlap between k‑mers

Q.95 Which of the following is an example of a structural variant?

SNP

Insertion

Indel

All of the above

Explanation - Structural variants include large insertions, deletions, inversions, etc., whereas SNPs are small changes.

Correct answer is: Insertion

Q.96 What does 'GC skew' measure?

Difference in G and C distribution across a strand

GC content

Number of repeats

None

Explanation - GC skew quantifies the imbalance of G versus C bases, often revealing replication origins.

Correct answer is: Difference in G and C distribution across a strand

Q.97 Which sequencing platform is best suited for detecting methylation directly?

Illumina

PacBio

Oxford Nanopore

Ion Torrent

Explanation - Nanopore sequencing senses base modifications, allowing direct methylation detection.

Correct answer is: Oxford Nanopore

Q.98 What is 'mate‑pair sequencing' used for?

Provide long‑range link information

Sequence short fragments

Reduce errors

None

Explanation - Mate‑pair libraries generate reads from the ends of long fragments, aiding scaffolding and structural analysis.

Correct answer is: Provide long‑range link information

Q.99 Which tool is used for comparative genome analysis across multiple species?

Mauve

BLAST

Bowtie

SAMtools

Explanation - Mauve aligns whole genomes and identifies large-scale rearrangements among species.

Correct answer is: Mauve

Q.100 What is the main goal of the 'Gene Ontology (GO)' consortium?

Provide a standardized vocabulary for gene function

Sequence genomes

Assemble genomes

None

Explanation - GO defines terms for biological processes, cellular components, and molecular functions across species.

Correct answer is: Provide a standardized vocabulary for gene function