Next Generation Sequencing Data Analysis # MCQs Practice set

Q.1 What is the main goal of Next Generation Sequencing (NGS)?

To sequence a single DNA fragment manually
To rapidly and massively generate DNA sequence data
To produce images of DNA molecules
To read RNA transcripts directly
Explanation - NGS technologies enable the parallel sequencing of millions of DNA fragments, producing large datasets in a short time.
Correct answer is: To rapidly and massively generate DNA sequence data

Q.2 Which of the following is a common NGS platform?

Sanger sequencer
Illumina NovaSeq
X-ray crystallography
Mass spectrometer
Explanation - Illumina NovaSeq is a widely used NGS platform that uses sequencing-by-synthesis chemistry.
Correct answer is: Illumina NovaSeq

Q.3 During an NGS run, what does 'base calling' refer to?

Determining the position of fragments on a slide
Assigning a nucleotide to each signal read by the detector
Calculating the sequence length
Cleaning the sample before sequencing
Explanation - Base calling is the process of translating the raw fluorescence or electrical signals into A, T, C, or G nucleotides.
Correct answer is: Assigning a nucleotide to each signal read by the detector

Q.4 Which term describes the smallest unit of a genome that can be reliably mapped using NGS?

Genome
Contig
Read
SNP
Explanation - An NGS read is a short DNA sequence fragment output by the sequencer, typically 50–300 base pairs long.
Correct answer is: Read

Q.5 What is a 'coverage' (or depth) in sequencing?

The number of sequencing machines used
The number of times a nucleotide position is read
The length of a read
The amount of raw data generated
Explanation - Coverage indicates how many times a particular base is sequenced, affecting confidence in variant calls.
Correct answer is: The number of times a nucleotide position is read

Q.6 Which of the following best describes a 'paired-end' library?

Both ends of a DNA fragment are sequenced
Only one end of the fragment is sequenced
The fragment is sequenced twice from the same end
The fragment is sequenced in both directions during PCR
Explanation - Paired-end sequencing generates two reads, one from each end of a DNA fragment, improving mapping accuracy.
Correct answer is: Both ends of a DNA fragment are sequenced

Q.7 What is the primary role of a bioinformatics pipeline in NGS data analysis?

To design primers for PCR
To sequence DNA
To process raw data into biological insights
To store data in a database
Explanation - A pipeline performs quality control, alignment, variant calling, and annotation to convert raw reads into usable results.
Correct answer is: To process raw data into biological insights

Q.8 Which file format is commonly used to store aligned sequencing reads?

FASTA
SAM
VCF
PDF
Explanation - SAM (Sequence Alignment/Map) is a text-based format for storing aligned reads; BCF is binary.
Correct answer is: SAM

Q.9 Which algorithm is widely used for aligning short reads to a reference genome?

BLAST
BWA
Dijkstra
Newton-Raphson
Explanation - BWA (Burrows-Wheeler Aligner) is optimized for mapping millions of short reads efficiently.
Correct answer is: BWA

Q.10 What does a Variant Call Format (VCF) file primarily contain?

Raw sequencing reads
Gene expression levels
Identified genetic variants
Protein structures
Explanation - VCF files record positions, reference and alternate alleles, genotype information, and annotations of variants.
Correct answer is: Identified genetic variants

Q.11 Which metric helps assess the quality of NGS reads before alignment?

Read length
GC content
Phred quality score
DNA melting temperature
Explanation - Phred scores indicate the probability of a base being called correctly, guiding filtering steps.
Correct answer is: Phred quality score

Q.12 What is a 'contig' in genome assembly?

A single base pair
A continuous sequence of DNA built from overlapping reads
A type of sequencing machine
A software tool for alignment
Explanation - Contigs are contiguous sequences produced during de novo assembly by merging overlapping reads.
Correct answer is: A continuous sequence of DNA built from overlapping reads

Q.13 Why is DNA fragmentation necessary before library preparation?

To reduce the DNA's molecular weight
To increase the sequence length
To prevent contamination
To create smaller pieces suitable for sequencing
Explanation - Sequencing platforms read short fragments; fragmentation ensures DNA fits the required size range.
Correct answer is: To create smaller pieces suitable for sequencing

Q.14 What does the acronym 'PCR' stand for?

Protein Chain Reaction
Polymerase Chain Reaction
Phosphate Cycle Reaction
Phenol Chloroform Reaction
Explanation - PCR is a technique to amplify small DNA fragments using DNA polymerase.
Correct answer is: Polymerase Chain Reaction

Q.15 Which of the following best describes the Illumina sequencing chemistry?

Sequencing by synthesis with reversible terminators
Sequencing by ligation
Sequencing by Sanger chain termination
Sequencing by mass spectrometry
Explanation - Illumina uses reversible terminator nucleotides that are added one at a time during synthesis.
Correct answer is: Sequencing by synthesis with reversible terminators

Q.16 In NGS, what is a 'barcode' or 'index' used for?

To label DNA fragments for multiplexing multiple samples
To increase read length
To clean up the library
To calibrate the sequencer
Explanation - Barcodes allow pooling of multiple libraries in one run and later demultiplexing by sequence.
Correct answer is: To label DNA fragments for multiplexing multiple samples

Q.17 What is the main difference between a 'single-end' and 'paired-end' sequencing run?

Single-end reads longer than 10 kb
Paired-end reads are duplicated
Single-end reads sequence only one end of a fragment, paired-end sequences both ends
Paired-end reads do not require adapters
Explanation - Paired-end sequencing provides two reads per fragment, offering better mapping and structural variant detection.
Correct answer is: Single-end reads sequence only one end of a fragment, paired-end sequences both ends

Q.18 Which of the following is NOT typically a step in an NGS data processing pipeline?

Quality control
Alignment
Variant calling
Protein folding prediction
Explanation - Protein folding is not part of standard NGS data processing; the pipeline focuses on reads, alignments, and variants.
Correct answer is: Protein folding prediction

Q.19 Which sequencing strategy is best suited for transcriptome analysis?

Whole-genome sequencing
RNA-Seq
ChIP-Seq
Metagenomics
Explanation - RNA-Seq sequences cDNA to quantify gene expression and discover novel transcripts.
Correct answer is: RNA-Seq

Q.20 What does the 'Phred quality score' Q represent?

The number of bases sequenced
The probability of a sequencing error: Q = -10 log10(error probability)
The length of a read in kilobases
The GC content percentage
Explanation - Higher Q scores mean lower error probability; Q30 corresponds to a 1 in 1000 chance of error.
Correct answer is: The probability of a sequencing error: Q = -10 log10(error probability)

Q.21 Which of these metrics is used to evaluate a genome assembly?

N50
GC%
Read length
Variant density
Explanation - N50 measures assembly continuity; a higher N50 indicates longer, more complete contigs.
Correct answer is: N50

Q.22 What is the purpose of a 'reference genome' in alignment?

To serve as a template for PCR primers
To provide a coordinate system for mapping reads
To generate sequencing reads
To determine sequencing depth
Explanation - Aligners map reads to the reference to identify positions and potential variants relative to known sequence.
Correct answer is: To provide a coordinate system for mapping reads

Q.23 Which tool is commonly used for variant annotation in human genomics?

BWA
SAMtools
ANNOVAR
FastQC
Explanation - ANNOVAR adds functional information (e.g., gene, effect, clinical relevance) to VCF variants.
Correct answer is: ANNOVAR

Q.24 In a paired-end library, what is the typical 'insert size'?

The length of one read
The distance between the two reads' starts on the reference
The total number of reads per run
The length of the adapter
Explanation - Insert size refers to the length of the original DNA fragment, including both reads and the unsequenced middle.
Correct answer is: The distance between the two reads' starts on the reference

Q.25 Which of the following best describes a 'de novo assembly'?

Building a genome using a reference sequence
Assembling reads into contigs without a reference
Sequencing a genome for the first time
Annotating variants after alignment
Explanation - De novo assembly reconstructs a genome solely from overlapping reads, useful for novel species.
Correct answer is: Assembling reads into contigs without a reference

Q.26 What is the typical read length produced by an Illumina NovaSeq 6000 run?

50 bp
150 bp
300 bp
1000 bp
Explanation - NovaSeq 6000 commonly generates 150 bp paired-end reads, though 75 bp or 300 bp options exist.
Correct answer is: 150 bp

Q.27 Which of the following is an advantage of single-molecule real-time (SMRT) sequencing?

Very short reads
Very high error rate that cannot be corrected
Long reads that can span repetitive regions
Requires a reference genome
Explanation - SMRT sequencing (PacBio) provides reads >10 kb, enabling resolution of structural variants.
Correct answer is: Long reads that can span repetitive regions

Q.28 In the context of variant calling, what does a 'heterozygous' SNP mean?

The individual has two different alleles at that position
Both alleles are identical
The variant is only present in males
The variant is present in all cells
Explanation - Heterozygous means one allele is reference, the other is alternate.
Correct answer is: The individual has two different alleles at that position

Q.29 Which quality metric indicates that a read alignment is likely to be reliable?

Low mapping quality score
High mapping quality score
Short read length
High GC content
Explanation - Mapping quality scores reflect confidence in a read's placement; high scores indicate unique mapping.
Correct answer is: High mapping quality score

Q.30 What is the main function of 'FastQC' in NGS workflow?

Align reads to a reference genome
Identify genetic variants
Assess raw read quality
Generate sequencing data
Explanation - FastQC produces reports on base quality, GC content, duplication levels, etc., before downstream analysis.
Correct answer is: Assess raw read quality

Q.31 Which of these steps is NOT typically part of the library preparation for Illumina sequencing?

Fragmentation
End repair
Poly-A selection
Adaptor ligation
Explanation - Poly-A selection enriches mRNA; standard DNA library prep does not use it.
Correct answer is: Poly-A selection

Q.32 Which term best describes a 'low-complexity' sequence region?

Region with many repetitive sequences
Region with high GC content
Region with many unique sequences
Region with no genes
Explanation - Low-complexity refers to sequences that lack diversity and are often repeats, making mapping difficult.
Correct answer is: Region with many repetitive sequences

Q.33 What is the purpose of 'duplicate marking' in BAM files?

To highlight errors in reads
To remove PCR duplicates before variant calling
To increase coverage
To merge multiple BAM files
Explanation - Duplicate reads can bias variant frequency; marking them allows downstream tools to ignore or flag them.
Correct answer is: To remove PCR duplicates before variant calling

Q.34 Which of the following is a common file format for storing raw sequencing data?

FASTQ
BED
GTF
VCF
Explanation - FASTQ stores both sequence and per-base quality scores for each read.
Correct answer is: FASTQ

Q.35 Why is it important to use a 'reference genome' from the same species when aligning reads?

Because cross-species alignment is faster
Because mismatches are interpreted as variants
Because it ensures that the reference contains the same genes
Because reference genomes are only for humans
Explanation - Aligning to a different species would treat all differences as variants, leading to false positives.
Correct answer is: Because mismatches are interpreted as variants

Q.36 Which of these is a key advantage of using 'Hi-C' data with NGS?

Detecting gene expression levels
Mapping chromatin interactions
Sequencing mitochondrial DNA
Improving base calling accuracy
Explanation - Hi-C captures physical proximity of genomic loci, informing 3D genome organization.
Correct answer is: Mapping chromatin interactions

Q.37 In variant annotation, what does the term 'CADD score' refer to?

A measure of read depth
A computational prediction of variant deleteriousness
A quality metric for mapping
The cost of sequencing per base
Explanation - CADD integrates multiple annotations to score how likely a variant is to be harmful.
Correct answer is: A computational prediction of variant deleteriousness

Q.38 What does 'GC skew' indicate in a DNA sequence?

Difference in GC content across strands
The absolute GC content percentage
The ratio of G to C nucleotides
The number of GC-rich repeats
Explanation - GC skew is calculated as (G-C)/(G+C) and can reveal replication origins.
Correct answer is: Difference in GC content across strands

Q.39 Which of these tools is used for visualizing alignment data in a genome browser?

SAMtools
IGV
BWA
FastQC
Explanation - IGV (Integrative Genomics Viewer) lets users view BAM, VCF, and other genomic data interactively.
Correct answer is: IGV

Q.40 Which sequencing chemistry uses 'nanopore' technology?

Illumina
PacBio
Oxford Nanopore Technologies
Sanger
Explanation - Nanopore sequencing measures changes in electrical current as DNA strands pass through a pore.
Correct answer is: Oxford Nanopore Technologies

Q.41 What does 'base composition bias' refer to in sequencing data?

Unequal representation of A/T/C/G nucleotides across reads
The length distribution of reads
The number of duplicate reads
The GC content of the genome
Explanation - Bias can affect downstream analysis by skewing frequency estimates.
Correct answer is: Unequal representation of A/T/C/G nucleotides across reads

Q.42 Which of the following is a hallmark of a 'hardy-weinberg equilibrium' test in population genetics?

Testing for uniform read coverage
Assessing genotype frequencies against expected frequencies
Evaluating sequencing error rates
Checking for GC bias
Explanation - The HWE test checks whether allele frequencies conform to expected random mating ratios.
Correct answer is: Assessing genotype frequencies against expected frequencies

Q.43 Which type of sequencing is most suitable for detecting large structural variants?

Short-read Illumina sequencing
Long-read PacBio or Oxford Nanopore sequencing
ChIP-Seq
Sanger sequencing
Explanation - Long reads span breakpoints, making structural variant detection easier.
Correct answer is: Long-read PacBio or Oxford Nanopore sequencing

Q.44 In a BAM file, what information is indicated by the flag '0x10'?

Read is mapped to the reverse strand
Read is unmapped
Read is paired
Read is duplicate
Explanation - SAM flag 0x10 indicates the read aligns to the reverse DNA strand.
Correct answer is: Read is mapped to the reverse strand

Q.45 Which of the following best explains 'PCR bias' in library preparation?

Uneven amplification leading to overrepresentation of some fragments
Complete removal of all PCR duplicates
Amplification of only the reverse strand
A technique to improve sequencing accuracy
Explanation - Certain sequences amplify more efficiently, skewing representation in the final library.
Correct answer is: Uneven amplification leading to overrepresentation of some fragments

Q.46 What is the main advantage of 'targeted sequencing' over whole-genome sequencing?

Higher coverage depth for regions of interest
Lower cost and data volume
Ability to sequence entire genomes rapidly
No need for a reference genome
Explanation - Targeted panels focus on specific loci, enabling deep coverage at lower cost.
Correct answer is: Higher coverage depth for regions of interest

Q.47 Which of the following is a characteristic of a 'variant allele frequency' (VAF) of 0.5?

The variant is homozygous
The variant is present in all cells
The variant is heterozygous in a diploid genome
The variant is a sequencing error
Explanation - In diploid organisms, a VAF of 0.5 indicates one copy of the variant allele.
Correct answer is: The variant is heterozygous in a diploid genome

Q.48 What does 'mate-pair' sequencing differ from 'paired-end' sequencing in terms of insert size?

Mate-pair uses shorter inserts
Mate-pair uses longer inserts spanning several kilobases
Mate-pair sequences only one read
Mate-pair is a type of PCR
Explanation - Mate-pair libraries typically have 2–5 kb inserts, helping resolve structural variation.
Correct answer is: Mate-pair uses longer inserts spanning several kilobases

Q.49 Which software is commonly used for de novo assembly of Illumina short reads?

SPAdes
BWA
BLAST
SAMtools
Explanation - SPAdes is optimized for assembling short-read sequencing data.
Correct answer is: SPAdes

Q.50 In variant annotation, what does the term 'transcript consequence' refer to?

The effect of a variant on an mRNA transcript (e.g., missense, nonsense)
The number of transcripts in a gene
The location of the variant in the genome
The reference allele used
Explanation - Transcript consequence describes how the variant changes the amino acid sequence.
Correct answer is: The effect of a variant on an mRNA transcript (e.g., missense, nonsense)

Q.51 What is a 'barcode collision' in multiplexed sequencing?

Two distinct samples sharing the same index sequence
A read with incorrect base quality
A duplicated read from PCR
An error in adapter ligation
Explanation - Barcodes must be unique; collisions cause sample misassignment.
Correct answer is: Two distinct samples sharing the same index sequence

Q.52 Which of the following best describes the 'read mapping quality' (MAPQ) metric?

Confidence that the read is correctly mapped to a unique location
The length of the read
The GC content of the read
The number of mismatches in the read
Explanation - MAPQ is derived from the alignment score and indicates mapping reliability.
Correct answer is: Confidence that the read is correctly mapped to a unique location

Q.53 What does the 'Duplication Rate' in FastQC reports indicate?

The proportion of duplicate reads
The average read length
The amount of adapter contamination
The GC bias
Explanation - High duplication suggests PCR overamplification or low library complexity.
Correct answer is: The proportion of duplicate reads

Q.54 In the context of RNA-Seq, what is 'FPKM' used to measure?

Genome assembly quality
Gene expression levels
Variant allele frequencies
Sequencing error rates
Explanation - FPKM (Fragments Per Kilobase per Million) normalizes read counts by gene length and sequencing depth.
Correct answer is: Gene expression levels

Q.55 Which of the following best defines 'insert size distribution' in a sequencing library?

The average number of reads per sample
The distribution of DNA fragment lengths after library prep
The GC content of the reads
The coverage depth across the genome
Explanation - Insert size distribution impacts library complexity and downstream mapping.
Correct answer is: The distribution of DNA fragment lengths after library prep

Q.56 Which of the following best describes a 'soft clip' in an alignment?

A portion of the read that does not align to the reference
A read that aligns perfectly
A read that is marked as duplicate
A read that is unmapped
Explanation - Soft clipping indicates that part of the read is excluded from alignment but retained.
Correct answer is: A portion of the read that does not align to the reference

Q.57 What is the main purpose of performing a 'joint variant calling' across multiple samples?

To reduce computational load
To increase the accuracy of variant detection by leveraging shared information
To align reads more quickly
To generate de novo assemblies
Explanation - Joint calling considers allele frequencies across samples, improving sensitivity.
Correct answer is: To increase the accuracy of variant detection by leveraging shared information

Q.58 Which of these is a common source of systematic error in Illumina sequencing?

Methylation of DNA
Phasing errors due to overlapping reads
Low GC content
Adapter dimer formation
Explanation - Adapter dimers can dominate low-complexity libraries, leading to poor data quality.
Correct answer is: Adapter dimer formation

Q.59 In the context of variant annotation, what does the 'CADD' score indicate?

Read depth at a variant site
The predicted deleteriousness of a variant
The frequency of the variant in the population
The alignment quality of the read
Explanation - CADD integrates multiple annotations to assess potential functional impact.
Correct answer is: The predicted deleteriousness of a variant

Q.60 Which of the following best explains the 'paired-end mapping' approach?

Mapping one end of the read only
Mapping both ends and using the insert size for validation
Mapping reads without a reference
Mapping reads after PCR amplification
Explanation - Paired-end mapping checks consistency between read pairs to improve alignment confidence.
Correct answer is: Mapping both ends and using the insert size for validation

Q.61 What does 'rRNA depletion' accomplish in RNA-Seq library prep?

Enriches for messenger RNA by removing ribosomal RNA
Adds adapters to all RNA molecules
Amplifies rRNA for sequencing
Measures ribosomal protein expression
Explanation - rRNA constitutes ~80% of total RNA; its removal increases coverage of coding transcripts.
Correct answer is: Enriches for messenger RNA by removing ribosomal RNA

Q.62 Which of these describes a 'haplotype block'?

A contiguous set of alleles inherited together
A set of identical reads
A region with no variants
A collection of sequencing adapters
Explanation - Haplotype blocks reflect linkage disequilibrium among neighboring SNPs.
Correct answer is: A contiguous set of alleles inherited together

Q.63 What is the main function of 'GATK' in variant discovery?

Align reads to a reference genome
Call variants and perform local realignment and base quality recalibration
Generate FASTQ files
Visualize alignments
Explanation - GATK includes tools like HaplotypeCaller, IndelRealigner, and BaseRecalibrator.
Correct answer is: Call variants and perform local realignment and base quality recalibration

Q.64 Which of the following best describes the 'coverage uniformity' metric?

The evenness of sequencing depth across the genome
The total number of reads sequenced
The GC content of the genome
The percentage of duplicate reads
Explanation - Uniform coverage ensures reliable variant calling across all genomic regions.
Correct answer is: The evenness of sequencing depth across the genome

Q.65 What is the role of a 'reference allele' in a VCF file?

It indicates the alternate allele present in the sample
It is the allele from the reference genome used as baseline
It represents the sequencing error
It is the read that mapped to this position
Explanation - The REF field in VCF lists the nucleotide found in the reference at that locus.
Correct answer is: It is the allele from the reference genome used as baseline

Q.66 What does a 'high duplication rate' in a sequencing library suggest?

High library complexity
Low PCR duplication or high coverage
Potential overamplification or low diversity
Increased sequencing accuracy
Explanation - High duplication indicates many reads come from the same original molecule.
Correct answer is: Potential overamplification or low diversity

Q.67 Which of these is a typical output of the 'Picard MarkDuplicates' tool?

A BAM file with duplicates marked
A FASTQ file of raw reads
A list of variant calls
An alignment score report
Explanation - Picard tags duplicate reads with the 'duplicate' flag for downstream analysis.
Correct answer is: A BAM file with duplicates marked

Q.68 In RNA-Seq analysis, which method normalizes read counts accounting for gene length and sequencing depth?

TPM
RPKM
FPKM
All of the above
Explanation - TPM, RPKM, and FPKM are all normalization strategies for RNA-Seq.
Correct answer is: All of the above

Q.69 What is the primary challenge when aligning short reads to highly repetitive genomic regions?

Read quality drops dramatically
Aligners cannot detect any variants
Ambiguous mapping leads to multi-mapped reads
The read length becomes too short
Explanation - Repetitive sequences cause reads to align to multiple loci, reducing confidence.
Correct answer is: Ambiguous mapping leads to multi-mapped reads

Q.70 Which of the following is an advantage of using a 'dual-indexed' library?

Higher read length
Reduced barcode collision risk
No need for adapters
Lower sequencing cost
Explanation - Dual indexing uses two distinct barcodes, minimizing sample misassignment.
Correct answer is: Reduced barcode collision risk

Q.71 What does the 'pileup' operation provide in variant calling?

Alignment of reads to the genome
Summary of read bases at each genomic position
Visualization of coverage
Mapping quality statistics
Explanation - Pileup shows how many reads support each allele at a locus.
Correct answer is: Summary of read bases at each genomic position

Q.72 In the context of sequencing, what does 'sequencing depth' refer to?

The length of sequencing reads
The number of times a base is read on average
The number of different samples sequenced
The cost of the sequencing run
Explanation - Depth, or coverage, indicates redundancy and confidence in variant calls.
Correct answer is: The number of times a base is read on average

Q.73 Which of these steps is performed after raw FASTQ files are generated but before variant calling?

Read alignment to a reference genome
Variant annotation
Gene expression quantification
PCR amplification
Explanation - Alignment is essential to map reads before calling variants.
Correct answer is: Read alignment to a reference genome

Q.74 What does the 'Read Group' (RG) tag in BAM files represent?

The sequencing instrument ID and sample information
The quality of individual reads
The number of duplicate reads
The alignment score
Explanation - RG tags provide metadata for grouping reads from the same sample.
Correct answer is: The sequencing instrument ID and sample information

Q.75 Which of the following is NOT a typical output of a 'variant calling' pipeline?

VCF file with identified variants
Alignment file (BAM)
Quality control report
Phylogenetic tree
Explanation - Variant calling pipelines focus on variant discovery, not phylogenetics.
Correct answer is: Phylogenetic tree

Q.76 Which of the following best describes the 'FASTQ format'?

A binary format for aligned reads
A text format storing sequence and quality scores
A database for variant annotations
A format for genome assemblies
Explanation - FASTQ holds each read's nucleotide sequence and its corresponding quality values.
Correct answer is: A text format storing sequence and quality scores

Q.77 In the context of NGS, what does 'UMI' stand for, and what is its purpose?

Universal Molecular Identifier; it tags individual DNA molecules to collapse duplicates
Unique Mapping Index; it maps reads uniquely
Universal Multiple Interrogation; it interrogates multiple loci
Unmatched Marker Index; it marks unmatched reads
Explanation - UMIs help identify PCR duplicates originating from the same original DNA fragment.
Correct answer is: Universal Molecular Identifier; it tags individual DNA molecules to collapse duplicates

Q.78 Which of the following metrics indicates the presence of adapter contamination in a FASTQ file?

High duplication rate
Low read length
High proportion of reads with adapter sequence
High GC content
Explanation - Adapter sequences appear in reads when library fragments are shorter than read length.
Correct answer is: High proportion of reads with adapter sequence

Q.79 What is a 'k-mer' in the context of genome assembly?

A short sequence of 'k' nucleotides used for assembly graphs
A variant of the base-calling algorithm
A type of sequencing error
A measure of GC content
Explanation - K-mers represent overlapping subsequences that build de Bruijn graphs in assembly.
Correct answer is: A short sequence of 'k' nucleotides used for assembly graphs

Q.80 Which of these best describes the purpose of 'base quality recalibration' in GATK?

Adjust base quality scores based on known variant positions
Remove duplicate reads
Align reads to the reference genome
Identify structural variants
Explanation - Recalibration refines quality scores by modeling systematic errors.
Correct answer is: Adjust base quality scores based on known variant positions

Q.81 In a sequencing experiment, what does a 'high Q30 rate' indicate?

A high proportion of bases with Q30 or higher quality
A low proportion of reads containing adapters
A high duplication rate
A high GC content
Explanation - Q30 means a 1 in 1000 chance of error; a high Q30 rate suggests good data quality.
Correct answer is: A high proportion of bases with Q30 or higher quality

Q.82 What is a key benefit of using 'RNA-Seq' over microarray for gene expression profiling?

It can detect novel transcripts and splice variants
It is cheaper to perform
It requires less computational analysis
It provides absolute quantification of proteins
Explanation - RNA-Seq sequences transcripts directly, revealing new isoforms and expression levels.
Correct answer is: It can detect novel transcripts and splice variants

Q.83 Which of the following best describes the 'coverage bias' issue in NGS?

Unequal sequencing depth across different genomic regions
The tendency to overcall variants
The use of incorrect adapters
The presence of low-quality bases
Explanation - Coverage bias leads to uneven representation, affecting variant detection sensitivity.
Correct answer is: Unequal sequencing depth across different genomic regions

Q.84 Which of these tools is commonly used for visualizing the distribution of GC content across a genome?

IGV
Circos
BLAST
FastQC
Explanation - Circos plots enable circular representation of GC and other genomic metrics.
Correct answer is: Circos

Q.85 What does the 'SNP' abbreviation stand for?

Single Nucleotide Polymorphism
Sequence Nucleic Polymerase
Single Nucleotide Protein
Serial Nucleotide Pattern
Explanation - SNPs are single-base variations in the genome between individuals.
Correct answer is: Single Nucleotide Polymorphism

Q.86 In NGS data analysis, what is the function of a 'reference genome'?

To provide a template for mapping reads
To generate sequencing reads
To act as a storage medium
To identify sequencing errors
Explanation - Reads are aligned to a reference to locate their positions in the genome.
Correct answer is: To provide a template for mapping reads

Q.87 Which of these best describes a 'de novo assembly' approach?

Assembling sequences without a reference genome
Aligning reads to a known reference genome
Identifying variants against a reference
Annotating genes in a reference genome
Explanation - De novo assembly reconstructs genomes from scratch using overlapping reads.
Correct answer is: Assembling sequences without a reference genome

Q.88 Which of the following is a major advantage of long-read sequencing platforms like PacBio?

Short read length
High per-base error rate
Ability to span repetitive regions
Lower cost per base
Explanation - Long reads can resolve complex structural variants and repeat elements.
Correct answer is: Ability to span repetitive regions

Q.89 What does the 'VCF' file format contain?

Fastq reads and quality scores
Variant calls and annotations
Reference genome sequences
Alignment metrics
Explanation - VCF lists positions, reference and alternate alleles, genotype data, and optional annotations.
Correct answer is: Variant calls and annotations

Q.90 Which of these best explains what a 'soft clip' indicates in a BAM alignment?

Read was removed from the alignment
Part of the read did not align to reference
The read is duplicated
The read aligns to the reverse strand
Explanation - Soft clipping marks unaligned portions while retaining the read in the file.
Correct answer is: Part of the read did not align to reference

Q.91 What is the purpose of 'adapter trimming' in NGS data preprocessing?

To remove low-quality bases
To eliminate residual sequencing adapters from reads
To increase read length
To annotate variants
Explanation - Adapters can interfere with mapping if not removed from read ends.
Correct answer is: To eliminate residual sequencing adapters from reads

Q.92 In an NGS experiment, what does the term 'multiplexing' refer to?

Sequencing multiple samples in a single run using unique barcodes
Sequencing the same sample multiple times
Using multiple sequencers in parallel
Adding multiple adapters to a single fragment
Explanation - Multiplexing reduces cost and increases throughput by pooling samples.
Correct answer is: Sequencing multiple samples in a single run using unique barcodes

Q.93 Which of the following best describes a 'genotype' in variant analysis?

The nucleotide sequence of a gene
The combination of alleles at a particular locus in an individual
The physical location of a gene on a chromosome
The number of reads covering a position
Explanation - Genotype denotes whether an individual is homozygous or heterozygous for a variant.
Correct answer is: The combination of alleles at a particular locus in an individual

Q.94 Which of these tools can be used to identify structural variants from NGS data?

GATK HaplotypeCaller
SAMtools mpileup
BreakDancer
FastQC
Explanation - BreakDancer detects structural variations such as deletions, insertions, inversions, and translocations.
Correct answer is: BreakDancer

Q.95 What is the 'FASTQ' file format commonly used for?

Storing aligned reads
Storing raw sequencing reads and quality scores
Storing variant annotations
Storing gene expression matrices
Explanation - FASTQ contains both the nucleotide sequence and its per-base quality information.
Correct answer is: Storing raw sequencing reads and quality scores

Q.96 Which of the following is NOT a standard step in an NGS data analysis pipeline?

Quality control of raw reads
Read alignment to a reference genome
Protein folding prediction
Variant calling and annotation
Explanation - Protein folding is outside typical NGS workflows, which focus on sequence data.
Correct answer is: Protein folding prediction

Q.97 Which of the following best describes the purpose of the 'Picard CollectInsertSizeMetrics' tool?

Calculate the insert size distribution of a sequencing library
Detect structural variants
Align reads to a reference genome
Perform quality trimming
Explanation - This tool summarizes insert size statistics to assess library preparation quality.
Correct answer is: Calculate the insert size distribution of a sequencing library

Q.98 In a BAM file, what does the flag '0x100' indicate?

Read is properly paired
Read is mapped to reverse strand
Read is not primary alignment (secondary)
Read is a duplicate
Explanation - Flag 0x100 marks secondary alignments for the same read pair.
Correct answer is: Read is not primary alignment (secondary)

Q.99 Which of these metrics is commonly used to evaluate the uniformity of read coverage across a target region?

Mean coverage
Coverage breadth
Coefficient of variation
GC bias score
Explanation - The coefficient of variation measures the spread of coverage, indicating uniformity.
Correct answer is: Coefficient of variation

Q.100 What is the primary benefit of using 'UMIs' (Unique Molecular Identifiers) in NGS library prep?

To identify and collapse PCR duplicates
To increase sequencing depth
To reduce read length
To add adapters
Explanation - UMIs tag each original DNA molecule, enabling accurate duplicate removal.
Correct answer is: To identify and collapse PCR duplicates

Q.101 Which of these best describes a 'variant allele frequency' (VAF) of 0.1 in a diploid sample?

Variant present on 10% of sequencing reads, likely subclonal or low allele fraction
Variant present on 10% of cells in the population
Variant is homozygous
Variant is a sequencing error
Explanation - A VAF of 0.1 means 10% of reads support the alternate allele, indicating low abundance.
Correct answer is: Variant present on 10% of sequencing reads, likely subclonal or low allele fraction

Q.102 Which of the following is a common quality control step after alignment but before variant calling?

Duplicate marking
Adapter trimming
Read length filtering
Variant annotation
Explanation - Removing PCR duplicates ensures accurate variant frequency estimation.
Correct answer is: Duplicate marking

Q.103 Which of these tools is used for de novo assembly of short reads?

SPAdes
BWA-MEM
SAMtools
Picard
Explanation - SPAdes constructs contigs from short reads using a de Bruijn graph approach.
Correct answer is: SPAdes

Q.104 What does the 'Read Group' (RG) tag in a BAM file provide?

Sequencer and sample information for each read
Base quality scores for each read
The number of duplicates in each read
The mapping quality of each read
Explanation - RG tags identify the sequencing run, library, and sample, aiding downstream analysis.
Correct answer is: Sequencer and sample information for each read

Q.105 Which of the following best explains the concept of 'genomic coverage' in sequencing?

The number of unique reads sequenced
The proportion of the genome successfully sequenced
The length of sequencing reads
The number of sequencing cycles performed
Explanation - Coverage refers to how much of the target region or genome has been captured by reads.
Correct answer is: The proportion of the genome successfully sequenced

Q.106 Which of the following best describes the 'Illumina sequencing-by-synthesis' chemistry?

Adding nucleotides one at a time and detecting fluorescence
Detecting changes in electrical current as DNA passes a pore
Using Sanger chain termination
Sequencing DNA by mass spectrometry
Explanation - Illumina uses reversible terminators that emit light upon incorporation.
Correct answer is: Adding nucleotides one at a time and detecting fluorescence

Q.107 Which of these steps is NOT typically performed during library preparation for Illumina sequencing?

Fragmentation
End-repair
Poly-A selection
Adaptor ligation
Explanation - Poly-A selection is used in mRNA enrichment for RNA-Seq, not standard DNA library prep.
Correct answer is: Poly-A selection

Q.108 Which of the following is a typical input file for variant annotation tools?

FASTQ
BAM
VCF
SAM
Explanation - Annotation tools enrich VCF files with functional information about each variant.
Correct answer is: VCF

Q.109 In a sequencing experiment, what does 'duplication rate' measure?

The proportion of identical reads that likely came from PCR amplification
The fraction of reads mapping to the reference genome
The percentage of low-quality bases
The amount of adapter contamination
Explanation - A high duplication rate can indicate overamplification or low library complexity.
Correct answer is: The proportion of identical reads that likely came from PCR amplification

Q.110 Which of the following best describes a 'haplotype block'?

A region with high recombination
A set of alleles that are inherited together due to linkage disequilibrium
A group of duplicated genes
A set of sequencing adapters
Explanation - Haplotype blocks are contiguous segments where variants tend to co-occur.
Correct answer is: A set of alleles that are inherited together due to linkage disequilibrium

Q.111 What does the 'Q' score in FASTQ represent?

Read length
GC content
Base quality (error probability)
Alignment score
Explanation - The Phred Q score quantifies the confidence in a base call.
Correct answer is: Base quality (error probability)

Q.112 Which of the following is a common use of 'NGS' data in clinical diagnostics?

Determining blood type
Diagnosing genetic diseases by identifying pathogenic variants
Measuring blood pressure
Testing for infectious disease by antibody titers
Explanation - NGS is employed to find causative mutations in hereditary disorders.
Correct answer is: Diagnosing genetic diseases by identifying pathogenic variants

Q.113 Which of the following best describes the 'Insert Size' of a sequencing library?

Length of the adapter sequence
Distance between the two read starts on the reference genome
The number of fragments in the library
The number of duplicates in the library
Explanation - Insert size refers to the original fragment length, including both reads and unsequenced middle.
Correct answer is: Distance between the two read starts on the reference genome

Q.114 What is the primary difference between a 'single-end' and a 'paired-end' sequencing read?

Single-end reads are longer
Paired-end reads come from both ends of a DNA fragment
Paired-end reads are single-stranded
Single-end reads do not require adapters
Explanation - Paired-end sequencing generates two reads per fragment, improving mapping.
Correct answer is: Paired-end reads come from both ends of a DNA fragment

Q.115 Which of the following is a common tool for visualizing BAM files and variant calls?

IGV
SAMtools
Picard
GATK
Explanation - IGV provides an interactive graphical interface for viewing alignments and variants.
Correct answer is: IGV

Q.116 Which of the following best describes an 'indel' in genetic variation?

Insertion or deletion of one or more nucleotides
A point mutation that changes a single base
A large chromosomal rearrangement
A substitution of a base pair
Explanation - Indels are small insertions or deletions that alter the DNA sequence.
Correct answer is: Insertion or deletion of one or more nucleotides

Q.117 What does the 'BAM' file format store?

Raw sequencing reads
Aligned sequencing reads with metadata
Variant call information
Gene expression matrices
Explanation - BAM is the binary compressed form of the SAM alignment format.
Correct answer is: Aligned sequencing reads with metadata

Q.118 Which of the following best describes the 'reference allele' in a VCF file?

The allele found in the reference genome at that position
The allele that is mutated in the sample
The most common allele in the population
The allele that is sequenced with highest quality
Explanation - The REF field shows the base present in the reference genome.
Correct answer is: The allele found in the reference genome at that position

Q.119 Which of these best explains why 'GC bias' is problematic in NGS data?

It leads to preferential sequencing of GC-rich or AT-rich regions
It causes reads to map to the wrong chromosome
It increases the number of duplicates
It reduces overall read length
Explanation - GC bias creates uneven coverage, affecting variant calling and quantification.
Correct answer is: It leads to preferential sequencing of GC-rich or AT-rich regions

Q.120 What is the main function of a 'k-mer' in de novo assembly?

To represent short, overlapping sequences for graph construction
To denote the size of an adapter
To describe the quality score of a base
To count the number of variants
Explanation - K-mers are the building blocks of de Bruijn graphs used in assembly algorithms.
Correct answer is: To represent short, overlapping sequences for graph construction

Q.121 Which of the following best describes a 'phred score' of Q30?

1 in 30 chance of error
1 in 30% error
1 in 1000 chance of error
30% base calling accuracy
Explanation - Q30 = -10 log10(1/1000) = 30, meaning 0.1% error probability.
Correct answer is: 1 in 1000 chance of error

Q.122 What does 'N' represent in a FASTQ sequence?

Unknown or ambiguous nucleotide
A nucleotide with high quality
A nucleotide that is not part of the reference genome
The start of a new read
Explanation - N indicates that the base cannot be determined or is missing.
Correct answer is: Unknown or ambiguous nucleotide

Q.123 Which of the following best describes a 'variant allele frequency' (VAF) of 0.5?

Variant is homozygous
Variant is heterozygous in a diploid genome
Variant is absent
Variant is present in 5% of reads
Explanation - In diploids, a VAF of 0.5 indicates one allele differs from the other.
Correct answer is: Variant is heterozygous in a diploid genome

Q.124 Which of these best describes a 'paired-end' alignment?

Alignment of reads from both ends of a fragment to the reference genome
Alignment of a single read only
Alignment performed without considering pairs
Alignment of reads to a de novo assembly
Explanation - Paired-end alignment uses information from both reads to improve mapping accuracy.
Correct answer is: Alignment of reads from both ends of a fragment to the reference genome

Q.125 What is the purpose of using 'adapter trimming' tools like Trimmomatic?

To remove sequencing adapters from the ends of reads before alignment
To merge paired-end reads
To call variants from raw data
To assemble genomes de novo
Explanation - Adapters can cause misalignment and false variant calls if not trimmed.
Correct answer is: To remove sequencing adapters from the ends of reads before alignment

Q.126 In a variant calling pipeline, which tool is commonly used for local realignment around indels?

SAMtools
Picard
GATK IndelRealigner
FastQC
Explanation - IndelRealigner reduces misalignment errors near insertions or deletions.
Correct answer is: GATK IndelRealigner

Q.127 What is the primary difference between 'FASTQ' and 'SAM' files?

FASTQ stores raw reads; SAM stores alignments to a reference genome
SAM stores raw reads; FASTQ stores alignments
Both are the same format but different names
FASTQ contains quality scores; SAM does not
Explanation - FASTQ is pre-alignment, while SAM/BAM are post-alignment files.
Correct answer is: FASTQ stores raw reads; SAM stores alignments to a reference genome

Q.128 What is a 'base quality score' used for?

Determining read length
Assessing confidence in a base call
Calculating GC content
Aligning reads to a reference
Explanation - Higher quality scores indicate lower probability of error.
Correct answer is: Assessing confidence in a base call

Q.129 Which of these is a typical output of a FASTQC quality control report?

Per-base sequence quality histogram
Gene expression matrix
Variant call file (VCF)
Alignment file (BAM)
Explanation - FASTQC reports include plots of sequence quality, GC content, and duplication levels.
Correct answer is: Per-base sequence quality histogram

Q.130 What is the purpose of 'dephasing' in Illumina sequencing?

To calibrate base quality scores
To account for errors introduced by incomplete base incorporation
To remove adapter sequences
To align reads to a reference genome
Explanation - Dephasing measures loss of synchronization during sequencing cycles.
Correct answer is: To account for errors introduced by incomplete base incorporation

Q.131 What does a 'high duplication rate' in a sequencing experiment typically indicate?

High library complexity
PCR overamplification or low complexity of the library
High read quality
Low sequencing depth
Explanation - Duplicate reads often arise from repeated amplification of the same fragment.
Correct answer is: PCR overamplification or low complexity of the library

Q.132 Which of the following best describes the 'FASTA' file format?

Stores raw sequencing reads and quality scores
Stores aligned reads
Stores nucleotide sequences with optional headers
Stores variant annotations
Explanation - FASTA contains sequences preceded by a header line starting with '>'
Correct answer is: Stores nucleotide sequences with optional headers

Q.133 What does the 'GATK HaplotypeCaller' tool do?

Assembles genomes de novo
Calls single nucleotide variants and small indels from aligned reads
Aligns reads to a reference
Calculates GC bias
Explanation - HaplotypeCaller uses local assembly to improve variant calling accuracy.
Correct answer is: Calls single nucleotide variants and small indels from aligned reads

Q.134 Which of the following best describes a 'structural variant' in genetics?

A single base change
An insertion or deletion of more than 50 base pairs
A substitution of a single base
A change in methylation status
Explanation - Structural variants include large insertions, deletions, inversions, and translocations.
Correct answer is: An insertion or deletion of more than 50 base pairs

Q.135 Which of the following best describes a 'phasing' problem in NGS data analysis?

Determining the order of variants on the same chromosome
Mapping reads to the reference genome
Removing duplicate reads
Trimming adapter sequences
Explanation - Phasing resolves which alleles co-occur on the same haplotype.
Correct answer is: Determining the order of variants on the same chromosome

Q.136 What does the 'coverage breadth' metric measure?

Average depth of coverage across all bases
Proportion of target bases that receive at least one read
Number of duplicate reads
Percentage of bases with high quality scores
Explanation - Coverage breadth indicates how much of the target region is represented in the data.
Correct answer is: Proportion of target bases that receive at least one read

Q.137 Which of the following best explains a 'soft clip' in a sequencing alignment?

The entire read is removed from the alignment
A portion of the read is excluded from the alignment but retained in the file
The read is marked as duplicate
The read is aligned to the reverse strand
Explanation - Soft clipping indicates a misaligned portion at the read's end.
Correct answer is: A portion of the read is excluded from the alignment but retained in the file

Q.138 Which of these best describes the term 'mapping quality' (MAPQ)?

Confidence that a read maps to a unique location
Base quality of a read
Read depth at a position
GC content of a read
Explanation - MAPQ is derived from alignment scores and indicates mapping certainty.
Correct answer is: Confidence that a read maps to a unique location

Q.139 Which of these best explains what a 'k-mer' is in computational genomics?

A type of adapter sequence
A short contiguous subsequence of length k nucleotides
A measure of base quality
A type of variant annotation
Explanation - K-mers are the fundamental units used in de Bruijn graph-based assembly.
Correct answer is: A short contiguous subsequence of length k nucleotides

Q.140 What is the main advantage of 'long-read' sequencing for detecting structural variants?

Higher base accuracy
Longer read lengths that can span complex regions
Lower cost
Higher throughput
Explanation - Long reads can directly cover breakpoints of large deletions, insertions, and rearrangements.
Correct answer is: Longer read lengths that can span complex regions

Q.141 Which of these metrics is used to evaluate the performance of a variant caller?

Sensitivity and specificity
Read length
GC bias
Duplication rate
Explanation - Sensitivity measures true positive rate; specificity measures true negative rate.
Correct answer is: Sensitivity and specificity

Q.142 Which of the following best describes 'base quality score recalibration' in GATK?

Adjusting base quality scores based on systematic sequencing errors
Removing duplicate reads
Aligning reads to the reference genome
Calculating GC content
Explanation - Recalibration models error rates and updates quality scores accordingly.
Correct answer is: Adjusting base quality scores based on systematic sequencing errors

Q.143 What does the 'BWA-MEM' algorithm do?

Aligns short reads to a reference genome
Calls variants from aligned reads
Trims adapters from reads
Assembles genomes de novo
Explanation - BWA-MEM is a widely used alignment tool for short read mapping.
Correct answer is: Aligns short reads to a reference genome

Q.144 Which of the following best describes a 'gene fusion' event?

Two genes located on the same chromosome
A recombination event that joins exons from two separate genes
A single nucleotide variant in a gene
A copy number gain of a gene
Explanation - Gene fusions can create novel proteins and are common in cancers.
Correct answer is: A recombination event that joins exons from two separate genes

Q.145 Which of the following best explains 'coverage uniformity' in NGS?

Equal read length across all reads
Even distribution of read depth across the target region
Uniform base quality scores
Uniform GC content across reads
Explanation - Uniformity ensures reliable variant detection across the genome or target.
Correct answer is: Even distribution of read depth across the target region

Q.146 What is a 'soft clip' in a BAM file alignment?

A read that does not align
A read that aligns to the reverse strand
A portion of a read that does not align and is represented with a soft clip
A read that is a duplicate
Explanation - Soft clipping indicates that part of the read is unaligned but still kept in the file.
Correct answer is: A portion of a read that does not align and is represented with a soft clip

Q.147 Which of these tools is used to identify structural variants from paired-end read data?

BreakDancer
BWA-MEM
SAMtools
FastQC
Explanation - BreakDancer analyzes discordant read pairs to call large structural changes.
Correct answer is: BreakDancer

Q.148 What does 'SAM' stand for in the SAM file format?

Sequence Alignment/Map
Sequence Analysis Module
Single-Read Alignment Model
Sample Annotation Matrix
Explanation - SAM is the text-based format that stores read alignments to a reference.
Correct answer is: Sequence Alignment/Map

Q.149 Which of these best describes the 'Q' quality score in FASTQ files?

The quality of the sequencing machine
The probability of an error for a specific base call
The length of a read
The mapping quality of a read
Explanation - Phred quality scores quantify base call accuracy.
Correct answer is: The probability of an error for a specific base call

Q.150 What is a 'variant allele frequency' (VAF) of 0.1 typically indicative of in a diploid sample?

A homozygous variant
A heterozygous variant
A subclonal variant present in ~10% of cells
A sequencing error
Explanation - VAF of 0.1 means only ~10% of reads support the alternate allele, often due to low allele fraction.
Correct answer is: A subclonal variant present in ~10% of cells

Q.151 Which of the following is a typical output of a 'variant calling' step?

FASTA file
BAM file
VCF file
FASTQ file
Explanation - VCF stores identified variants, genotype information, and optional annotations.
Correct answer is: VCF file

Q.152 Which of the following best explains 'base quality score recalibration'?

Adjusting quality scores to account for systematic errors
Removing duplicate reads from the data
Aligning reads to a reference genome
Trimming adapter sequences
Explanation - Recalibration corrects for biases in base calling quality scores.
Correct answer is: Adjusting quality scores to account for systematic errors

Q.153 What does a 'high Q30 rate' indicate about a sequencing run?

High proportion of low-quality bases
High proportion of bases with a quality score of at least 30
Low read length
High duplication rate
Explanation - A Q30 rate >80% suggests good sequencing quality.
Correct answer is: High proportion of bases with a quality score of at least 30

Q.154 In RNA-Seq, what is the purpose of 'poly-A selection'?

Enrich for messenger RNA by removing ribosomal RNA
Add adapters to RNA fragments
Sequence DNA instead of RNA
Degrade all RNA species
Explanation - Poly-A selection captures the polyadenylated tail of mRNAs, reducing rRNA contamination.
Correct answer is: Enrich for messenger RNA by removing ribosomal RNA

Q.155 Which of the following best describes an 'insertion' variant?

Deletion of one or more nucleotides
Addition of one or more nucleotides
Substitution of a single base
Duplication of an entire chromosome
Explanation - Insertions add nucleotides relative to the reference sequence.
Correct answer is: Addition of one or more nucleotides

Q.156 What is the main purpose of a 'variant annotation' step?

To remove duplicate reads
To assign functional context to variants (e.g., gene impact)
To align reads to a reference genome
To trim adapters
Explanation - Annotation tools link variants to genes, proteins, and known disease associations.
Correct answer is: To assign functional context to variants (e.g., gene impact)

Q.157 Which of the following best describes 'GC bias' in NGS?

Uneven coverage of genomic regions with high or low GC content
Error in GC content calculation
Overrepresentation of GC-rich fragments in the library
Underrepresentation of AT-rich fragments
Explanation - GC bias leads to coverage dropouts in extreme GC regions.
Correct answer is: Uneven coverage of genomic regions with high or low GC content

Q.158 Which of the following best describes a 'paired-end' sequencing run?

Sequencing both ends of each DNA fragment
Sequencing a single end of each DNA fragment
Sequencing two different fragments independently
Sequencing one end of each fragment twice
Explanation - Paired-end reads provide two sequences per fragment, improving mapping.
Correct answer is: Sequencing both ends of each DNA fragment

Q.159 Which of these best explains a 'k-mer' in computational genomics?

A short contiguous subsequence of length k nucleotides
A type of adapter sequence
A quality score for a base
A variant annotation
Explanation - K-mers are used in de Bruijn graph assembly algorithms.
Correct answer is: A short contiguous subsequence of length k nucleotides

Q.160 What does the 'coverage breadth' metric assess in a sequencing experiment?

The average depth across all positions
The fraction of targeted bases that receive at least one read
The total number of reads sequenced
The proportion of duplicates
Explanation - Coverage breadth indicates how completely a target region is sequenced.
Correct answer is: The fraction of targeted bases that receive at least one read

Q.161 Which of these best describes the term 'variant allele frequency' (VAF)?

The percentage of reads supporting the alternative allele
The depth of coverage at a variant site
The number of duplicates at a variant site
The quality score of a variant call
Explanation - VAF is calculated as alternate reads divided by total reads at that locus.
Correct answer is: The percentage of reads supporting the alternative allele

Q.162 Which of the following is NOT a typical output of the 'FastQC' tool?

Per-base sequence quality plot
GC content histogram
Alignment file (BAM)
Duplication level plot
Explanation - FastQC provides quality reports; it does not produce aligned reads.
Correct answer is: Alignment file (BAM)

Q.163 Which of these tools is commonly used for detecting copy number variations from NGS data?

GATK
CNVnator
BLAST
FastQC
Explanation - CNVnator uses read depth to call copy number variations.
Correct answer is: CNVnator

Q.164 What is the main purpose of performing a 'duplicate marking' step after alignment?

Remove PCR duplicates to avoid bias in variant calling
Trim adapter sequences
Call variants
Align reads to reference
Explanation - Duplicate marking flags redundant reads so that variant callers can ignore them.
Correct answer is: Remove PCR duplicates to avoid bias in variant calling

Q.165 Which of the following best describes 'Read Length' in sequencing?

The length of the DNA fragment in the library
The number of bases in a single sequenced read
The average coverage depth
The number of reads sequenced
Explanation - Read length is the sequence length produced by the instrument per read.
Correct answer is: The number of bases in a single sequenced read

Q.166 What does the 'FASTQ' file format store?

Raw sequencing reads and per-base quality scores
Aligned reads with mapping information
Variant calls
Genome assembly
Explanation - FASTQ includes both sequence and quality information for each read.
Correct answer is: Raw sequencing reads and per-base quality scores

Q.167 Which of these steps is part of the standard NGS workflow before variant calling?

Read trimming and quality filtering
PCR amplification
Genome assembly
Protein structure prediction
Explanation - Trimming removes low-quality bases and adapters before alignment.
Correct answer is: Read trimming and quality filtering

Q.168 Which of the following best describes an 'assembly graph' used in de novo assembly?

A graph of read alignment positions
A graph where nodes represent k-mers and edges represent overlaps
A graph of variant positions
A graph of gene expression levels
Explanation - De Bruijn graphs represent the relationships between k-mers for assembly.
Correct answer is: A graph where nodes represent k-mers and edges represent overlaps