Transcriptomics # MCQs Practice set

Q.1 What does the RNA-Seq technique primarily measure in a cell?

DNA methylation patterns
mRNA abundance
Protein–protein interactions
Cell membrane potential
Explanation - RNA-Seq quantifies the transcriptome by sequencing cDNA, providing counts of mRNA molecules.
Correct answer is: mRNA abundance

Q.2 Which library preparation step removes ribosomal RNA from total RNA?

Poly(A) enrichment
RiboZero depletion
Fragmentation
Adapter ligation
Explanation - RiboZero uses probes that bind rRNA, enabling its removal so other RNAs can be sequenced.
Correct answer is: RiboZero depletion

Q.3 What is the main purpose of aligning RNA-Seq reads to a reference genome?

To assemble new genes
To identify splice junctions
To estimate DNA copy number
To measure protein levels
Explanation - Alignment places reads on the genome and reveals where exons join, indicating splicing events.
Correct answer is: To identify splice junctions

Q.4 Which statistic is commonly used to test differential expression between two conditions?

Pearson correlation
Log fold change
p-value from DESeq2
Euclidean distance
Explanation - DESeq2 models count data and provides p-values (often adjusted) to identify differentially expressed genes.
Correct answer is: p-value from DESeq2

Q.5 What does the FPKM metric represent?

Fragments per kilobase of transcript per million mapped reads
Fragments per kilobase of gene per million reads
Full-length per kilobase per million reads
Functional protein per kilobase per million reads
Explanation - FPKM normalizes read counts by transcript length and sequencing depth.
Correct answer is: Fragments per kilobase of transcript per million mapped reads

Q.6 Which algorithm is specifically designed for de novo transcriptome assembly?

SPAdes
Trinity
BLAST
GATK
Explanation - Trinity constructs transcripts from short reads without a reference genome, ideal for de novo assembly.
Correct answer is: Trinity

Q.7 In single-cell RNA-Seq, what is the primary source of technical noise?

Ambient RNA contamination
Cell cycle variation
Sequencing error rate
PCR amplification bias
Explanation - Amplification can skew representation of transcripts, creating variability unrelated to biology.
Correct answer is: PCR amplification bias

Q.8 What does the 'UMI' stand for in scRNA-Seq protocols?

Unique Molecular Identifier
Universal Marker Index
Unpaired mRNA Index
Unstructured Metagenomic Index
Explanation - UMIs tag each RNA molecule before amplification, allowing accurate quantification by distinguishing duplicates.
Correct answer is: Unique Molecular Identifier

Q.9 Which of the following best describes a 'splicing event' detected by RNA-Seq?

A gene duplication
A transcription start site change
The removal of intronic sequences
A protein-protein interaction
Explanation - Splicing joins exons, removing introns, creating mature mRNA variants.
Correct answer is: The removal of intronic sequences

Q.10 What does a negative log2 fold change in a differential expression analysis indicate?

Higher expression in the control sample
Higher expression in the experimental sample
No change in expression
An error in the analysis
Explanation - A negative log2 fold change means the gene is expressed more in the reference (control) condition.
Correct answer is: Higher expression in the control sample

Q.11 Which sequencing platform is known for producing the longest read lengths?

Illumina HiSeq
PacBio Sequel II
Oxford Nanopore MinION
SOLiD 5500
Explanation - PacBio SMRT sequencing can generate reads exceeding 10 kb, aiding in full-length transcript capture.
Correct answer is: PacBio Sequel II

Q.12 What is the primary advantage of using a paired-end RNA-Seq protocol?

Higher sequencing throughput
Improved mapping accuracy
Reduced library preparation time
Lower sequencing error rate
Explanation - Paired-end reads provide information from both ends of a fragment, improving alignment, especially across splice junctions.
Correct answer is: Improved mapping accuracy

Q.13 Which of the following is NOT a common step in RNA-Seq data preprocessing?

Quality trimming
Adapter removal
Read alignment
Chromatin immunoprecipitation
Explanation - ChIP is unrelated to RNA-Seq; preprocessing focuses on cleaning reads and aligning them.
Correct answer is: Chromatin immunoprecipitation

Q.14 What does the term 'coverage' refer to in the context of RNA-Seq?

Number of genes expressed
Depth of sequencing across the transcriptome
Percentage of the genome covered by reads
Frequency of splice junctions
Explanation - Coverage indicates how many reads map to a region, influencing quantification accuracy.
Correct answer is: Depth of sequencing across the transcriptome

Q.15 Which statistical correction is commonly applied to control the false discovery rate in DEGs analysis?

Bonferroni correction
Benjamini–Hochberg procedure
Tukey's HSD
Friedman test
Explanation - BH adjusts p-values to control the expected proportion of false positives among significant genes.
Correct answer is: Benjamini–Hochberg procedure

Q.16 In bulk RNA-Seq, what does 'biological replicates' refer to?

Repeated sequencing of the same sample
Independent samples from the same experimental group
Technical duplicates within a single sample
Synthetic RNA spike-ins
Explanation - Biological replicates capture natural variation between organisms or cells, essential for robust statistics.
Correct answer is: Independent samples from the same experimental group

Q.17 Which of the following is a key difference between microarrays and RNA-Seq?

RNA-Seq requires pre-designed probes
Microarrays can quantify novel transcripts
RNA-Seq is less sensitive to low-abundance transcripts
Microarrays generate sequence data
Explanation - RNA-Seq’s digital counts and high depth allow detection of transcripts present in low amounts.
Correct answer is: RNA-Seq is less sensitive to low-abundance transcripts

Q.18 What does the 'gene body coverage' metric assess?

Uniformity of read distribution across a gene
Presence of splice variants
GC content bias
Sequencing error rate
Explanation - High gene-body coverage indicates reads are evenly distributed, reflecting library complexity and sequencing biases.
Correct answer is: Uniformity of read distribution across a gene

Q.19 Which tool is commonly used to quantify transcript abundance from RNA-Seq read alignments?

Cufflinks
HTSeq-count
STAR
BWA
Explanation - HTSeq-count tallies reads overlapping annotated exons to produce raw counts per gene.
Correct answer is: HTSeq-count

Q.20 What is a 'barcode' in the context of multiplexed RNA-Seq libraries?

A unique sequence identifying each sample
A fluorescent tag for detection
A DNA sequence that fragments RNA
A computational filter for low-quality reads
Explanation - Barcodes allow pooling multiple samples in one sequencing run, enabling demultiplexing after sequencing.
Correct answer is: A unique sequence identifying each sample

Q.21 Why is strandedness important in RNA-Seq library preparation?

It determines the direction of transcription
It reduces sequencing errors
It increases read length
It simplifies alignment
Explanation - Stranded libraries retain information about which DNA strand the RNA originated from, aiding annotation and detection of overlapping genes.
Correct answer is: It determines the direction of transcription

Q.22 What does the term 'dropout' refer to in single-cell RNA-Seq data?

Loss of a cell during sample preparation
Failure to sequence a read
Zero counts for a gene in a cell due to low capture efficiency
Removal of low-quality reads
Explanation - Dropouts arise when a transcript is not captured or amplified, leading to false zeros.
Correct answer is: Zero counts for a gene in a cell due to low capture efficiency

Q.23 Which of the following best describes a 'batch effect'?

Systematic technical differences between sequencing runs
Biological variation between samples
Random sequencing errors
A computational artifact from alignment
Explanation - Batch effects arise from varying protocols, reagents, or machines, confounding true biological signals.
Correct answer is: Systematic technical differences between sequencing runs

Q.24 What is the purpose of the 'spike-in' controls in RNA-Seq?

To calibrate sequencing machine fluorescence
To provide an external reference for normalization
To sequence DNA instead of RNA
To remove ribosomal RNA
Explanation - Synthetic RNAs of known quantity allow assessment of library preparation efficiency and normalization.
Correct answer is: To provide an external reference for normalization

Q.25 Which type of alternative splicing results in the inclusion of a cassette exon?

Exon skipping
Alternative 5' splice site
Alternative 3' splice site
Mutually exclusive exons
Explanation - Exon skipping is the most common form where an exon is included in some isoforms and omitted in others.
Correct answer is: Exon skipping

Q.26 In the context of RNA-Seq, what does 'pseudoalignment' refer to?

Aligning reads to a reference genome without base-level alignment
Aligning reads to a genome after removing adapters
A method for aligning reads to a reference transcriptome quickly
A technique for aligning reads to a synthetic reference
Explanation - Tools like Kallisto or Salmon use pseudoalignment to estimate transcript abundance efficiently.
Correct answer is: A method for aligning reads to a reference transcriptome quickly

Q.27 Which metric is used to evaluate the precision of a differential expression analysis?

False discovery rate
Sensitivity
Specificity
Positive predictive value
Explanation - PPV measures the proportion of identified DEGs that are truly differentially expressed.
Correct answer is: Positive predictive value

Q.28 What is the main challenge when comparing expression across species using RNA-Seq?

Different sequencing depths
Ortholog mapping ambiguity
Variable ribosomal RNA content
Differences in GC content
Explanation - Accurate cross-species comparisons require reliable identification of orthologous genes, which can be ambiguous.
Correct answer is: Ortholog mapping ambiguity

Q.29 Which computational method is commonly used to cluster single-cell RNA-Seq data?

k-means clustering
Hierarchical clustering
PCA followed by graph-based clustering
All of the above
Explanation - Multiple clustering strategies, often combined with dimensionality reduction, are used to identify cell types.
Correct answer is: All of the above

Q.30 Which type of RNA is primarily captured by 3' end counting methods in single-cell sequencing?

Full-length mRNA
Polyadenylated fragments
Non-coding RNAs
MicroRNAs
Explanation - 3' end methods like Drop‑seq capture the 3' UTR of polyadenylated transcripts for counting.
Correct answer is: Polyadenylated fragments

Q.31 What is the effect of high GC content on RNA-Seq library preparation?

Increased read length
Reduced PCR bias
Increased sequencing error
Improved mapping accuracy
Explanation - High GC regions are more prone to PCR bias and dropouts, leading to uneven coverage.
Correct answer is: Reduced PCR bias

Q.32 Which of the following best defines a 'transcript isoform'?

A different gene
A variant of mRNA produced by alternative splicing
A non-coding RNA
A protein variant
Explanation - Isoforms are distinct mRNA sequences derived from the same gene locus via alternative splicing or promoter usage.
Correct answer is: A variant of mRNA produced by alternative splicing

Q.33 Why is rRNA depletion more suitable for non-mammalian species in RNA-Seq?

They lack poly(A) tails
They have a higher proportion of rRNA
They express fewer mRNAs
They have shorter transcripts
Explanation - Non-mammalian RNAs often lack polyadenylation, making poly(A) selection ineffective.
Correct answer is: They lack poly(A) tails

Q.34 Which of these is a downstream analysis after identifying differential expression?

Gene ontology enrichment
Clustering of samples
Pathway analysis
All of the above
Explanation - All listed analyses help interpret biological meaning of DEGs.
Correct answer is: All of the above

Q.35 What is the primary source of variability in bulk RNA-Seq data?

Biological differences between samples
Sequencing machine errors
Library preparation kit brand
All of the above
Explanation - While technical factors exist, true biological variation is the dominant source in properly controlled experiments.
Correct answer is: Biological differences between samples

Q.36 Which of the following is a common artifact in low-input RNA-Seq libraries?

High duplication rate
Excessive adapter dimers
Uniform coverage across transcripts
Accurate quantification of rare transcripts
Explanation - Low amounts of starting material lead to overamplification and duplicate reads.
Correct answer is: High duplication rate

Q.37 Which database contains annotated transcriptomes for multiple species?

RefSeq
Ensembl
UCSC Genome Browser
All of the above
Explanation - All these resources provide curated transcript annotations across species.
Correct answer is: All of the above

Q.38 Which factor primarily determines the depth of coverage in an RNA-Seq experiment?

Read length
Sequencing throughput
Library complexity
All of the above
Explanation - Depth depends on read length, total output, and how many unique fragments are represented.
Correct answer is: All of the above

Q.39 In bulk RNA-Seq, what does the 'fold change' represent?

The ratio of two expression levels
The difference in read counts
The standard deviation of counts
The logarithm of the expression level
Explanation - Fold change quantifies how many times a gene's expression differs between conditions.
Correct answer is: The ratio of two expression levels

Q.40 Which of the following best describes a 'non-annotated transcript'?

A transcript lacking a known gene symbol
A transcript with unknown function
A transcript found only in cancer
A transcript not expressed in the sample
Explanation - Non-annotated transcripts are novel or poorly characterized RNAs not yet in reference databases.
Correct answer is: A transcript lacking a known gene symbol

Q.41 What is the main purpose of a 'quantification bias' correction?

To adjust for GC bias in read counts
To remove duplicate reads
To align reads more accurately
To increase sequencing depth
Explanation - Bias correction improves quantification by accounting for sequence-dependent coverage differences.
Correct answer is: To adjust for GC bias in read counts

Q.42 Which of the following describes the '3' bias' observed in RNA-Seq libraries?

Enrichment of reads toward the 3' end of transcripts
Preferential amplification of 3' UTRs
Loss of reads at the 3' end during sequencing
Increased error rates at the 3' end
Explanation - Certain library preparations favor sequencing near the poly(A) tail, creating a 3' bias.
Correct answer is: Enrichment of reads toward the 3' end of transcripts

Q.43 What is a 'spike-in normalization' method used for?

Adjusting for differences in sequencing depth
Removing adapter contamination
Correcting mapping errors
Identifying splice junctions
Explanation - Known quantities of spike-ins allow normalization across samples independent of endogenous RNA levels.
Correct answer is: Adjusting for differences in sequencing depth

Q.44 Which of the following is NOT a typical output of RNA-Seq analysis?

Differentially expressed genes
Gene co‑expression modules
Protein‑protein interaction networks
Splicing patterns
Explanation - PPINs are derived from proteomics data; RNA-Seq informs gene expression and splicing.
Correct answer is: Protein‑protein interaction networks

Q.45 What does the term 'read duplication rate' indicate?

Proportion of identical reads indicating PCR duplicates
Number of reads mapping to the same gene
Rate of adapter removal
Sequencing error frequency
Explanation - High duplication rates can indicate overamplification or low library complexity.
Correct answer is: Proportion of identical reads indicating PCR duplicates

Q.46 Which of the following best explains 'ambient RNA contamination' in droplet‑based scRNA-Seq?

Free RNA from lysed cells present in the solution
RNA trapped in droplets during library prep
Sequencing error in cell barcodes
Cross‑talk between fluorescent channels
Explanation - Ambient RNA can be captured in droplets, artificially inflating expression of some genes.
Correct answer is: Free RNA from lysed cells present in the solution

Q.47 Why is it important to include a 'negative control' in RNA-Seq experiments?

To assess library preparation efficiency
To monitor background contamination
To calibrate the sequencer
To provide a reference for normalization
Explanation - Negative controls help identify contaminating sequences not derived from biological samples.
Correct answer is: To monitor background contamination

Q.48 Which of the following is a limitation of 2'-O-methyl RNA probes in hybrid capture?

Lower binding affinity
Increased off‑target capture
Higher cost per probe
Incompatibility with Illumina platforms
Explanation - 2'-O-methyl probes can cross‑hybridize to similar sequences, reducing specificity.
Correct answer is: Increased off‑target capture

Q.49 What does the 'UMI deduplication' step achieve in scRNA-Seq data processing?

Removes PCR duplicates using unique molecule identifiers
Merges reads with identical barcodes
Identifies doublets
Normalizes expression across cells
Explanation - UMIs allow distinguishing unique molecules from amplified copies.
Correct answer is: Removes PCR duplicates using unique molecule identifiers

Q.50 Which statistical model is commonly used by DESeq2 to estimate variance?

Negative binomial distribution
Poisson distribution
Gaussian distribution
Binomial distribution
Explanation - RNA‑Seq counts follow a NB distribution, capturing overdispersion beyond Poisson.
Correct answer is: Negative binomial distribution

Q.51 What is the primary difference between 'bulk' and 'single‑cell' RNA‑Seq?

Sequencing platform
Library complexity
Resolution of expression measurement
Cost per sample
Explanation - Bulk averages across many cells; scRNA‑Seq resolves heterogeneity at single‑cell level.
Correct answer is: Resolution of expression measurement

Q.52 Which of the following best describes 'transcriptome assembly quality' metrics?

Number of contigs longer than 500 bp
N50 value of assembled transcripts
Coverage uniformity
All of the above
Explanation - These metrics assess continuity, completeness, and uniformity of the assembled transcriptome.
Correct answer is: All of the above

Q.53 What is 'read trimming' in the context of RNA‑Seq preprocessing?

Removing low‑quality bases and adapters
Cutting reads to a fixed length
Removing duplicate reads
Aligning reads to a reference
Explanation - Trimming ensures that only high‑confidence bases are used for mapping.
Correct answer is: Removing low‑quality bases and adapters

Q.54 Which of the following tools is designed for visualizing gene expression heatmaps?

Heatmaply
Cytoscape
IGV
FastQC
Explanation - Heatmaply generates interactive heatmaps from gene expression matrices.
Correct answer is: Heatmaply

Q.55 Why is it essential to keep RNA samples on ice during library preparation?

To prevent RNase activity
To maintain enzyme activity
To avoid denaturation of adapters
All of the above
Explanation - RNases are active at room temperature, degrading RNA if samples warm.
Correct answer is: To prevent RNase activity

Q.56 Which of the following best defines a 'read alignment score'?

Number of mismatches in a read
Probability that a read belongs to a given transcript
Sum of alignment penalties and bonuses
The length of the longest match
Explanation - Alignment score aggregates match/mismatch and gap penalties to rank alignments.
Correct answer is: Sum of alignment penalties and bonuses

Q.57 What is the primary benefit of using 'poly(A) selection' over rRNA depletion?

Enrichment of non‑coding RNAs
Higher coverage of low‑abundance transcripts
Simplicity and lower cost
Inclusion of ribosomal RNA sequences
Explanation - Poly(A) selection is straightforward and inexpensive for mRNA enrichment in eukaryotes.
Correct answer is: Simplicity and lower cost

Q.58 What does the term 'sequencing depth' refer to in an RNA-Seq experiment?

Number of bases sequenced per sample
Total number of reads generated
Average read length
Coverage of the genome
Explanation - Depth indicates how many reads are produced, influencing statistical power.
Correct answer is: Total number of reads generated

Q.59 Which of the following best explains the concept of 'allele‑specific expression'?

Expression of only one allele in a heterozygote
Expression of both alleles equally
Expression of multiple alleles
Expression of only the minor allele
Explanation - ASE studies detect imbalanced expression between maternal and paternal alleles.
Correct answer is: Expression of only one allele in a heterozygote

Q.60 Which step is critical to avoid batch effects when processing multiple RNA-Seq libraries?

Randomizing sample processing order
Using the same reagent lot for all samples
Sequencing all libraries in one run
All of the above
Explanation - Standardizing protocols and randomizing helps reduce systematic technical variation.
Correct answer is: All of the above

Q.61 What does the 'coverage uniformity' metric assess in RNA‑Seq data?

Consistency of read depth across different genes
Evenness of read distribution across a single transcript
Coverage of intergenic regions
Coverage of the mitochondrial genome
Explanation - Uniform coverage indicates balanced representation of all transcript regions.
Correct answer is: Evenness of read distribution across a single transcript

Q.62 Which of the following is a key advantage of using a stranded library for RNA‑Seq?

Better detection of antisense transcripts
Reduced sequencing cost
Shorter read lengths
Elimination of ribosomal RNA
Explanation - Stranded libraries preserve directionality, allowing discrimination between sense and antisense RNAs.
Correct answer is: Better detection of antisense transcripts

Q.63 In RNA‑Seq analysis, what is the purpose of a 'gene set enrichment analysis (GSEA)?

To identify differentially expressed genes
To assess whether predefined gene sets show statistically significant differences
To normalize read counts
To map reads to the genome
Explanation - GSEA evaluates pathway-level changes rather than single‑gene tests.
Correct answer is: To assess whether predefined gene sets show statistically significant differences

Q.64 What does the 'GC bias' in RNA‑Seq refer to?

Preference for GC-rich sequences during sequencing
Underrepresentation of GC‑rich transcripts after PCR amplification
Increased sequencing errors in GC‑rich regions
All of the above
Explanation - GC bias leads to uneven coverage, impacting quantification.
Correct answer is: Underrepresentation of GC‑rich transcripts after PCR amplification

Q.65 Which of the following is a typical quality metric reported by FastQC for RNA‑Seq data?

Per‑base sequence quality
Adapter content
Overrepresented sequences
All of the above
Explanation - FastQC assesses multiple facets of raw read quality.
Correct answer is: All of the above

Q.66 What is the primary reason for using 'poly(A) tails' in mRNA library preparation?

To enrich for non‑coding RNAs
To capture only transcripts with polyadenylation
To avoid ribosomal RNA contamination
To improve read length
Explanation - Poly(A) selection isolates mature mRNA while excluding most non‑coding RNAs.
Correct answer is: To capture only transcripts with polyadenylation

Q.67 Which of the following best describes the 'fragment size distribution' in a paired‑end library?

Length of reads produced
Range of DNA fragment lengths before sequencing
Number of fragments per sample
GC content of fragments
Explanation - Fragment size distribution informs library complexity and sequencing strategy.
Correct answer is: Range of DNA fragment lengths before sequencing

Q.68 What does the 'median absolute deviation (MAD)' measure in read count data?

Average read length
Variance of gene expression
Robust measure of spread around the median
Correlation between samples
Explanation - MAD is less influenced by outliers compared to standard deviation.
Correct answer is: Robust measure of spread around the median

Q.69 Which of the following best explains why 'UMI counts' are used instead of raw read counts?

To reduce the effect of PCR duplicates
To increase sequencing depth
To correct for GC bias
To simplify downstream analysis
Explanation - UMIs enable counting unique RNA molecules regardless of amplification duplicates.
Correct answer is: To reduce the effect of PCR duplicates

Q.70 What is the purpose of 'batch correction' in scRNA‑Seq data integration?

To merge data from different donors
To remove technical variability across batches
To identify rare cell types
To increase read depth
Explanation - Batch correction aligns datasets onto a common space, improving clustering.
Correct answer is: To remove technical variability across batches

Q.71 Which of these is a potential source of contamination in RNA‑Seq library prep?

RNase-free water
Contaminated pipette tips
UV‑cross‑linked adapters
Heat‑stable enzymes
Explanation - Contaminants from pipette tips can introduce exogenous RNA or DNA.
Correct answer is: Contaminated pipette tips

Q.72 What does the 'Cohen's kappa' statistic evaluate in transcript annotation?

Agreement between two annotators
Variance of read counts
Proportion of reads mapping to exons
Accuracy of splice junction detection
Explanation - Cohen’s kappa measures inter‑rater reliability for categorical annotations.
Correct answer is: Agreement between two annotators

Q.73 Which of the following best describes 'gene fusions' detected by RNA-Seq?

Transcriptional read-through events
Fusion of two separate genes into one transcript
Alternative splicing within a single gene
Mitochondrial gene recombination
Explanation - Gene fusions produce chimeric RNAs combining exons from different loci, often oncogenic.
Correct answer is: Fusion of two separate genes into one transcript

Q.74 Which of the following is a key feature of the 'Salmon' quantification tool?

Alignment‑free pseudo‑alignment
Built‑in differential expression analysis
Requires full‑length cDNA libraries
Designed for microarray data
Explanation - Salmon uses k‑mer mapping for fast, bias‑aware transcript quantification.
Correct answer is: Alignment‑free pseudo‑alignment

Q.75 Why is it important to keep RNA samples at −80 °C after extraction?

To prevent enzymatic degradation
To preserve RNA integrity
Both A and B
To reduce sequencing costs
Explanation - Low temperatures halt RNase activity and maintain high‑quality RNA.
Correct answer is: Both A and B

Q.76 Which of the following best describes a 'UMI collision'?

Two distinct molecules receiving the same UMI by chance
A UMI being lost during sequencing
A UMI not matching any read
A UMI being too long
Explanation - Collisions can lead to undercounting if distinct molecules share a UMI.
Correct answer is: Two distinct molecules receiving the same UMI by chance

Q.77 What is the main goal of a 'transcript‑level differential expression' analysis?

To identify differences at the gene level
To assess expression changes between isoforms
To estimate read alignment accuracy
To evaluate sequencing depth
Explanation - Transcript‑level analysis captures isoform‑specific regulation.
Correct answer is: To assess expression changes between isoforms

Q.78 Which of the following is NOT a commonly used RNA-Seq normalization method?

TPM (Transcripts Per Million)
FPKM (Fragments Per Kilobase Million)
RPKM (Reads Per Kilobase Million)
UCSC (Uniform Count Scaling)
Explanation - UCSC is a genome browser; UCSC scaling is not a standard RNA‑Seq normalization method.
Correct answer is: UCSC (Uniform Count Scaling)

Q.79 Which of these is a common method for visualizing differential expression results?

MA plot
Venn diagram
Heatmap
All of the above
Explanation - MA plots show log fold change vs mean expression; Venn diagrams compare gene lists; heatmaps display expression patterns.
Correct answer is: All of the above

Q.80 What is a 'splicing junction read'?

A read that spans exon–exon boundaries
A read that maps only to intronic regions
A read that covers the poly(A) tail
A read that aligns to the promoter
Explanation - These reads provide evidence for specific splicing events.
Correct answer is: A read that spans exon–exon boundaries

Q.81 Why might 'adapter dimers' appear in an RNA‑Seq library?

Excessive ligation of adapters
Short fragments that are adapters only
High sequencing depth
Low PCR cycles
Explanation - Adapter dimers are products of adapter–adapter ligation and must be removed.
Correct answer is: Short fragments that are adapters only

Q.82 Which of the following metrics assesses the proportion of reads mapping to exons?

Mapping efficiency
Exon coverage
Exonic rate
Splicing efficiency
Explanation - Exonic rate indicates library enrichment for coding regions.
Correct answer is: Exonic rate

Q.83 In RNA‑Seq, what does a 'low depth' of coverage typically lead to?

Improved detection of rare transcripts
Higher false‑negative rate
Increased sequencing cost
Reduced duplication rate
Explanation - Insufficient reads may miss low‑abundance genes.
Correct answer is: Higher false‑negative rate

Q.84 Which of the following best explains the 'U‑shape' distribution in RNA‑Seq fragment size?

Bias towards short fragments
Uniform distribution across sizes
Preference for both short and long fragments
Exclusion of medium-sized fragments
Explanation - U‑shape arises from suboptimal fragmentation or PCR bias.
Correct answer is: Preference for both short and long fragments

Q.85 What does the 'library complexity' metric represent?

Number of unique fragments in a library
Depth of sequencing
Quality of adapter ligation
Length of reads
Explanation - High library complexity indicates diverse fragments, reducing redundancy.
Correct answer is: Number of unique fragments in a library

Q.86 Which of the following is a feature of the 'featureCounts' program?

Counts reads overlapping genomic features
Aligns reads to a reference genome
Visualizes expression heatmaps
Normalizes read counts
Explanation - featureCounts efficiently assigns reads to exons, genes, or other features.
Correct answer is: Counts reads overlapping genomic features

Q.87 Why is it important to include a 'sequencing spike‑in' of known concentration?

To assess library prep efficiency
To calibrate the sequencing instrument
To generate a mock transcriptome
To increase sequencing depth
Explanation - Spike‑ins provide a reference for normalization and quality control.
Correct answer is: To assess library prep efficiency

Q.88 Which of the following best describes 'isoform switching'?

Change in overall gene expression
Switching between different splice variants
Switching from nuclear to cytoplasmic localization
Switching of transcription start sites
Explanation - Isoform switching refers to differential usage of transcript isoforms across conditions.
Correct answer is: Switching between different splice variants

Q.89 What does the 'poly(A) tail length' measurement inform us about?

mRNA stability
Translation efficiency
Both A and B
Neither A nor B
Explanation - Longer poly(A) tails often correlate with increased stability and translation.
Correct answer is: Both A and B

Q.90 Which of the following best defines a 'transcriptional burst'?

A sudden increase in transcriptional output
A sudden decrease in mRNA degradation
A spike in sequencing error rate
A transient increase in ribosomal RNA
Explanation - Bursting refers to stochastic periods of high transcriptional activity.
Correct answer is: A sudden increase in transcriptional output

Q.91 Why are 'poly(A)+ RNA' libraries often preferred for coding transcript discovery?

They exclude non‑coding RNA
They include all RNA species
They enrich for mature mRNA
They reduce sequencing cost
Explanation - Poly(A)+ selection pulls mature, capped mRNAs, facilitating coding transcript detection.
Correct answer is: They enrich for mature mRNA

Q.92 What is the primary purpose of performing a 'gene set enrichment analysis' on DEGs?

To identify individual genes of interest
To determine enriched biological pathways
To normalize raw counts
To assess read quality
Explanation - GSEA aggregates gene lists to discover overrepresented functional categories.
Correct answer is: To determine enriched biological pathways

Q.93 Which of the following best describes a 'transcriptome reference'?

A complete set of all possible mRNAs in a species
A set of synthetic RNA controls
The DNA sequence of the genome
An annotation of protein structures
Explanation - Transcriptome references include exon boundaries and splice variants used for alignment.
Correct answer is: A complete set of all possible mRNAs in a species

Q.94 Why is 'read length' a factor in detecting splice junctions?

Short reads cannot span exon–exon boundaries
Longer reads increase sequencing errors
Read length does not matter
Short reads are always better
Explanation - Longer reads are more likely to bridge splice sites, improving junction detection.
Correct answer is: Short reads cannot span exon–exon boundaries

Q.95 Which of the following best explains 'UMI diversity'?

Number of unique UMIs in the library
Length of each UMI
Frequency of adapter contamination
Percentage of reads with barcodes
Explanation - High UMI diversity reduces collisions and improves molecule counting.
Correct answer is: Number of unique UMIs in the library

Q.96 Which of the following is NOT an advantage of using 'paired‑end sequencing' over single‑end?

Improved mapping accuracy
Reduced cost
Better detection of insert size
Increased read length
Explanation - Paired‑end sequencing typically costs more but offers mapping and insert size benefits.
Correct answer is: Reduced cost

Q.97 In a bulk RNA‑Seq experiment, why might you choose a 'shallow' sequencing depth?

To detect high‑abundance transcripts only
To reduce costs for large sample numbers
To increase statistical power
To avoid sequencing errors
Explanation - Shallow depth saves money when only broad expression patterns are needed.
Correct answer is: To reduce costs for large sample numbers

Q.98 What is the main advantage of using 'salmon' for transcript quantification?

Requires no alignment
Provides variant calling
Detects splice junctions directly
Only works with Illumina data
Explanation - Salmon uses pseudo‑alignment, dramatically speeding up quantification.
Correct answer is: Requires no alignment

Q.99 Which of the following best describes a 'doublet' in scRNA‑Seq data?

Two cells captured in the same droplet
A read that maps to two loci
A cell that expresses two distinct lineages
An artifact from PCR amplification
Explanation - Doublets confound cell type identification by combining signals from two cells.
Correct answer is: Two cells captured in the same droplet

Q.100 Which of the following is a typical source of 'ambient RNA contamination'?

RNA from lysed cells in the suspension
RNA from the library preparation reagents
RNA from the sequencing machine
RNA from the barcode oligos
Explanation - Ambient RNA can be inadvertently captured in droplets, creating background signal.
Correct answer is: RNA from lysed cells in the suspension

Q.101 What does the 'sequencing error rate' influence in RNA‑Seq data?

Mapping accuracy
Duplication rate
Library complexity
Read length
Explanation - Higher error rates reduce correct alignments and inflate mismatches.
Correct answer is: Mapping accuracy

Q.102 Why are 'negative controls' important in RNA‑Seq experiments?

To detect cross‑contamination
To provide a baseline for normalization
To evaluate library size
All of the above
Explanation - Negative controls help identify technical artifacts and set baselines.
Correct answer is: All of the above

Q.103 Which of the following is a benefit of using a 'single‑cell' approach over bulk RNA‑Seq?

Higher throughput
Reduced cost
Detection of cellular heterogeneity
Simplified data analysis
Explanation - scRNA‑Seq resolves differences among individual cells, revealing subpopulations.
Correct answer is: Detection of cellular heterogeneity

Q.104 What is the purpose of 'gene ontology (GO) analysis' after differential expression?

To identify enriched biological functions
To normalize expression levels
To align reads to the genome
To quantify isoforms
Explanation - GO analysis groups genes into functional categories to interpret DEGs.
Correct answer is: To identify enriched biological functions

Q.105 Which of the following best explains a 'technical replicate'?

Repetition of the entire experiment with a new sample
Repeated library preparation from the same sample
A second sequencing run of the same library
A duplicate RNA extraction
Explanation - Technical replicates assess consistency of library prep and sequencing.
Correct answer is: Repeated library preparation from the same sample

Q.106 Why is it important to use a 'unique barcode' for each cell in droplet‑based scRNA‑Seq?

To identify which cell a read originated from
To increase read length
To reduce sequencing errors
To prevent adapter dimers
Explanation - Barcodes allow demultiplexing of reads to their cell of origin.
Correct answer is: To identify which cell a read originated from

Q.107 Which of the following metrics assesses the proportion of reads that map uniquely to the genome?

Mapping rate
Duplicate rate
Insert size distribution
Read length
Explanation - Mapping rate indicates how many reads successfully align to the reference.
Correct answer is: Mapping rate

Q.108 What is the key difference between 'short‑read' and 'long‑read' RNA sequencing?

Short reads are cheaper
Long reads can span entire transcripts
Short reads provide higher accuracy
Long reads cannot be used for splicing analysis
Explanation - Long reads capture full-length isoforms, facilitating isoform discovery.
Correct answer is: Long reads can span entire transcripts

Q.109 Which of the following best describes 'allele‑specific expression (ASE) analysis'?

Comparison of expression between two alleles within the same individual
Measurement of gene expression across individuals
Identification of rare variants
Normalization of read counts
Explanation - ASE assesses differential usage of maternal vs paternal alleles.
Correct answer is: Comparison of expression between two alleles within the same individual

Q.110 In RNA‑Seq, why is it useful to keep a 'duplicate removal' step after alignment?

To reduce computational load
To correct for PCR duplicates
To improve read quality
To increase mapping rate
Explanation - Removing duplicates ensures counts reflect unique molecules, not amplification artifacts.
Correct answer is: To correct for PCR duplicates

Q.111 Which of the following best defines the 'knee plot' in scRNA‑Seq?

Plot of cell viability vs time
Plot showing the number of reads per cell to determine cutoff
Plot of gene expression vs gene length
Plot of sequencing error vs read length
Explanation - The knee plot helps select a threshold to separate real cells from empty droplets.
Correct answer is: Plot showing the number of reads per cell to determine cutoff

Q.112 Which of the following tools is used to identify fusion transcripts from RNA‑Seq data?

STAR-Fusion
DESeq2
featureCounts
FastQC
Explanation - STAR-Fusion uses STAR alignments to detect chimeric junctions indicative of fusions.
Correct answer is: STAR-Fusion

Q.113 What is a 'transcriptome assembly'?

A set of assembled protein structures
A collection of genomic contigs
A reconstruction of transcripts from sequencing reads
A list of ribosomal RNA genes
Explanation - De novo assembly builds transcripts without a reference genome.
Correct answer is: A reconstruction of transcripts from sequencing reads

Q.114 Which of the following best explains the 'UMI count matrix'?

Matrix of unique barcodes per cell
Matrix of UMI counts per gene per cell
Matrix of read lengths
Matrix of adapter sequences
Explanation - This matrix is used for downstream scRNA‑Seq analysis, reflecting unique mRNA molecules.
Correct answer is: Matrix of UMI counts per gene per cell

Q.115 Which of the following best characterizes 'poly(A) tail length variation' in RNA‑Seq?

Uniform across all transcripts
Randomly distributed
Correlated with mRNA stability
Irrelevant to gene expression
Explanation - Longer poly(A) tails often indicate more stable transcripts.
Correct answer is: Correlated with mRNA stability

Q.116 Which of the following is a common approach to mitigate 'batch effects' in RNA‑Seq data?

Randomizing sample processing order
Using the same sequencing platform for all samples
Applying batch correction algorithms like ComBat
All of the above
Explanation - Standardization and computational correction together reduce batch-induced bias.
Correct answer is: All of the above