Programming for Bioinformatics # MCQs Practice set

Q.1 Which Python library is commonly used to parse FASTA files in bioinformatics?

NumPy
Biopython
Pandas
Matplotlib
Explanation - Biopython provides tools such as SeqIO for reading and writing FASTA files.
Correct answer is: Biopython

Q.2 What does the BLAST algorithm primarily compare?

Protein structures
DNA sequences
Gene expression levels
Metabolic pathways
Explanation - BLAST (Basic Local Alignment Search Tool) compares nucleotide or protein sequences to find regions of local similarity.
Correct answer is: DNA sequences

Q.3 In a phylogenetic tree, what does a longer branch typically indicate?

Higher mutation rate
Shorter evolutionary distance
Lower sequence similarity
More recent common ancestor
Explanation - Longer branches represent greater evolutionary changes, indicating a higher mutation rate or longer time.
Correct answer is: Higher mutation rate

Q.4 Which of the following is NOT a commonly used sequence alignment metric?

Percent identity
E-value
GC content
Alignment score
Explanation - GC content refers to the proportion of G and C bases in a sequence, not a metric of alignment quality.
Correct answer is: GC content

Q.5 In Python, which function is used to shuffle a list randomly?

random.shuffle()
random.sample()
list.shuffle()
shuffle()
Explanation - The random.shuffle() function shuffles a list in place.
Correct answer is: random.shuffle()

Q.6 What type of data structure is a FASTA file?

Binary tree
Linked list
Text file format
Matrix
Explanation - FASTA is a plain text format with header lines starting with '>' followed by sequence lines.
Correct answer is: Text file format

Q.7 Which Python package is ideal for manipulating biological data frames?

NumPy
SciPy
Pandas
Plotly
Explanation - Pandas provides DataFrame structures useful for handling tabular bioinformatics data.
Correct answer is: Pandas

Q.8 The 'E-value' in BLAST indicates:

The evolutionary distance between sequences
The number of mismatches
The expected number of hits by chance
The alignment length
Explanation - E-value estimates how many matches could be found by chance, lower values indicate more significant hits.
Correct answer is: The expected number of hits by chance

Q.9 Which command in the Linux shell lists all files, including hidden ones?

ls -a
ls -l
ls -h
ls -R
Explanation - The '-a' flag includes hidden files starting with a dot.
Correct answer is: ls -a

Q.10 What is the main purpose of using a 'quality score' in next-generation sequencing data?

To determine the read length
To indicate the confidence of each base call
To assign a color code to sequences
To measure GC content
Explanation - Quality scores reflect the probability that a base call is incorrect.
Correct answer is: To indicate the confidence of each base call

Q.11 Which Python library would you use for machine learning in genomics?

Scikit-learn
OpenCV
PyTorch
TensorFlow
Explanation - Scikit-learn offers a variety of ML algorithms suitable for bioinformatics tasks.
Correct answer is: Scikit-learn

Q.12 In a multiple sequence alignment, a column with no gaps or mismatches is called a:

Consensus column
Anchor column
Gap column
Polymorphic column
Explanation - A consensus column shows identical residues across all sequences.
Correct answer is: Consensus column

Q.13 Which data type in R is used to store a sequence of DNA nucleotides?

integer
factor
DNAString
matrix
Explanation - DNAString from the Biostrings package represents nucleotide sequences in R.
Correct answer is: DNAString

Q.14 What does the acronym 'RNA-Seq' stand for?

Random Nucleotide Analysis Sequencing
Rapid Nucleotide Amplification Sequencing
RNA Sequencing
Rescue Nucleic Acid Sequencing
Explanation - RNA-Seq refers to high-throughput sequencing of RNA transcripts.
Correct answer is: RNA Sequencing

Q.15 Which algorithm is most suitable for constructing phylogenies based on distance matrices?

Maximum likelihood
Neighbor-Joining
BLAST
Hidden Markov Models
Explanation - Neighbor-Joining is a distance‑based method for phylogenetic tree reconstruction.
Correct answer is: Neighbor-Joining

Q.16 In the context of gene expression analysis, what is a 'heatmap' used for?

To display sequence alignment
To visualize differential expression across samples
To plot GC content
To show phylogenetic trees
Explanation - Heatmaps represent expression levels with color gradients, aiding in pattern recognition.
Correct answer is: To visualize differential expression across samples

Q.17 Which command is used to convert a FASTQ file to FASTA format using seqtk?

seqtk seq -a input.fastq > output.fasta
seqtk convert -f input.fastq -t fasta
seqtk fq2fa input.fastq output.fasta
seqtk format -fasta input.fastq
Explanation - The '-a' flag tells seqtk to output in FASTA format.
Correct answer is: seqtk seq -a input.fastq > output.fasta

Q.18 Which of the following best describes the 'p-value' in differential gene expression?

Probability of observing the data if the null hypothesis is true
Proportion of reads mapped to a gene
Length of a transcript
Number of genes expressed
Explanation - The p-value indicates the likelihood that observed differences occurred by chance.
Correct answer is: Probability of observing the data if the null hypothesis is true

Q.19 Which programming language is NOT typically used in bioinformatics pipelines?

Python
Java
Bash
MATLAB
Explanation - While MATLAB can be used, it is less common than Python, Java, or Bash in bioinformatics.
Correct answer is: MATLAB

Q.20 What does 'GC content' refer to in a DNA sequence?

The number of G and C bases divided by total bases
The number of G bases only
The number of C bases only
The ratio of A to T bases
Explanation - GC content is calculated as (G+C)/total nucleotides, expressed as a percentage.
Correct answer is: The number of G and C bases divided by total bases

Q.21 Which algorithm is commonly used for motif discovery in DNA sequences?

K-means clustering
MEME
Gaussian Mixture Models
Random Forest
Explanation - MEME (Multiple EM for Motif Elicitation) identifies statistically significant motifs.
Correct answer is: MEME

Q.22 Which of the following is a typical input for a Hidden Markov Model in protein family classification?

RNA structure files
Protein sequence alignments
DNA methylation data
Chromosome conformation capture data
Explanation - HMMs model sequence patterns in alignments to predict protein families.
Correct answer is: Protein sequence alignments

Q.23 What is the primary function of the 'samtools view' command?

Compress BAM files
Convert SAM to BAM
Filter alignments by mapping quality
Generate coverage plots
Explanation - samtools view can filter reads by flags, quality, and other criteria.
Correct answer is: Filter alignments by mapping quality

Q.24 In R, which function from the Bioconductor package 'DESeq2' is used to normalize count data?

estimateSizeFactors
normalizeCounts
preprocessInput
scaleData
Explanation - estimateSizeFactors() calculates normalization factors for sequencing depth.
Correct answer is: estimateSizeFactors

Q.25 Which of the following best describes a 'single‑cell RNA‑seq' experiment?

Sequencing DNA from a single organism
Sequencing RNA from individual cells
Sequencing proteins in a bulk sample
Sequencing a single gene
Explanation - Single‑cell RNA‑seq captures transcriptomes at the resolution of single cells.
Correct answer is: Sequencing RNA from individual cells

Q.26 In Python, how do you open a file for reading?

open('file.txt', 'w')
open('file.txt', 'r')
open('file.txt', 'x')
open('file.txt', 'a')
Explanation - The 'r' mode opens a file for reading.
Correct answer is: open('file.txt', 'r')

Q.27 What does 'ORF' stand for in genetics?

Open Reading Frame
Oligonucleotide Receptor Factor
Overall Ribonucleotide Frequency
Optimized Reverse Function
Explanation - ORF refers to a continuous sequence of codons that could encode a protein.
Correct answer is: Open Reading Frame

Q.28 Which of the following is an example of a k‑mer?

AGTC
ATGCG
GATC
TAA
Explanation - A k‑mer is a substring of length k; AGTC is a 4‑mer.
Correct answer is: AGTC

Q.29 In a phylogenetic tree, what does a bootstrap value represent?

Confidence level of a branch
Number of species in the tree
Length of the branch
Mutation rate
Explanation - Bootstrap values estimate statistical support for tree branches.
Correct answer is: Confidence level of a branch

Q.30 Which command is used to extract reads mapped to chromosome 12 from a BAM file?

samtools view -h input.bam chr12 > chr12.bam
samtools index input.bam chr12
samtools filter -r 12 input.bam
samtools extract -c 12 input.bam
Explanation - The 'view' command with chromosome name selects reads from that region.
Correct answer is: samtools view -h input.bam chr12 > chr12.bam

Q.31 What is the purpose of the 'gzip' command in a bioinformatics pipeline?

To create a backup archive
To compress files for storage
To decompress FASTQ files only
To convert text to binary
Explanation - gzip reduces file size, commonly used for large sequencing files.
Correct answer is: To compress files for storage

Q.32 Which of the following is NOT a type of variant called by GATK?

SNP
Insertion
Deletion
Chromosome translocation
Explanation - GATK detects SNPs, indels, but not structural variants like translocations.
Correct answer is: Chromosome translocation

Q.33 In Python, which library provides tools for working with genomic intervals?

Biopython
pandas
pysam
pybedtools
Explanation - pybedtools interfaces with BEDTools for genomic interval operations.
Correct answer is: pybedtools

Q.34 Which of these metrics is commonly used to assess clustering quality in unsupervised gene expression analysis?

Silhouette score
p-value
GC content
E-value
Explanation - Silhouette score measures how similar an object is to its own cluster compared to other clusters.
Correct answer is: Silhouette score

Q.35 What does the 'MAFFT' program primarily do?

Align multiple protein or nucleotide sequences
Perform phylogenetic tree reconstruction
Predict secondary structure
Cluster gene expression data
Explanation - MAFFT is a multiple sequence alignment tool.
Correct answer is: Align multiple protein or nucleotide sequences

Q.36 Which of the following best describes a 'contig' in genome assembly?

A single DNA fragment from a plasmid
An assembled sequence from overlapping reads
A region of low coverage
A gap between scaffolds
Explanation - Contigs are contiguous sequences assembled from overlapping reads.
Correct answer is: An assembled sequence from overlapping reads

Q.37 In R, what function from the 'ggplot2' package is used to create a scatter plot?

geom_bar()
geom_line()
geom_point()
geom_histogram()
Explanation - geom_point() plots points for scatter plots.
Correct answer is: geom_point()

Q.38 Which type of filter is commonly used to remove low‑quality reads based on quality scores?

Median filter
Low‑pass filter
Quality score filter
High‑pass filter
Explanation - Reads are filtered by minimum per‑base or average quality thresholds.
Correct answer is: Quality score filter

Q.39 What does the 'trim_galore' tool do?

Trims adapters and low‑quality ends from sequencing reads
Aligns reads to a reference genome
Compresses FASTQ files
Converts FASTQ to BAM
Explanation - Trim Galore is a wrapper around Cutadapt for adapter trimming.
Correct answer is: Trims adapters and low‑quality ends from sequencing reads

Q.40 Which of the following is a property of a 'protein motif'?

A specific DNA sequence
A conserved pattern of amino acids
A gene regulatory network
A chromatin state
Explanation - Protein motifs are short, conserved sequences that often indicate functional domains.
Correct answer is: A conserved pattern of amino acids

Q.41 What is the main function of the 'cutadapt' program?

Assemble genomes
Trim adapters from sequencing reads
Align reads to a reference
Perform differential expression analysis
Explanation - Cutadapt removes adapter sequences and low‑quality bases.
Correct answer is: Trim adapters from sequencing reads

Q.42 Which of these is a commonly used file format for storing gene annotations?

FASTA
SAM
GTF
BED
Explanation - GTF (Gene Transfer Format) contains gene feature annotations.
Correct answer is: GTF

Q.43 Which command is used to count the number of reads in a FASTQ file using awk?

awk '{print NR}' file.fastq | wc -l
awk 'NR % 4 == 0' file.fastq | wc -l
awk '{print $1}' file.fastq | wc -l
awk '/^@/{print $0}' file.fastq | wc -l
Explanation - Each FASTQ record consists of 4 lines; counting lines divisible by 4 gives read count.
Correct answer is: awk 'NR % 4 == 0' file.fastq | wc -l

Q.44 What is the purpose of a 'phylogenetic bootstrap analysis'?

To estimate mutation rates
To test the robustness of tree branches
To align sequences
To find conserved motifs
Explanation - Bootstrapping resamples data to assess confidence in tree topology.
Correct answer is: To test the robustness of tree branches

Q.45 Which Python package is useful for visualizing genomic data tracks?

Matplotlib
pyGenomeTracks
NumPy
SciPy
Explanation - pyGenomeTracks renders genome browser‑style tracks programmatically.
Correct answer is: pyGenomeTracks

Q.46 In the context of next‑generation sequencing, what does 'paired‑end' refer to?

Two independent samples
Sequencing reads from both ends of a DNA fragment
Two types of base calling
Read pairing with adapter sequences
Explanation - Paired‑end sequencing generates two reads per fragment, one from each end.
Correct answer is: Sequencing reads from both ends of a DNA fragment

Q.47 Which of the following commands converts a SAM file to BAM and sorts it?

samtools sort -o output.bam input.sam
samtools view -bS input.sam | samtools sort -o output.bam
samtools convert input.sam output.bam
samtools index input.sam output.bam
Explanation - This pipeline first converts SAM to BAM then sorts the BAM file.
Correct answer is: samtools view -bS input.sam | samtools sort -o output.bam

Q.48 Which R function is used to read a FASTQ file into a Biostrings object?

readDNAStringSet()
readFastq()
readDNAFile()
readSequence()
Explanation - readFastq() from Biostrings imports FASTQ files as DNAStringSet.
Correct answer is: readFastq()

Q.49 In a variant call format (VCF) file, which field stores the genotype of an individual?

REF
ALT
INFO
FORMAT
Explanation - FORMAT defines genotype fields, such as GT, AD, DP.
Correct answer is: FORMAT

Q.50 Which command is used to generate a de novo assembly using SPAdes?

spades.py -1 reads_1.fq -2 reads_2.fq -o assembly
spades -assemble -reads reads.fq -output assembly
spades --assemble -i reads.fq -o assembly
spades assembly -i reads.fq -o assembly
Explanation - SPAdes requires paired‑end input via '-1' and '-2', and specifies output dir with '-o'.
Correct answer is: spades.py -1 reads_1.fq -2 reads_2.fq -o assembly

Q.51 Which of the following best describes a 'coverage depth' metric?

Number of unique sequences in a dataset
Average number of times a base is read
Length of the longest read
Percentage of reads mapping to the reference
Explanation - Coverage depth is the mean read depth across a genomic region.
Correct answer is: Average number of times a base is read

Q.52 What is the purpose of using 'indel realignment' during variant calling?

To remove duplicate reads
To correct mis‑aligned reads around insertions/deletions
To convert BAM to FASTQ
To filter by quality score
Explanation - Indel realignment reduces false SNV calls near indels.
Correct answer is: To correct mis‑aligned reads around insertions/deletions

Q.53 Which Python library is used for working with graph data structures in bioinformatics?

NetworkX
Pillow
OpenCV
PyTorch
Explanation - NetworkX provides graph algorithms useful for network biology.
Correct answer is: NetworkX

Q.54 Which command extracts the header lines from a FASTQ file?

awk '/^@/{print}' file.fastq
grep '^@' file.fastq
awk 'NR%4==1' file.fastq
All of the above
Explanation - All listed commands correctly capture header lines beginning with '@'.
Correct answer is: All of the above

Q.55 What does the 'MACS2' software do in ChIP‑seq data analysis?

Call peaks of enriched DNA regions
Align reads to reference genome
Normalize read counts
Predict transcription factor binding sites
Explanation - MACS2 identifies statistically significant enrichment peaks.
Correct answer is: Call peaks of enriched DNA regions

Q.56 In Python, which method of a pandas DataFrame returns the mean of numeric columns?

sum()
mean()
average()
count()
Explanation - DataFrame.mean() computes column‑wise arithmetic mean.
Correct answer is: mean()

Q.57 Which of the following is NOT a valid base in RNA sequencing reads?

A
C
G
T
Explanation - RNA uses uracil (U) instead of thymine (T).
Correct answer is: T

Q.58 What is the primary advantage of using 'single‑molecule real‑time (SMRT)' sequencing?

Short read length
Long read length
Lower cost
Higher error rate only
Explanation - SMRT sequencing generates reads exceeding 10 kb.
Correct answer is: Long read length

Q.59 Which R package provides tools for differential expression analysis of RNA‑seq data?

DESeq2
ggplot2
dplyr
tidyr
Explanation - DESeq2 models count data to test for differential expression.
Correct answer is: DESeq2

Q.60 What does the 'samtools flagstat' command output?

Alignment quality scores
Statistics of reads (mapped, unmapped)
Base composition
Coverage histogram
Explanation - Flagstat provides a quick summary of alignment statistics.
Correct answer is: Statistics of reads (mapped, unmapped)

Q.61 Which of the following describes a 'motif discovery algorithm' in DNA sequences?

A tool that predicts gene structure
A method to find statistically over‑represented patterns
A software that aligns reads
A pipeline for assembly
Explanation - Motif discovery seeks common motifs within a set of sequences.
Correct answer is: A method to find statistically over‑represented patterns

Q.62 What is the role of 'hash tables' in bioinformatics data processing?

Storing large matrices efficiently
Facilitating quick look‑ups of sequence identifiers
Plotting gene expression heatmaps
Compressing genomic data
Explanation - Hash tables provide constant‑time access to keys like sequence IDs.
Correct answer is: Facilitating quick look‑ups of sequence identifiers

Q.63 Which of these file extensions is commonly used for compressed FASTQ files?

.fq.gz
.sam.gz
.bam.gz
.vcf.gz
Explanation - Compressed FASTQ files use the .fq.gz extension.
Correct answer is: .fq.gz

Q.64 In a VCF file, what does the 'AF' field represent?

Allele frequency in the sample population
Alignment score
Average coverage
Alternate allele count
Explanation - AF stands for allele frequency, indicating variant prevalence.
Correct answer is: Allele frequency in the sample population

Q.65 Which command line utility is used to merge multiple BAM files?

samtools merge
samtools cat
samtools combine
samtools concat
Explanation - samtools merge combines BAM files into a single sorted BAM.
Correct answer is: samtools merge

Q.66 What is a 'kmer count table' used for in metagenomics?

Estimating genome size
Comparing read quality
Building phylogenetic trees
Storing gene annotations
Explanation - Kmer frequency distributions help infer genome size and complexity.
Correct answer is: Estimating genome size

Q.67 Which Python module provides a 'deque' data structure useful for sliding windows?

collections
numpy
pandas
os
Explanation - collections.deque is a double‑ended queue ideal for windowed operations.
Correct answer is: collections

Q.68 What does the 'fastqc' tool evaluate in sequencing data?

Assembly quality
Read quality metrics such as per‑base sequence quality
Variant calling accuracy
Phylogenetic tree reliability
Explanation - FastQC reports on many metrics, including per‑base quality and GC bias.
Correct answer is: Read quality metrics such as per‑base sequence quality

Q.69 Which of the following is NOT a type of alignment score in sequence alignment?

Bit score
E-value
Percent identity
Alignment length
Explanation - Alignment length is a parameter, not a score metric.
Correct answer is: Alignment length

Q.70 In a gene‑expression heatmap, what does a darker color typically represent?

Low expression
High expression
Average expression
No expression
Explanation - Heatmaps often use a color gradient where darker shades indicate higher values.
Correct answer is: High expression

Q.71 What is the main output of the 'bwa mem' command?

FASTA file
SAM file
VCF file
BAM file
Explanation - BWA mem produces a SAM alignment file, which can be converted to BAM.
Correct answer is: SAM file

Q.72 Which R function is used to perform a principal component analysis (PCA) on expression data?

prcomp()
pca()
princomp()
pca_analysis()
Explanation - prcomp() is the base R function for PCA.
Correct answer is: prcomp()

Q.73 What does a 'strand‑specific' RNA‑seq library preserve?

Sense strands
Both sense and antisense strands equally
Only antisense strands
No strand information
Explanation - Strand‑specific protocols retain the original transcript strand information.
Correct answer is: Sense strands

Q.74 Which of these is a key metric for evaluating a de novo assembly?

GC content
N50 value
E-value
Alignment score
Explanation - N50 indicates the contig length where half the assembly is in contigs of that size or larger.
Correct answer is: N50 value

Q.75 Which Python function calculates the Levenshtein distance between two strings?

difflib.SequenceMatcher.distance()
python-Levenshtein.distance()
distance()
levenshtein()
Explanation - The python-Levenshtein library provides an efficient distance calculation.
Correct answer is: python-Levenshtein.distance()

Q.76 In a phylogenetic tree, what does a 'polytomy' indicate?

A node with multiple descendant branches
A node with no branches
A node with exactly two branches
A node with a single descendant
Explanation - A polytomy represents unresolved branching order.
Correct answer is: A node with multiple descendant branches

Q.77 Which command in Linux lists the number of lines in a file?

ls -l
wc -l
cat file | wc -l
Both b and c
Explanation - wc -l counts lines; cat file | wc -l is a common pipeline.
Correct answer is: Both b and c

Q.78 Which of these is NOT a common output of a metagenomic assembler?

Scaffold sequences
Contig sequences
Reference genomes
Binning assignments
Explanation - Assemblers produce contigs/scaffolds; reference genomes are not directly output.
Correct answer is: Reference genomes

Q.79 In Python, which data type is most suitable for storing a sequence of nucleotides?

list
tuple
str
dict
Explanation - Strings efficiently hold DNA/RNA sequences.
Correct answer is: str

Q.80 What does the 'awk NF' command do when applied to a FASTQ file?

Prints only lines with a field count greater than zero
Prints header lines
Prints lines containing 'N'
Prints lines with even number of fields
Explanation - NF is the number of fields; awk NF prints non‑empty lines.
Correct answer is: Prints only lines with a field count greater than zero

Q.81 Which R function is used to write a VCF file from a VariantAnnotation object?

writeVcf()
vcfWrite()
saveVcf()
exportVcf()
Explanation - writeVcf() writes VariantAnnotation objects to disk.
Correct answer is: writeVcf()

Q.82 What is the primary purpose of a 'reference genome' in alignment?

To serve as a target for mapping reads
To generate sequencing adapters
To store variant calls
To provide annotation data
Explanation - Reads are aligned against a reference genome to determine their genomic positions.
Correct answer is: To serve as a target for mapping reads

Q.83 Which of the following commands removes duplicate reads in a BAM file using Picard?

picard MarkDuplicates I=input.bam O=dedup.bam
picard MarkDuplicates I=input.bam O=dedup.bam REMOVE_DUPLICATES=true
picard MarkDuplicates INPUT=input.bam OUTPUT=dedup.bam
All of the above
Explanation - All three syntaxes are acceptable Picard command variants.
Correct answer is: All of the above

Q.84 In bioinformatics, what does 'GC skew' measure?

The ratio of GC to AT content
The imbalance of G versus C along the genome
The GC content across different species
The GC content in a single read
Explanation - GC skew (G-C)/(G+C) indicates strand asymmetry.
Correct answer is: The imbalance of G versus C along the genome

Q.85 Which of the following is a common step in a RNA‑seq differential expression pipeline?

Read trimming
Quality filtering
Alignment to reference
All of the above
Explanation - RNA‑seq workflows typically involve trimming, quality filtering, and alignment.
Correct answer is: All of the above

Q.86 What is the output format of the 'samtools mpileup' command?

VCF
SAM
BAM
BED
Explanation - mpileup produces a VCF‑style output summarizing base calls.
Correct answer is: VCF

Q.87 Which algorithm is used by HMMER to detect protein domains?

Hidden Markov Models
Dynamic programming
Smith‑Waterman
BLASTP
Explanation - HMMER employs HMMs to model sequence families.
Correct answer is: Hidden Markov Models

Q.88 In Python, how do you open a file and read all lines into a list?

open('file.txt').readlines()
open('file.txt', 'r').readlines()
readlines('file.txt')
both a and b
Explanation - Both syntaxes read all lines; specifying 'r' is optional.
Correct answer is: both a and b

Q.89 Which of the following best describes a 'pseudogene'?

An active gene producing functional proteins
A non‑coding RNA gene
A gene that has lost its function due to mutations
A gene with multiple splice variants
Explanation - Pseudogenes are remnants of genes that no longer produce functional products.
Correct answer is: A gene that has lost its function due to mutations

Q.90 What is a 'transcriptome'?

The complete set of proteins in a cell
The complete set of DNA in a cell
The complete set of RNA transcripts in a cell
The set of all metabolites
Explanation - Transcriptome refers to all RNA molecules transcribed from the genome.
Correct answer is: The complete set of RNA transcripts in a cell

Q.91 Which of these commands extracts reads with a MAPQ score above 30?

samtools view -q 30 input.bam > highq.bam
samtools view -q 30 input.bam | samtools sort -o highq.bam
samtools view -h input.bam | awk '$5>=30' > highq.bam
All of the above
Explanation - All three methods filter by MAPQ >=30.
Correct answer is: All of the above

Q.92 What is the purpose of using 'multi‑qc' in a sequencing pipeline?

Generate a single QC report from multiple samples
Compress data files
Align reads to multiple references
Call variants
Explanation - multi‑qc aggregates QC metrics from FastQC and other tools into one report.
Correct answer is: Generate a single QC report from multiple samples

Q.93 Which R package is commonly used to plot genomic tracks like coverage or SNP density?

ggplot2
Gviz
tidyverse
data.table
Explanation - Gviz creates genome browsers‑style plots in R.
Correct answer is: Gviz

Q.94 In a FASTQ file, what does the '+' line signify?

Quality string delimiter
Sequence identifier repeat
Start of next record
End of file
Explanation - The '+' line separates sequence from its quality string.
Correct answer is: Quality string delimiter

Q.95 Which command extracts only unique reads from a BAM file?

samtools rmdup
samtools markdup -r
samtools dedup
samtools dedupe
Explanation - samtools rmdup removes duplicate reads based on alignment coordinates.
Correct answer is: samtools rmdup

Q.96 Which of the following best describes a 'gene ontology (GO)' term?

A type of DNA sequencing technology
A standardized description of gene functions
A tool for sequence alignment
A file format for variant calls
Explanation - GO terms categorize gene functions into biological processes, molecular functions, and cellular components.
Correct answer is: A standardized description of gene functions

Q.97 In Python, what does the 'pandas.read_csv()' function return?

A list
A DataFrame
A Series
A dictionary
Explanation - read_csv() reads tabular data into a pandas DataFrame.
Correct answer is: A DataFrame

Q.98 Which of these is a common method for normalizing RNA‑seq read counts?

RPKM
TPM
FPKM
All of the above
Explanation - All are normalization methods adjusting for gene length and sequencing depth.
Correct answer is: All of the above

Q.99 What does the 'GATK HaplotypeCaller' do?

Calls SNPs and indels from aligned reads
Aligns reads to the reference genome
Creates a reference assembly
Generates phylogenetic trees
Explanation - HaplotypeCaller performs local re‑assembly for accurate variant calling.
Correct answer is: Calls SNPs and indels from aligned reads

Q.100 Which of the following best describes 'phasing' in genetics?

Determining the sequence of nucleotides
Assigning variants to their parental origin
Calculating GC content
Sorting reads by quality
Explanation - Phasing reconstructs which variants co‑occur on the same chromosome.
Correct answer is: Assigning variants to their parental origin

Q.101 In a BLAST search, which parameter directly affects the length of the query region used in the alignment?

E-value
Word size
Gap penalty
Scoring matrix
Explanation - Word size defines the length of exact matches that seed alignments.
Correct answer is: Word size

Q.102 Which of the following commands would you use to convert a SAM file to a sorted BAM file using samtools?

samtools view -bS input.sam | samtools sort -o sorted.bam
samtools view input.sam | samtools sort -o sorted.bam
samtools sort input.sam -o sorted.bam
samtools convert -b input.sam -o sorted.bam
Explanation - The pipeline first converts to BAM and then sorts it.
Correct answer is: samtools view -bS input.sam | samtools sort -o sorted.bam

Q.103 What is the purpose of a 'reference panel' in population genetics?

To provide a set of known variants for imputation
To store RNA‑seq reads
To align sequencing data
To visualize phylogenies
Explanation - Reference panels contain variant data used for genotype imputation.
Correct answer is: To provide a set of known variants for imputation

Q.104 Which R package is used for clustering analysis of gene expression data?

cluster
clusterProfiler
stats
gplots
Explanation - The cluster package provides hierarchical clustering utilities.
Correct answer is: cluster

Q.105 Which of the following best describes a 'scoring matrix' in sequence alignment?

A file containing base frequencies
A table assigning scores to residue pairs
A list of alignment scores
A graphical representation of alignments
Explanation - Scoring matrices like BLOSUM or PAM assign scores to substitutions.
Correct answer is: A table assigning scores to residue pairs

Q.106 In Python, how would you import the 'pandas' library?

import pandas
include pandas
use pandas
require pandas
Explanation - The import statement loads the pandas module.
Correct answer is: import pandas

Q.107 Which command calculates the GC skew across a genome in a sliding window?

skewfinder -g genome.fasta
skew -g genome.fasta
skewfinder genome.fasta
skew genome.fasta
Explanation - skewfinder is a tool that computes GC and AT skew in windows.
Correct answer is: skewfinder -g genome.fasta

Q.108 What does the 'GFF3' file format contain?

Gene expression data
Genomic feature annotations
Variant calls
Sequencing quality scores
Explanation - GFF3 files list genomic features such as exons, transcripts, and genes.
Correct answer is: Genomic feature annotations

Q.109 Which of the following best describes a 'transposon'?

A protein-coding gene
A mobile genetic element
A type of RNA polymerase
A DNA methylation marker
Explanation - Transposons can move within the genome, affecting structure and function.
Correct answer is: A mobile genetic element

Q.110 Which command counts the number of occurrences of a pattern in a file using grep?

grep -c 'pattern' file.txt
grep 'pattern' file.txt | wc -l
both a and b
none of the above
Explanation - Both commands return the count of matching lines.
Correct answer is: both a and b

Q.111 In a phylogenetic tree, what is a 'branch length' typically proportional to?

Mutation rate
Genome size
Sequence length
Number of taxa
Explanation - Branch length reflects evolutionary distance, often tied to mutations.
Correct answer is: Mutation rate

Q.112 What does the 'BWA MEM' algorithm use for alignment?

Suffix arrays
Burrows–Wheeler transform
Huffman coding
Dynamic programming
Explanation - BWA MEM uses the BWT index for efficient read mapping.
Correct answer is: Burrows–Wheeler transform

Q.113 Which of the following is a key component of a 'genome annotation pipeline'?

Read mapping
Protein structure prediction
Variant calling
Metabolic modeling
Explanation - Annotation pipelines often begin with aligning reads to a reference.
Correct answer is: Read mapping

Q.114 Which R function extracts the mean expression for each gene across samples?

rowMeans()
colMeans()
mean()
median()
Explanation - rowMeans() computes the mean of each row (gene) in a matrix.
Correct answer is: rowMeans()

Q.115 Which command is used to sort a BAM file by coordinate using samtools?

samtools sort -o sorted.bam input.bam
samtools order input.bam -o sorted.bam
samtools coordinate input.bam -o sorted.bam
samtools index input.bam
Explanation - samtools sort orders reads by genomic coordinates.
Correct answer is: samtools sort -o sorted.bam input.bam

Q.116 In a 'de novo' assembly, what is the 'k' in a 'k‑mer' strategy?

Number of contigs produced
Size of the k‑mer subsequence
Length of reads
Number of iterations
Explanation - The 'k' denotes the length of substrings used for overlap detection.
Correct answer is: Size of the k‑mer subsequence

Q.117 Which command removes adapter sequences from paired‑end reads using Trimmomatic?

trimmomatic PE input_R1.fastq input_R2.fastq output_forward_paired.fq output_forward_unpaired.fq output_reverse_paired.fq output_reverse_unpaired.fq ILLUMINACLIP:adapters.fa:2:30:10
trim_galore --paired input_R1.fastq input_R2.fastq
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGTT
both a and b
Explanation - Both Trimmomatic and Trim Galore can trim adapters from PE reads.
Correct answer is: both a and b

Q.118 Which of the following best describes the 'Read Depth' metric?

Average read length
Number of reads covering a region
Error rate of sequencing
GC content variation
Explanation - Read depth indicates how many reads map to a genomic position.
Correct answer is: Number of reads covering a region

Q.119 What is the main output of the 'FastQC' tool?

Alignment files
Quality reports in HTML and text
Variant call files
Phylogenetic trees
Explanation - FastQC generates a multi‑panel HTML report summarizing read quality.
Correct answer is: Quality reports in HTML and text

Q.120 Which of the following commands converts a VCF file to a BED file containing only variant positions?

vcftools --vcf input.vcf --positions --bed output.bed
awk 'NR>1 {print $1":"$4"-"$4}' input.vcf > output.bed
sed -n '2,$p' input.vcf | cut -f1,4 > output.bed
All of the above
Explanation - All methods can extract variant positions into BED format.
Correct answer is: All of the above

Q.121 Which of the following is an advantage of long‑read sequencing?

Higher per‑base accuracy
Lower error rates
Better assembly of repetitive regions
Smaller library preparation time
Explanation - Long reads span repeats, improving assembly contiguity.
Correct answer is: Better assembly of repetitive regions

Q.122 In Python, which function generates a random integer between 1 and 10?

random.randint(1,10)
random.random(1,10)
random.randrange(1,10)
both a and c
Explanation - Both randint and randrange produce a random integer in the range.
Correct answer is: both a and c

Q.123 Which command extracts the first 100 lines of a file?

head -n 100 file.txt
head 100 file.txt
tail -n 100 file.txt
sed -n '1,100p' file.txt
Explanation - head with -n outputs the top 100 lines.
Correct answer is: head -n 100 file.txt

Q.124 Which of these is a typical input for the 'MAFFT' alignment program?

Protein FASTA files
RNA‑seq FASTQ files
Variant call files
Chromosome conformation capture data
Explanation - MAFFT aligns protein or nucleotide sequences given in FASTA format.
Correct answer is: Protein FASTA files

Q.125 In a BAM file, which flag value indicates a properly paired read?

0x1
0x2
0x4
0x8
Explanation - 0x2 means the read is properly paired.
Correct answer is: 0x2

Q.126 Which R function calculates the variance of a numeric vector?

var()
sd()
mean()
median()
Explanation - var() returns the sample variance of a numeric vector.
Correct answer is: var()

Q.127 What is the output format of the 'samtools mpileup' command when used with the '--vcf' flag?

VCF
SAM
BAM
BED
Explanation - The '--vcf' flag tells mpileup to output VCF format.
Correct answer is: VCF

Q.128 Which command in Linux creates a compressed version of a file?

tar -czvf archive.tar.gz folder/
gzip file.txt
compress file.txt
both a and b
Explanation - Both tar with gzip and gzip directly compress files.
Correct answer is: both a and b

Q.129 Which of the following best describes a 'transcript isoform'?

A different gene variant
An alternative splicing product of the same gene
A type of DNA methylation
A protein domain
Explanation - Isoforms result from alternative splicing producing distinct transcripts.
Correct answer is: An alternative splicing product of the same gene

Q.130 In Python, how do you iterate over the keys of a dictionary?

for key in dict:
for key in dict.keys():
both a and b
foreach key in dict
Explanation - Both syntaxes iterate over dictionary keys.
Correct answer is: both a and b

Q.131 Which command creates a FASTQ file containing only reads with a Phred quality score above 30?

awk 'NR%4==0 && $1>=30' input.fastq > highq.fastq
seqtk seq -q 30 input.fastq > highq.fastq
sed -n '4~4p' input.fastq | awk '$1>=30' > highq.fastq
All of the above
Explanation - seqtk's '-q' flag filters by quality score.
Correct answer is: seqtk seq -q 30 input.fastq > highq.fastq

Q.132 Which of the following is a typical output of the 'RNA‑seq differential expression' analysis?

Variant call files
Differentially expressed gene list with fold change
Phylogenetic tree
Metabolic network diagram
Explanation - DE analyses produce gene lists with statistics and fold changes.
Correct answer is: Differentially expressed gene list with fold change

Q.133 What does the 'CIGAR' string in a BAM file describe?

Read length
Alignment operations (match/mismatch/indel)
Sequencing platform
Quality scores
Explanation - CIGAR encodes how reads align to the reference genome.
Correct answer is: Alignment operations (match/mismatch/indel)

Q.134 Which of the following commands performs a de novo assembly using SPAdes for single‑end reads?

spades.py -s reads.fq -o assembly
spades -single reads.fq -output assembly
spades -s reads.fq --output assembly
All of the above
Explanation - spades.py accepts single‑end input with '-s' flag.
Correct answer is: spades.py -s reads.fq -o assembly

Q.135 Which command in R calculates the Pearson correlation between two vectors?

cor(x, y, method='pearson')
pearson(x, y)
correlation(x, y)
corr(x, y, type='pearson')
Explanation - The cor() function with method='pearson' computes Pearson correlation.
Correct answer is: cor(x, y, method='pearson')

Q.136 Which of the following best describes a 'methylome'?

The set of all genes in a genome
The set of all DNA methylation marks across the genome
The set of all RNA transcripts
The set of all protein domains
Explanation - Methylome refers to genome‑wide DNA methylation patterns.
Correct answer is: The set of all DNA methylation marks across the genome

Q.137 In a variant call, what does 'QUAL' represent?

Quality score of the genotype
Number of supporting reads
Allele frequency
Position of the variant
Explanation - 'QUAL' is the Phred‑scaled quality of the variant call.
Correct answer is: Quality score of the genotype

Q.138 Which of the following commands displays the first 10 lines of a file in reverse order?

head -n 10 file.txt | tac
tac file.txt | head -n 10
tail -n 10 file.txt | rev
both a and b
Explanation - Both commands reverse the lines before showing the top 10.
Correct answer is: both a and b

Q.139 Which of the following best describes an 'indel'?

Insertion or deletion of nucleotides
Substitution of a single nucleotide
A translocation event
A chromosomal inversion
Explanation - Indels are insertions or deletions relative to the reference.
Correct answer is: Insertion or deletion of nucleotides

Q.140 Which R function plots a heatmap of a gene expression matrix?

heatmap()
plot()
ggplot()
corrplot()
Explanation - heatmap() from base R generates a simple heatmap.
Correct answer is: heatmap()

Q.141 What does 'FDR' stand for in differential expression analysis?

False Discovery Rate
Fold Difference Ratio
Full Data Range
Fast Distribution Ratio
Explanation - FDR is the expected proportion of false positives among significant results.
Correct answer is: False Discovery Rate

Q.142 Which of these tools is used for rapid read alignment against a reference genome?

BWA MEM
MAFFT
HMMER
BLASTP
Explanation - BWA MEM is designed for fast mapping of short reads.
Correct answer is: BWA MEM

Q.143 In Python, which library is best for plotting genomic tracks similar to UCSC Genome Browser?

pyGenomeTracks
Matplotlib
Plotly
Bokeh
Explanation - pyGenomeTracks generates genome browser‑style plots programmatically.
Correct answer is: pyGenomeTracks

Q.144 Which command extracts the mean depth of coverage from a depth file generated by samtools depth?

awk '{sum+=$3} END{print sum/NR}' depth.txt
awk '{print $3}' depth.txt | paste -sd+ - | bc / NR
both a and b
none of the above
Explanation - Both commands compute the mean depth by summing and dividing by record count.
Correct answer is: both a and b

Q.145 What is the purpose of a 'barcode' in multiplexed sequencing libraries?

To identify sample origin within a pooled run
To increase read length
To mark quality of reads
To sort reads by GC content
Explanation - Barcodes tag reads from different samples, enabling demultiplexing.
Correct answer is: To identify sample origin within a pooled run

Q.146 Which of these commands creates an index for a BAM file?

samtools index input.bam
samtools mkindex input.bam
samtools index -b input.bam
samtools makeindex input.bam
Explanation - samtools index builds a coordinate index for efficient access.
Correct answer is: samtools index input.bam

Q.147 Which of the following best describes a 'de novo' assembly?

Assembly using a known reference genome
Assembly without a reference, using reads alone
Assembly of protein sequences
Assembly of transcriptomes only
Explanation - De novo assembly constructs sequences from scratch.
Correct answer is: Assembly without a reference, using reads alone

Q.148 Which R function is used to write a CSV file from a DataFrame?

write.csv()
write_csv()
csv.write()
write.table()
Explanation - write.csv() outputs a DataFrame to a CSV file.
Correct answer is: write.csv()

Q.149 What does the command 'wget https://example.com/file.fasta' do?

Uploads file.fasta to the server
Downloads file.fasta from the URL
Deletes file.fasta from the server
Copies file.fasta to local directory
Explanation - wget retrieves files from the web via HTTP/FTP.
Correct answer is: Downloads file.fasta from the URL

Q.150 Which of the following best describes a 'gene ontology (GO) enrichment' analysis?

Assessing over‑representation of GO terms in a gene set
Mapping genes to their chromosomal positions
Identifying sequence motifs in promoters
Predicting 3D protein structures
Explanation - GO enrichment identifies biological functions over‑represented in a list.
Correct answer is: Assessing over‑representation of GO terms in a gene set

Q.151 In a FASTQ file, what does the '+' line contain when it is followed by an identical header?

Quality string placeholder
Sequence identifier repeat
Sequence itself
No data
Explanation - The '+' line can be followed by the same header or left blank.
Correct answer is: Quality string placeholder

Q.152 What does the 'FASTQC' tool highlight in its per‑sequence GC content plot?

GC content distribution across reads
Read length distribution
Quality score trends
Adapter contamination
Explanation - This plot shows the GC distribution for each read, indicating bias.
Correct answer is: GC content distribution across reads

Q.153 Which command in Linux counts the total number of characters in a file?

wc -c file.txt
cat file.txt | wc -c
both a and b
none of the above
Explanation - wc -c returns the character count; piping works similarly.
Correct answer is: both a and b

Q.154 In the context of gene regulation, what is a 'promoter'?

A coding region of a gene
A regulatory DNA sequence upstream of a gene
A type of RNA polymerase
A protein domain
Explanation - Promoters initiate transcription by binding transcription factors.
Correct answer is: A regulatory DNA sequence upstream of a gene

Q.155 Which R function returns the standard deviation of a vector?

sd()
var()
mean()
sum()
Explanation - sd() computes the sample standard deviation.
Correct answer is: sd()

Q.156 What does the 'SAM' format store that the 'BAM' format does not?

Sequence data
Alignment data
Quality scores
It is a binary format; BAM stores the same data in binary form
Explanation - BAM is a compressed binary representation of SAM.
Correct answer is: It is a binary format; BAM stores the same data in binary form

Q.157 Which of the following commands is used to generate a FASTA file containing only coding sequences from a GTF annotation?

gffread -g genome.fa -y coding.fasta annotation.gtf
awk '$3==CDS' annotation.gtf > coding.gtf
sed -n '/CDS/p' annotation.gtf > coding.gtf
both a and b
Explanation - gffread extracts coding sequences based on GTF features.
Correct answer is: gffread -g genome.fa -y coding.fasta annotation.gtf

Q.158 In a gene‑expression heatmap, what is the typical purpose of a dendrogram?

Shows the phylogenetic tree
Groups similar expression profiles
Indicates GC content
Displays sequence alignment
Explanation - Dendrograms cluster genes or samples with similar patterns.
Correct answer is: Groups similar expression profiles

Q.159 Which of the following commands calculates the mean of a column in a tabular file using awk?

awk '{sum+=$2} END{print sum/NR}' file.txt
awk '{print $2}' file.txt | paste -sd+ - | bc / NR
both a and b
none of the above
Explanation - Both compute the average of the second column.
Correct answer is: both a and b

Q.160 What does a 'coverage plot' display?

Number of reads per base across the genome
Expression levels across samples
GC content variation
Phylogenetic distances
Explanation - Coverage plots show read depth across genomic coordinates.
Correct answer is: Number of reads per base across the genome

Q.161 In a FASTQ file, how many lines correspond to one read?

1
2
3
4
Explanation - A FASTQ record consists of 4 lines: header, sequence, '+', and quality.
Correct answer is: 4

Q.162 Which R package is commonly used for functional enrichment analysis of gene sets?

clusterProfiler
ggplot2
data.table
dplyr
Explanation - clusterProfiler performs GO and pathway enrichment analyses.
Correct answer is: clusterProfiler

Q.163 Which of the following commands creates a gzipped FASTQ file from an uncompressed FASTQ?

gzip input.fastq
bgzip input.fastq
pigz input.fastq
All of the above
Explanation - All three tools can compress FASTQ files to .gz.
Correct answer is: All of the above

Q.164 What is the purpose of a 'masker' in genome annotation?

To identify and annotate repeats
To filter low‑quality reads
To compress the genome
To predict transcription factor binding sites
Explanation - RepeatMasker identifies repetitive elements in the genome.
Correct answer is: To identify and annotate repeats

Q.165 Which of the following best describes a 'haplotype'?

A set of DNA bases forming a gene
A combination of alleles at multiple loci on the same chromosome
A type of protein
A statistical measure of read depth
Explanation - Haplotypes represent linked genetic variants on one chromosome.
Correct answer is: A combination of alleles at multiple loci on the same chromosome

Q.166 In Python, how do you split a string by commas?

string.split(',')
string.split(',')
string.split()
string.split(',')
Explanation - The split() method splits on the delimiter provided.
Correct answer is: string.split(',')

Q.167 Which of the following commands calculates the GC content of a FASTA file using awk?

awk 'NR>1{g+=gsub(/G|C/,"")} END{print (g/len)*100}' file.fasta
awk '/[GC]/{g++} END{print g/NR}' file.fasta
both a and b
none of the above
Explanation - Both snippets accumulate G/C counts and calculate percentage.
Correct answer is: both a and b

Q.168 What does the 'SAM flag 0x10' indicate?

Read is mapped in the forward direction
Read is mapped in the reverse direction
Read is unmapped
Read is part of a paired‑end alignment
Explanation - 0x10 denotes the reverse complement strand mapping.
Correct answer is: Read is mapped in the reverse direction

Q.169 Which command lists the contents of a directory sorted by modification time?

ls -t
ls -l
ls -h
ls -s
Explanation - The '-t' flag sorts by modification time, newest first.
Correct answer is: ls -t

Q.170 Which of the following best describes 'metagenomics'?

Sequencing of individual genomes
Sequencing of mixed microbial communities
Sequencing of the human transcriptome
Sequencing of the human genome only
Explanation - Metagenomics analyzes DNA from environmental samples containing many species.
Correct answer is: Sequencing of mixed microbial communities

Q.171 What does the 'RPKM' normalization formula stand for?

Reads Per Kilobase per Million mapped reads
Reads per Kilobase of mRNA
RNA per Kilobase of genome
Random Per Kilobase per Million
Explanation - RPKM corrects for gene length and sequencing depth.
Correct answer is: Reads Per Kilobase per Million mapped reads

Q.172 Which of the following commands generates a FASTA file containing only sequences longer than 1000 bases?

awk 'NR%4==2 && length($0)>1000' file.fasta > long.fasta
seqtk subseq file.fasta -m 1000 > long.fasta
sed -n '/^>/p' file.fasta > headers.txt && grep -A1 -B1 '^>.*$' file.fasta | awk 'NF>1000' > long.fasta
All of the above
Explanation - All commands filter by sequence length.
Correct answer is: All of the above

Q.173 What does the 'GC skew' plot help identify in a bacterial genome?

Strand replication origin and terminus
Gene expression levels
Phylogenetic relationships
Methylation patterns
Explanation - GC skew changes around the origin and terminus of replication.
Correct answer is: Strand replication origin and terminus

Q.174 Which of the following is a typical input for the 'BWA aln' algorithm?

Paired‑end reads
Single‑end reads
Protein sequences
RNA‑seq FASTQ files
Explanation - BWA aln is used for short, single‑end read mapping.
Correct answer is: Single‑end reads

Q.175 Which R function merges two data frames by a common column?

merge()
cbind()
rbind()
join()
Explanation - merge() combines data frames on key columns.
Correct answer is: merge()

Q.176 Which of the following best describes a 'pseudogene'?

An active protein‑coding gene
A non‑coding RNA gene
A gene that has lost its function due to mutations
A gene with multiple splice variants
Explanation - Pseudogenes are non‑functional remnants of once‑active genes.
Correct answer is: A gene that has lost its function due to mutations

Q.177 What does a 'scaffold' represent in genome assembly?

A single continuous contig
A set of contigs linked with estimated distances
A read from the sequencing library
A protein domain annotation
Explanation - Scaffolds arrange contigs with gap estimates.
Correct answer is: A set of contigs linked with estimated distances

Q.178 Which command displays the number of distinct sequences in a FASTA file using awk?

awk '/^>/ {count++} END{print count}' file.fasta
grep -c '^>' file.fasta
both a and b
none of the above
Explanation - Both count the number of header lines, indicating sequences.
Correct answer is: both a and b

Q.179 In a phylogenetic tree, what is the 'root'?

The most recent common ancestor of all taxa
The oldest species in the tree
The node with the longest branch
The leaf node
Explanation - The root represents the ancestral point from which all branches diverge.
Correct answer is: The most recent common ancestor of all taxa

Q.180 Which of the following is a valid Python list comprehension that squares numbers 1–5?

[x**2 for x in range(1,6)]
[x*x for x in 1..5]
(x**2 for x in range(1,6))
[x**2 for x in range(6)]
Explanation - This comprehension correctly iterates 1–5 and squares each element.
Correct answer is: [x**2 for x in range(1,6)]

Q.181 What does the command 'samtools flagstat input.bam' output?

The number of mapped and unmapped reads
The base composition of the reference genome
The alignment score distribution
The GC content of the reads
Explanation - flagstat provides a quick summary of mapping statistics.
Correct answer is: The number of mapped and unmapped reads

Q.182 Which R function calculates the median of a numeric vector?

median()
mean()
mode()
median()
Explanation - median() returns the middle value of a sorted vector.
Correct answer is: median()

Q.183 What does the 'N' character represent in a DNA sequence?

A single nucleotide
An ambiguous nucleotide (any base)
A gap
A stop codon
Explanation - N indicates any base (A, T, C, or G).
Correct answer is: An ambiguous nucleotide (any base)

Q.184 Which command in R merges two data frames on a shared key using dplyr?

left_join(df1, df2, by='id')
merge(df1, df2, by='id')
join(df1, df2)
all_join(df1, df2)
Explanation - left_join() from dplyr performs a left merge.
Correct answer is: left_join(df1, df2, by='id')

Q.185 Which of the following commands performs a de novo assembly using Canu for long reads?

canu -p asm -d out genomeSize=3g -pacbio-raw reads.fq
canu -assembly asm -input reads.fq
canu -run asm -reads reads.fq
canu -p asm -d out -genomeSize 3g reads.fq
Explanation - This syntax specifies output directory, genome size, and PacBio raw reads.
Correct answer is: canu -p asm -d out genomeSize=3g -pacbio-raw reads.fq

Q.186 Which of the following best describes a 'pseudogene'?

A gene that has lost its function due to mutations
A non‑coding RNA gene
A gene with multiple splice variants
An active protein‑coding gene
Explanation - Pseudogenes are non‑functional remnants of once‑active genes.
Correct answer is: A gene that has lost its function due to mutations

Q.187 Which command lists all files and directories in the current directory, including hidden ones, sorted alphabetically?

ls -a
ls -al
ls -alh
All of the above
Explanation - ls -al lists files, including hidden ones, and shows detailed info.
Correct answer is: ls -al

Q.188 Which R function computes the log2 fold change between two conditions?

log2(x/y)
log2(x)/log2(y)
log2(x)+log2(y)
log2(x*y)
Explanation - Log2 fold change is the log base 2 of the ratio of expression values.
Correct answer is: log2(x/y)