Explanation of sequencing depth, coverage, and its importance in experiments.
Sequencing Depth and Coverage are key metrics in next-generation sequencing (NGS) experiments that influence the reliability of results.
Sequencing Depth (or Read Depth)
Sequencing depth refers to the number of times a particular nucleotide is read during sequencing. It is usually expressed as "X-fold" (e.g., 30x), which indicates how many times, on average, each base in the genome is sequenced.
- Higher depth: Leads to more accurate identification of base pairs, improving confidence in detecting true variants (SNPs, indels).
- Lower depth: Increases the chance of missing or incorrectly calling variants due to insufficient data to reliably differentiate between sequencing errors and real variations.
Coverage
Coverage refers to the proportion of the genome (or the region of interest) that has been sequenced at a certain depth. This can be defined in terms of breadth of coverage and depth of coverage:
- Breadth of Coverage: This is the percentage of the targeted region or genome that has at least one read covering it.
- Depth of Coverage: Average number of times each base in the genome is sequenced, as explained earlier.
Importance in Experiments
- Variant Detection: High depth increases the probability of detecting true variants, especially in heterogeneous samples (e.g., tumors, mixed populations).
- Sensitivity and Accuracy: Experiments with higher sequencing depth generally have higher sensitivity (ability to detect true variants) and accuracy (lower false positives/negatives).
- Cost vs. Efficiency: High depth improves data quality but also increases costs. Balancing the desired depth with budget is a key experimental design consideration.
- Different Experiment Types:
- Whole Genome Sequencing (WGS): Requires lower average depth (~30x) but aims for high breadth of coverage.
- Targeted Sequencing: Often requires higher depth (100x or more) to detect variants with high sensitivity in specific regions.
- RNA Sequencing: Depth depends on transcript abundance, with highly expressed genes needing fewer reads to achieve sufficient depth compared to low-expressed genes.