What Is Genome Sequencing?
Whole genome sequencing is individual genome sequencing of species with unknown genome sequence, which means that a species can be sequenced and analyzed without relying on any reference sequence information, sequence splicing using the latest bioinformatics methods to obtain the genome sequence map of a species, and a series of subsequent analyses such as genome structure annotation, functional annotation, and comparative genomics analysis.
Genomic DNA is extracted, then randomly interrupted, and DNA fragments of desired length (0.2~5Kb) are recovered by electrophoresis, coupled with connectors, and DNA clusters are prepared. The insert is sequenced using the Paired-End (Solexa) or Mate-Pair (SOLiD) method. The sequences are then assembled into Contigs, which can be further assembled into Scaffolds by the Paired-End distance, and then into chromosomes.
Figure 1. The process of genome sequencing.
- Sequencing depth
- The ratio of the total number of bases (bp) obtained by sequencing to the genome size.
- There is a positive correlation between sequencing depth and genome coverage, and the error rate or false positive results from sequencing decreases as sequencing depth increases.
- For sequenced individuals, if a double-end or Mate-Pair scheme is used, when the sequencing depth is above 50X~100X, the genome coverage and sequencing error rate control can be ensured, and the subsequent sequence assembly into chromosomes can become easier and more accurate.
- Sequencing coverage
- The proportion of bases covered by the genome obtained by sequencing
- Sequencing coverage is one of the indicators reflecting the randomness of sequencing
- When the depth reaches 5X, more than 99.4% of the genome can be covered
In terms of years' professional experience in this field, Creative Biogene can provide you with the most affordable and highest quality sequencing services.
Our Genome Sequencing Methods
In order to genome sequencing, we can use the following methods.
- Paired-end sequencing
The paired-end sequencing method uses a pair of markers with a specified insertion spacing that can accommodate long inserts up to several kb in length.
As single-stranded DNA molecules pass through the nanopore, different current signals are obtained with respect to each nucleotide. The ionic current variations for each well are recorded and converted into base sequences based on Markov model or recurrent neural network approaches. In addition to this, Ultra-long reads (ULRs) are another important feature of the ONT platform and have the potential to facilitate large genome assembly.
- Gene prediction and annotation
- Coding gene prediction
- Repetitive sequence annotation and transposable element classification
- Non-coding RNA annotation
- Pseudogene annotation, etc.
- Biological problem solving
- Comparative genomics studies
- Gene family clustering
- Construction of phylogenetic trees
- Analysis of gene family expansion and contraction
- Species differentiation time imputation
- Estimation of LTR formation times
- Genome-wide replication events
- Analysis of selection pressure
- Wei, ZG, Zhang, SW. (2018) NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model. BMC BIOINFORMATICS, 19. doi: 10.1186/s12859-018-2208-0
- Xie, HY, Yang, CY, Sun, YM, et al. (2020) PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning. FRONTIERS IN GENETICS, 11. doi: 10.3389/fgene.2020.516269.