At a glance:
PacBio sequencing platform is a single-molecule real-time sequencing technology developed by Pacific Biosciences. PacBIo sequencing features long read length, high accuracy, uniform genome coverage, and detection of base modifications while sequencing.
PacBio sequencing technology is based on the principle of sequencing while synthesizing, with sequencing lengths up to 30kb and throughputs up to 20 Gb. PacBio sequencing captures sequence information during the replication process of the target DNA molecule. The template, called a SMRTbell, is a closed, single-stranded circular DNA that is created by ligating hairpin adaptors to both ends of a target double-stranded DNA (dsDNA) molecule.
Workflow of PacBio Sequencing (Hon T et al. 2020)
Single Molecule, Real-Time sequencing, which we abbreviate as SMRT sequencing, is a technology introduced by Pacific Biosciences of California, Inc. (PacBio).
SMRT sequencing uses four-color fluorescently labeled dNTP and ZMW wells to complete the sequencing of a single DNA molecule. In each ZMW well, a single DNA molecule template is bound to a primer and then, after binding DNA polymerase, is immobilized to the bottom of the ZMW well. When four-color fluorescently labeled dNTP is added and DNA synthesis begins, the attached dNTP will stay at the bottom of the ZMW for a longer period of time due to base pairing and emit a corresponding fluorescent signal after excitation to be recognized, and the returned fluorescent signal will form a special pulse wave. On the other hand, because the fluorescence signal is attached to the phosphate group of dNTP, when the last dNTP is synthesized, the phosphate group is automatically shed, which ensures the continuity of the detection and improves the detection speed of 3 bases per second synthesized with a high-resolution optical detection system, real-time detection is achieved.
HiFi reads, or High Fidelity Long Reads for short, is based on Circular Consensus Sequencing (CCS) mode to produce both long read length (10-20kb length) and high accuracy (>99% accuracy) sequencing results.
PacBio HiFi sequencing is currently the model for excellent data types for a variety of genomic applications. In this sequencing mode, the enzyme read length is typically larger than the insert length, so the enzyme is sequenced in a rolling loop around the template and the insert is sequenced multiple times. Random sequencing errors caused during a single sequencing can be corrected by the algorithm itself, resulting in highly accurate HiFi reads.
Both are long-read sequencing techniques, Nanopore and PacBio also have many differences and discrepancies. Nanopore reads are much longer than PacBio, they can reach 330kbp in length, even exceeding 2Mb according to one report. Yield/cell is 245 Gb. It can be used for both DNA and RNA (without reverse transcription), and it can read methylated bases (and other modifications) directly (read).
|Principle of sequencing||Sequencing by synthesis/DNA polymerase||Electronic signals sequencing/exonuclease|
|Read length||10-15 kb, up to 20 kb||10-100 kb, up to 4 Mb|
|Read accuracy (%)||88–90/99.9 (CCS)||96–99|
|Based per sample||20-30 Gb||6 Gb|
|Advantages||Long average read length;
No amplification of sequencing
More accurate in isoform discovery
No amplification of sequencing fragments;
Powerful in expression level quantification
Length distributions and mappability of reads (Cui J et al. 2020)
PacBio sequencing confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization.
HiFi Reads means there is no need to choose between read length and accuracy. They are already widely used for 16S full-length amplicon sequencing, and genome assembly for metagenome contigs.
Better assembly of macro-genomic data is possible, while DNA modification information can be directly detected. During base synthesis, tetra chromatically labeled dNTP releases a specific fluorescent signal and corresponds to a specific pulse signal. When a base carries a specific modification, the pulse signals of two adjacent bases appear with a corresponding time interval (Interpulse Duration, IPD), and the type of base modification can be determined based on the IPD value.
Using Sequel system to sequence human genome, 30kb fragments are built and run for 10h. The average read length is about 10~18kb, more than half of the reads are longer than 20kb, and the longest can reach 60kb, which is sufficient for most of the gene structure sequencing and achieve high quality assembly.
The chance of error in PacBio sequencing is random and independent of sequence length and sequence composition. This randomness tends to result in 87.5% accuracy at 1X for single molecule sequencing, but as shown in the figure, the accuracy of sequencing increases as the amount of sequencing increases. When the sequencing depth reaches 30X, the accuracy reaches Q50 (99.999%), and very accurate sequencing results can be achieved with 80X sequencing volume, while each reaction of the first and next generation sequencing is originally the average signal obtained from the simultaneous superimposed reactions of N molecules.
The complementary advantages of PacBio HiFi and ONT Ultra-long have improved the continuity and accuracy of genome assembly, laying a solid foundation for subsequent genome evolution, genome structure variation resolution and gene function research.
In particular, more and more species have completed T2T genome assembly, heralding the coming era of T2T genome explosion. The newly completed regions include all of the centromeric and telomere sequences, and for the first time, these complex regions of the genome are being used for variation and function studies.
Reliable data covering a large range of genomes obtained by PacBio HiFi sequencing facilitates deep mining of genomic variants, such as SVs (genomic variants of 50 to 1,000 bp), and their classification and resolution. It also facilitates the study of algorithms for genotyping from short read-length data and corrects the errors in the existing perception of SVs.
Based on single-molecule real-time SMRT sequencing technology, Iso-Seq spans the complete transcript from the 5' end to the 3'-Poly A tail with the advantage of ultra-long read length to obtain high-quality full-length transcript sequences without interrupting RNA molecules. It was found that Iso-Seq has significant advantages in identifying novel transcripts, alternative splicing events and fusion gene studies.
Another outstanding advantage of triple sequencing is the ability to directly read out base modifications (e.g. methylation), and the PacBio sequencing platform has now enabled the direct detection of 5mC methylation at CpG sites in DNA samples by HiFi sequencing. Therefore, we can obtain both accurate sequencing results and methylation information in the genome, and construct a genome-wide methylation map without additional experimental processing.
Most current techniques for microbiome strain typing, such as 16S rRNA sequencing or short-read sequencing, often provide insufficient resolution. A microbial species may only be classified as part of a broader genetic family, rather than being identified as a separate genetic species. PacBio SMRT sequencing technology, which captures ultra-long read lengths while also directly detecting base modifications, can help scientists to identify microbial strains. These base modifications can help the scientific community to address microbiome analysis methods for individual species and strains at high resolution.
For research purposes only, not intended for personal diagnosis, clinical testing, or health assessment