At a glance:
A pan-genome is the sum of all genomic information within a species. With the development of genomic technology, researchers have found that a single reference genome can no longer meet the needs of genomic data analysis, and more and more species, including the human genome, are choosing to construct a pan-genome instead of a single reference genome.
Pan-genomes reflect structural variation (SV) and polymorphisms in the genome, allowing in-depth comparisons of variation at the species level or at higher taxonomic levels. Pan-genomes have potential applications in crop improvement, evolution and biodiversity research. To fully exploit the value of pan-genomes, a broader range of information such as phenotypic, environmental and expression data needs to be integrated to provide insight into the role of variable regions in the genome.
There is extensive genomic diversity within species, and a pan-genome need to capture this diversity while removing redundancies to generate an integrated single genome.
Map to pan, which starts with de novo assembly, and matches the sequences of each individual assembled to the reference genome to find the unmatched sequences, then finds all the unmatched sequences and builds the pan-genome, or iterative mapping and assembly methods
Iterative assembly starts with a single reference genome and then complements it with non-redundant sequences from other individuals or the iterative mapping and assembly method starts from a single reference genome and then complements it with non-redundant sequences from other individuals to build a pan-genome.
De novo assembly requires the individual genomes to be assembled separately, followed by whole genome comparison.
Pan-genomic analysis clusters gene sets by co-occurrence in each individual and is usually divided into three categories: Core gene, genes present in all plant and animal strains; dispensable gene, genes present in one or more plant and animal strains; private, genes present in only one strain. The core part is present in all individuals, while the dispensable part is present in only one individual.
Pan-genomic analysis helps to understand the characteristics of species, while the complex genomic variation provided by pan-genome mapping helps to resolve the diversity of crop phenotypes and agronomic traits.
Application of pan-genome in crop improvement (Della Coletta R et al., 2021)
The reduced cost of Illumina sequencing and improvements in assembly algorithms have facilitated the use of low-cost short-read data (e.g., maize genome, rice genome, soybean genome). While this approach has generated highly complete and contiguous assemblies of low-copy gene regions, the more repetitive, TE-rich regions of the genome have proven difficult to assemble with short reads, resulting in large gaps and partial assemblies in these regions. Recently, the maturation of long-read sequencing technologies, especially PacBio HiFi Sequencing, has facilitated more contiguous and complete assemblies of crop genomes and, in some cases, long-read length-based assemblies within a single species. Advances in PacBio pangenome sequencing technologies are described below.
Impact of sequencing technology on polyploid assembly (Della Coletta R et al., 2021)
Nowadays, pan-genome construction generally uses three-generation long read-length sequencing to assemble multiple samples of a population from scratch. The two technology platforms now commonly used for triple sequencing are PacBio's HiFi sequencing and ONT's Nanopore sequencing, of which HiFi sequencing takes into account long read length and ultra-high accuracy, and is extremely suitable for sequencing genomic de novo assembly.
The higher accuracy of HiFi reads allows the assembly algorithm to extend contigs to flanking mitotic regions with high confidence through more repeat assemblies at shorter read lengths, enhancing the integrity of the mitotic and telomeric regions
High-quality genome assembly in polyploid species has been difficult to achieve due to the inclusion of multiple closely related subgenomes and the associated challenges in distinguishing homologous motifs and creating non-mosaic subgenomic scaffolds. Long-read sequencing with low error rates (e.g., PacBio HiFi read length) has enabled high-quality polyploid genome assembly, with recent assemblies containing fewer gaps and resolved homologous scaffolds. As polyploid pangenomes of more species are revealed, more novel structural variants and markers are likely to be observed.
For research purposes only, not intended for personal diagnosis, clinical testing, or health assessment