Description
This track shows short nucleotide variants of a few base pairs when aligning
HPRC genomes to the hg38 reference assembly. The alignment was made with the
Minigraph-cactus approach described in the references below.
There are three subtracks in this superTrack:
- All short variants up to 50bp, without any length filter
- All short variants <= 3 bp long
- All short variants > 3 bp long
VCF Decomposition from
HPRC Pangenome Resources Github:
"The Raw VCF files contain a site for each bubble in the graph. Nested bubbles will result in
overlapping sites. The nesting relationships are denoted with the PS (parent snarl), LV (level) and
AT (allele traversal) tags and need to be taken into account when interpreting the VCF.
Alternatively, you can use the 'Decomposed VCFs' which have been normalized by using
vcfbub to 'pop'
bubbles with alleles larger than 100k and
vcfwave
to realign each alt
(script). Note that in order to reproduce the PanGenie analyses from the papers, you should instead
use the
PanGenie HPRC Workflow. This workflow has a
CHM13 branch to use when working with that reference.
The exact tools and commands used to produce the VCFs are given
here."
Display Conventions and Configuration
The Name of the items are the pair of node labels that denote the site's location
in the graph, with the '>' and '<' denoting the forward and reverse
orientation of the node. Mouseover on items in "squish" and "pack" modes shows the items Name and
Genotypes. Mouseover on items in "full" mode shows Alleles.
Methods
The Minigraph-Cactus HPRC v1.0 graph was converted to VCF using vg deconstruct.
This result was further postprocessed using vcfbub to flatten nested sites then
vcfwave to normalize by realigning alt alleles to the reference. All steps are
described in Hickey et al 2023. The postprocessing command lines and data can be found on
Github.
Finally, the resulting VCF was filtered by length and split into two VCFs using a cutoff of 3bp.
Credits
Thanks to Glenn Hickey for providing the HAL file from the HPRC project and for making these VCFs from them.
References
Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q,
Xie D, Feng S, Stiller J
et al.
Progressive Cactus is a multiple-genome aligner for the thousand-genome era.
Nature. 2020 Nov;587(7833):246-251.
PMID: 33177663;
PMC: PMC7673649;
DOI: 10.1038/s41586-020-2871-y
Glenn Hickey, Jean Monlong, Jana Ebler, Adam M Novak, Jordan M Eizenga,
Yan Gao; Human Pangenome Reference Consortium; Tobias Marschall, Heng Li,
Benedict Paten
Pangenome graph construction from genome alignments with Minigraph-Cactus.
Nature Biotechnology. 2023 May 10. doi: 10.1038/s41587-023-01793-w.
PMID: 37165083;
DOI: 10.1038/s41587-023-01793-w
Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D.
Cactus: Algorithms for genome multiple sequence alignment.
Genome Res. 2011 Sep;21(9):1512-28.
PMID: 21665927;
PMC: PMC3166836;
DOI: 10.1101/gr.123356.111
Wen-Wei Liao, Mobin Asri, Jana Ebler, ...et al, Heng Lin,
Benedict Paten
A draft human pangenome reference.
Nature. 2023 May;617(7960):312-324.
PMID: 37165242;
PMC: PMC1017212;
DOI: 10.1038/s41586-023-05896-x
|