Description
This track shows the alignment of three different mRNA vaccine sequences
to the SARS-CoV-2 genome:
- The BioNTech/Pfizer BNT-162b2 sequence as published by the World Health Organization
- The reconstructed BioNTech/Pfizer BNT-162b2 RNA as sequenced by the
Andrew Fire lab, Stanford University School of Medicine
- The Moderna mRNA-1273 sequence as sequenced by the
Andrew Fire lab, Stanford University School of Medicine
Note that the actual vaccines are synthesized with N1-methyl-pseudouridine
(Ψ) in place of uridine. See paper by Hubert in References for
a discussion.
Display Conventions and Configuration
The psl output from blat was converted to a bigPsl
format file for display in this track. Depending upon the size of the
section of the genome in display, the track will draw black where
nucleotides are identical between vaccine sequence and the SARS-CoV-2
sequence. Red lines indicate differences in nucleotides. At viewpoints
with smaller sections of the genome in view, setting the
Color track by codons or bases: to different mRNA bases
will show the nucleotides in the vaccine that are different than the
SARS-CoV-2 sequence.
Methods
The mRNA sequences were obtained from the MS WORD documents as
mentioned in the references below. And the
Andrew Fire lab
github supplied the
fasta sequencing result for
the BioNTech/Pfizer BNT-162b2 and Moderna mRNA-1273 samples.
The PSL alignment file was obtained via the UCSC genome browser
blat service with parameters -t=dnax -q=rnax and filtered
to allow only scores above 1000 to filter out the polyA match:
gfClient -maxIntron=10 -t=dnax -q=rnax <host> <port> \
/gbdb/wuhCor1 threeVaccines.fa stdout \
| pslFilter -minScore=1000 stdin wuhCor1.vaccines.psl
pslScore wuhCor1.vaccines.psl
#tName tStart tEnd qName:qStart-qEnd score percentIdent
NC_045512v2 21559 25384 ModernaMrna1273:54-3879 1419 68.60
NC_045512v2 21559 25384 ReconstructedBNT162b2:51-3876 1701 72.30
NC_045512v2 21559 25384 WHO_BNT162b2:51-3876 1701 72.30
faCount threeVaccines.fa | tawk '{print $1,"1.."$2+1}' \
| head -4 | tail -3 > threeVaccines.cds
pslToBigPsl -cds=threeVaccines.cds -fa=threeVaccines.fa wuhCor1.vaccines.psl stdout \
| sort -k1,1 -k2,2n > wuhCor1.vaccines.bigPsl
bedToBigBed -type=bed12+13 -tab -as=HOME/kent/src/hg/lib/bigPsl.as \
wuhCor1.vaccines.bigPsl wuhCor1.chrom.sizes wuhCor1.vaccines.bb
Data Access
The fasta file sequences and psl alignment file can be obtained from
our download server at:
https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/vaccines/.
The bigPsl alignment file used for the display of this track
in the genome browser can be accessed from
https://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/wuhCor1.vaccines.bb.
The kent command line access tool bigBedToBed,
which can be compiled from the source code or downloaded as a precompiled
binary for your system. Instructions for downloading source code and
binaries can be found
here.
The protein encoded by the three sequences has two AA substitutions
compared to the SARS-CoV-2 S glycoprotein. Variations: S:K986P and S:V987P
in the vaccine sequence. See also:
The tiny tweak behind COVID-19 vaccines.
>BNT162b2
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFD
NPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVY
SSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQT
LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRV
QPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSF
VIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPC
NGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL
PFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGS
NVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI
SVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGF
NFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAG
TITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALN
TLVKQLSSNFGAISSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRV
DFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT
FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDL
QELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYTZZ
References
Dae Eun Jeong, Matthew McCoy, Karen Artiles, Orkan Ilbay, Andrew Fire, Kari Nadeau, Helen Park, Brooke Betts, Scott Boyd, Ramona Hoh, and Massa Shoura
Assemblies of putative SARS-CoV2-spike-encoding mRNA sequences for vaccines BNT-162b2 and mRNA-1273
obtained from github
Bert Hubert
Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine
25 Dec 2020
WikiPedia
Pfizer-BioNTech COVID-19 vaccine
World Health Organization MedNet
Messenger RNA encoding the full-length SARS-CoV-2 spike glycoprotein Sept. 2020 document 11889
Cyril Le Nouën, Peter L. Collins, and Ursula J. Buchholz
Attenuation of Human Respiratory Viruses by Synonymous Genome Recoding Frontiers in Immunology 2019; 10: 1250. PMID: 31231383
Ryan Cross
The tiny tweak behind COVID-19 vaccines,
Chemical & Engineering News 29 September 2020 Vol 98, issue 38
Credits
Thank you to the Andrew Fire lab, Stanford University School of Medicine
for providing the sequencing data of these vaccines.
The presentation of this track was prepared by Hiram Clawson (hclawson@ucsc.edu).
|