Description
This track shows deletions that have been found in the sequences uploaded to the GISAID database as of June 6, 2020.
Three confidence levels of deletion calls are shown:
- deletions found in at least 1
GISAID sequence
- deletions found in at least 2 GISAID
sequences
- deletions found in at least 2 GISAID sequences that
were able to be validated with raw reads.
Methods
We accessed all GISAID SARS-CoV-2 sequences on June 6, 2020. We filtered to
high coverage reads encompassing the entire SARS-CoV-2 genome (>=29000 bps),
leaving 12,403 sequences.
We aligned the reads using MAFFT.
Verification
We validated several deletions with the raw reads from NCBI's SRA Run browser.
Additionally, NYU Langone Health provided us with the aligned reads for many of
their sequences.
Data Access
The raw data can be explored interactively with the
Table Browser, combined with other datasets in the
Data Integrator tool,
or downloaded directly as "microdel.txt.gz" from
the download server.
Please refer to our
mailing list archives
for questions, or our
Data Access FAQ
for more information.
Credits
We thank all of the labs that submitted their sequences to the GISAID database.
The full acknowledgement table can be found at
https://github.com/briannachrisman/SARS-CoV-2_Microdeletions/blob/master/acknowledgments.pdf.
We thank the public health laboratories VIDRL and MDU-PHL at The Peter Doherty Institute for
Infection and Immunity for providing over 1000 high quality raw reads to NCBI.
Thank you NYU Langone SARS-CoV2 Sequencing Team's Matthew T Maurano, Matija Snuderl, and
Adriana Heguy for providing many of their raw reads.
References
Chrisman, Brianna Sierra, Kelley Paskov, Nate Stockham, Kevin Tabatabaei, Jae-Yoon Jung, Peter Washington, Maya Varma, Min Woo Sun, Sepideh Maleki, and Dennis P. Wall. "Indels in SARS-CoV-2 occur at template-switching hotspots." BioData Mining 14, no. 1 (2021): 1-16. https://doi.org/10.1186/s13040-021-00251-0
|