Logo image
OR39 De novo assembly of the major histocompatibility complex using single-molecule real-time sequencing of large contiguous DNA fragments captured by targeted region specific extraction
Journal article   Peer reviewed

OR39 De novo assembly of the major histocompatibility complex using single-molecule real-time sequencing of large contiguous DNA fragments captured by targeted region specific extraction

Peter M. Clark, Mark Kunkel, Hilary Mehler and Dimitri Monos
Human immunology, v 76, pp 32-32
Oct 2015

Abstract

Utilize our region-specific extraction (RSE), targeted DNA capture methodology to generate large, contiguous DNA fragments (5–60Kbp) from the MHC for sequencing on the PacBio RSII single molecule real-time sequencing (SMRT) platform to produce long sequenced reads (10–15Kbp) for de novo assembly and characterization of the MHC. This unique combination of technologies produces long sequenced reads (up to 60Kbp) that may eventually enable the construction of large, phased haplotype blocks and haplotype resolved de novo assembly of the MHC. Genomic DNA from the homozygous cell line, COX (which has a fully characterized MHC haplotype) was enriched for 4Mbp of the MHC (chr6: 29618227–33618227) using the RSE DNA capture methodology [1]. DNA fragment lengths were calculated using a BioAnalyzer prior to sequencing. SMRTbell DNA libraries were constructed according to the Pacbio standard protocol “20kb Template Preparation Using BluePippin Size-selection system”. Libraries were sequenced on the PacBio RS II instrument (P6-C4 chemistry). Computational analysis was carried out using the PacBio SMRT portal HGAP 2 de novo assembly algorithm. Assembled contigs were evaluated using QUAST with the COX haplotype sequence as reference. Captured DNA fragments from the MHC were calculated to be ∼12Kbp on average (ranging from ∼5 to 60Kbp). The observed read length distribution following PacBio RSII sequencing reveals an average read length of ∼3.5Kbp, with some reads as long as 60Kbp. We are able to de novo assemble 91% of the targeted region, with 99.99% accuracy. The N50 and NG50 for the assembly were calculated to be 33,234bp and 92,824bp, respectively. The largest contig aligned to the COX reference was found to be ∼200Kbp. Our targeted resequencing and de novo assembly approach represents a comprehensive method to characterize 4Mbp of the Human MHC. We demonstrate the unique ability to de novo assemble and fully characterize 91% of the targeted MHC for the homozygous cell line COX with 99.9% accuracy as compared to the annotated COX haplotype reference sequence.

Metrics

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Immunology
Logo image