CleanDeepSeq – Software To Minimize Error in Next-Gen Sequencing
A Software has been developed by St. Jude Children’s Research Hospital researchers to minimize the error rate in next-generation sequencing data by as much as 100-fold, which would likely speed early detection of relapse and other threats.
Next-generation DNA sequencing datasets from St. Jude and four other institutions were examined by the Researchers to identify and suppress ordinary sources of sequencing errors. Researchers reported that Using the new procedure the error rate for DNA base substitution declined from 0.1 percent (1 in 1,000) to between 0.01 (1 in 10,000) and 0.001 percent (1 in 100,000).
The Researchers hope to provide patients a head start on cures by making it easier to distinguish with greater accuracy the signal from noise, in this case, a true mutation from a sequencing error,
“Early detection of cancer or cancer relapse really is like finding a needle in a haystack because the amount of cancer cells is dependent upon the number of normal cells at an early stage,” said co-first and corresponding author Xiaotu Ma, Ph.D., an assistant member of the St. Jude Department of Computational Biology. ” Further added by Xiaotu
that this method, which we have named CleanDeepSeq, will help in removing the hay to make it easier to find the needle.”
Sequencing the human genome involves determining the specific order of the 3 billion chemical bases or letters that make up the genome.
Interest in reducing errors and enhancing data quality has grown as next-generation
sequencing costs have fallen. the Cancer-driving genes can be sequenced thousands or hundreds of thousands of times in order to find indications of cancer cells found before the overt disease.
Corresponding and senior writer Jinghui Zhang, Ph.D., St. Jude Computational Biology chair stated that – Sequencing errors are a roadblock for discovering the low-frequency genetic variables that are important for cancer molecular diagnosis, treatment, and surveillance utilizing deep next-generation sequencing. Later he also added that with this study we will get the first complete analysis of the origin of such sequencing mistakes and offers new strategies for improving the accuracy.
This study focused on identifying the selection and source of substitution errors in next-generation sequencing information and developing a mathematical error-suppression strategy. Variety of techniques were used by the investigators to ascertain the lowest frequency where a genuine mutation could be distinguished from a sequencing error. Analyzing datasets in St. Jude, HudsonAlpha Institute of Biotechnology, the Broad Institute, Baylor College of Medicine, and WuXiNextCODE, in China was also involved in the Research.
The analysis revealed several sources of errors, including handling and storage of the patient samples like the enzymes used to amplify patient samples and the sequencing, which leads to profiling led Ma and his colleagues to home in on recognition and suppression of errors related to poor sequencing quality or difficulty re-assembling (mapping) the sequences or aligning the patient genome with a reference genome.
Researchers are working to bring CleanDeepSeq to the practice for observation relapse and possibly early diagnosis, particularly in high-risk patients. Ma further stated that this method may also help scientists studying infectious diseases like influenza and HIV or wherever drug-resistance is a concern.
The results of the above study were published in the BMC Journal with Title “Analysis of error profiles in deep next-generation sequencing data”.