[an error occurred while processing this directive]

HEREDITAS(Beijing) ›› 2014, Vol. 36 ›› Issue (6): 618-624.doi: 10.3724/SP.J.1005.2014.0618

• Technique and Method • Previous Articles    

Automatic analysis pipeline of next-generation sequencing data

Wenke Li1, Fengyu Li1, 2, Siyao Zhang1, Bin Cai1, Na Zheng1, Yu Nie1, Dao Zhou2, Qian Zhao1   

  1. 1. State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Disease, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China;
    2. College of Biomedical Engineering, South-Central University for Nationalities, Wuhan 430074, China
  • Received:2013-09-07 Revised:2014-01-20 Online:2014-06-20 Published:2014-05-28

Abstract:

The development of next-generation sequencing has generated high demand for data processing and analysis. Although there are a lot of software for analyzing next-generation sequencing data, most of them are designed for one specific function (e.g., alignment, variant calling or annotation). Therefore, it is necessary to combine them together for data analysis and to generate interpretable results for biologists. This study designed a pipeline to process Illumina sequencing data based on Perl programming language and SGE system. The pipeline takes original sequence data (fastq format) as input, calls the standard data processing software (e.g., BWA, Samtools, GATK, and Annovar), and finally outputs a list of annotated variants that researchers can further analyze. The pipeline simplifies the manual operation and improves the efficiency by automatization and parallel computation. Users can easily run the pipeline by editing the configuration file or clicking the graphical interface. Our work will facilitate the research projects using the sequencing technology.

Key words: next generation sequencing, automatic data analysis, pipeline, variantion detection