遗传 ›› 2018, Vol. 40 ›› Issue (11): 1044-1047.doi: 10.16288/j.yczz.18-178

• 资源与平台 • 上一篇    下一篇

GSA:组学原始数据归档库

张思思1,2,陈婷婷1,2,朱军伟1,2,周晴1,3,陈旭1,2,王彦青1,2,赵文明1,2,3()   

  1. 1. 中国科学院北京基因组研究所生命与健康大数据中心, 北京 100101
    2. 中国科学院北京基因组研究所基因组科学与信息重点实验室,北京 100101
    3. 中国科学院大学,北京 100049
  • 收稿日期:2018-06-29 修回日期:2018-09-06 出版日期:2018-11-20 发布日期:2018-10-09
  • 通讯作者: 赵文明 E-mail:zhaowm@big.ac.cn
  • 作者简介:张思思,博士,工程师,研究方向:生物信息学。E-mail: zhangss@big.ac.cn|陈婷婷,硕士,工程师,研究方向:生物信息学。E-mail: chentt@big.ac.cn,张思思和陈婷婷并列第一作者。
  • 基金资助:
    国家重点研发计划“国家生物信息平台支撑技术项目”和“精准医学项目”(2017YFC1201200);国家重点研发计划“国家生物信息平台支撑技术项目”和“精准医学项目”(2016YFC0901603);中国科学院战略性先导科技专项基金项目(XDB13040500);中国科学院战略性先导科技专项基金项目(XDA08020102);国家自然科学基金项目(91731304);中国科学院关键技术人才基金项目和中国科学院“十三五”信息化建设专项(XXH13505-05)

GSA: Genome Sequence Archive

Sisi Zhang1,2,Tingting Chen1,2,Junwei Zhu1,2,Qing Zhou1,3,Xu Chen1,2,Yanqing Wang1,2,Wenming Zhao1,2,3()   

  1. 1. BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
    2. CAS Key Laboratory of Genomics and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
    3. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2018-06-29 Revised:2018-09-06 Online:2018-11-20 Published:2018-10-09
  • Contact: Zhao Wenming E-mail:zhaowm@big.ac.cn
  • Supported by:
    [Supported by the National Key R&D Program of China(2017YFC1201200);[Supported by the National Key R&D Program of China(2016YFC0901603);the Strategic Priority Research Program of the Chinese Academy of Sciences(XDB13040500);the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA08020102);the National Natural Science Foundation of China(91731304);Key Technology Talent Program of the Chinese Academy of Sciences and the 13th Five-year Informatization Plan of Chinese Academy of Sciences(XXH13505-05)

摘要:

生命科学的发展已进入组学大数据时代,然而我国至今尚未形成公共数据库存储体系。为弥补国内空白,组学原始数据归档库(Genome Sequence Archive, GSA, http://bigd.big.ac.cn/gsa)系统遵循国际核苷酸序列数据联盟(International Nucleotide Sequence Database Collaboration,INSDC)相关数据库建设标准,广泛收集各类生命组学原始数据。自2015年底上线运行以来,已获得了包括CellNaturePNASGPB等30余个国内外期刊的认可,收录的数据量呈显著增长趋势,提供的数据服务受到国内外广大科研人员的认可。GSA有效缓解了当前我国生命组学数据汇交、存储与共享困难的问题,为我国国家生物信息中心的建设奠定了坚实基础。本文对目前GSA数据汇交、审核、发布与管理等机制进行了深入阐述,以方便用户了解GSA的各项功能,提供更高效的数据服务。

关键词: 组学原始数据归档库(GSA), 组学大数据, 数据汇交, 数据共享

Abstract:

The Genome Sequence Archive (GSA), a new data repository for raw sequence reads in China, has been developed in compliance with the International Nucleotide Sequence Database Collaboration (INSDC) standards. It supports data generated from a variety of sequencing platforms ranging from Sanger sequencing to single-cell sequencing and provides data storing and sharing services freely for worldwide scientific communities. Since it went online in late 2015, GSA has archived more than 500 TB data and been acknowledged by many high-profile journals, including Cell, Nature, PNAS, GPB, etc. Focusing on omics data submission, storing and sharing typically for Chinese users, GSA promotes the initiative of the National Bioinformatics Center of China. This paper introduces the specifies of GSA as data collection, curation, management and exchange to facilitate users to understand and use GSA database.

Key words: Genome Sequence Archive (GSA), omics data, data submission, data sharing