遗传 ›› 2008, Vol. 30 ›› Issue (12): 1640-1646.doi: 10.3724/SP.J.1005.2008.01640

• 技术与方法 • 上一篇    下一篇

基因芯片筛选差异表达基因方法比较

单文娟; 童春发; 施季森

  

  1. 南京林业大学 国家林业局、江苏省林木遗传和基因工程重点实验室, 南京 210037

  • 收稿日期:2008-03-19 修回日期:2008-08-31 出版日期:2008-12-10 发布日期:2008-12-10
  • 通讯作者: 施季森

Comparison of statistical methods for detecting differential expres-sion in microarray data

SHAN Wen-Juan; TONG Chun-Fa; SHI Ji-Sen   

  1. The Key Laboratory of Forest Genetics and Gene Engineering of the State Administration and Jiangsu Province, Nanjing Forestry Univer-sity, Nanjing 210037, China
  • Received:2008-03-19 Revised:2008-08-31 Online:2008-12-10 Published:2008-12-10
  • Contact: SHI Ji-Sen

摘要:

摘要: 使用计算机模拟数据和真实的芯片数据, 对8种筛选差异表达基因的方法进行了比较分析, 旨在比较不同方法对基因芯片数据的筛选效果。模拟数据分析表明, 所使用的8种方法对均匀分布的差异表达基因有很好的识别、检出作用。算法方面, SAM和Wilcoxon秩和检验方法较好; 数据分布方面, 正态分布的识别效果较好, 卡方分布和指数分布的识别效果较差。杨树cDNA芯片分析表明, SAM、Samroc和回归模型方法相近, 而Wilcoxon秩和检验方法与它们有较大差异。

关键词: 杨树, 基因芯片, 差异表达

Abstract:

Abstract: DNA microarray is a new tool in biotechnology, which allows simultaneously monitoring thousands of gene expression in cells. The goal of differential gene expression analysis is to detect genes with significant change of gene ex-pression levels arising from experimental conditions. Although various statistical methods have been suggested to confirm differential gene expression, only a few studies compared performance of the statistical methods. This paper presented comparison of statistical methods for finding differentially expressed genes (DEGs) from the microarray data. Using simu-lated and real datasets (Populus cDNA microarray data), we compared eight methods of identifying differential gene ex-pression. The simulated datasets included four differential distributions (normal distribution, uniform distribution, c2 distri-bution, and exponential distribution). The results of simulated datasets analysis showed that the eight methods were more preferable with the microarray data of uniform distribution than normal distribution. They were not preferable with the c2 distribution and exponential distribution. Of these eight methods, SAM (Significance Analysis of Microarrays) and Wil-coxon rank sum test performed well in most cases. The results of real cDNA microarray data of Populus showed that there was much similarity of SAM, Samroc, and regression modeling approach. Wilcoxon rank sum test was different from them. Samroc and regression modeling approach were similar in the eight methods. For both simulated and real datasets, SAM, Samroc, and regression modeling approach performed better than other methods.