卡内基·梅隆大学的计算生物学家开发出了一种分析技术,用于检测对于糖尿病、哮喘和癌症等具有多种临床和分子特征的复杂疾病综合征有贡献的多重遗传变异。
与每次寻找一种导致特定症状和特征的遗传变异的方法(大多数传统方法就是这样做的)不同,卡内基·梅隆大学的科学家使用了一种统计方法,这让他们可以发现造成复杂疾病的整个基因调控网络或特征背后的基因组变异。
Eric P. Xing 教授和博士后科学家Seyoung Kim今天在网上出版的《公共科学图书馆·遗传学》杂志上报告说,他们的graph-guided fused lasso (GFlasso) 方法在检测与复杂综合征有关的基因变异方面优于其他方法。在一项测试中,GFlasso成功地探测到了一种已知涉及到严重哮喘的基因变异以及额外的两种此前未与该病有联系的基因。Xing和Kim说,需要对这两种变异进行更多研究从而证实这种联系。
“我们知道困扰人类的一些最常见和最严重的疾病不是由单一遗传突变造成的,而是许多遗传和环境因素的组合,”机器学习、语言技术和计算机科学副教授Xing说。“让情况复杂化的是大多数复杂疾病有大量临床特征,诸如各种症状、身体特征和家族史,而且全基因组的基因表达谱分析可以发现上万种疾病有关的分子特征。”
通常,其中许多的特征是有相关性的。例如,高血压和高体重可能共享一些同样的遗传因素。Xing 说,如果某人每次一对一地测试每种基因变异和每种特征(传统方法就是这样做的),测试的数量及其庞大,而且关于相关特征的遗传因素的信息没有被正确地使用,导致了统计检验功效的丧失。“因此我们不太可能一次一个基因一个特征地揭示出诸如癌症、糖尿病和哮喘等疾病的根本原因,”他说。“相反,我们需要诸如GFlasso等工具,从而让我们可以寻找基因网络和临床特征之间的相关性。”
例如,严重哮喘拥有超过50个临床特征,其中一些与环境或活动程度有关,一些与气喘和胸闷有关,而另一些与肺部生理状况有关。Xing和Kim在《公共科学图书馆·遗传学》的这篇论文中指出,其中一些特征相互高度相关,这提示它们具有一种共有的遗传基础。他们的这种技术通过联合分析这些高度相关的特征从而利用了它们。这种方法还有助于检测一些遗传变异,如果没有这种方法,这些遗传变异就可能被遗漏,因为它们具有对于任何单独特征相对难以捉摸的影响,但是这些变异很重要,因为它们对于一些相关特征有贡献。
“这种方法将提供对于复杂疾病的更全面的遗传和分子视角,”Xing说。“因此我们可以发现在疾病过程背后的基因,理解基因在确定疾病的严重性方面的作用,并研发诊断疾病的改良手段。”
Xing是卡内基·梅隆大学Ray 与Stephanie Lane计算生物学中心的成员,作为受到国立卫生研究院支持的正在进行的一项研究的一部分,他正在与匹兹堡大学医学院以及哈佛大学医学院的同事合作使用GFlasso研究严重哮喘。(生物谷Bioon.com)
生物谷推荐原始出处:
PLoS Genet 5(8): e1000587. doi:10.1371/journal.pgen.1000587
Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network
Seyoung Kim, Eric P. Xing*
School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.