美国费城儿童医院一个由Hakon Hakonarson主持的研究小组将一种计算机程序——全基因组关联研究,应用到基因标记物上,与传统的评估患1-型糖尿病概率的方法相比,该方法有更高的准确性。该技术或许能够应用到某些复杂的多基因疾病上,也将促进针对患者的基因特征开发出个性化的治疗药物。该研究报告发表在10月9日的PLoS Genetic杂志网络版上。
全基因组关联研究(Genome-wide association studies,GWAS)是一种自动基因分型工具,旨在从人类基因组中寻找致病的基因变异体,使医生能够准确预测出个体患某种疾病的可能性,从而达到早预防早治疗的目的。
据论文作者说,目前,许多疾病的致病主要基因仍然未被发现,而一些研究也只是有选择性的选取小部分基因变异体进行研究,所以研究结果有很大的局限性。在近期的一些研究中,研究人员通常利用曲线下面积 (the area under the curve,AUC)来评估患病率,AUC值一般在0.55~0.6之间,因此临床应用价值不大。
Hakonarson研究组拓宽基因变异体的研究范围,广泛的收集疾病的标记物(其中也包括许多未被证实的标记物),从而获得某个疾病相关基因之间的统计阈值,虽然这种方法不能排除假阳性的存在,但总体来说能够提高预测结果的准确性。
研究人员将该计算机程序应用到1-型糖尿病GWAS资料组,并建立了一个评估模型。与对照组相比,该模型评估的精确度显著提高,AUC达到0.8以上。此外,研究人员还强调,选择合适的疾病研究对象也非常重要,由于1-型糖尿病具有高度遗传性,其主要组织相容性复合体区域有许多疾病发病易感性基因存在。而且,这种疾病风险评估模型不适用于大规模基因扫描,而只适用于评估患一类疾病的患者。(生物谷Bioon.com)
相关阅读:
Nature Genetics:识别出22个影响血细胞发育的相关基因区域
Nature Genetics:发现牛皮癣易感基因
生物谷推荐原始出处:
PLoS Genet 5(10): e1000678. doi:10.1371/journal.pgen.1000678
From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes
Zhi Wei1#, Kai Wang2#, Hui-Qi Qu3, Haitao Zhang2, Jonathan Bradfield2, Cecilia Kim2, Edward Frackleton2, Cuiping Hou2, Joseph T. Glessner2, Rosetta Chiavacci2, Charles Stanley4, Dimitri Monos5, Struan F. A. Grant2,6, Constantin Polychronakos3, Hakon Hakonarson2,6*
1 Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, United States of America, 2 Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America, 3 Departments of Pediatrics and Human Genetics, McGill University, Montreal, Québec, Canada, 4 Division of Endocrinology, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America, 5 Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America, 6 Division of Genetics, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
Genome-wide association studies (GWAS) have been fruitful in identifying disease susceptibility loci for common and complex diseases. A remaining question is whether we can quantify individual disease risk based on genotype data, in order to facilitate personalized prevention and treatment for complex diseases. Previous studies have typically failed to achieve satisfactory performance, primarily due to the use of only a limited number of confirmed susceptibility loci. Here we propose that sophisticated machine-learning approaches with a large ensemble of markers may improve the performance of disease risk assessment. We applied a Support Vector Machine (SVM) algorithm on a GWAS dataset generated on the Affymetrix genotyping platform for type 1 diabetes (T1D) and optimized a risk assessment model with hundreds of markers. We subsequently tested this model on an independent Illumina-genotyped dataset with imputed genotypes (1,008 cases and 1,000 controls), as well as a separate Affymetrix-genotyped dataset (1,529 cases and 1,458 controls), resulting in area under ROC curve (AUC) of ~0.84 in both datasets. In contrast, poor performance was achieved when limited to dozens of known susceptibility loci in the SVM model or logistic regression model. Our study suggests that improved disease risk assessment can be achieved by using algorithms that take into account interactions between a large ensemble of markers. We are optimistic that genotype-based disease risk assessment may be feasible for diseases where a notable proportion of the risk has already been captured by SNP arrays.