生物的基因组是由A,T,G和C四个核苷酸组成的。科学家已经发现一个基因组不同区域的定长核苷酸串(譬如双核苷酸串有6个,分别为AT,AG,AC,TG,TC和GC)基本一致。最近,佐治亚大学系统生物学实验室的研究科学家周丰丰博士发现,通过将一个基因组不同区域的所有定长核苷酸串的出现频率映射为不同颜色的图形方式,可以非常直观有效的表现出以上特征(见下图)。
该特征被称为一个基因组(或者一条染色体)的条形码。进一步分析表明,同一个物种不同染色体的条形码互相比较相似,不同物种的基因组(或者染色体)的条形码有一定的差别。真核生物、原核生物、叶绿体和线粒体的条形码可以非常清晰的分隔开。这些特性可以大大提高超基因组学(metagenomics)的分类研究。一个基因组(或者染色体)的条形码中可能存在一些具有不同条形码的区域。研究表明,这些区域可能是通过水平转移等机制从其他物种中得到的。
该成果发表于BMC Bioinformatics,在发表后不到半个月时间,已经被访问超过1100次。进一步的应用研究将于近期发布,具体信息请参见作者的个人主页:http://csbl.bmb.uga.edu/~ffzhou/(生物谷Bioon.com)
生物谷推荐原始出处:
BMC Bioinformatics 2008, 9:546doi:10.1186/1471-2105-9-546
Barcodes for genomes and applications
Fengfeng Zhou , Victor Olman and Ying Xu
Background
Each genome has a stable distribution of the combined frequency for each k-mer and its reverse complement measured in sequence fragments as short as 1000 bps across the whole genome, for 1<k<6. The collection of these k-mer frequency distributions is unique to each genome and termed the genome's barcode.
Results
We found that for each genome, the majority of its short sequence fragments have highly similar barcodes while sequence fragments with different barcodes typically correspond to genes that are horizontally transferred or highly expressed. This observation has led to new and more effective ways for solving two challenging problems: metagenome binning problem and identification of horizontally transferred genes. Our barcode-based metagenome binning algorithm substantially improves the state of the art in terms of both binning accuracies and the scope of applicability. Other attractive properties of genomes barcodes include (a) the barcodes have different and identifiable characteristics for different classes of genomes like prokaryotes, eukaryotes, mitochondria and plastids, and (b) barcodes similarities are generally proportional to the genomes' phylogenetic closeness.
Conclusions
These and other properties of genomes barcodes make them a new and powerful tool for studying numerous genome and metagenome analysis problems.