前阵子还为人类到底是4-6万个基因还是10万以上个基因争论不休,最新对第6条染色体的精确测序和生物信息学分析时又有许多新的发现,发现了大量过去没有发现的信息,也许今后在其它染色体上也会有更多的发现。第6条染色体精确测序一文发表在Nature 425, 805 - 811 (23 October 2003); 试一试下载全文(有的地方可以的):Summary | Full text | PDF (469k) | N&V | Supplementary Information (BIOON)。
Nature 425, 775 - 776 (23 October 2003); doi:10.1038/425775a
Genomics: Six is seventh
JANE GRIMWOOD AND JEREMY SCHMUTZ
Jane Grimwood and Jeremy Schmutz are at the Stanford Human Genome Center, 975 California Avenue, Palo Alto, California 94304, USA.
e-mail: jane@shgc.stanford.edu;jeremy@shgc.stanford.edu
The finished sequence of human chromosome 6 reveals an abundance of biological information previously buried within the draft of the human genome, and illustrates the increasing power of comparative genomics.
Monday 14 April 2003 was a momentous day. It was the culmination of a project that had consumed the professional and personal lives of an international group of researchers for as far back as most could remember. The Human Genome Project had surpassed its goals with the early completion of the 'finished' human genome: 2,825 million base pairs of DNA sequence, finished to exceptionally high quality and accuracy. The human genome is divided into 24 chromosomes (Fig. 1), each unique in its composition and contribution to the biological basis of being human. With the sequencing phase complete, each of the chromosomes is now being annotated, a process that involves an in-depth manual review and curation of the gene content. Six of the chromosomes (22, 21, 20, 14, Y and 7) have already been fully analysed and the results published. On page 805 of this issue1, Mungall et al. describe the DNA sequence and analysis of the seventh completed and fully analysed chromosome, human chromosome 6.
Figure 1 The big picture — the paired chromosome complement of a normal male, with 22 non-sex-determining chromosomes and an X and a Y chromosome. Full legend
High resolution image and legend (53k)
This publication marks the end of a mapping, sequencing and finishing endeavour that began systematically in September 1996 at the Sanger Institute, Cambridge, UK. The finishing phase involves turning a rough draft sequence into a highly accurate finished sequence, with no sequence gaps and a defined error rate. Chromosome 6 weighs in at 166,880,988 base pairs, or more than 166 megabases, making it the largest human chromosome to be completely analysed and published so far. The finished sequence, comprising a little less than 6% of the total human genome, is in nine pieces, each divided by a small gap, or missing segment. All but two of these missing segments are located in regions of highly repetitive DNA near the centre or at the very tips of the chromosome arms. Mungall et al. estimate that they have captured 99.5% of the chromosome's euchromatin, the portion of the chromosome that excludes these repetitive areas.
What have we learned about the landscape of chromosome 6? Mungall et al. describe 2,190 gene structures, with evidence that 1,557 of them are functional genes and 633 are pseudogenes (inactive genes). Of the 1,557 that are predicted to be active or functional, only 772 have been previously described — a proportion comparable to that of the other published human chromosomes.
Many of the statistics for chromosome 6 are typical of the genome as a whole2. For example, the average gene density per megabase is 9.2 (the genome average is about 10). And the percentage of repeated sequence is 43.95% (genome average 44.8%). Other attributes of chromosome 6 make it stand out from the crowd, however. They include the largest known cluster (157 genes) of transfer RNAs, which are involved in the translation of DNA sequence to amino acids, and the most polymorphic gene yet discovered — HLA-B, which shows unparalleled variation between individuals.
Chromosome 6 is best known as the home of the major histocompatibility complex (MHC), a region of 3.6 megabases that has an essential role in immune systems. Because of its importance and association with many common diseases, the complete DNA sequence of this region was published back in 1999 (ref. 3). But the MHC region should not overshadow the medical importance of several other genes on the chromosome. Examples are HFE, mutations in which cause hereditary haemochromatosis, a condition that affects 1 in 400 people and results in multi-organ dysfunction caused by increased iron deposition4; EPM2A, mutations in which cause a disease known as Lafora's myoclonus epilepsy; and PARK2, point mutations or deletions in which are responsible for a juvenile-onset form of Parkinson's disease. Gene abnormalities on chromosome 6 are also implicated — although not yet confirmed — as a contributory cause of several genetically complex diseases, including schizophrenia, epilepsy, cancer and heart disease.
Adding the base pairs of chromosome 6 to those of the six others that have already been finished gives a total of 500 megabases of fully analysed human sequence. This is only 17% of the 2.8 billion base pairs of the total sequence. What is the status of the remaining 17 chromosomes that have yet to be completed? One severely underestimated part of the process is the difficulty of fully annotating a finished chromosome5. Various groups are still striving to deliver the most complete snapshot of the chromosomes within the limits of current gene-prediction science. In addition, although the original goals of the Human Genome Project were exceeded in finishing 99% of the sequence, work is continuing on the 'exceptional duplicated regions'6. These are chromosome segments that have been recently duplicated and are believed to be the birthplace of new genes, but classical chromosome mapping techniques have proved ineffective in teasing them apart.
However, full analyses of the remaining 17 chromosomes should be well worth the wait. As their description of chromosome 6 shows, Mungall et al.1 have been the first to be able to fully apply the cross-species comparative power that is now on offer as the DNA sequences of other organisms become available (Fig. 2). Assembled draft genomes of the mouse, rat, Tetraodon (spotted green pufferfish), Fugu (tiger pufferfish) and zebrafish are in hand, and can now be used to search for regions of similarity or conservation between different organisms. This comparative approach allows refined predictions of which stretches of DNA are actually genes, and a more sophisticated interpretation of the underlying genomic data. The power of comparative genomics will grow as the genome sequences of the chicken, chimpanzee, frog, dog and cow, already in the production queue, become available.
Figure 2 Comparative benefits. Full legend
High resolution image and legend (27k)
Several other large-scale projects also have the aim of understanding how life functions at the biomolecular level. The newest of them, ENCODE (for 'Encyclopedia of DNA Elements')7, involves a microscopic examination of 30 million bases of the human genome (about 1%). The aim is to identify all of the functional elements and to create a type of Peterson Field Guide to DNA. This electronic book will contain descriptions, examples and identifying characteristics of each type of known DNA component; these can then be used to scan the rest of the genome and other genomes for similar pieces of DNA sequence that might have a similar function. The ENCODE project encompasses both protein-coding elements and non-coding functional elements that do not themselves code for protein but instead regulate the production or expression of proteins from nearby genes.
Another project, MGC (the 'Mammalian Gene Collection')8, is attempting to amass at least one full-length messenger RNA transcript for each gene in the human and mouse genomes. Zebrafish and frog mRNAs are also now being included in this collection. These transcript sequences are vital for the continuing annotation of the human genome, in that they provide a complete picture of the entire gene after it has been spliced from the genome. Moreover, the mRNA clones can be introduced into a foreign host cell and made to produce a specific protein. These proteins can then be tested for functional activity. Similar projects are continuing world-wide, providing resources from a variety of organisms. All will help in interpreting the mysteries still buried in the DNA sequence of the human genome.
References 1. Mungall A. J. et al. Nature 425, 805-811 (2003). | Article | PubMed |
2. The International Human Genome Sequencing Consortium Nature 409, 860-921 (2001). | Article | PubMed | ISI | ChemPort |
3. The MHC Sequencing Consortium Nature 401, 921-923 (1999). | ISI |
4. Feder, J. N. et al. Nature Genet. 13, 399-408 (2001).
5. Ashurst, J. L. & Collins, J. E. Annu. Rev. Genomics Hum. Genet. 4, 69-88 (2003). | Article | PubMed |
6. Eichler, E. E. Genome Res. 11, 653-656 (2001). | Article | PubMed | ISI | ChemPort |
7. http://www.genome.gov/10005107
8. http://mgc.nci.nih.gov