By Kevin E. Noonan --
As reported earlier by a variety of news sources, the J. Craig Venter Institute published today the sequence of a complete diploid human genomic complement, fitting that of the eponymous Dr. Venter (below), on the Public Library of Science (PLoS) website. This sequence differs in several ways from previously-published "complete" human genomic DNA sequences. First, it was produced from a single individual, whereas the earlier sequences were from several (Dr. Venter's own earlier work included sequences from five individuals, including himself). Second, although this isn't the first individual genomic DNA complement sequenced (that honor, fittingly, went to James Watson), it is the first diploid sequence reported (that is, where there was an effort to sequence both chromosomes of an individual). This approach made it possible for the Venter group (and their collaborators) to compare DNA sequences from each of the chromosomes Dr. Venter inherited from his mother and father, and some of the results were unexpected.
When compared with the human reference sequence contained in the National Center for Biotechnology Information, Dr. Venter's DNA revealed more than 4 million DNA variants, over a million of which (30%) were novel (indicating a great deal of sequence variation in an individual). These variants included single nucleotide polymorphisms (SNPs, more than 3.2 million), substitutions of 2-206 basepairs (bp) (>50,000), heterozygous insertions or deletions of 1-571 bp (almost 300,000), homozygous insertions or deletions of 1-82,711 bp (more than 559,000) and 90 inverted sequences. Although the genetic variation that was not related to SNPs accounted for only 22% of the "events," the non-SNP variation comprised 74% of the variant bases. Confirming earlier genetic work by Richard Lewontin and others, 44% of the gene (protein-coding) sequences detected were heterozygous for one or more variants.
The authors took advantage of improvements in the technology, especially in the open source Celera comparison and alignment algorithms used to assemble the sequence (which was produced using Venter's random shotgun cloning/sequencing techniques that helped his team compete with the much larger effort of the Human Genome Project to produce the first human genomic sequence several years ago). These results provided about 120 megabases (Mbp) of sequence used to align earlier versions of the human genome or to fill in gaps in those sequences; 14 Mbp of these sequences were previously unreported.
When the frequency and position of the SNPs and insertion/deletion (indel) events were mapped, 42% of SNPs and 91% of indels were found to have been eliminated in coding regions, presumably by natural selection. This is particularly apt for the indels, most of which result in a frameshift in the coding sequence that would produce a non-functional protein. When the genetic diversity of the alleles from the two parental chromosomes were compared, 11,718 heterozygous and 9,434 homozygous coding SNPs were found, as were 236 heterozygous and 627 homozygous coding indels. The authors calculated that at least 17% (4,107/23,224) of Dr. Venter's genes contained nonsynonymous SNPs that would produce different proteins, and 44% of genes (10,208/23,224), or almost half, have at least one heterozygous variant in the either the untranslated (UTR) or coding regions that would affect the amino acid sequence of the encoded protein or gene expression due to differences in elements contained in the UTRs. This is likely to be an underestimate, since it does not include variation in non-coding regions involved in gene regulation (for example, promoter, enhancers, and other regulatory elements).
The authors estimate that there is a minimum of 0.5% variation between different diploid genomes (that is, between different individual humans). The significance of this result will be appreciated when it is realized that there is only an estimated 2% difference between human and chimpanzee genomic DNA sequences. This amount of human genetic variability is remarkable, and the authors admit that the amount of variability in heterochromatic regions "largely escaped analysis in this study."
These results are exciting for the prospect of better understanding human genetic variability, and in particular identifying variability important for phylogeny versus that important for intraspecies variability. However, this work also points out the long way there is to go for the long-awaited, "practical" benefits of the explosion of human genetic information obtained in the past decade (such as personalized medicine). It has been a hallmark of biotechnology that the ability to view the possible applications of the technology has exceeded the capacity to exploit those insights to expeditiously develop the applications. It will likely be many years before the insights gleaned from Dr. Venter's genomic DNA sequence will be truly useful; for now we can but marvel at the accomplishment and the underlying genetic complexity of one individual human.
Comments