Biology

The completion of the Human Genome Project will unlock the genetic basis of most diseases and enable personalized medicine in the near future.

Now we know:

The completed genome revealed far greater complexity than anticipated. Most diseases are polygenic and environmentally influenced. The 'one gene, one disease' model was vastly oversimplified. What was called 'junk DNA' - 98% of the genome - turned out to be regulatory and functional.

Disproven 2003

What changed?

On the morning of June 26, 2000, Francis Collins of the National Human Genome Research Institute and Craig Venter of Celera Genomics stood in the East Room of the White House beside President Clinton to announce a working draft of the complete human genome sequence. Clinton called it "the most important, most wondrous map ever produced by humankind." Tony Blair joined by satellite link from Downing Street. The occasion had been building for a decade, the Human Genome Project had formally launched in 1990 with the expectation that once the full sequence was in hand, the mechanisms of disease, inheritance, and human variation would come clear. The genome was the book of life, and the book was nearly read.

The completed sequence, officially declared finished on April 14, 2003, confirmed that the human genome contains approximately 3.2 billion base pairs of DNA and somewhere between 20,000 and 25,000 protein-coding genes, a number that startled researchers who had predicted between 80,000 and 140,000. The preliminary Nature paper published by the International Human Genome Sequencing Consortium in February 2001, with Eric Lander of the Broad Institute as lead author, showed that coding sequences amounted to less than two percent of the total genome. The other ninety-eight percent, long dismissed in the scientific literature as "junk DNA," a term coined by geneticist Susumu Ohno in 1972, sat there unexplained.

The term junk DNA was not merely dismissive; it encoded an assumption. If these sequences didn't code for proteins, they presumably did nothing. The one-gene, one-protein model of molecular biology, formalized through the work of Beadle and Tatum in the 1940s and refined through the central dogma of the 1950s and 1960s, had built the intellectual scaffold on which the genome project's promise rested. Sequence the genes; understand the proteins; cure the diseases. The logic seemed airtight.

What emerged instead was a picture of extraordinary complexity. Most human diseases turned out to be polygenic, influenced not by a single causative gene but by dozens or hundreds of variants, each contributing a small probability of effect, interacting with environmental exposures in ways that varied across lifetimes. Type 2 diabetes, schizophrenia, coronary artery disease, most common cancers: the hoped-for single-gene explanations materialized for only a small number of conditions. The genome-wide association studies that followed the project's completion found hundreds of disease-associated variants, nearly all of them of individually tiny effect, scattered across regions that often had no known function.

The "junk" was the story. Beginning in 2003 and culminating in the ENCODE Project Consortium's landmark 2012 publication in Nature, thirty simultaneous papers coordinated across 440 researchers, investigators demonstrated that approximately 80 percent of the genome is biochemically active in at least one cell type. The non-coding regions contain regulatory switches (enhancers and silencers), RNA genes, binding sites for transcription factors, and structural elements that determine how the genome folds in three dimensions inside the nucleus. Mutations in these regulatory regions turned out to cause disease as readily as mutations in coding genes. The genome was not a parts list. It was a wiring diagram, and the wiring turned out to be almost the whole thing.

The completion of the Human Genome Project will unlock the genetic basis of most diseases and enable personalized medicine in the near future.

What changed?

At a glance

Sources