By Robert Hazen, George Mason University
The Human Genome Project is an epic, ongoing task. It’s still really in its infancy. We’ve only been doing this for a few decades and only about half of the human genes are even recognized at this point. Early in the 21st century though, at least part of this great scientific challenge may have been finally met. After many years of effort, scientists will have at last determined the full record of the human genetic information, that is, the human genome.
The Human Genome Project
Every human being has, what’s now estimated to be, about 80,000 genes. Each of those genes is a genetic instruction; it carries coded information to manufacture a protein. We have, in every cell of our body, the potential information to manufacture 80,000 different kinds of protein. One of the greatest challenges in modern biology is to identify each of those genes, and then to understand the structure and the function of the associated protein to each of those genes.
Yet, we don’t know with absolute certainty whether the number is really 80,000, or 100,000, or something else. As only a handful of the known genes are fully described, it prompted the need for the Human Genome Project. The Human Genome Project was coordinated from a base at the National Institutes of Health, in Bethesda, Maryland. It addressed this massive problem in molecular biology, and in information technology as well. The project had two main objectives: mapping and sequencing.
This is a transcript from the video series The Joy of Science. Watch it now, on Wondrium.
Mapping the Human Genome
The first of these goals was to provide a detailed map of the human genome. This is the distribution of genes on each of the 23 chromosomes in every human cell. One can imagine this as trying to come up with a road map, if you will.
Think of each chromosome as a long strand with one base pair after another, and along that strand, we have segments—sometimes broken, sometimes fragmented, but different segments that correspond to each of the different genes. It’s like plotting out a long interstate, and recording all the towns and villages that are along that interstate, and we do that for each of the 23 chromosomes. That’s mapping in the Human Genome Project.
The second goal of the Human Genome Project was to determine all three billion letters of the human genetic message. That’s called sequencing. That’s an automated process now, and new technologies have sped up sequencing that allow the Human Genome Project to be completed far ahead of schedule—and under budget!
The first complete human genome is going to be an eclectic collage of sequences from many different people. It was originally thought that maybe the first human genome should be James Watson’s genome, but it was decided that that really was not such a good idea, to make it just one white male; so people from all races, all parts of the world are participating, and different segments of different chromosomes are identified from these different people. That will be the first complete human genome.
Of course, eventually we’re going to have tens and then hundreds and then thousands: countless complete human genomes of many people, from many different parts of the world. We’re going to start getting more and more information, because everyone’s genome differs slightly. We’re probably 99.9 percent the same, but there’s that slight difference, and it’ll be fascinating to see what those differences are.
Mapping the human genome, the road map aspect, relies initially on the identification of distinctive short DNA sequences, typically 200 to 500 base pairs long, and these are like markers. These are things that are similar, if not identical, in everybody’s genome. These markers can be used to find our way around; they’re like mile markers on an interstate. This is an ongoing effort that requires combined skills of molecular biologists, who manipulate and analyze the DNA themselves, and also information experts, computer people, because one has to collect and collate huge amounts of information—three billion base pairs in one genome alone.
We have to search that data for recurrent patterns; look for genes, and then the function of the gene is also often related to certain common sequences that are shared among many different genes. Can you imagine finding two similar genes in a three-billion-base-pair genome? That’s sort of like trying to find two similar sentences in an entire encyclopedia! It’s a huge job, but with computers, we can do it. And we did.
We have already established tens of thousands of these road markers, or ‘sequence tags’ as they’re called. We can use this reference system, and identify locations along all the 23 human chromosomes. This has been done and we’re at the first stage. There were about 30,000 sequence tags by mid-1997; as many as 100,000 of these tags are eventually going to be produced, and then it will be very easy for researchers to locate specific segments of specific chromosomes, and continue with the work in a more systematic way.
Common Questions about the Human Genome Project
The first goal of the Human Genome Project was to provide a detailed map of the human genome. This is the distribution of genes on each of the 23 chromosomes in every human cell.
The second goal of the Human Genome Project was to determine all three billion letters of the human genetic message. That’s called sequencing.
We have already established tens of thousands of ‘sequence tags’. We can use this reference system, and identify locations along all the 23 human chromosomes.