By Robert Hazen, George Mason University
The human genome is vastly complex. However, we can get a better sense of the human genome and how it works by looking at a much simpler genome, that of a virus. A virus is nothing more than a strand of DNA—or in some cases, a strand of messenger RNA in what are called retroviruses—and that’s surrounded by a coating of protein.

What Is a Virus?
Viruses are not composed of cells, and they can’t survive independent of a host cell. That gives us a hint that maybe viruses really shouldn’t be counted quite like a living cell, in the category of these biological entities. Nevertheless, under the right environment, viruses can reproduce, and they actually evolve.
However, they do have some distinct properties which help define what they are. First of all, a virus always requires a host cell to perform virtually all of its functions. Without a cell, the virus can’t do anything, for example, reproduce. Viruses are not capable of independent metabolism; they require energy from a cell. Viruses are much smaller than the smallest cell, so they can easily enter a cell and inject their genetic material into a cell. Viruses possess either DNA or RNA; they have to have some kind of genetic material, and that carries what is called the viral genome.
In retroviruses, that’s the messenger RNA, and that messenger RNA strand is converted to DNA by a special enzyme called reverse transcriptase. It assumes great significance when we talk about genetic engineering; this is a specific enzyme that takes RNA and makes DNA out of it.
This is a transcript from the video series The Joy of Science. Watch it now, on Wondrium.
The Workings of a Virus
When taken into a host cell, the virus disintegrates. The protein coat just dissolves and leaves that genetic material exposed. That viral nucleic acid then takes control of the cell’s genetic machinery, and it starts making hundreds of new viruses before the cell dies.
Viruses essentially work by fooling the host cell into thinking that they’re food. A receptor sees the virus protein coat and identifies it as food. When the receptor opens the door to the virus’s nucleic acid, that protein is attached to a complete virus. The cell’s polymerase then replicates the viral DNA over and over again and makes hundreds of copies of the viral DNA; at the same time, the cell’s machinery starts making copies of all the proteins, the protein coats that make up the viruses.
Essentially, the cell is co-opted, making more and more and more copies of the virus, until eventually the cell bursts and dies, flooding the surrounding cells with new copies of the virus.
Simian Virus 40

Looking at a typical viral genome, we can see how the genetic code works. When we observe a simian virus 40, or SV40, which gives monkeys a kind of 24-hour flu, we can see that its genome consists of 5,243 base pairs in a closed loop of DNA. This closed loop of DNA is very typical of small cells, and also of viruses.
That SV40 genome contains exactly five genes. Three of them form the protein coat, and two of those genes co-opt the cell’s machinery and instruct it to make more copies of the virus. The DNA loop has a starting signal to indicate where to start reading, it looks like TATA; it’s a combination of A’s and T’s.
The SV40 genome has a number of features that are also very typical of how genes are distributed on chromosomes, and they are really quite surprising. First, different genes are read in different directions around the loop—clockwise or counterclockwise, if you will. A second trait: one of the genes is split into two segments, in this case separated by about 500 base pairs. One has to start reading that gene, and then skip over 500 base pairs and continue reading it. That’s very typical, something called an intron: an extra piece of DNA that really doesn’t have any value. Every organism has this feature.
In addition, one can see that there are substantial segments of several of the proteins that overlap each other. That is, the genetic information to make proteins 1, 2, and 3 may overlap each other in large segments. That means that there are large segments of amino acids that are identical in three different proteins.
Viral DNA
There are numerous viral diseases that affect humans, and they’ve been the subject of intensive research. That’s one of the reasons why viral genomes are so well known. But unlike the DNA in our cells, viral DNA has no correction mechanism. You see, in every cell in our body, DNA is constantly being damaged, but the cell knows how to correct that. In a virus, there’s no mechanism to correct damage. If the DNA is damaged in a virus, it just stays damaged.
Therefore, viruses mutate extremely quickly, and viral diseases are constantly changing from year to year. There are more than 100 different strains, for example, of rhinovirus, which causes the common cold. Immunity for any one strain, therefore, doesn’t prove effective necessarily for other strains, so year after year one may always get colds.
Viral Diseases
Humans don’t develop immunity, because new types of colds are commonly coming into vogue, if you will. Similarly, every year there appear new strains of the influenza virus. That’s why physicians try to anticipate the mutations of the flu virus, and give you a flu shot which has the type of virus that’s thought to be coming into prominence in that particular flu season; but sometimes they’re wrong.
Then there’s HIV—that’s the retrovirus that causes AIDS. It attacks the human immune system, so that the body actually loses its ability to immunize itself against other infections. Ultimately, the cause of death in AIDS is not so much the HIV virus itself as it is the other diseases that exploit the body’s weaknesses.
Common Questions about the Genetic World of a Virus
Viruses are not capable of independent metabolism; they require energy from a cell.
One of the genes of the SV40 genome is split into two segments, in this case separated by about 500 base pairs. One has to start reading that gene, and then skip over 500 base pairs and continue reading it. That’s very typical, something called an intron: an extra piece of DNA that really doesn’t have any value.
HIV is the retrovirus that causes AIDS.