By Jonny Lupsha, Wondrium Staff Writer
DNA database process used for crime-solving may help identify unknown soldiers. The U.S. Department of Defense (DoD) has tried for decades to verify the identities of the thousands of American troops who died in World War II but were unidentifiable. Genes hold our unique blueprints.

Since the beginning of World War II, efforts have been made to bring closure to the families of soldiers who never returned home to the United States. Most troops killed in action were identified through their dog tags, matching remains with on-file dental records, or other means. However, according to the official Defense POW/MIA Accounting Agency website, “Today, more than 72,000 Americans remain unaccounted for from WWII.”
Historically, family members of missing soldiers were asked to come forward to provide DNA samples to match to unidentified remains. Now, the DoD is considering approaching the problem from the opposite angle: uploading DNA from buried, unknown soldiers to a public genetics database and searching for matches. This approach has been used in the past to identify murderers in cold cases such as the Golden State Killer.
In his video series Understanding Genetics: DNA, Genes, and Their Real-World Applications, Dr. David Sadava, Adjunct Professor of Cancer Cell Biology at the City of Hope Medical Center in Duarte, California, explained how genetic testing is involved in forensics.
Genotype Analysis for Identification
As we know, DNA exists in every cell of the body and each person’s DNA is different. A DNA strand resembles a twisting ladder, in which each leg of the ladder is a backbone of alternating groups of sugar and phosphate. Meanwhile, each rung of the ladder is made up of paired chemical bases: adenine, cytosine, guanine, or thymine; abbreviated as A, C, G, and T, respectively.
Each of these bases prefers to be paired with one other: adenine and thymine bond with each other, as do cytosine and guanine. While all of us have these four pairings—A to T, T to A, C to G, and G to C—repeating countless times in our DNA, their order and the amount of each pairing is unique.
“Human genome sequencing has revealed that the genome contains short sequences, about two to 10 base pairs long, that are repeated many times in tandem,” Dr. Sadava said. “These are appropriately called short tandem repeats. Looking through the whole genome, there are these different short tandem repeats, and the repeat numbers are inherited.”
In one example Dr. Sadava gave, looking at one leg of the DNA ladder, we may find the sequence TCAT repeated five times at one location and seven times in another. Since the repeats are inherited, this means the sequence of five TCATs comes from one parent while the sequence of seven TCATs comes from the other parent.
“Suppose five [TCATs in a row] is present 50% of the time in the population and seven [TCATs in a row] is present 50% of the time in the population of the United States of America,” Dr. Sadava said.
This would mean that in order to find out how many people in the United States have both short tandem repeats—five TCATs in a row and seven TCATs in a row—in their DNA, you would have to multiply the odds of each occurrence. One half times one half is one quarter, or 25%. This may not narrow down the population much, but there are 13 short tandem repeat sequences used for DNA identification, some of which occur much less often.
By using the 13 short tandem repeats, DNA identification can narrow down someone’s identity to a single person. This may soon help thousands of families of unaccounted-for World War II soldiers to find closure.