It’s been nearly 15 years since the completion of the Human Genome Project. In the years that have passed, we’ve seen a windfall of scientific discovery and innovation. For decades, scientists have envisioned diverse applications for DNA and the technology we use to study it. And now, with the cost of DNA sequencing at an all-time low, we’re beginning to see some of these applications become reality. On April 25 of this year, otherwise known as DNA Day, we highlighted a few exciting trends in DNA, ranging from large-scale data sets to DNA-based storage. As we look to the future, it’ll be exciting to see how these trends evolve:
- Large, diverse data sets
- Combining data to bring human genetics into focus
- Sequencing ancient DNA to learn about history
- Early cancer identification using DNA sequencing technology
- DNA as a tool in the fight against climate change
- DNA—a powerful digital storage material
Large, diverse data sets
Genome-wide association studies (GWAS) have proven to be a powerful tool in genetics. This type of study looks for links between a person’s DNA sequence and their traits such as hair color, sleep preferences, and disease development. More specifically, GWAS look for these links in groups of people who show a range of different traits (like no hair, thinning hair, or a thick head of hair). Based on these patterns, scientists can start to understand what segments of the DNA contribute to making you, you.
As we’ve developed more advanced technology, we’ve gained the ability to sequence DNA faster and to gather broader data sets. In just the past few years, we’ve seen GWAS studies being performed on considerably larger cohorts—some involving hundreds of thousands of people. The more people studied in a GWAS, the more power researchers have to find variants with subtle effects. Databases like the UK Biobank and the Chinese Kadoorie Biobank make this possible by accumulating genetic and phenotypic data from thousands of people, and then making that data available to researchers. This trend is only continuing to grow as these large scale studies begin to gather multimodal data—the aggregation of information about a person’s traits, their genetics, and other data including features of their environment or medical history. Using multimodal data, researchers have an opportunity to learn about how a person’s environment may combine with their genetics to give rise to specific traits.
Combining data to bring human genetics into focus
Another exciting trend is in the use of polygenic scores to determine how likely it is that a person will develop complex traits, like male pattern baldness or breast cancer. In short, a polygenic score is simply an algorithm—one that adds up the impact of multiple variants. A PGS can analyze just a few variants, or it can consider millions. The key is that the PGS analyzes multiple variants in aggregate. We used to think of genetics as though one gene determined one trait. But that’s not entirely accurate. It’s true that some traits are based on just the presence or absence of a single variant, like whether you can taste a specific bitter compound called PROP or whether you have cystic fibrosis. But the vast majority of traits are actually influenced by many variants, and are therefore best estimated with this type of approach.
So why should you care about polygenic scores? PGSs are becoming more common and are changing the way we think about disease risk. Previously, if a woman didn’t have a variant in known breast cancer genes, there wasn’t much that could be reported about her breast cancer risk, other than estimates based on her family history. However, if a polygenic score is available, every woman can receive risk information based on the aggregate impact of multiple variants in the form of a PGS. This approach may allow everyone to have a better sense of their own genetic risk for certain traits or diseases. The continued development of polygenic scores will likely change how we talk about genetics and disease.
Sequencing ancient DNA to learn about history
You can learn a lot about history by studying the DNA of modern humans, but imagine what you can learn from the DNA of ancient humans! In 2010, a lock of hair from a 4,000-year-old Greenlander was used to generate the first ever complete genomic sequence of an ancient human1.
We’re likely to gain an unprecedented view into the past
Previously, scientists relied on finding and studying artifacts—pottery, tools, burial grounds—from ancient humans to learn about their society and migration patterns. As we learned more about the human genome, scientists figured out how to use patterns in our DNA to study the course of human evolution and migration throughout history. Naturally, there was desire to analyze ancient DNA as well, but technological limitations prevented researchers from gathering enough material to actually sequence the entire genome. Instead, they were limited to studying small segments of DNA and the mitochondrial DNA. Thanks to some critical advances in our ability to amplify small amounts of DNA and improvements in sequencing technology, the field of ancient genomics—the study of DNA from ancient humans—has taken off. As a result, we’re likely to gain an unprecedented view into the past2.
Since 2010, more than 1,000 ancient DNA samples have been analyzed. Results from these studies help us identify migration patterns which can be used to understand how languages and technology may have passed from one culture to another. This field of study shows no sign of slowing down, either. In 2010, a total of 5 ancient genomes were sequenced. In 2015, 287 ancient genomes were studied. And in 2017, 686 ancient human genomes were analyzed. It will be fascinating to see what we learn about human history, and what this information can tell us about ourselves today2.
Early cancer identification using DNA sequencing technology
Early detection of cancer is top of mind for many researchers and companies. They know that detecting cancer early can improve a patient’s chances of survival—perhaps as much as 5-10 times compared to that of a late-stage diagnosis. In practice, however, it’s extremely challenging to determine if a person has a medically relevant tumor until it’s advanced to a larger and more threatening stage. Thanks to advances in DNA sequencing technology, there is a new and powerful tool that may be able to identify patients with early stage cancer and help direct therapeutic strategies3.
Cancer is a complex disease that involves the transformation of a normal cell into a cancerous cell. Through a series of mostly random steps, cancer arises when mutations in the DNA cause a cell to start growing uncontrollably—one cell becomes two, becomes four, and so on. As they develop, tumor cells begin to accumulate more and more mutations in their DNA, which causes this DNA to look distinct from the DNA in a normal cell. This is important because the DNA of a tumor cell diverges from that of a normal cell before the cancer has grown to a detectable size. If we can detect this divergence while the tumor is still very small, we may have a chance to detect cancer at a very early stage and prevent its growth3.
Tumor cell DNA can be found circulating in the blood. It’s been suggested that this may be due to the fact that some tumor cells die during cancer development, and the DNA from these dead cells can be washed away into the bloodstream. Scientists are seizing on this fact as a way to identify developing tumors. By collecting DNA from the blood and using next generation sequencing, researchers are able to sequence the circulating DNA and build a picture of the cell that originally had that DNA. Scientists at Grail are piloting a study involving 10,000 participants in an effort to identify what circulating tumor DNA looks like in comparison to normal DNA that may be circulating in the blood. Using this information, the researchers will build and test models for accurately differentiating patients with cancer from those without. Efforts like this may ultimately lead to the establishment of a sensitive, cost effective, and life saving method for early cancer identification3.
DNA as a tool in the fight against climate change
Having the ability to sequence DNA in a rapid and cost-effective manner has opened up new possibilities for petro-based materials like oil, plastics, paints, and adhesives. There is a significant global effort to identify and create alternative sources that are more environmentally friendly compared to petroleum. Oil—or petroleum—is a natural material that consists of hydrocarbon molecules. Chemically, there is a significant amount of energy stored in these kinds of molecules, which is why we use them as fuels. It turns out that many organisms, ranging from bacteria to plants to humans, are capable of producing hydrocarbons that are similarly capable of storing high amounts of energy. For decades, researchers have been looking for ways to turn single-celled organisms like yeast and microalgae into microscopic fuel factories4.
NGS may help develop biofuels that can compete with oil
As promising as it is, this technique has significant challenges to overcome before it is a viable option. One of the big hurdles has been identifying how microorganisms produce the desired fuel molecules, and figuring out how to manipulate their biology to make it a cost-effective process. We know that an organism’s DNA will code for specific enzymes that work together to build these energetic molecules, and exactly what molecule is made depends on the enzymes that are present. Therefore, in order to push microbes to produce the desired molecular targets, we have to figure out the right combination of enzymes that are required for its production. To do this, scientists are turning to next generation sequencing. Advances in DNA sequencing technology have it made it possible for researchers to sequence the DNA of many organisms4.
Using this information, they can identify the proper DNA sequence that leads to a cohort of biofuel-producing enzymes—often times taking newly sequenced enzymes from different plant species and combining them in the microbial production host to empower it with new production abilities. This kind of research may help researchers optimize energy production, and make biofuels economically competitive with oil4.
DNA—a powerful digital storage material
Your genome is made up of 3.2 billion base pairs which code for more than 20,000 different genes. That’s a lot of information, and all of it is packed into the microscopic space of a cell nucleus—and there’s two copies of it stored there! This incredible information density has led computer scientists, geneticists, biochemists, and others to come together to explore ways of using DNA as a storage medium5.
There is a significant amount of data flying around the internet. It’s estimated that by 2020, there could be as much as 44 trillion gigabytes of data in existence. All of that needs to be stored in some physical location. Currently, most of this data resides on magnetic storage and silicon, but DNA could offer an entirely new level of storage density5.
Scientists are currently refining techniques for coding information with DNA nucleotides. Prototype systems have assigned binary values to the nucleotides so that they represent either a 1 or 0 (some systems have also tested a base-three system). With this method, it could be possible to store all of the world’s digital data using about a kilogram of DNA! Scientists have already successfully coded—and recalled—various written texts and images (including a picture of a cat)5.
DNA is an exciting storage material for multiple reasons. Aside from a high information density capacity, it also has significant potential as a long-term information storage material: DNA is incredibly stable, as evidenced by our ability to sequence DNA from animals that are hundreds of thousands of years old. There are still challenges ahead that will need to be overcome before DNA can be practically applied in this way, but it’s promising. Scientists will have to figure out how to reproducibly embed information without errors, how to speed up the process of writing code with DNA, and how to it all in a cost-effective manner5.
1Rasmussen, Morten et al. “Ancient Human Genome Sequence of an Extinct Palaeo-Eskimo.” Nature 463.7282 (2010): 757–762. PMC. Web. 20 Apr. 2018.
2Callaway, Ewen. “Divided by DNA: The Uneasy Relationship between Archaeology and Ancient Genomics.” Nature, vol. 555, no. 7698, 2018, pp. 573–576., doi:10.1038/d41586-018-03773-6.
3Aravanis, Alexander M., et al. “Next-Generation Sequencing of Circulating Tumor DNA for Early Cancer Detection.” Cell, vol. 168, no. 4, 2017, pp. 571–574., doi:10.1016/j.cell.2017.01.030.
4Liao, James C., et al. “Fuelling the Future: Microbial Engineering for the Production of Sustainable Biofuels.” Nature Reviews Microbiology, vol. 14, no. 5, 2016, pp. 288–304., doi:10.1038/nrmicro.2016.32.
5Extance, Andy. “How DNA Could Store All the World’s Data.” Nature, vol. 537, no. 7618, 2016, pp. 22–24., doi:10.1038/537022a.