Guest post by Daniel Jepson and Chris Wood, writers for “Casey Extraordinary Technology”
In last week's article on epigenetics at Casey Research, we began with a brief discussion of the enormous expectations that were placed on the Human Genome Project such as, that its results would lead to the end of disease—and how those expectations ultimately went unfulfilled because of course, things are never that simple. More importantly, in this case, genes are only part of the story.
To quote briefly from that article:
Little did the community know at the time that the project [i.e., the Human Genome Project] would only uncover a small portion of what's really going on in our genome. They were only scratching the surface. What the architects of that project once dismissed literally as junk surrounding our genes is proving far more vital than anyone ever expected—in fact, it may hold the very keys to understanding evolution itself.
When scientists began the Human Genome Project, they were expecting to find approximately 100,000 protein-coding genes to account for the complexity of our species. What they found instead was that humans only have about 25,000, about the same number as fish and mice. In fact, according to biologist Dr. Michael Skinner, "the human genome is probably not as complex and doesn't have as many genes as plants do."
That's sort of a problem, because if we humans are supposed to be the complex species we hold ourselves out to be, then why don't we have as many genes as an oak tree? Maybe because genes are only part of the story.
That article went on to discuss how our epigenome—the second layer of structure above the genome, comprised of methyl groups and histones, that changes throughout our lives—can turn our genes on and off and control the degree to which they are expressed. Cool stuff—and a very important budding area of science.
But today we'd like to bring the focus back to the genome itself, more specifically to a study of the genome called ENCODE.
When the Human Genome Project was finished, all scientists really had was a linear sequence of three billion DNA base pairs—in essence, just a set of boring letters consisting of As, Gs, Cs, and Ts. What was needed was something to bring those letters to life and translate them into an instruction manual for actually building a person; then we'd be better able to understand the roots of disease and generate treatments.
It happened on September 5, 2012. That was the day when one of the most ambitious international science projects you may have never heard of revealed the fruits of its labour: a collection of 30 papers simultaneously published in the journals Nature, Genome Research, and Genome Biology. Taken together, they provided the results from a multiyear research endeavor—involving over 400 scientists from 32 labs around the world—known as the ENCODE Project.
ENCODE, or the "Encyclopedia of DNA Elements," was designed to pick up where the Human Genome Project left off. It sought to annotate the specific regions of the genome that are used in the various cells of the human body and to catalogue the biochemical products of this activity.
A key takeaway from the ENCODE project is that even though our genes only account for approximately 2% of our genome, the bulk of the rest of our DNA—which used to be called "junk DNA" because it was thought to serve no real purpose—actually performs crucial regulatory functions. Think of them as switches attached to a particular gene that determine whether or not it will be expressed.
Scientists have long been aware of such DNA configurations, but thought their number was on par with the number of genes. It turns out, however, that there are millions of such regions throughout the genome, linked to each other (and to the protein-coding genes) in an extremely complicated hierarchical network. (The metaphor of a "hairball of wires" was offered by one ENCODE scientist.)
But the goodies from ENCODE don't stop there.
"It was one of those too-good-to-be-true moments."
That's what Ewan Birney, a biologist and leading scientist from the ENCODE project consortium, had to say about one of the insights gleaned by the efforts of his team.
Back to the Human Genome Project for a moment. Much of the excitement that followed the project's completion a decade ago had to do with the notion that since we now knew how the genome was "supposed" to look, we could identify the genes whose mutations were responsible for certain diseases and devise an appropriate remedy.
As noted earlier, however, things aren't that simple. Genes are only part of the story. We know that from the results of studies that were designed to correlate genetic mutations with specific diseases (known as Genome-Wide Association Studies, or GWAS). In the majority of cases, it was found that disease-correlated DNA variants lay in the vast noncoding regions of the genome, rather than in the genes themselves. With limited understanding of the actual functional processes performed by this DNA, science has been largely unable to come up with an appropriate remedy in situations where the original DNA message has been altered.
But thanks to ENCODE, we may be on the way to overcoming this obstacle. A key finding from the project—the one that caught Birney's attention—was that many of the mutations associated with disease are located in DNA regions to which the ENCODE project was able to assign a specific functionality. In particular, many mutations were found to be located in areas of our DNA known as "promoter" and "enhancer" regions—sequences that, while not coding for protein themselves, are responsible for turning genes on and off within a cell. "[This] is a really big deal," said Bradley Bernstein, an ENCODE scientist. "I don't think anyone predicted that [this] would be the case."
So now a whole host of new possibilities for gene therapy will begin to open up. When we can identify the biological processes in the cell that result from a mutation, it becomes much more likely that we can formulate an effective treatment. ENCODE has already identified several hundred regions of DNA that should be of interest to researchers studying specific diseases, and this number will only increase over the next few years as the huge amounts of data generated by the project continue to be analysed.
The project has also identified the function of many noncoding RNA molecules (i.e., RNA molecules other than messenger RNAs, which are an intermediate step in the creation of a protein). Casey Extraordinary Technology subscribers need no introduction to RNAi, an extremely exciting therapeutic technology that's based on a particular type of noncoding RNA known as small interfering RNA (siRNA).
But you may not have heard of a new approach that's appeared on the scene in recent years: microRNA therapeutics. MicroRNA (miRNA) is a close cousin of siRNA, and its implications for the biotechnology landscape are no less significant. Since their discovery little more than a decade ago, these little molecules have already been widely implicated in the development of several types of cancer: some miRNAs are overexpressed in cancer cells, while others are missing entirely. Not surprisingly, there has been a widespread effort to leverage this insight into therapeutic remedies, and some miRNA-based products have already entered Phase II trials.
As biotech investors, we must remember that tomorrow's breakthroughs will result from events taking place around us today. In order to stay ahead of the market, we must be vigilant in identifying these causes before their effects have been fully brought to light. The ENCODE project, with its "too good to be true" moments, provides a good starting point. While it has received considerably less public fanfare than the Human Genome Project, for the alert investor it points the way toward a whole host of potential new breakthroughs.
* * * *
Chris Wood (right) is Chris is the senior analyst for Casey Extraordinary Technology, contributor to The Casey Report and serves as an editor of The Technology Investor.
Daniel Jepson (left) is a researcher at the Harvard University Department of Immunology and Infectious Diseases.
This post first appeared in the Casey Daily Dispatch.