Biologist John Mattick on Junk DNA, ENCODE, and Intelligent Design

John Mattick is a very well known professor of molecular biology at the University of Queensland in Brisbane, Australia (you can find his lab webpage here). As of January 2012, he is also the executive director of the Garvan Institute of Medical Research.

Mattick is best known for his work on elucidating the functions of non-coding RNA, an area in which he has published widely (for a list of some of these publications, go here).

You can see him in an interview on the importance and role of non-coding RNAs here.

Mattick recently published a paper in the HUGO Journal, responding to critics of last year’s ENCODE results (Mattick and Dinger, 2013). In the paper, he contests the argument of Graur et al. (2013) that lack of evolutionary conservation implies non-function — actually, there are many cases of known functional sequences that show no discernible evidence of sequence conservation (e.g., Vakhrusheva et al., 2013; Pang et al., 2006).

Mattick also briefly discusses the C-value enigma, referring to the absence of correlation among eukaryotes between biological complexity and genome size, with differences in genome size being due largely to differences in the amount of non-coding DNA. Some argue that this provides evidence for pervasive non-functionality in the human genome (e.g., Doolittle, 2013; Eddy, 2012).

Protozoans in particular possess very large genome sizes. For example, Amoeba dubia has a genome size of around 670 billion base pairs. Amoeba proteus has a genome size of around 290 billion base pairs. The size of the human genome is only about 3 billion base pairs. The number of genes also bears little relationship to an organism’s complexity. Humans (Homo sapiens), for instance, possess approximately 22,000 genes. Rice (Oryza sativa) has around 41,000 genes. The roundworm Caenorhabditis elegans possesses about 19,500 genes.

There are a number of ways in which the C-value paradox could be resolved, however, and it is likely a combination of a number of factors. We now know that through RNA splicing after transcription, a single gene can produce more than one protein. Humans, for instance, produce around 100,000 proteins from 22,000 structural genes. Thus, an organism’s complexity cannot be viewed as a simple function of the number of genes it possesses.

Second, there is in fact a relationship between biological complexity and the extent of gene regulation. Roughly 9% of genes in Homo sapiens encode transcription factors. In Drosophila melanogaster, only about 5.5% of genes code for transcription factors; 4.2% of the genes of Caenorhabditis elegans code for transcription factors; only 3.4% of genes code for transcription factors in the budding yeast Saccharomyces cerevisiae (see Table 2 of Messina et al., 2004). When coupled with an increased network of transcriptional enhancers and promoters, such a difference could result in a much larger set of gene expression patterns. This could lead to a non-linear increase in organismal complexity (e.g. see Levine and Tjian, 2003).

There are other factors to consider as well — for example, organisms with larger cell volumes (such as amoebas) tend to produce repetitive DNA, which serves structural purposes. As Thomas Cavalier-Smith explains, when cell size increases, “there is positive selection for a corresponding increase in nuclear volume; it is generally easier to achieve this by increasing the amount of DNA rather than by altering its folding parameters” (Cavalier-Smith, 2005). Another thing to consider is that time taken to transcribe long stretches of non-coding DNA such as introns can be of functional consequence (e.g. see Swinburne and Silver, 2011). There are thus so many different factors needing to be taken into account that it is difficult to make a watertight argument for junk DNA based on the C-value paradox.

Toward the end of his paper, Mattick weighs in on a potential source of motivation in the debate regarding the extent to which the human genome is functional. He writes,

“There may also be another factor motivating the Graur et al. and related articles (van Bakel et al. 2010; Scanlan 2012), which is suggested by the sources and selection of quotations used at the beginning of the article, as well as in the use of the phrase “evolution-free gospel” in its title (Graur et al. 2013): the argument of a largely non-functional genome is invoked by some evolutionary theorists in the debate against the proposition of intelligent design of life on earth, particularly with respect to the origin of humanity. In essence, the argument posits that the presence of non-protein-coding or so-called ‘junk DNA’ that comprises >90% of the human genome is evidence for the accumulation of evolutionary debris by blind Darwinian evolution, and argues against intelligent design, as an intelligent designer would presumably not fill the human genetic instruction set with meaningless information (Dawkins 1986; Collins 2006). This argument is threatened in the face of growing functional indices of noncoding regions of the genome, with the latter reciprocally used in support of the notion of intelligent design and to challenge the conception that natural selection accounts for the existence of complex organisms (Behe 2003; Wells 2011).”

Of course, Mattick goes on to state that “This case is, moreover, entirely consistent with the broad tenets of evolution by natural selection, although it may not be easily reconcilable with current population theory and current ideas of evolutionary neutrality.”

Mattick himself is no proponent of intelligent design. But his willingness to state upfront that a common argument against ID has been “threatened in the face of growing functional indices of noncoding regions of the genome” (even providing a citation to Jonathan Wells’s book The Myth of Junk DNA, and an open letter to Nature by Michael Behe) deserves commendation.

Mattick continues,

“In any case, that our understanding of the remarkably complex processes underlying the molecular evolution of life, including the likely evolution of evolvability (Mattick 2009c), is incomplete should not be surprising. With the emergence of transformative technologies, such as massively parallel sequencing, which provide tools to view the inner molecular workings of the genome that were inconceivable less than a decade ago, it is as important as ever that we scientists remain open to observations that challenge even the most fundamental paradigms that exist within biology today.”

It is not every day that we encounter such a humble attitude from a scientist who is prepared to candidly acknowledge the substantial incompleteness in our understanding of the mechanics of evolution. Mattick’s scholarly attitude is one that up-and-coming scientists would do well to emulate.


Cavalier-Smith, T. 2005. Economy, Speed and Size Matter: Evolutionary Forces Driving Nuclear Genome Miniaturization and Expansion. Annals of Botany 95:147-175.

Dootlittle, W.F. Is junk DNA bunk? A critique of ENCODE 110(14):5294-5300.

Eddy, S.R. 2012. The C-value paradox, junk DNA and ENCODE. Current Biology 22:21:R898-R899.

Grauer, D. et al. 2013. Genome Biology and Evolution 5(3):578-590. 

Levine, M. and Tjian, R. 2003. Transcription regulation and animal diversity. Nature 424:147-151.

Mattick, J.S. and Dinger, M.E. 2013. The extent of functionality in the human genome. The HUGO Journal 7:2.

Messina, D.N., et al. 2004. An ORFeome-based Analysis of Human Transcription Factor Genes and the Construction of a Microarray to Interrogate Their Expression. Genome Research 14:2041-2047.

Pang, K.C., Frith, M.C. and Mattick, J.S. 2006. Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function. Trends in Genetics 22:1:1-5.

Swinburne, I.A. and Silver, P.A. 2008. Intron Delays and Transcriptional Timing During Development. Developmental Cell 14(3):324-330.

Vakhrusheva, O.A., Bazykin, G.A., and Kondrashov, A.S. (2013) Genome-Level Analysis of Selective Constraint Without Apparent Sequence Conservation. 2013. Genome Biology and Evolution 5(3):532-541.