"Gene" Definition(s) for Synthetic Biology

“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean – neither more nor less.” “The question is,” said Alice, “whether you can make words mean so many different things.” “The question is,” said Humpty Dumpty, “which is to be master—that’s all.”

Lewis Carroll (1832-1898) [pseudonym of Charles Lutwidge Dodgson] Alice’s Adventures in Wonderland [1865]



1. A dictionary definition – “The basic biological unit of heredity; a segment of deoxyribonucleic acid (DNA) needed to contribute to a function.” [http://www.ehsc.orst.edu/outreach/glossary.html]

2. A Darwinian definition –

“A gene is defined as any portion of chromosomal material that potentially lasts for enough generations to serve as a unit of natural selection.” Richard Dawkins, The Selfish Gene [1976]

3. A post-genome definition – “A DNA segment that contributes to phenotype/function. In the absence of a demonstrated function a gene may be characterized by sequence, transcription or homology.”

H.M. Wain, E.A. Bruford, R.C. Lovering, M.J. Lush, M.W. Wright, and S. Povey (2002) “Guidelines for human gene nomenclature,” Genomics 79, 464-470. [Human Genome Nomenclature Organization]

4. A post-ENCODE definition –

“A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.”

Mark B. Gerstein, Can Bruce, Joel S. Rozowsky, Deyou Zheng, Jiang Du, Jan O. Korbel, Olof Emanuelsson, Zhengdong D. Zhang, Sherman Weissman, and Michael Snyder (2007) “What is a gene, post-ENCODE? History and updated definition,” Genome Research 17, 669-681. ____________________________________________________________________________________________________________

The preceding definitions bear out the often-acknowledged fact that the word “gene” has no universally accepted definition – despite the efforts of countless textbooks to assert one. The paper by Gerstein, et al., cited above, represents a detailed and sophisticated analysis of this issue, including a documented history starting from the first use of the word in 1909.


The “dictionary” definition given above (1) captures the essential concept of a gene in simple language: a segment of DNA that encodes a function. It is unitary in the sense that a molecule is unitary: water is H2O - take away one of the H atoms and it no longer behaves as water. A gene consists of a string of nucleotides that together specify its functional role – take away one or more nucleotides and you alter the function. Note that a gene’s encoded function may be only one part of a more complex entity (such as a single subunit of a multi-subunit protein). Of primary importance, of course, is that a gene gets passed from one generation to the next as a “unit of heredity.” Note that the dictionary definition excludes the genes of RNA viruses like influenza and HIV. Those RNA sequences clearly conform to our notion of genes – though laypeople mostly recognize only DNA, but not RNA, as genetic material.


The “Darwinian” or evolutionary definition (2) fits the widely accepted concept of evolution acting at the level of the gene, rather than the traditionally assumed level of the organism (or the species). Richard Dawkins’ classic book, The Selfish Gene, elegantly lays out this idea (and should be read by anyone interested in a broad grasp of genetics and evolution!)


The sequencing of whole genomes (including the human genome) has revealed nature in all its magnificent (and maddening!) complexity (3), and the recently completed ENCODE project to examine thoroughly and annotate completely, selected parts of the human genome (amounting to 1% of the total) pushes that perspective to its ultimate limit (4). The bottom line is that no simple definition adequately describes all genes. Many genes come in pieces, separated by segments, called introns that get removed before the gene’s function is expressed. Some genes are situated in the introns of other genes. Many genes overlap each other either on the same or the opposite DNA strand. Some genes even consist of two parts that reside on different chromosomes!


It’s important to recognize that the term “gene” is not limited to specifying the DNA (or RNA) sequences that correspond to information that encodes protein sequences (ORFs or cds’s). Stable, functional RNA molecules such as transfer RNA, ribosomal RNA, ribozyme RNAs, etc., also derive from genes as do the plethora of newly discovered regulatory RNA molecules (such as RNAi). In fact, some DNA sequences that never get transcribed into RNA, but encode structural features important for the activity of the genome (for example, the telomeres that protect the ends of eukaryotic chromosomes) also qualify as “genes.”


Perhaps, like Humpty-Dumpty, researchers have to tailor the meaning of the word “gene” to fit the problem at hand and not worry too much about other people’s problems. In synthetic biology, however, the effort to simplify and understand genetic systems focuses on clearly defining one discrete functional element of the genetic material (a “part”) and then combining it with others to create devices and systems that function in a pre-designed fashion inside living cells. An approach synthetic biologists have begun to take is to remove as much as possible of the sequences in a natural genetic system that have no apparent function in order to streamline it and make it amenable to further engineered changes. The unitary functional elements left behind by this process constitute “genes,” though in synthetic biology we usually call them "parts."


Note that the need of synthetic biologists to work with well-defined parts forces us to revisit the definition of “gene” and emphasize its basis in function. We also see that the tendency of biological systems to overlap and merge functions – evident at the macroscopic level in anatomical features - also confronts us on the molecular scale. – SCMohr (3/13/08)