No known molecule, including DNA, RNA, and proteins, can replicate itself; dozens of specific proteins are required to replicate a small bacterial genome. DNA, however long it is and however many genes it can encode, is nothing without the molecular machineries to decode (that is, to transcribe and translate) its encoded genes. Amazingly, each of the three domains of life (bacteria, archaea, and eukaryotes) has its own unique way of replicating its genome, defining whether a piece of DNA is a gene or not, whether an RNA is protein-coding or not, and where transcription or translation should start and end. This creates unbridgeable gaps in between bacteria, archaea, and eukaryotes and, thus, challenges the popular belief that life came from non-life naturally and that all organisms are connected via a big evolutionary tree of life.
Keywords: abiogenesis, origin of life, information coding and decoding, DNA replication, gene transcription, gene translation
As I mentioned in #1 and #2 of this series on Facts Cannot be Ignored When Considering the Origin of Life (Tan 2022a, 2022b), an astonishing discovery of molecular biology is organism-specific biological information coding and decoding systems. This was what started this author to inquire the origin of eukaryotes and the origin of life sixteen years ago. Biological information coding and decoding refers to DNA replication (the process that two copies of genomic DNA are made from one copy), gene transcription (making RNA from DNA templates), and gene translation (making proteins whose amino acid sequences are determined by the nucleotide sequences of RNA). It is impossible to fully address this discovery in one short article. This essay is merely a foretaste of what I hope to detail in the future. Specifically, a brief comparison of bacterial and eukaryotic DNA replication initiation, transcription initiation, and translation initiation will be presented to illustrate how the same vital tasks of replicating genomic DNA and transcribing and translating genes are implemented differently in different domains of life using proteins that are mostly unrelated in amino acids sequence.
A comparison of bacterial and eukaryotic DNA replication initiation
DNA replication is indispensable for the survival and reproduction of every organism, without exception. To compare and contrast bacterial and eukaryotic DNA replication initiation, I determined the distribution of homologs of proteins involved in the initiation of bacterial and eukaryotic DNA replication. A homolog of a protein of interest is commonly regarded as a protein shared a common ancestorial gene with the protein of interest, however here it is defined as a protein with some sequence similarity with the protein of interest, regardless its origin.
The results are shown in Fig. 1. Each row represents one specific gene, with its total homolog number next to the name of the gene. Each column represents a group of organisms with the numbers of species analyzed in that group listed underneath the name of that group. The number at the intersection of a row and a column is the number of species in the group of that column that contain homologs to the gene of that row. The percentage of species with homologs for a specific gene in a group are color-coded (red: 0, none species in the group contains a homolog for that gene; green: 100%, every species in the group contains a homolog for that gene; different shades of mixed red and green: frequency between 0 and 100%, the higher the frequency, the greener the color). Orphan genes, genes with no homologs in all the organisms (other than the reference organism itself) compared, are marked with red stars. Nearly orphan genes, genes with homologs in 1–5 species (other than the reference organism) of all species compared, are marked with blue stars.
Fig. 1 shows that a species that is colored green on one side (right or left with bacteria on the right, eukaryotes on the left, and archaea in the middle) is generally red on the other side. This demonstrates that most proteins used by bacteria to replicate their genomes do not have homologs in eukaryotes, and vice versa. Thus, bacteria use mostly bacteria-unique genes to replicate their genomes, and eukaryotes use mostly eukaryotes-unique genes to replicate their genomes.
This is evidence that the same task is implemented differently by the three fundamental cell types.
This is not what one would expect if bacteria and eukaryotes had shared a common ancestor since DNA replication is essential for the survival and reproduction of each and every known organism. In other words, these bacteria or eukaryotes-unique genes that are needed for DNA replication of their specific domains of life challenge the belief that eukaryotes evolved from bacteria, and the belief that all life forms are connected via an evolutionary tree of life.
A Comparison of Bacterial and Eukaryotic Gene Transcription Initiation
Like DNA replication, gene transcription is indispensable for the survival and reproduction of every organism. Bacteria use one type of RNA polymerase, an enzyme that synthesis RNA using DNA as a template, that is made of five different proteins to transcribe all genes encoded in their genomes. In contrast, eukaryotes use at least three RNA polymerases made of 12–17 different proteins to transcribe their genes, each polymerase responsible for a specific set of genes (fig. 2).
Strikingly, the bacterial RNA polymerase core needs to form a protein complex with one other protein, while the eukaryotic RNA polymerases, which are made of two to three times more proteins, needs to form a complex with multiple general transcription factors to locate gene promoters (fig. 3). Two of the Pol II transcription factors, TFIID and THIIH, are made of 14 and 10 different proteins, respectively. TFIIB is the only Pol II general transcription factor that is made of a single protein.
To ensure that each gene in a eukaryotic cell is expressed when needed, and only when needed, in the cells needed, at the levels needed, gene-specific transcription factors that bind DNA motifs in gene enhancers are needed. However, most gene-specific transcription factors are unable to directly bind, and thus regulate, the basal transcription machinery made of the RNA polymerase and the general transcription factors. This problem is solved in eukaryotic cells using a protein complex called mediator. The mediator does not bind DNA, but it interacts with the transcription factors that can bind DNA. In addition, it interacts, directly, with the basal transcription machinery.
The mediator alone is composed of more than twenty different proteins (the exact number of proteins varies in different organisms). The structures of Pol II pre-initiation complex (PIC) with mediator (except its kinase domain) from human and mouse have been solved (Chen, Yin et al. 2021; Zhao et al. 2021). The structure of the human complex, which contains 68 unique proteins, is shown in (fig. 4). With the inclusion of the mediator and the general transcription factors in the transcription machinery, Pol II itself appears so small and insignificant.
In short, bacteria and eukaryotes genes are defined and recognized differently. This is like Chinese and English, they use totally different alphabets, words, and grammars and need to be read differently.
Again, same task, but entirely different implementations.
This creates another challenge to the idea that life came from non-life, that complicated life evolved from simpler ones, and that all life forms are connected via an evolutionary tree of life.
A Comparison of Bacterial and Eukaryotic Gene Translation Initiation
Like DNA replication and gene transcription, gene translation is indispensable for the survival and reproduction of every organism. Gene translation occurs in ribosomes in all organisms. Bacterial ribosomes are made of fifty or so ribosomal proteins and three ribosomal RNAs, while eukaryotic ribosomes are made of about 80 ribosomal proteins and four or more ribosomal RNAs. Some of the bacterial ribosomal proteins are similar in amino acid sequence to eukaryotic ribosomal proteins and some are unique to bacteria.
Strikingly, the three domains of life—bacteria, archaea, and eukaryotes—each has its own peculiar way of determining whether a piece of RNA is protein-coding and, if it is protein-coding, where to start and to stop translation. Fig. 5 provides a comparison of bacterial and eukaryotic translation initiation.
Note that we only need three fingers to fully count the bacterial translation initiation factors (IFs) (fig. 6). To number the eukaryotic translation initiation factors (eIFs), we will need a six-fingered hand. Furthermore, the fingers need to have branches because we need to count the letters A, B, C, etc. at the same time. Note that the initiation factors whose names containing the same number but different letters are unrelated in protein composition or function, and that each bacterial initiation factor is made of one single protein, while many eukaryotic initiation factors are made of multiple subunits. For example, eukaryotic initiation factor 3, which is represented with an unbranched finger, is made of 6 to 13 different proteins (the number varies depending on the organisms).
As DNA replication and gene transcription, the number of proteins involved is not the key issue. The key issue is the identities of the proteins. Both bacteria and eukaryotes require their organism-specific proteins to translate their genes.
Once more, same task, but different implementations.
This creates yet another challenge to the idea that life came from non-life, that complicated life evolved from simpler ones, and that all life forms are connected via an evolutionary tree of life.
A Comparison of Archaeal and Eukaryotic Gene Transcription
It has often been said that archaea can function as intermediates between bacteria and eukaryotes (for example, (Gribaldo et al. 2010; Koonin 2015; Rochette, Brochier-Armanet, and Gouy 2014; Vesteg and Krajcovic 2011; Williams et al. 2013). However, whether eukaryotes are more similar to archaea or more similar to bacteria depend on what genes are compared. For instance, the eukaryotic information processing system is more similar to that of archaea than to that of bacteria, but the eukaryotic metabolic system is more similar to that of bacteria than to that of archaea (Tan and Tomkins 2015a; Tan 2017).
I examined the archaeal and eukaryotic information processing systems and found that they are distinct and unexchangeable (Tan 2017; Tan and Tomkins 2015a). For example, the largest subunit, known as Rpb1, of eukaryotic RNA polymerase II, has homologs in archaea (fig. 7). However, the eukaryotic protein has a unique C-terminal tail (fig. 7, red box) that has no archaeal counterpart. This C-terminal tail is necessary for eukaryotic gene transcription initiation, elongation, and termination. In other words, without the eukaryote-specific C-terminal tail, eukaryotic life would be impossible since its protein-coding genes would not be transcribed, and, consequently, its proteins could not be generated and its genomic DNA whose replication requires proteins could not be replicated.
Organism-Specific Cryptographic Keys
To borrow the language of cryptography, the inheritable genetic information that determines life or death of all living beings is encrypted. Furthermore, the information is encrypted in different ways in different organisms, one way for bacteria, another way for archaea, and yet another way for eukaryotes. These organisms have to use their own unique cryptographic keys to decipher their genomes. These organisms differ in whether they (not we human intellectuals!) regard a segment of DNA as a gene or not, whether a gene is protein-coding or non-protein-coding, where a transcription should start and end, and where a translation should start and end. Their cryptographic keys are their own RNAs and proteins present in their own cells and those that they can make themselves using their own molecular machineries.
For example, the same RNA transcript may encode totally unrelated proteins by a bacterial cell that uses Shine-Dalgarno sequence to identify its translation starting site and a eukaryotic cell that uses a scan mechanism to do that even if it does encode a protein (fig. 8).
Using an analogy to cryptography, the encrypted string of letters “NOT TO BE NOT” can mean quite different, even opposite, things depending on which cryptographic key is used (fig. 9)
Mutation and Natural Selection
Could accumulated mutation and natural selection account for the vast number of organism-specific genes necessary for the survival and reproduction of organisms, especially those for their DNA replication, transcription, and translation?
With this question in mind, I have investigated what have been discovered, experimentally, about mutation and natural selection and formation of new genes. The answer for the question is a clear “NO”. Some results of my early investigations were reported in (Tan 2015).
A Logical Conclusion
A logical conclusion from analyses of the genes involved in biological information coding and decoding and of the genes necessary for organism survival and reproduction is that not all organisms can be linked via a big evolutionary tree of life. Organisms on earth are better represented as a forest of trees of life, as I proposed a few years ago (Tan 2016). What many believe and teach about the origin of life and the origin of biodiversity does not agree with what the genes are showing us.
Unfortunately, this conclusion is not one that many would feel comfortable with.
An Experimental Demonstration
The organism-specificity and non-interchangeability, at least at the domain level of life, of biological information coding and decoding systems (Tan and Tomkins 2015a, 2015b), the necessity of many organism-specific essential genes (Tan 2015), and the inability of mutation and natural selection to generate a single essential gene are demonstrated by molecular cloning experiments, especially that of Craig Venter in his creating the “the first self-replicating species we’ve had on the planet whose parent is a computer” (Gibson et al. 2010; Tan 2016; Venter 2010), though whether the parent of the cell they synthesized is a computer is controversial (Matuscak and Tan 2016) and it was not Venter’s intention to provide such a demonstration.
Briefly, Craig Venter and colleagues synthesized the entire one-megabase (Mb) genome of Mycoplasma mycoides in yeast, a eukaryotic cell (Gibson et al. 2010) (fig. 10). However, the yeast cells could not create M. mycoides cells using the cloned bacterial genome. The genes encoded in the cloned genome need to be transcribed and translated using the molecular machines from Mycoplasma capricolum, a cell that shares more than 99% identity for the 79 core proteins involved in gene translation, as well as their ribosomal DNA, with the genome donor M. mycoides (Labroussaa et al. 2016).
The inability of a yeast cell to decode the bacterial M. mycoides genetic code is a consequence of the domain-specific information processing systems, including DNA replication, transcription, and translation (Tan and Tomkins 2015a; 2015b). As mentioned earlier, what is striking is not so much that the number of proteins involved are different (as important as that is) but that the identities of these proteins are different. The proteins used for bacterial DNA replication, transcription, and translation are mostly bacteria specific; they do not have known homologs in eukaryotes. Likewise, the proteins used for eukaryotic DNA replication, transcription, and translation are mostly eukaryotes specific; they do not have known homologs in bacteria.
To overcome the barrier between bacterial and eukaryotic DNA replication machinery, a yeast origin of replication had to be artificially incorporated into the bacterial genome before the bacterial genome could be cloned in yeast (fig. 10, the leftmost arrow on the top). Furthermore, at each step in which the cloned bacterial DNA needs to be amplified in both E. coli and yeast, an E. coli-yeast shuttle vector that contains both an E. coli origin of replication and a yeast origin of replication (fig. 11) had to be used.
To overcome the barriers between bacterial and eukaryotic transcription and translation machineries, the bacterial selectable genes (usually antibiotic-resistance genes) were placed under the control of a bacterial gene promoter and the yeast selectable genes (usually nutrient-selectable genes) were placed under the control of a yeast gene promoter, so that the selectable marker genes can be recognized by the transcription and translation machinery of the corresponding host organisms and be transcribed and translated. The selectable genes are necessary for the identification and isolation of the cloned DNA.
Also, because of the barriers between bacterial and eukaryotic transcription and translation machineries and difference in coding and decoding strategies, the cloned bacterial genome that was fully assembled and amplified in yeast had to be transferred into a bacterial host cell to be “activated.”
Even after all these, the initial cloned genome was not able to generate a self-replicating synthetic bacterial cell because one base pair in one of M. mycoides hundreds of essential genes was missed at the beginning of the cloning process and the mistake escaped detection along the way. The missed base pair had to be added back manually to enable the synthetic cell to survive and reproduce. For a more detailed description of what the Venter’s team did and the obstacles they had to overcome during their creation of their synthetic bacterium, the readers are referred to (Tan and Stadler 2020).
The inheritable genetic information that determines life or death of all living beings is not only encrypted but is encrypted in an organism-specific manner. Different organisms have to use their own unique cryptographic keys that are made of their own RNAs and proteins present in their own cells and those that they can make themselves using their own molecular machineries to decipher their genomes. This creates discontinuities in between bacteria, archaea, and eukaryotes. The discontinuities can be overcome during molecular cloning using specifically- engineered cloning vectors, but not by the organisms whose DNA is cloned, apart from human intellects. Therefore, the organism-specific genetic information coding and decoding systems challenge the popular belief that life came from non-life naturally and that all organisms are connected via a big evolutionary tree of life.
Chen, Xizi, Yilun Qi, Zihan Wu, Xinxin Wang, Jiabei Li, Dan Zhao, Haifeng Hou et al. 2021. “Structural Insights into Preinitiation Complex Assembly on Core Promoters.” Science 372, no. 6541 (30 April). https://www.science.org/doi/10.1126/science.aba8490.
Chen, Xizi, Xiaotong Yin, Jiabei Li, Zihan Wu, Yilun Qi, Xinxin Wang, Weida Liu, and Yanhui Xu. 2021. “Structures of the Human Mediator and Mediator-Bound Preinitiation Complex.” Science 372, no. 6546 (6 May). https://www.science.org/doi/epdf/10.1126/science.abg0635.
Gibson, Daniel G., John I. Glass, Carole Lartigue, Vladimir N. Noskov, Ray-Yuan Chuang, Mikkel A. Algire, Gwynedd A. Benders et al. 2010. “Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome.” Science 329, no. 5987 (20 May): 52–56.
Girbig, Mathias, Agata D. Misiaszek, Matthias K. Vorländer, Aleix Lafita, Helga Grötsch, Florence Baudin, Alex Bateman, and Christoph W. Müller. 2021. “Cryo-EM Structures of Human RNA Polymerase III in its Unbound and Transcribing States.” Nature Structural and Molecular Biology 28, no. 2 (February): 210–219.
Gribaldo, Simonetta, Anthony M. Poole, Vincent Daubin, Patrick Forterre, and Céline Brochier-Armanet. 2010. “The Origin of Eukaryotes and Their Relationship with the Archaea: Are We at a Phylogenomic Impasse?” Nature Reviews Microbiology 8, no. 10 (16 September): 743–852.
Koonin, Eugene V. 2015. “Origin of Eukaryotes from within Archaea, Archaeal Eukaryome and Bursts of Gene Gain: Eukaryogenesis Just Made Easier?” Philosophical Transactions of the Royal Society B-Biological Sciences 370, no. 1678 (26 September). https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.2014.0333.
Labroussaa, Fabien, Anne Lebaudy, Vincent Baby, Geraldine Gourgues, Dominick Matteau, Sanjay Vashee, Pascal Sirand-Pugnet, Sébastien Rodrigue, and Carole Lartigue. 2016. “Impact of Donor-Recipient Phylogenetic Distance on Bacterial Genome Transplantation.” Nucleic Acids Research 44, no. 17 (August): 8501–8511.
Liu, Bin, Chuan Hong, Rick K. Huang, Zhiheng Yu, and Thomas A. Steitz. 2017. “Structural Basis of Bacterial Transcription Activation.” Science 358, no. 6365 (November 17): 947–951.
Matuscak, Scott T., and Change Laura Tan. 2016. “Who are the Parents of Mycoplasma Mycoides JCVI-syn1.0?” BIO-Complexity 2016, no. 2: 1–5. https://bio-complexity.org/ojs/index.php/main/article/view/BIO-C.2016.2/BIO-C.2016.2.
Nevers, Yannis, Arnaud Kress, Audrey Defosset, Raymond Ripp, Benjamin Linard, Julie D. Thompson, Olivier Poch, and Odile Lecompte. 2019. “OrthoInspector 3.0: Open Portal for Comparative Genomics.” Nucleic Acids Research 47, no. D1: D411–D418. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323921/.
Rochette, Nicolas C., Céline Brochier-Armanet, and Manolo Gouy. 2014. “Phylogenomic Test of the Hypotheses for the Evolutionary Origin of Eukaryotes.” Molecular Biology and Evolution 31, no. 4 (April): 832–845.
Sadian, Yashar, Florence Baudin, Lucas Tafur, Brice Murciano, Rene Wetzel, Felix Weis, and Christoph W. Müller. 2019. “Molecular Insight into RNA Polymerase I Promoter Recognition and Promoter Melting.” Nature Communications 10, no. 1 (5 December): 5543.
Schmitt, Emmanuelle, Pierre-Damien Coureux, Ramy Kazan, Gabrielle Bourgeois, Christine Lazennec-Schurdevin, and Yves Mechulam. 2020. “Recent Advances in Archaeal Translation Initiation.” Frontiers in Microbiology 11: (September) 584152. https://www.frontiersin.org/articles/10.3389/fmicb.2020.584152/full.
Sehnal, David , Alexander S. Rose, Jaroslav Koča, Stephen K. Burley, and Sameer Velankar. 2018. “Mol*: Towards a Common Library and Tools for Web Molecular Graphics.” MolVA ‘18 Proceedings of the Workshop on Molecular Graphics and Visual Analysis of Molecular Data: 29–33. https://doi.org/10.2312/molva.20181103.
Tan, Change Laura. 2015. “Using Taxonomically Restricted Essential Genes to Determine Whether Two Organisms Can Belong to the Same Family Tree.” Answers Research Journal 8 (November 4): 413–435. https://assets.answersingenesis.org/doc/articles/pdf-versions/arj/v8/taxonomically_restricted_genes_family_tree.pdf.
Tan, Change Laura. 2017. “Holistic Study of Whole Genomes.” Journal of Genome 1, no. 1: 1000e102. https://www.omicsonline.org/open-access/holistic-study-of-whole-genomes.pdf.
Tan, Change. 2016. “Big Gaps and Short Bridges: A Model for Solving the Discontinuity Problem.” Answers Research Journal 9 (6 July): 149–162. www.answersingenesis.org/arj/v9/discontinuity_problem.pdf.
Tan, Change Laura. 2022a. “Facts Cannot be Ignored When Considering the Origin of Life #1: The Necessity of Bio-monomers Not to Self-Link for the Existence of Living Organisms.” Answers Research Journal 15 (9 March): 25– 29. https://answersresearchjournal.org/genetics/necessity-of-bio-monomers/.
Tan, Change Laura. 2022b. “Facts Cannot be Ignored When Considering the Origin of Life #2: Challenges in Generating the First Gene-encoding Template DNA or RNA.” Answers Research Journal 15:31–48.
Tan, Change L., and Rob Stadler. 2020. The Stairway To Life: An Origin-Of-Life Reality Check. Evorevo Books.
Tan, Change, and Jeffrey P. Tomkins. 2015a. “Information Processing Differences Between Archaea and Eukaraya— Implications for Homologs and the Myth of Eukaryogenesis.” Answers Research Journal 8 (18 March): 121–141. https://answersingenesis.org/biology/microbiology/information-processing-differences-between-archaea-and-eukarya/.
Tan, Change, and Jeffrey P. Tomkins. 2015b. “Information Processing Differences Between Bacteria and Eukarya— Implications for the Myth of Eukaryogenesis.” Answers Research Journal 8 (25 March): 143–162. https://answersingenesis.org/biology/microbiology/information-processing-differences-between-bacteria-and-eukarya/.
Venter, Craig 2010. “Watch me unveil ‘synthetic life’.” TED (Technology, Entertainment and Design). http://www.ted.com/talks/craig_venter_unveils_synthetic_life.
Vesteg, Matej, and Juraj Krajcovic. 2011. “The Falsifiability of the Models for the Origin of Eukaryotes.” Current Genetics 57, no. 6 (December): 367–390.
Williams, Tom A., Peter G. Foster, Cymon J. Cox, and T. Martin Embley. 2013. “An Archaeal Origin of Eukaryotes Supports Only Two Primary Domains of Life.” Nature 504 no. 7479 (12 December): 231–236.
Zhao, Haiyan, Natalie Young, Jens Kalchschmidt, Jenna Lieberman, Laila El Khattabi, Rafael Casellas, and Francisco J. Asturias. 2021. “Structure of Mammalian Mediator Complex Reveals Tail Module Architecture and Interaction With a Conserved Core.” Nature Communications 12, no. 1 (1 March): 1355.