Research Highlights of the Aravind group: Damage, conflict, repair and the early history of nucleic acid polymerases

The accumulated wisdom of sequence analysis and structural biology over the past three decades has led to the realization that the catalytic domains of all extant nucleic acid polymerases belong to just four great superfamilies which have had independent origins. The most widespread is the RRM (RNA-recognition motif)-like fold of the “palm” domains seen in DNA polymerases of superfamily A, B and Y, reverse transcriptases, viral RNA-dependent RNA polymerases, DNA-dependent RNA polymerases of mitochondria and certain viruses (e.g. phage T7), archaeo-eukaryotic type primases, and the tRNA repair enzyme Thg1. The second most prevalent fold, the pol-β fold is that displayed by the superfamily X (e.g. pol β), bacterial PolIII-type DNA polymerases and various template-independent RNA- and DNA-polymerases, such as the CCA-adding enzymes, the polyA polymerases and the terminal transferases. Notably, both these folds also feature multiple independent innovations of synthetases for signaling cyclic/oligonucleotide nucleotide activity. A further innovation of polymerase activity is seen in the TOPRIM domain shared by DNAG-typeprimases and the majority of topoisomerases and gyrases. Last, the RNA polymerase activity templated by DNA- or RNA-templates of all cellular transcription enzymes, certain viral and plasmid RNA polymerases, and the smallRNA-amplifying enzymes involved in the RNAi process display two copies of the double-psi-β-barrel fold.

Two questions raised by these observations are: 1) what do the structures of the catalytic domains of these polymerases tell us about the early protein-nucleic acid world? 2)What are the implications of the repeated innovation of cyclic nucleotide signaling among the nucleic acid polymerases? First, in the case of at least three of the above folds, in addition to nucleic acid polymerase activity, we also see ancient non-metal-binding, non-catalytic versions that are likely to have just bound RNA. This suggests that that the nucleotidyltransferase catalysis probably arose in the context of a more general RNA-binding activity. Thus, the proteins, which were probably at first “protective” or scaffolding partners of the ribozymes, displaced the RNAs in terms of catalysis. The presence of both template-dependent and template-independent activities at least three of these folds suggests that like Thg1 or the CCA-adding enzymes their earliest activities were probably relatively generic without major participation of the protein in template recognition.

The presence of any type of code, in the simplest case in the form of a complementary template, or more elaborately in the form of multi-layered “reading” that emerged in the translation apparatus meant that: 1) there had to be safe-guards for the code against environmental insults such as chemical and radiation damage. 2) The code becomes an excellent invariant for attack by competing rival replicators. This selected a wide range of nucleic-acid- and ribosome-targeting effectors as mediators of this conflict that continues to this date at all levels of biological organization. 3) The previous two factors meant that there was strong selection for multiple nucleic-acid-repair systems. We posit that it was this process that favored the multiple originations of nucleic-acid-repair activities in several structurally distinct RNA-binding folds via emergence of key metal-coordinating residues (see Burroughs and Aravind, 2016). On one hand selection channeled some of these into bona fide replication and transcription enzymes, while on the other hand some of their paralogs remained in a simpler state closer to the ancestral enzymes, as is seen today in the catalytic domains of RNA repair enzymes like Thg1 and the CCA-adding enzymes.

Attacks on Nucleic acids, repair and the provenance of nucleotide signaling

Finally, we posit that a natural byproduct of the activities of at least some of these nucleotidyltransferases were cyclic nucleotides or oligonucleotide like oligo 2’-5’A. The generation of these in context of attacks on nucleic acids due to the ongoing repair activity probably selected for them functioning as signaling molecules for both biological conflict and environmental stress. Thus, we posit that not only did the emergence of these enzymes contribute directly to the structure of the core of biology, i.e. information flow in “the central dogma” but also to the less-conserved “periphery” in the form of signaling systems. Consistent with this proposal our studies have obtained strong evidence for a major role for nucleotide-signals as mediators of counter-invader attack systems. Further support emerges from the fact the synthetases involved in nucleotide-generation in several such systems like the CRISPR systems and several cyclic-dinucleotide and 2’-5’A-centered systems (in bacteria and animal interferon signaling) share a common origin with either the CCA-adding enzyme-like clade of pol-β family nucleotidyltransferases or Thg1-like RNA repair enzymes. For more discussions on related matters, read our latest papers [6,12].

References

1 Iyer LM, Koonin EV, Leipe DD, Aravind L. 2005. Origin and evolution of the archaeo-eukaryotic primase superfamily and related palm-domain proteins: structural insights and new members. Nucleic acids research 33: 3875-96.

2 Iyer LM, Abhiman S, Aravind L. 2008. A new family of polymerases related to superfamily A DNA polymerases and T7-like DNA-dependent RNA polymerases. Biology direct 3: 39.

3 Anantharaman V, Iyer LM, Aravind L. 2010. Presence of a classical RRM-fold palm domain in Thg1-type 3'- 5'nucleic acid polymerases and the origin of the GGDEF and CRISPR polymerase domains. Biology direct 5: 43.

4 Lamers MH, Georgescu RE, Lee SG, O'Donnell M, et al. 2006. Crystal structure of the catalytic alpha subunit of E. coli replicative DNA polymerase III. Cell 126: 881-92.

5 Aravind L, Koonin EV. 1999. DNA polymerase beta-like nucleotidyltransferase superfamily: identification of three new families, classification and evolutionary history. Nucleic acids research 27: 1609-18.

6 Burroughs AM, Zhang D, Schaffer DE, Iyer LM, et al. 2015. Comparative genomic analyses reveal a vast, novel network of nucleotide-centric systems in biological conflicts, immunity and signaling. Nucleic acids research 43: 10633-54. ***

7 Aravind L, Leipe DD, Koonin EV. 1998. Toprim--a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Nucleic acids research 26: 4205-13.

8 Iyer LM, Koonin EV, Aravind L. 2003. Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases. BMC structural biology 3: 1.

9 Burroughs AM, Zhang D, Aravind L. 2015. The eukaryotic translation initiation regulator CDC123 defines a divergent clade of ATP-grasp enzymes with a predicted role in novel protein modifications. Biology direct 10: 21.

10 Zhang D, de Souza RF, Anantharaman V, Iyer LM, et al. 2012. Polymorphic toxin systems: Comprehensive characterization of trafficking modes, processing, mechanisms of action, immunity and ecology using comparative genomics. Biology direct 7: 18.

11 Iyer LM, Zhang D, Rogozin IB, Aravind L. 2011. Evolution of the deaminase fold and multiple origins of eukaryotic editing and mutagenic nucleic acid deaminases from bacterial toxin systems. Nucleic acids research 39: 9473-97.