The accumulated wisdom of sequence analysis and structural biology
over the past three decades has led to the realization that the catalytic domains
of all extant nucleic acid polymerases belong to just four great superfamilies
which have had independent origins. The most widespread is the RRM
(RNA-recognition motif)-like fold of the “palm” domains seen in DNA polymerases
of superfamily A, B and Y, reverse transcriptases, viral RNA-dependent RNA
polymerases, DNA-dependent RNA polymerases of mitochondria and certain viruses
(e.g. phage
T7), archaeo-eukaryotic type
primases, and the tRNA repair enzyme Thg1. The
second most prevalent fold, the pol-β fold is that displayed by the
superfamily X (e.g. pol β), bacterial
PolIII-type DNA polymerases and various template-independent RNA- and
DNA-polymerases, such as the CCA-adding enzymes, the polyA polymerases and the
terminal transferases. Notably, both these folds also feature multiple
independent innovations of synthetases for signaling
cyclic/oligonucleotide nucleotide activity. A further innovation of
polymerase activity is seen in the TOPRIM domain shared by
DNAG-typeprimases and the majority of topoisomerases and gyrases. Last, the
RNA polymerase activity templated by DNA- or RNA-templates of all cellular
transcription enzymes, certain viral and plasmid RNA polymerases, and the smallRNA-amplifying enzymes
involved in the RNAi process display two copies of the double-psi-β-barrel fold.
Two questions raised by these observations are: 1) what do the
structures of the catalytic domains of these polymerases tell us about the
early protein-nucleic acid world? 2)What are the implications of the repeated
innovation of cyclic nucleotide signaling among the nucleic acid polymerases?
First, in the case of at least three of the above folds, in addition to nucleic
acid polymerase activity, we also see ancient non-metal-binding, non-catalytic
versions that are likely to have just bound RNA. This suggests that that the
nucleotidyltransferase catalysis probably arose in the context of a more general
RNA-binding activity. Thus, the proteins, which were probably at first
“protective” or scaffolding partners of the ribozymes, displaced the RNAs in
terms of catalysis. The presence of both template-dependent and
template-independent activities at least three of these folds suggests that
like Thg1 or the CCA-adding enzymes their earliest activities were probably
relatively generic without major participation of the protein in template
recognition.
The presence of any type of code, in the simplest case in the form
of a complementary template, or more elaborately in the form of multi-layered
“reading” that emerged in the translation apparatus meant that: 1) there had to
be safe-guards for the code against environmental insults such as chemical and
radiation damage. 2) The code becomes an excellent invariant for attack by
competing rival replicators. This selected a wide range
of nucleic-acid- and ribosome-targeting effectors as mediators of this conflict
that continues to this date at all levels of biological organization. 3)
The previous two factors meant that there was strong selection for multiple
nucleic-acid-repair systems. We posit that it was this process that favored the
multiple originations of nucleic-acid-repair activities in several structurally
distinct RNA-binding folds via emergence of key metal-coordinating residues (see Burroughs and Aravind,
2016). On one hand selection channeled some of these into bona fide replication and transcription enzymes,
while on the other hand some of their paralogs remained in a simpler state
closer to the ancestral enzymes, as is seen today in the catalytic domains of
RNA repair enzymes like Thg1 and the CCA-adding enzymes.
Attacks on Nucleic acids, repair and the provenance of nucleotide signaling |
References