Research Highlights of the Aravind group: On the origins of the bacterial transcription apparatus

Many years ago, a little after the first RNA polymerase structures were solved, we obtained several remarkable insights into the core transcription apparatus of life. We were the first to show that the RNA polymerase subunits, cognates of the bacterial beta and betaprime subunits, contain recognizable, evolutionarily conserved domains and that each of these subunits contribute a double-psi beta barrel domain to the active site. We also showed that the polymerase subunits accreted several other domains in a lineage-specific manner, which differ between the archaeo-eukaryotic and the bacterial subunits, and even within the bacterial versions. Our study also established the common origin of the RNA-dependent RNA polymerase involved in RNAi and the cellular DNA-dependent RNA polymerases (Click to access [Reference1] [Reference 2]).

Recently, we conducted a reanalysis of the bacterial transcription apparatus and from this study emerged several new insights that have refined or redefined our thinking on the origins of the transcriptional apparatus [Click to read]. Some of these new findings were discussed at greater length with a leading researcher in the field of transcription and a part of the correspondence is reproduced below as questions and answers.

Question: One of the new points uncovered in this study is the shared evolutionary ancestry of the archaeo-eukaryotic TFIIB and the bacterial sigma factor, based on structural similarity of the cognate HTH domains that interact with similar sites on the archaeo-eukaryotic and bacterial RNAP, respectively. Is the homology in any way reflected on the sequence level?"

Answer: The simple answer to the question is yes -- we can detect using different sequence profile methods statistically significant sequence similarity between the TFIIB and Sigma HTHs. In conclusion there is no doubt about their evolutionary relatedness and descent from a common ancestor (for example a comparisons of the HMMs of archaeal TFIIB orthologs with Sigma70-like superfamily using profile-profile comparisons; e.g. HHpred; gives p=1.8e-6 and probability of 86% and many more such lines of support). .

Question: Could the similarity between the transcription factor-RNAP interactions in the bacterial holo-RNAP and the RNAPII-TFIIB / RNAP-TFB complexes be a case of convergent evolution?

Answer: Several aspects of the interactions of the bacterial and archaeo-eukaryotic RNAPs are very likely to be convergent and we have no counter-argument in this regard. The main point is the orthology of sigma and TFIIB despite being distantly related (which seems likely now to us).

Question: The current consensus in the field is that there are no real sigma homologs in archaea, or eukaryotes. It is argued that the LUCA RNAP could have initiated in a transcription factor-independent manner, and that the sigma and TFIIB/TFB-related factors emerged in evolution following the split of the bacterial and archaeo-eukaryotic lineages.

Answer with a bit of history from LA): Many moons ago in our early days of sequence analysis we had studied the HTHs in considerable depth. One thing that became clear was that all these HTHs, be it sigma or TFIIB certainly shared a common origin (a view articulated in these papers that we wrote several years later pmid: 10556324 [Click to access]and another in 2005 pmid: 15808743 [Click to access]). As a result of these investigations it became clear that TFIIB (of course including TFB)/cyclin/RB and sigma are *real homologs*, but throughout that period the issue remained as to whether they were *real orthologs*. The reason being many other basal HTHs also show significant similarity to each. Of course, we could rule out things like TFIIE wHTH and MBF-like 4-helical HTHs from contending for ortholog-hood with sigma because they belong to different lineages of HTHs that have their own clear-cut bacterial cognates. But sigma remained unclear. In course of the above mentioned papers, I took a stance that indeed sigma and TFIIB, while being genuine homologs, were independent recruitments as basal TFs which interacted with the RNAP. But since 2005 we got an opportunity to understand the RNA polymerase evolution better using the template of our earlier studies on these proteins (12553882 [Click to access], 15194191 [Click to access]) aided by the various versions from diverse selfish elements that offered potential evolutionary intermediates. So in conclusion it became clear that they began as RNAPs that could have initiated transcription factor-independently, especially given that they lacked any specially adaptation to interact with TFs or had inbuilt HTH domains that might have substituted for the TF. But the beta cognates of the RNAPs of cellular life were unified by one striking synapomorphy in the form of the insertion of the SBHM within the catalytic DPBB domain that could not have been convergence. The emergence of this insert would indicate the emergence of interactions of a DNA-bound TF as it plays this role in all the three superkingdomains of life and is absent in the RdRP-like RNA polymerases (e.g. YonO) and RNAPs of selfish elements such as the NCgl1702-type RNAPs. This, taken together with the homology of the sigma and TFIIB, and the fact they have double HTHs, made us reconsider our former position and accept the more simple explanation of sigma and TFIIB being orthologs, albeit distant in sequence. Of course this divergence in sequence is not surprising with lot of independent action happening around them such as emergence of TBP in the archaeo-eukaryotic lineage etc.

Question: If the primordial ribozyme RNAP evolved into the extant multisubunit RNAP by recruiting a dimeric DPBB protein cofactor which usurped the active site, and over time increased the subunit complexity to result in the extant multisubunit RNAP, where does that leave the single subunit enzymes? Did they emerge later, earlier, or at the same time? Different members of extant single subunit nucleic acid polymerases have all these activities (RNAPs, DNAPs, RT etc.). Assuming that they would have predated multisubunit RNAPs, when did the change of guard occur, and for what functional reasons/selective advantages?

Answer: Currently we can list the following major independent inventions of RNA polymerase activity:
Within the RRM-like fold or the classical palm-containing polymerases: 1.1) The RNA viral RdRPs; 1.2) the THG1 (5'->3')-CRISPR-like RNA polymerases (at least some are RdRPs) and the 1.3) Phage T7-like RNAPs. Within the RRM-like fold with a flange: 2) archaeo-eukaryotic type primases. Within the TOPRIM fold: 3) DNAG-like primases. Within the pol-beta fold: 4) CCA-adding enzyme-poly A polymerase-like. Within the DPBB fold: 5) The double barrel RdRPs and DdRPs.

While there were many inventions of RNA polymerases, the following observations seem to hold: The RNAPs in the group 1.1 are the main replicative enzymes that replicate RNA in independent replicons. While the double-psi barrel RdRPs replicate small RNAs in the eukaryotic RNAi system, there is no evidence currently for them being dominant replicative enzymes of large replicons. The RdRPs in group 1.1 are further closely related to the replicative reverse transcriptases, which appear to have a single origin. On the other hand, representatives from 1.1, 1.3, 2, 3 and 5 can be associated with replication in the context of the synthesis of the RNA primer for DNA replication. Additionally, the primpols from group 2 can replicate DNA after initiating it with a RNA polymerase activity for priming. We are of the opinion that indeed RNA was more likely the primary nucleic acid (supported by: 1) its catalytic and replicative capacity; 2) its association with polypeptide templating, and 3) the priming problem making DNA a difficult starting genome. This conclusion, combined with the above observations regarding the RNAPs of group 1 and the relationship to RTs, leads us to propose that the polymerases from the 1.1. group were the first to emerge. They enabled the rise of DNA genomes with the origin reverse transcribing ability as they radiated. The emergence of DNA in turn offered a new niche for RNA polymerases due to the priming problem. This selective force appears to resulted in the emergence of multiple RNA primer synthesizing enzymes (early representatives of 1.3, 2, 3 and 5) as evidenced by the above observations. Even at this stage it is possible that there was a reverse transcribing intermediate in replication, which also helped solve the transcription problem for DNA replicons. The rise of large DNA replicons appears to have placed the pressure for transcription-specific RNAPs. This unique niche appears to have favored two major groups of RNA polymerases -- 1.3 and 5, but in the lineage leading to the cellular replicons 5 seems to have dominated. We suspect that the elements of the architecture of the double-psi beta barrel polymerases allowed them to be more effective transcription enzymes due to: 1) their ability to initiate transcription at internal sites independently of a replication origin signal for which the other enzyme were optimized; 2) their offering interfaces for regulation -- in particular the distinctive bihelical extension preceded by two extended segments forming a standalone haripin in beta-prime. The latest analysis of the evolution of double-psi beta barrel RNAPs suggests that they two began as a fusion of two DPBBs in a single polypeptide followed by a split prior to LUCA.

Question: Could these accretions have been responsible for improved regulatory potential or higher fidelity? In that context it is noteworthy than no single subunit RNAP can 'backtrack' and undergo transcript cleavage.

Answer: The addition of subunits, basal TFs and SBHMs and other domains do clearly point in the direction of continuous evolution favoring higher fidelity and regulatory potential. In particular it might have helped provide robustness to this central cellular system in face of mutational "attack" -- over-engineering.The last point of the question is of note and might have been a selective force in the later evolution of the RNAPs.

Question: Since the RNAP are predicted to have their origins as ribozymes and went through an RNA-protein stage, why is the ribosome apparently slower in losing its RNA components, as compared to nucleic acid polymerases.

Answer: First, regarding the ribosome where the RNA plays a role in peptidyltransfer: We have recently extensively studied the emergence of peptide bond forming activity in protein enzymes (pmid: 20023723 [Click to access], 20678224 {Click to access]). There were at least 11 independent inventions of peptide ligase activity, but an examination of each of these suggest that they are unable to handle the reaction in an amino acid independent manner. This inability of the protein peptide ligases might have allowed the RNA to persist. Further, a look at the other ancient ribozyme RNAse P suggests that shape selective recognition of nucleic acid structures, which is a feature it shares with the ribosomal RNAs might be a key factor that cannot be entirely reproduced by proteins. In these cases the ribozymes certainly would persist. Further RNA is also a better scaffold than proteins in certain contexts and it continues to be used as such in contexts like the eukaryotic Polycomb RNAs and HOTAIR. So, we do not see a need for RNA to be displaced in every case. Our original ribozyme displacement hypothesis was based on the observations like: 1) Several of the ancient enzymes are homologs of non-enzymatic ancient domains that bind RNA and 2) In cases like RNAseP, the protein component increases the catalytic rate of the ribozyme by potentially increasing local affinity metal ion. This offers a pre-adaptation for the protein acquiring metal-binding dependent catalysis. Now, given the new information on the evolution of the doublepsi beta barrel RNAPs, it appears that the RdRP activity might be a secondary innovation. Hence, it is conceivable the DPBB domains were merely nucleic acid binding cofactors in an already protein dominant world and its associated nucleic acid might not have had any catalytic activity. It is becoming increasingly likely that a RNA only world was probably never there (i.e. independent of proteins) and early RNAs at best had restricted catalytic capabilities in the RNA world. It is even possible that right from the beginning the basic reciprocal catalytic cycle involved early RNAs catalyzing peptide-bond formation and protein synthesis (precursor of the ribosome) and the proteins in turn catalyzing the formation of the phosphodiester bond and RNA synthesis.

Question: What about the evolutionary origins of the TBP fold, and of TBP itself? The single fold itself can be found in RNaseHIII and DNA glycosylases but it has not been demonstrated to mediate any direct interactions with DNA or DNA, that emerged later, with TBP in the archaeo-eukaryotic lineage. What happened before that, did the LUCA RNAP initiate TFIIB-sigma dependent?

Answer: TBP belongs to the larger helix-grip fold (pmid: 11276083 [Click to access]) that includes proteins with various binding capabilities. When we first showed the relationship between TBP and the RNAseHIII N-terminal domain in 2001 (pmid:11582786 [Click to access]), it was the closest to TBP within the helix grip fold. However, since then we found another member of the fold, CCTBP that is as related as the one in RNAseHIII to TBP (PMID: 19089947). Both these are much closer to TBP than the version in the DNA glycosylases. Hence, the evolution of TBP is to be understood in the context of these related domains. Of these the CCTBP is involved in sulfotransfer along with ubiquitin like proteins. The evidence does suggest that the RNAseHIII TBP domain might interact with DNA-RNA hybrid molecules. Hence, it appears that during the radiation of the TBP family it acquired very distinct activities, but the one associated with primer degradation or RNA-based DNA restriction is a more likely candidate for precursor of TBP the basal TF than CCTBP, which is associated with distinct metabolic activities. However, this might change if a nucleic acid binding activity is demonstrated for the CCTBP domain.

Saturday, January 21, 2012

On the origins of the bacterial transcription apparatus