Research Highlights of the Aravind group: PIWI domain evolution

A good deal of manuscript ink has been spilled in study of PIWI proteins, the core catalytic engine of the RNA interference (RNAi) pathway which, among many linked functional roles is perhaps best known for triggering post-transcriptional gene silencing in eukaryotes via binding to small RNAs, which in turn bind reverse-complementary homologous stretches in target mRNAs. Despite copious knowledge gained into almost every minute aspect of PIWI interaction with small RNA and target RNA, the evolution of PIWI and how it came to functionally occupy the central role in such a well-studied pathway remains shrouded in relative murkiness. In a recent review published by our group along with Dr. Yoshinari Ando at Johns Hopkins, we sought to clear some of the fog surrounding the natural history of the PIWI proteins (see our recent review).

PIWI PROTEIN ANATOMY
Much of this confusion surrounding the origin of PIWI stems from a profound lack of understanding the domain architecture and the individual domains comprising the PIWI protein and the extent to which this is conserved in prokaryotic PIWI proteins. The core conserved architecture of eukaryotic PIWI proteins are, in order from N- to C-terminus: 1) a dyad of PIWI-N-terminal domains (PNTD1 and PTND2). These two domains have arisen through duplication followed by a circular permutation at the N-terminus of one of the copies from an ancestral domain with 4 strands and two helices (see the review for more details). The boundaries of these two domains have been inaccurately established in several studies resulting in two inappropriately-defined segments termed the N-terminal and Linker-1 (L1) domains. 2) These domains are followed by PAZ, a RNA-binding domain adopting a SH3-like fold which plays an important role in recognition of the 3’end of the guide strand. 3) A conserved “linker” region (typically termed Linker-2 or L2). 4) The a/ß sandwich MID domain with a Rossmannoid topology that specifically binds the 5’ end of the guide strand. 5) The PIWI catalytic domain itself, belonging to the RNase H fold, which binds the target strand, and if active, uses its metal-dependent RNase H active site to cleave target and passenger strands.

As recently recognized by the Tomari laboratory at the University of Tokyo, this core eukaryotic architecture is observed in some PIWI proteins in prokaryotes [click for ref]. However, in contrast to the strict adherence to this core architecture observed in eukaryotes, prokaryotic PIWI (pPIWI) proteins are more diversity in their architectural construction. One form of elaboration is seen in the fusion of a Sirtuin fold nuclease to the N-terminus of the standard eukaryotic architecture. Another is the potential uncharacterized N-terminal module in the newly-discovered pPIWI-RE family [click for ref] in lieu of the PNTD1/PTND2/PAZ/L2 domains. More strikingly, pPIWI proteins are commonly comprised of only the L2+MID+PIWI domains. The gene encoding this protein is adjacent to another gene encoding a conserved region in a wide range of prokaryotic lineages. This domain is related to a region that was previously claimed to be a novel domain referred to as the APAZ (Analog of PAZ) domain fused to several prokaryotic PIWI domains; the authors of this prediction reasoned that this apparently novel domain was displacing the PAZ domain and therefore likely functionally equivalent to PAZ [click for ref]. However, in our recently published work, we determine that this assignment of a novel domain to the so called “APAZ” region was in error; in fact, the region comprises of rather standard versions of the PTND1 and PTND2 domains and likely a C-terminal PAZ domain; although one defining characteristic of the PAZ domain, like other members of the SH3 barrel fold, is to tendency to diverge rapidly, preventing homology detection using even the most sensitive of methods. Therefore we have determined, outside the possible distinct N-terminal module of the pPIWI-RE family of PIWI proteins, that the core domain architecture established in eukaryotes is largely observed across all PIWI proteins, although in many prokaryotes this architecture is sundered into two distinct polypeptides, the first containing the PTND1+PTND2+PAZ domain order with the second containing the L2+MID+PIWI domain order. Within these split versions, mirroring the Sirtuin fusion to the complete core architecture mentioned above, the PTND1+PTND2+PAZ protein is often further fused at the N-terminus to nuclease domains derived from several distinct folds including the Restriction Endonuclease (REase) fold, the TIR fold, and the Sirtuin fold.

Delineation of the PNTD1/2 domain duplication event and establishment of its deep prokaryotic roots and early adoption into the core PIWI protein architecture clarifies recent functional roles attributed to the N-terminal region of PIWI proteins: namely, implication in the melting of double-stranded RNA duplexes formed during PIWI loading and after target binding and also in prevention of duplex propagation. Introduction of the duplicated PNTD1/2 domains into the core PIWI architecture assisted in the formation of an extended channel, shaping an inbuilt and ancestral switch allowing the RNaseH domain of the PIWI proteins to catalyze cleavage only when the former domains establish an appropriate interface with the binding nucleic acids. Together with the MID domain which recognizes the opposite small RNA terminus, PNTD1/2 appears to have been the primary evolutionary constraint for the characteristic modal length of the small RNAs deployed in RNAi.

With the preceding information in hand, we can begin to trace the evolutionary trajectory of PIWI domain architecture. The RNase H-like PIWI domain is most closely related to the UvrC/Endonuclease V clade of RNase H domains, with UvrC highly conserved across bacteria and EndoV conserved across eukaryotes and archaea. This suggests at least a single copy of this RNase H clade was present in the Last Universal Common Ancestor (LUCA). Both the UvrC and EndoV domain are endoDNases and not RNases. The relatively sporadic and limited distribution of known pPIWI domains suggests they likely emerged from one of these two more broadly distributed lineages later in prokaryotic evolution followed by subsequent dispersal across a diverse range of prokaryotes via horizontal gene transfer. This process appears to have resulted in a shift from DNA duplex to RNA-DNA hybrid duplex specificity. Emergence of the RNase H PIWI domain likely coincided with direct association with the MID domain which descended from an unknown Rossmannoid fold precursor. This core pairing is observed in the pPIWI-RE family, which likely represents the most ancestral extant version of the PIWI domain. As the nature of the N-terminal domains fused to pPIWI-RE remain opaque, the exact temporal timing of the association with the PNTD1/PNTD2/PAZ module (as well as the L2 domain) remains unclear, possibly occurring with the emergence of pPIWI-RE or prior to the divergence of the class I and class II divisions of the classical pPIWI proteins. The eukaryotic PIWI protein was thus necessarily inherited from the class II division, given the core domain architecture shared between eukaryotes and class II in contrast to the sundered architecture in class I.

FUNCTIONAL SHIFTS IN PIWI EVOLUTION
Applying genome contextual information in the form of conserved operon associations onto the above evolutionary framework throws considerable light on the functional shifts that occurred during PIWI evolution. The most basal pPIWI lineage, the pPIWI-RE family, is contained within a three-gene island additionally encoding both a helicase and a REase DNase, strongly suggesting the pPIWI-RE family functions as a plasmid/phage defense system (see post below for more details on pPIWI-RE). In our work, we find evidence supporting similar functional roles for classic pPIWI protein families in the form of strong, family-specific genome linkages to endoDNases of various distinct folds. These associations were observed in all branches of the class I division and at least two branches in the class II division. Recent small RNA profiling in the bacterium Rhodobacter sphaeroides observed hybrid RNA-DNA duplexes associating with pPIWI playing a role in plasmid silencing. This R. sphaeroides pPIWI protein belongs to a class II family associating with a DNA REase domain, further supporting a role for many classical pPIWI protein families in phage/plasmid restriction and drawing a straight line from the predicted function in the pPIWI-RE family to the classical pPIWI families. Thus, ancestrally the pPIWI domains appear to have functioned in the context of RNA-guided restriction of invasive DNA by endoDNases.

We also observed additional contextual associations in the class II division: 1) at least two families have been recruited to previously unrecognized/uncharacterized CRISPR systems. The CRISPR moniker refers to a collection of phage restriction systems following a similar mode of action: incorporation of fragments of phage genomes into genomic loci, transcribing these fragments, and using the fragments as guide RNA to attack the DNA (and in some cases, the RNA) of infecting agents. Despite functional similarities, the protein components comprising these systems are astonishingly diverse, incorporating several distinct nucleases and RNA-binding domains [http://www.biologydirect.com/content/6/1/38]. Our review is the first to link CRISPR-like systems with pPIWI; these systems are notable for their lack of any known processing RNase, suggesting the pPIWI domain functions in processing and utilizing CRISPR RNAs during the phage targeting step. 2) One pPIWI family associates with an endoRNase HEPN domain [click for ref]. 3) One family conspicuously lacking any conserved association with other domains. Strikingly, the pPIWI proteins in this family share the strongest sequence affinity with the eukaryotic PIWI proteins. As the earliest eukaryotic PIWI proteins were clearly recruited to RNA-targeting systems, it appears possible that the shift from DNA targeting to RNA targeting may have actually occurred first, and given the HEPN connection possibly on multiple independent occasions, in prokaryotes.

ADOPTION OF PIWI AS THE CENTRAL COMPONENT OF EUKARYOTIC RNAi AMIDST THE RNA MILIEU OF EARLY EUKARYOTES
As part of our review, we compare small RNA data across diverse eukaryotic phylogenies and identified three sources of small RNA potentially utilized by the earliest-emerging iterations of eukaryotic RNAi systems: small RNA derived from 1) overlapping sites of sense-antisense transcription, 2) genomically-encoded, independently-transcribed hairpin sequences, and 3) double-stranded sections from larger, non-coding RNA entities (including snoRNA, tRNA, etc.). Surprisingly, the most broadly-distributed and ancestral of these three sources appears to be sense-antisense transcriptional sites. Thus, it appears possible that the earliest PIWI-centered RNAi systems in eukaryotes may have acquired substrates from sense-antisense transcription. This dovetails nicely recent research on RNA expression indicating bacteria are engulfed in a transcriptional landscape consisting of such sense-antisense RNA transcriptional products [click for ref], a condition likely mirrored in the eukaryotic stem lineage.

While the above neatly explains both the architectural inheritance and functional shifts taking place during PIWI evolution, it fails to address the logic behind selection of the PIWI domain as the central catalytic component of eukaryotic RNAi. After all, prokaryotes possess their own widespread, well-elaborated RNA-based interference/restriction system: the aforementioned CRISPR/Cas system, in addition to less frequently-observed pPIWI-dependent systems. Why rebuild an RNAi system from scratch, in the process selecting a central component from a relatively infrequently-utilized restriction system? A possible answer for this question is observed in the loss of several other multigene defense systems during the prokaryote-eukaryote transition, such as the classic restriction-modification (R-M) systems, the Pgl system, and toxin–antitoxin systems. All of these systems are themselves mobile, selfish elements that appear to depend on strong genomic linkage (i.e. existence of operons) for the physical assembly of their products and neutralization of their toxic components via the linkage of transcription and translation in prokaryotes. The emergence of the nucleus in eukaryotes, with the resulting breakdown of transcription–translation coupling, rendered such systems incapable of survival owing to the potential danger of the toxic restriction components to the cell. Indeed, expressions of CRISPR/Cas systems in eukaryotes with appropriate RNA guides, e.g., Type II systems, introduce double-strand breaks in DNA with serious mutagenic consequences. The eukaryotic RNAi system therefore appears to have been rebuilt by elaboration around a core formed by the simpler prokaryotic pPIWI-based systems, specifically those that did not have strong operonic linkages with DNA targeting components.

The Cas9-containing CRISPR systems, which are thematically similar in combining a RNaseH domain with a restriction system-like HNH domain inserted into the former have recently proven to be raging successes as biotechnological reagents of gene disruption [click for ref]. In light of these, it might be useful to explore the diverse range of pPIWI guided restriction systems as potential biotechnological reagents for similar purposes.

Tuesday, December 17, 2013

PIWI domain evolution