Wednesday, January 4, 2017

Who is responsible for bad science: researchers, peer reviewers or editors?

The following is partly a case study of how research can take a wrong turn in modern molecular biology/biochemistry and partly a reflection on the sociology of the science.

In the current scientific culture it is considered a big deal to publish in particular “high profile venues”, such as the Science and the Nature magazines. We can attest based on personal experience as well as reports from our peers that publication in these venues often entails enormous difficulty to the researchers involved on account of the peer review practices at these venues. Thus, pushing a paper through to these venues can earn one a virtual badge of a survivor or even hero of a great battle. This badge confers tangible benefits in the modern scientific system: 1) the tenure decisions of scientists at many institutions is favorably influenced by such magazine-publications. 2) The scientific productivity reviews of researchers at many institutions often receives a great boost from such publications. 3) A scientist stands to get press, awards, and higher visibility (as measured by citations) – more generally “fame” from such publications. 4) Last and perhaps most importantly in academia, such publications could be a big factor in earning grant money for continuing research.

Unfortunately, this system of incentives makes the magazine-publication an end in itself, ahead of the actual science. In the below case we illustrate how this, together with the system at the magazines, can engender bad science.

Recently a paper was published in the Science magazine entitled: “A nuclease that mediates cell death induced by DNA damage andpoly(ADP-ribose) polymerase-1”. Given our interest in novel biochemistries and our long-standing investigations of self-inflicted nucleic acid damage in programmed cell death our interest was piqued by this paper. A closer look soon revealed that the paper had problems. Briefly:

MIF, a protein with previously reported tautomerase activity, is a member of the tautomerase superfamily which does not feature nucleases. The authors suggest that MIF is a DNase by claiming a structural relationship to nucleases of the Restriction endonuclease (REase) fold, which frequently but not always contain a PD-(D/E)XK motif. They claim that MIF contains three copies of this motif implying that it contains three copies of the REase fold. However, none of this is supported by structure or sequence evidence: 1) A DALI search with the structure of MIF does not recover any REase fold structures with Z-scores suggestive of genuine relationships (Z>3); as expected it recovers several tautomerase superfamily structures. 2) The REase fold is topologically unrelated to the tautomerase fold, which emerged from an internal duplication of a simple two-strand-helix unit. 3) REase fold catalysis requires residues (not conserved in MIF) beyond the metal-coordinating acidic residues of the PD-(D/E)XK. Moreover, the aspartates and glutamates identified by the authors are often on opposite ends of different structural elements and not proximal to coordinate a metal ion. 4) These motifs, unlike the catalytic prolines of the tautomerases, are not well-conserved even among animal orthologs.

Further, the authors claim glutamate 22 to be the catalytic residue equivalent to that of the so-called Exonuclease-Endonuclease-Phosphatase (synaptojanin-like) domain (Pfam: PF03372). This fold is unrelated to both REases (so-called PD-(D/E)XK) and tautomerases. Their claim that MIF contains a “CxxCxxHx(n)C Zinc finger” (the Rad18 Zn-finger present in FAN1/KIAA1018) is also untenable as the side chains of these residues are nowhere proximal to coordinate a Zn-ion in MIF. Hence, this proposal of DNase function in MIF is based on a flawed structural hypothesis and should be viewed with utmost caution. We have explained the above in great detail in this preprint available from bioArxiv.

Given the serious problems with this published paper we decided to resort to the standard process of post-facto scientific engagement. Within 10 days of reading the said paper we submitted a technical comment along the lines of the above-linked preprint to the Science Magazine (10/26/2016) detailing why the paper is problematic. We believe that this was important because the extraordinary claims made in the article flew in the face of the foundations of biology, i.e. the evolutionary theory, and we felt this should be made apparent to the research community. After more than a month (12/6/2016) we heard back from the editor at the Science Magazine who had handled the original article. Despite all that time spent we received no peer reviews of our technical note. Rather the editor decided that even though “your discussion of our recent paper is interesting” it was not worth publishing. However, the editor suggested that we submit a short summary of our note as an “eLetter” (which allows only 500 words; similar to the summary we provide above). These are non-peer reviewed comments that are posted below the article on the Science Magazine website at the discretion of the editor. They are neither visible in any obvious way with the article nor are they available on Pubmed, which is the standard resource used by researchers to find literature of relevance. We followed the suggestion of the editor by posting an “eLetter” (12/9/2016); eventually, the editors decided to post our comment on the magazine website (12/20/2016). Thus, almost 2 months after the original submission some form of dissenting commentary appeared on the magazine.

So what are the lessons to learn from this story regarding bad science? We see that there are three parties, each of which has to be blamed for different even if somewhat overlapping reasons:

1) The authors. They are to blame because they have utterly disregarded the foundations of biology in planning their experiments. In any other science, like physics, a researcher is unlikely to have any chance of being accepted as a serious scientific player in the community if s/he were unaware of the basic foundations of the science, like say classical mechanics, leave alone publish in the Science Magazine. However, unfortunately, in biology several researchers can spend their publishing career without having any more than a sketchy grasp of the foundations of biology, i.e. the evolutionary theory, and how it is applied to study the functions of biomolecules. This is exactly what we see with this paper. The authors blithely disregard very basic principles of protein structure and sequence evolution to form their starting conjecture which they then go to support with wet lab experiments. Now, if the wet lab experiments supported their results then we have reason to be suspicious of the way they were done – contamination, poor controls, or even worse, some kind of unscrupulous practice. There are indeed suspicious features regarding the experimental result of nuclease activity as pointed out in our preprint.

If this were not enough, the authors responded to our eLetter on 12/24/2016 as can be seen on the Science website. This response betrays not just a lack of understanding of the evolutionary theory but even more fundamental issues. Though the authors aligned a monomer to claim the presence of the so-called “PD-(D/E)XK” motif in MIF in their original paper, now they claim that the REase fold is seen only in the trimer of the Tautomerase fold! This is not merely an about-face regarding their original inference but indicates an even deeper lack of an idea of what comprises a protein fold. In conclusion, based on this all we can say is that they are either visually challenged or lack discernment regarding elementary geometric issues such as symmetry.

2) The reviewers. The Science Magazine helpfully provides the following details regarding the review of this paper: It was handled by a single editor who had it reviewed by three full reviewers and one advisor by the single blind method. The whole process took from 11/02/2015 to 8/22/2016 with two rounds of review prior to acceptance. This is an extraordinary overkill both in terms of time spent and number of reviewers for an article that should have been rejected outright in the first round of review itself or by the advisor if such were consulted by the editor before formal review. Why did this not happen? Given the point we make above, we fear that sadly the reviewers too, like the authors, lacked the basic qualification i.e. sound knowledge of the implications of the evolutionary theory as applied to biomolecules. Instead, as is typical of such magazine venues it appears that the reviewers sent the authors around for almost 11 months on a wild-goose chase of doing more experiments which are utterly worthless given that the starting premise itself is flawed. What this also shows is that at magazine venues reviewers are wont to giving trouble to authors for irrelevant things while not really focusing on the key scientific issues in the paper. It is a certain mentality which is sadly not uncommon among wet lab scientist where technical issues take center-stage before asking whether the science could meaningful (and useful) or not.

3) The Editors. Given the vastness of the field and literature and the technical expertise needed for things like sequence/structure analysis we would not blame the editor overly for missing the bad science in the original submission. Nevertheless, at venues like the magazines and high-profile journals the editors are typically former scientists themselves. Hence, we would expect that in the least they would have some quick intuition for good versus bad science. From personal experience we can say that editors at such venues often fail to see merit in genuinely good and interesting science submitted to them, while taking bad pieces like this one. Thus, we may say that the intuition for discriminating between submissions might be weak among the editors. However, even more damaging is their failure to get proper peer reviews as doorkeepers of these magazines which, as noted above, play an important role in the career of researchers. Finally, we believe that the editor’s unwillingness to properly publish a technical comment that reveals why the article amounts to bad science amounts even further damage to science. Although the editors did finally post an eLetter from us, as noted above, this is not visible via the regular channels of literature search. Hence, no corrective is available to go hand in hand with the original article. Thus, this move on part of the editor strikes against the much valued “self-correcting process” that exists in science.

In conclusion, while we are not involved in any kind of design of science policy, we still have a few recommendations to make. 
  • First, relates to biology education. Modern biology education necessarily needs to go hand-in-hand with proper teaching of the evolutionary theory as it applies to biomolecules, along with the accompanying biochemistry that is needed to properly understand it. Technical skills with handling laboratory equipment and experimentation, however important, cannot be privileged over such education in the above-stated fundamentals. 
  • Second, the scientific status-measuring apparatus needs to go slow on emphasizing publications in magazine-like venues as a “badge of honor”. Publications at such venues are short, thereby giving little space for detailed scientific description that helps develop a foundational argument properly and spot flawed ideas. Their peer-review system is aimed more at “causing sufficient trouble” rather than providing honest peer comments on the science.
  • Third, the members of the editorial system should come out of the echo-chambers fostered by certain “big-name researchers” or artificially constructed “hot science” and pay independent attention to the literature from diverse journals in their respective fields.
Finally, in case a reader were to think that this is an example of us making much ado about nothing or that we are sensationally drumming up an isolated case, then all we’ll say is that this just the tip of the proverbial iceberg. Right in this venue we hope to briefly bring up more examples as and when time permits. One might also look at an earlier paper we had written on this topic. Apparently, things have not entirely changed in all those years.

Wednesday, December 7, 2016

Origins of cyclic- and oligo-nucleotides in biological conflicts from microbes to vertebrate interferons: The “CRISPR polymerase” finds a function

Recent research has pointed to fundamental ties between the emergence of conflict and the fixation of diverse polymerase activities from distinct protein folds (see earlier post). One byproduct of the emergence of several stable polymerase-based replicative systems, which depend on nucleotide transfer during nucleic acid polymer elongation, was a concomitant increase in the diversity of homolog nucleotidyltransferases (NTase) which function in the production of cyclic or oligonucleotides. The appearance of these enzymes in biological conflict contexts likely led to their selection as core components of pathways involved in conflict, with their nucleotide products contributing as signaling molecules during conflict and environmental stress response.

A recent study from our group identified a wealth of previously-unidentified systems, distributed widely across a broad swath of prokaryotic phyla, centering on just such secondary messenger nucleotide-generating NTase enzymes and their corresponding nucleotide sensor domains (click to read). In addition to this NTase-sensor pair, these conflict systems invariably contain an effector domain which is predicted to either attack a non-self-entity or initiate cell suicide. The most frequently-observed NTase embedded in these systems is a representative of the SMODS family, which is typically coupled to one of two novel sensor domains, the SAVED or AGS-C domains. 

The SMODS family belongs to the DNA polβ fold, which includes the experimentally-characterized Vibrio cholerae DncV protein which is known to generate cyclic GMP-AMP (cGAMP). Within the DNA polβ fold, the SMODS family forms a higher-order assemblage with both the eukaryotic cGAS (cGAMP synthase) and OAS (2’-5’ oligoadenylate synthase) enzymes. While the SMODS NTase was likely the “founder” NTase for these newly-identified, nucleotide-dependent conflict systems, several systems display clear evidence of displacement of the core nucleotide-generating NTase component.

In a subset of systems, we identified a previously-unidentified enzyme occupying the typical SMODS position, suggesting a displacement by a novel, uncharacterized NTase domain. Careful analysis of this domain identified it as a new member of the RRM-like fold containing a “palm” domain, with a surprising, close relationship to the catalytic domain of the CRISPR polymerases (frequently referred to as Cmr2 or Cas10). As this novel enzyme conserved the structure and sequence features necessary for NTase activity, yet lacked the N-terminal fusion to the HD phosphoesterase domain observed in the CRISPR polymerase domains, we named it the mCpol (minimal CRISPR polymerase) domain (click to read). Both the CRISPR polymerase NTase domain and the mCpol together form a higher-order assemblage of palm domain NTases to the exclusion of all other families with the GGDEF family of cyclic di-GMP synthetases.

Strikingly, mCpol domains were consistently linked in nucleotide-dependent conflict system contexts to the CARF sensor domain. This represents a further parallel between the CRISPR polymerase and mCpol domains; previous research from our group has described enrichment of CARF domains specifically in the so-called “Type III” CRISPR systems which harbor active CRISPR polymerase and HD domain fusion proteins (click to read). mcPol- and CRISPR polymerase-centered systems can thus each be conceptually thought of as containing three core components: 1) the nucleotide synthetase component, 2) the CARF sensor component, and 3) effector domain components. In mCpol systems, effectors include various pore-forming domains and the HEPN RNase domain. In CRISPR systems, the effector takes the form of the HEPN RNase domain found C-terminally fused to the CARF domain, and might also thematically extend to include other CRISPR effectors 
including interference caused by the cascade complexes.

The discovery of the mCpol domain and its placement within a larger context of nucleotide-dependent conflict systems therefore offers substantive insight into the evolution and function of CRISPR systems containing the CRISPR polymerase. In evolutionary terms, it appears likely that certain CRISPR systems, including the classical Type I and III systems, emerged through combination of the more minimal mCpol-CARF (and potentially HEPN) units with other mobile elements including the RAMPs and Cas1-Cas2 dyad.

In functional terms, the CRISPR polymerase itself has long-remained an enigmatic domain with regards to its possible role in the CRISPR systems, with speculation at various points ranging from roles as a crRNA-amplifying polymerase, template independent terminal transferase, and cyclic nucleotide synthetase. Based on its relationship to the GGDEF synthetases and now with the evolutionary parallels to our newly-described, nucleotide intermediate-dependent conflict systems, we can say that these polymerases are likely generating (cyclic) nucleotides which are in turn sensed by accompanying CARF domains. Again by parallel to the many other described nucleotide-dependent conflict systems (click to read), this nucleotide signal is likely to activate the HEPN effector in these CRISPR systems. In light of this, the HD-phosphoesterase domain N-terminally fused to the CRISPR polymerase domain could provide a means of terminating this effector-activating signal by hydrolyzing the nucleotide, an action comparable to the cNMP phosphodiesterases with HD domains in classical cNMP signaling.


The discovery of these evolutionary connections and their resulting functional inferences will undoubtedly deepen experimental understanding of the endogenous regulation of different classes of CRISPR systems. Additionally, there is potential scope for these discoveries to bring improvements to biotechnological application of CRISPR systems in the lab.

Monday, December 5, 2016

Damage, conflict, repair and the early history of nucleic acid polymerases

The accumulated wisdom of sequence analysis and structural biology over the past three decades has led to the realization that the catalytic domains of all extant nucleic acid polymerases belong to just four great superfamilies which have had independent origins. The most widespread is the RRM (RNA-recognition motif)-like fold of the “palm” domains seen in DNA polymerases of superfamily A, B and Y, reverse transcriptases, viral RNA-dependent RNA polymerases, DNA-dependent RNA polymerases of mitochondria and certain viruses (e.g. phage T7), archaeo-eukaryotic type primases, and the tRNA repair enzyme Thg1. The second most prevalent fold, the pol-β fold is that displayed by the superfamily X (e.g. pol β), bacterial PolIII-type DNA polymerases and various template-independent RNA- and DNA-polymerases, such as the CCA-adding enzymes, the polyA polymerases and the terminal transferases. Notably, both these folds also feature multiple independent innovations of synthetases for signaling cyclic/oligonucleotide nucleotide activity. A further innovation of polymerase activity is seen in the TOPRIM domain shared by DNAG-typeprimases and the majority of topoisomerases and gyrases. Last, the RNA polymerase activity templated by DNA- or RNA-templates of all cellular transcription enzymes, certain viral and plasmid RNA polymerases, and the smallRNA-amplifying enzymes involved in the RNAi process display two copies of the double-psi-β-barrel fold.

Two questions raised by these observations are: 1) what do the structures of the catalytic domains of these polymerases tell us about the early protein-nucleic acid world? 2)What are the implications of the repeated innovation of cyclic nucleotide signaling among the nucleic acid polymerases? First, in the case of at least three of the above folds, in addition to nucleic acid polymerase activity, we also see ancient non-metal-binding, non-catalytic versions that are likely to have just bound RNA. This suggests that that the nucleotidyltransferase catalysis probably arose in the context of a more general RNA-binding activity. Thus, the proteins, which were probably at first “protective” or scaffolding partners of the ribozymes, displaced the RNAs in terms of catalysis. The presence of both template-dependent and template-independent activities at least three of these folds suggests that like Thg1 or the CCA-adding enzymes their earliest activities were probably relatively generic without major participation of the protein in template recognition.

The presence of any type of code, in the simplest case in the form of a complementary template, or more elaborately in the form of multi-layered “reading” that emerged in the translation apparatus meant that: 1) there had to be safe-guards for the code against environmental insults such as chemical and radiation damage. 2) The code becomes an excellent invariant for attack by competing rival replicators. This selected a wide range of nucleic-acid- and ribosome-targeting effectors as mediators of this conflict that continues to this date at all levels of biological organization. 3) The previous two factors meant that there was strong selection for multiple nucleic-acid-repair systems. We posit that it was this process that favored the multiple originations of nucleic-acid-repair activities in several structurally distinct RNA-binding folds via emergence of key metal-coordinating residues (see Burroughs and Aravind, 2016). On one hand selection channeled some of these into bona fide replication and transcription enzymes, while on the other hand some of their paralogs remained in a simpler state closer to the ancestral enzymes, as is seen today in the catalytic domains of RNA repair enzymes like Thg1 and the CCA-adding enzymes.

Attacks on Nucleic acids, repair and the provenance of nucleotide signaling
Finally, we posit that a natural byproduct of the activities of at least some of these nucleotidyltransferases were cyclic nucleotides or oligonucleotide like oligo 2’-5’A. The generation of these in context of attacks on nucleic acids due to the ongoing repair activity probably selected for them functioning as signaling molecules for both biological conflict and environmental stress. Thus, we posit that not only did the emergence of these enzymes contribute directly to the structure of the core of biology, i.e. information flow in “the central dogma” but also to the less-conserved “periphery” in the form of signaling systems. Consistent with this proposal our studies have obtained strong evidence for a major role for nucleotide-signals as mediators of counter-invader attack systems. Further support emerges from the fact the synthetases involved in nucleotide-generation in several such systems like the CRISPR systems and several cyclic-dinucleotide and 2’-5’A-centered systems (in bacteria and animal interferon signaling) share a common origin with either the CCA-adding enzyme-like clade of pol-β family nucleotidyltransferases or Thg1-like RNA repair enzymes. For more discussions on related matters, read our latest papers [6,12].


Thursday, November 3, 2016

Quod erat demonstrandum? No restriction endonuclease fold in MIF

Recently a seriously flawed article was published in the Science magazine. We present an analysis pointing to the flaws in the paper. Click the following link to read more.

Monday, January 4, 2016

DNA adenine methylation in eukaryotes

A video abstract of our latest study on Adenine methylation in eukaryotes published in Bioessays follows.  Click here to access the full paper.

Tuesday, December 17, 2013

PIWI domain evolution

A good deal of manuscript ink has been spilled in study of PIWI proteins, the core catalytic engine of the RNA interference (RNAi) pathway which, among many linked functional roles is perhaps best known for triggering post-transcriptional gene silencing in eukaryotes via binding to small RNAs, which in turn bind reverse-complementary homologous stretches in target mRNAs. Despite copious knowledge gained into almost every minute aspect of PIWI interaction with small RNA and target RNA, the evolution of PIWI and how it came to functionally occupy the central role in such a well-studied pathway remains shrouded in relative murkiness. In a recent review published by our group along with Dr. Yoshinari Ando at Johns Hopkins, we sought to clear some of the fog surrounding the natural history of the PIWI proteins (see our recent review).

Much of this confusion surrounding the origin of PIWI stems from a profound lack of understanding the domain architecture and the individual domains comprising the PIWI protein and the extent to which this is conserved in prokaryotic PIWI proteins. The core conserved architecture of eukaryotic PIWI proteins are, in order from N- to C-terminus: 1) a dyad of PIWI-N-terminal domains (PNTD1 and PTND2). These two domains have arisen through duplication followed by a circular permutation at the N-terminus of one of the copies from an ancestral domain with 4 strands and two helices (see the review for more details). The boundaries of these two domains have been inaccurately established in several studies resulting in two inappropriately-defined segments termed the N-terminal and Linker-1 (L1) domains. 2) These domains are followed by PAZ, a RNA-binding domain adopting a SH3-like fold which plays an important role in recognition of the 3’end of the guide strand. 3) A conserved “linker” region (typically termed Linker-2 or L2). 4) The a/ß sandwich MID domain with a Rossmannoid topology that specifically binds the 5’ end of the guide strand. 5) The PIWI catalytic domain itself, belonging to the RNase H fold, which binds the target strand, and if active, uses its metal-dependent RNase H active site to cleave target and passenger strands.

As recently recognized by the Tomari laboratory at the University of Tokyo, this core eukaryotic architecture is observed in some PIWI proteins in prokaryotes [click for ref]. However, in contrast to the strict adherence to this core architecture observed in eukaryotes, prokaryotic PIWI (pPIWI) proteins are more diversity in their architectural construction. One form of elaboration is seen in the fusion of a Sirtuin fold nuclease to the N-terminus of the standard eukaryotic architecture. Another is the potential uncharacterized N-terminal module in the newly-discovered pPIWI-RE family [click for ref] in lieu of the PNTD1/PTND2/PAZ/L2 domains. More strikingly, pPIWI proteins are commonly comprised of only the L2+MID+PIWI domains. The gene encoding this protein is adjacent to another gene encoding a conserved region in a wide range of prokaryotic lineages. This domain is related to a region that was previously claimed to be a novel domain referred to as the APAZ (Analog of PAZ) domain fused to several prokaryotic PIWI domains; the authors of this prediction reasoned that this apparently novel domain was displacing the PAZ domain and therefore likely functionally equivalent to PAZ [click for ref]. However, in our recently published work, we determine that this assignment of a novel domain to the so called “APAZ” region was in error; in fact, the region comprises of rather standard versions of the PTND1 and PTND2 domains and likely a C-terminal PAZ domain; although one defining characteristic of the PAZ domain, like other members of the SH3 barrel fold, is to tendency to diverge rapidly, preventing homology detection using even the most sensitive of methods. Therefore we have determined, outside the possible distinct N-terminal module of the pPIWI-RE family of PIWI proteins, that the core domain architecture established in eukaryotes is largely observed across all PIWI proteins, although in many prokaryotes this architecture is sundered into two distinct polypeptides, the first containing the PTND1+PTND2+PAZ domain order with the second containing the L2+MID+PIWI domain order. Within these split versions, mirroring the Sirtuin fusion to the complete core architecture mentioned above, the PTND1+PTND2+PAZ protein is often further fused at the N-terminus to nuclease domains derived from several distinct folds including the Restriction Endonuclease (REase) fold, the TIR fold, and the Sirtuin fold.

Delineation of the PNTD1/2 domain duplication event and establishment of its deep prokaryotic roots and early adoption into the core PIWI protein architecture clarifies recent functional roles attributed to the N-terminal region of PIWI proteins: namely, implication in the melting of double-stranded RNA duplexes formed during PIWI loading and after target binding and also in prevention of duplex propagation. Introduction of the duplicated PNTD1/2 domains into the core PIWI architecture assisted in the formation of an extended channel, shaping an inbuilt and ancestral switch allowing the RNaseH domain of the PIWI proteins to catalyze cleavage only when the former domains establish an appropriate interface with the binding nucleic acids. Together with the MID domain which recognizes the opposite small RNA terminus, PNTD1/2 appears to have been the primary evolutionary constraint for the characteristic modal length of the small RNAs deployed in RNAi.

With the preceding information in hand, we can begin to trace the evolutionary trajectory of PIWI domain architecture. The RNase H-like PIWI domain is most closely related to the UvrC/Endonuclease V clade of RNase H domains, with UvrC highly conserved across bacteria and EndoV conserved across eukaryotes and archaea. This suggests at least a single copy of this RNase H clade was present in the Last Universal Common Ancestor (LUCA). Both the UvrC and EndoV domain are endoDNases and not RNases. The relatively sporadic and limited distribution of known pPIWI domains suggests they likely emerged from one of these two more broadly distributed lineages later in prokaryotic evolution followed by subsequent dispersal across a diverse range of prokaryotes via horizontal gene transfer. This process appears to have resulted in a shift from DNA duplex to RNA-DNA hybrid duplex specificity. Emergence of the RNase H PIWI domain likely coincided with direct association with the MID domain which descended from an unknown Rossmannoid fold precursor. This core pairing is observed in the pPIWI-RE family, which likely represents the most ancestral extant version of the PIWI domain. As the nature of the N-terminal domains fused to pPIWI-RE remain opaque, the exact temporal timing of the association with the PNTD1/PNTD2/PAZ module (as well as the L2 domain) remains unclear, possibly occurring with the emergence of pPIWI-RE or prior to the divergence of the class I and class II divisions of the classical pPIWI proteins. The eukaryotic PIWI protein was thus necessarily inherited from the class II division, given the core domain architecture shared between eukaryotes and class II in contrast to the sundered architecture in class I.

Applying genome contextual information in the form of conserved operon associations onto the above evolutionary framework throws considerable light on the functional shifts that occurred during PIWI evolution. The most basal pPIWI lineage, the pPIWI-RE family, is contained within a three-gene island additionally encoding both a helicase and a REase DNase, strongly suggesting the pPIWI-RE family functions as a plasmid/phage defense system (see post below for more details on pPIWI-RE). In our work, we find evidence supporting similar functional roles for classic pPIWI protein families in the form of strong, family-specific genome linkages to endoDNases of various distinct folds. These associations were observed in all branches of the class I division and at least two branches in the class II division. Recent small RNA profiling in the bacterium Rhodobacter sphaeroides observed hybrid RNA-DNA duplexes associating with pPIWI playing a role in plasmid silencing. This R. sphaeroides pPIWI protein belongs to a class II family associating with a DNA REase domain, further supporting a role for many classical pPIWI protein families in phage/plasmid restriction and drawing a straight line from the predicted function in the pPIWI-RE family to the classical pPIWI families. Thus, ancestrally the pPIWI domains appear to have functioned in the context of RNA-guided restriction of invasive DNA by endoDNases.

We also observed additional contextual associations in the class II division: 1) at least two families have been recruited to previously unrecognized/uncharacterized CRISPR systems. The CRISPR moniker refers to a collection of phage restriction systems following a similar mode of action: incorporation of fragments of phage genomes into genomic loci, transcribing these fragments, and using the fragments as guide RNA to attack the DNA (and in some cases, the RNA) of infecting agents. Despite functional similarities, the protein components comprising these systems are astonishingly diverse, incorporating several distinct nucleases and RNA-binding domains []. Our review is the first to link CRISPR-like systems with pPIWI; these systems are notable for their lack of any known processing RNase, suggesting the pPIWI domain functions in processing and utilizing CRISPR RNAs during the phage targeting step. 2) One pPIWI family associates with an endoRNase HEPN domain [click for ref]. 3) One family conspicuously lacking any conserved association with other domains. Strikingly, the pPIWI proteins in this family share the strongest sequence affinity with the eukaryotic PIWI proteins. As the earliest eukaryotic PIWI proteins were clearly recruited to RNA-targeting systems, it appears possible that the shift from DNA targeting to RNA targeting may have actually occurred first, and given the HEPN connection possibly on multiple independent occasions, in prokaryotes.

As part of our review, we compare small RNA data across diverse eukaryotic phylogenies and identified three sources of small RNA potentially utilized by the earliest-emerging iterations of eukaryotic RNAi systems: small RNA derived from 1) overlapping sites of sense-antisense transcription, 2) genomically-encoded, independently-transcribed hairpin sequences, and 3) double-stranded sections from larger, non-coding RNA entities (including snoRNA, tRNA, etc.). Surprisingly, the most broadly-distributed and ancestral of these three sources appears to be sense-antisense transcriptional sites. Thus, it appears possible that the earliest PIWI-centered RNAi systems in eukaryotes may have acquired substrates from sense-antisense transcription. This dovetails nicely recent research on RNA expression indicating bacteria are engulfed in a transcriptional landscape consisting of such sense-antisense RNA transcriptional products [click for ref], a condition likely mirrored in the eukaryotic stem lineage.

While the above neatly explains both the architectural inheritance and functional shifts taking place during PIWI evolution, it fails to address the logic behind selection of the PIWI domain as the central catalytic component of eukaryotic RNAi. After all, prokaryotes possess their own widespread, well-elaborated RNA-based interference/restriction system: the aforementioned CRISPR/Cas system, in addition to less frequently-observed pPIWI-dependent systems. Why rebuild an RNAi system from scratch, in the process selecting a central component from a relatively infrequently-utilized restriction system? A possible answer for this question is observed in the loss of several other multigene defense systems during the prokaryote-eukaryote transition, such as the classic restriction-modification (R-M) systems, the Pgl system, and toxin–antitoxin systems. All of these systems are themselves mobile, selfish elements that appear to depend on strong genomic linkage (i.e. existence of operons) for the physical assembly of their products and neutralization of their toxic components via the linkage of transcription and translation in prokaryotes. The emergence of the nucleus in eukaryotes, with the resulting breakdown of transcription–translation coupling, rendered such systems incapable of survival owing to the potential danger of the toxic restriction components to the cell. Indeed, expressions of CRISPR/Cas systems in eukaryotes with appropriate RNA guides, e.g., Type II systems, introduce double-strand breaks in DNA with serious mutagenic consequences. The eukaryotic RNAi system therefore appears to have been rebuilt by elaboration around a core formed by the simpler prokaryotic pPIWI-based systems, specifically those that did not have strong operonic linkages with DNA targeting components.

The Cas9-containing CRISPR systems, which are thematically similar in combining a RNaseH domain with a restriction system-like HNH domain inserted into the former have recently proven to be raging successes as biotechnological reagents of gene disruption [click for ref]. In light of these, it might be useful to explore the diverse range of pPIWI guided restriction systems as potential biotechnological reagents for similar purposes.

Tuesday, July 16, 2013

Expanding the PIWI repertoire

The PIWI module directly binds a small RNA transcript which in turn targets a reverse complementary substrate, a remarkable form of RNA-based regulation in the cell which has been linked to a continually-expanding list of pathways including transcript silencing, splicing, chromatin dynamics, DNA break repair, and viral defense, to name a few. The PIWI module was identified well over a decade ago but, until now, the classical PIWI family found in PIWI and Argonaute proteins has remained the only known family. In a very-recently published paper [] from our group, we characterize two novel families of PIWI domains, one found in bacteria and the other in eukaryotes. The bacterial version, dubbed the pPIWI_RE family (in part overlapping with what used to be called the domain of unknown function: DUF3893), is predicted to function in a defense system against invasive phages or plasmids while the eukaryotic version, the medPIWI family, is the defining domain of the human Med13 protein and its eukaryotic orthologs which are crucial regulators of the Mediator complex—a complex required for transcriptional initiation of most eukaryotic genes and one of the primary discoveries behind the awarding of the 2006 Nobel Prize for Chemistry [].

Perhaps the overriding question after discovery of these new families was whether they could bind small RNAs to effect function similar to the classical PIWI module. The PIWI module as defined in Pfam [] actually consists of two distinct domains: an N-terminal Rossmannoid domain which utilizes a unique constellation of conserved residues to bind the 5’ end of the small RNA and a C-terminal domain belonging to the RNAse H nuclease fold which, while often nuclease-inactive, contributes conserved residues primarily interacting with the 4th and 5th nucleotides measured from the 5’ end of the bound small RNA. Careful comparison of both new families with the classical family revealed conservation of amino acids at positions crucial for small RNA binding. Perhaps most notably, the 5’ end-binding constellation of residues was conserved, indicating the new PIWI modules could bind either processed RNAs with exposed 5’ ends or the 5’ transcribed end of a nascent RNA transcript. (Note: The final PDF version of our paper appears to have been produced with low resolution figures so we recommend that the reader directly download the author-supplied images from the HTML version of the open-access paper).

If these novel PIWI modules are capable of binding small RNAs, what are they binding? After considering several lines of evidence, we hypothesized the bacterial pPIWI_RE domain is likely binding the 5’ RNA end of the RNA component of the R-loops (RNA-DNA hybrids) characteristic of replicating invasive plasmids/phages. The reasoning: first, we were unable to detect any conserved, genomically-encoded small RNA transcripts around the pPIWI_RE-encoding gene or its operonic neighbors. Second, the pPIWI_RE domain is tightly-linked in an operon with the DinG helicase, a helicase which has been shown in distinct contexts to specifically interact with R-loops. Finally, such a target enforces selectivity on a defense system weaponized with a potential lethal Restriction Endonuclease fold endoDNase which appears to lack any other method for distinguishing “self” vs. “non-self”. This situation might be compared with a subset of the “Type U” CRISPR/Cas systems which similarly have a DinG helicase combined with Cas7 and Cas5 clade RAMPs.

The Med13 protein is a crucial component of a subcomplex regulating the Mediator complex. This subcomplex transiently associates with essentially all promoters, but only associates strongly at a promoter following activation of an as-yet undetermined physical switch which enacts a conformation change []. We postulate that medPIWI binding to a small RNA constitutes this switch, with the most likely source of this small RNA being cis-generated promoter-derived small RNA transcripts. Recent research has indicated that small RNAs are generated from divergent transcription (transcription on the forward and reverse strands) at and around transcriptional start sites (TSSs) [ ,]. The quantity of these small RNAs at any TSS is roughly proportional to the strength of expression of a gene, dovetailing nicely with the observation that the Med13-containing subcomplex associates most strongly with highly-expressed promoters [].

Several questions remain to be answered, but these discoveries potentially open up exciting new avenues of research. The pPIWI_RE module appears to represent the second RNA-dependent restriction system in prokaryotes after the CRISPR system. It could potentially be exploited as a method for cleaving target DNA using an RNA guide as is being exploited in recent studies using certain types of CRISPR systems. The medPIWI module could provide insight into both the mechanism by which Med13 and its allied proteins modulate Mediator transcriptional activation and the function of small RNA generated near promoter regions.

Tuesday, February 5, 2013

A common thread in amyotrophic lateral sclerosis, Fronto-temporal dementia, Birt-Hogg-Dubé syndrome, vesicular trafficking and bacterial cell polarity

"I tired mid-season. I don't know why, but I just couldn't get going again" -Lou Gehrig (1938). One of the greatest American baseball first baseman was diagnosed with amyotrophic lateral sclerosis (ALS) in June 1939 and died two year later. 

Amyotrophic lateral sclerosis, or Lou Gehrig’s disease, is a debilitating motor neuron disease characterized by rapidly progressive muscular degenerations resulting in fatal paralysis. ALS is often known to occur in individuals with no family history. Studies on the inherited form, familial ALS, have shown that mutations in any of 19 genes can cause ALS, and analysis of the gene list does not reveal an obvious common thread. The neuropathological features include degeneration of corticospinal tracts and loss of lower motor neurons, and several distinct cell types in the primary motor cortex, and gliosis in the motor cortex and spinal cord. Like other neural diseases such as Parkinsonism or Alzheimer’s disease, the degenerating neurons show inclusion bodies of insoluble proteins, or proteins in complex with RNA. Several proteins have been reported in these inclusion bodies including ubiquitin, superoxide dismutase, peripherin, Dorfin, intermediate filament proteins and cystatin-C. Thus, although it has been a while since people recognized ALS, the many distinct causes and varying pathologies have posed great challenges with respect to diagnosis, treatment and understanding the mechanisms of the disease. It is in this regard that recent work from our group, while clarifying the origins of a particular type of ALS, adds new wrinkles to the story and opens up new doors for other poorly characterized proteins, some of which are implicated in human disease.

In 2011, two independent groups (DeJesus-Hernandez et al., Renton et al.) found that a mutation in the human gene C9orf72, is strongly associated with ALS and fronto-temporal dementia (FTD). This was the first gene that linked both these conditions. More precisely, the mutation involves an expansion of the hexanucleotide GGGGCC in the first intron of C9orf72. The absence of a protein defect led researchers to propose that the pathology of this disease may result from RNA-dominant toxicity or haploinsufficiency, supported by the presence of inclusion bodies with the RNA binding protein TDP-43. As of today, over 150 studies have been published regarding the role of C9orf72 mutations in ALS, and also many other neurodegenerative diseases such as Alzheimer's disease and mild cognitive impairment.

Using sensitive sequence and structure analyses, we unified the C9orf72 to a well-known family of GDP-GTP exchange factors (GEFs) for Rab GTPases known as the DENN module, divergent versions of which were also recently identified in Folliculin (Nookala et al.). Mutations in Folliculin cause the Birt-Hogg-Dubé syndrome. Additionally, we showed that the Folliculin interacting proteins FNIP1/2, the nitrogen permease regulators 2 and 3 (Npr2 and Npr3), and the SMCR8 protein encoded by a gene in the Smith-Magenis syndrome candidate region also contain DENN modules. Unification of these proteins to a module that partners with the Rab GTPases connects them to intracellular vesicular trafficking events. This opens a new angle for ALS pathology, i.e the possibility of a vesicular trafficking defect in individual with the C9orf72 mutation. Defects in vesicular trafficking proteins have been previously implicated in phenotypically comparable neurological diseases. For example, mutations in ALS2, which has been proposed to function as a GEF for Rab5, result in an infantile onset motor neuron disease similar to ALS from C9ORF72 mutations. Likewise, an adult onset atypical ALS ensues from mutations in VAPB (ALS8), which is a vesicular trafficking protein. A mutation in the dynactin gene responsible for distal hereditary motor neuronopathy type VIIB (HMN7B; distal spinal and bulbar muscular atrophy or dSBMA), might also result from defects in vesicular trafficking on microtubule tracks by the dynein motor (Laird et al.). Impairment of intracellular trafficking is also a commonly observed theme in several neural diseases such as Huntington’s, Parkinson’s, Niemann-Pick Type C, and Alzheimer’s disease.

The DENN module..... to be continued.

For now you can read our paper (Zhang et. al) and also a news story related to our work.

Wednesday, September 5, 2012

Origin of multicellularity – the bacterial connection

From Dayel et al.

Recently there has been some interest regarding work on the choanoflagellate Salpingoeca rosetta and it transition to multicellularity induced by the sulfonolipid produced by its prey, the bacteroidetes Algoriphagus (click to refer). This is interesting because it is consonant with a concept we have been articulating in print over the last 13 years: genetic material encoding particular protein domains which were horizontally transferred from bacteria were directly responsible for the origin of multicellularity in eukaryotes. We were first alerted to this possibility when we discovered the first caspases, AP-ATPases and TIR (Toll-interleukin) domains in bacteria ( click to refer). These molecules were just then emerging as key mediators of apoptosis in metazoans. This led to the idea that apoptosis, which is a key manifestation of multicellularity emerged directly on account of molecules acquired through lateral transfer from bacteria. We further developed this concept in a detailed sequence analysis of apoptosis mediators that became available as consequence of various genome projects and described this in a paper concomitant with the announcement of the human genome (click to refer). Subsequently, in another article we pointed out that many key aspect of multicellularity, both in terms of signaling and organization have had their ultimate origin in bacteria (click to refer). In terms of signaling, we were able to show that some major metazoan pathways such as the Notch pathway, which is involved in asymmetric cell-division, apoptotic pathways, and cell-cell signaling pathways, e.g. the nitric oxide signaling pathway have crystallized on account of components, whose origins lay in lateral transfers from bacteria. For example, in the Notch pathway the Swi2/Snf2 ATPase protein, Strawberry notch has emerged from bacterial DNA-modification systems related to restriction-modification systems. On the other hand, we showed that the nitric oxide/ carbon monoxide receptor domains emerged from comparable bacterial signaling domains (click to refer). On the organizational side, we were able to show that the origin of key cell-cell adhesion mediating domains (click to refer) also lay in bacteria – in particular we showed that the cadherin, Ig, FNIII and TIG domains emerged from various bacterial proteins with roles in cell-cell adhesion in bacteria, probably in the context of bacterial multicellularity and biofilm formation (click to refer). For a summary of our views one might refer to our paper on the origin of multicellularity (click to refer).

Our recent studies on 2-oxoglutarate and iron dependent (2OGFeDO) and Jumonji-related dioxygenases provide insights into the origins of a quintessential animal molecule collagen and the enzyme required for its biogenesis – the prolyl hydroxylase (click to refer). We uncovered several operons in bacteria that combine genes for one or more distinct 2OGFeDOs, namely amino acid beta-hydroxylase phytanoyl CoA and AlkB-like hydroxylases, with distinct versions of methyltransferase and sulfotransferase domains-containing proteins. These operons might also encode phosphoadenosine phosphosulfate synthetases, acetyltransferases either or both of two types of non-enzymatic proteins: (i) a member of the bacteriophage tail–collar family prototyped by the phage T4 short tail–fiber protein. (ii) Secreted glycine-rich peptides, some of which have a similar pattern of tripeptide repeats as seen in animal collagen. These operonic contexts suggest that the bacteria possessing them might produce collagen-like protein, which are modified by hydroxylation just like their animal counterparts. Indeed, this suggests that a collagen-precursor and its modifying enzymes were acquired from a bacterial source through the lateral transfer of such an operon played a role in the origin of animal by furnish a major component of animal extracellular matrices. Interestingly, the presence of sulfotransferase and phosphoadenosine phosphosulfate synthetases points to sulfate modification, which are also an essential feature of the animal extracellular matrices. On a more general note we observed that related sulfotransferases are fused to Jumonji-related extracellular dioxygenases of the FIH1 family in the choanoflagellate Monosiga (in most organisms they are intracellular and even nuclear proteins). This is particularly interesting in the context of the multiply hydroxylated sulfonolipid reported as being the multicellularity inducing agent secreted by Algoriphagus. Indeed the phytanoyl CoA hydroxylase-like, FIH1-like and sulfotransferase enzymes found in these operons can potentially participate in the synthesis of such metabolites. Therefore, we already have potential candidates for the biosynthesis of the multicellularity inducing agent and also evidence that genes for the synthesis of such molecules have been laterally transferred from bacteria to choanoflagellates.

In more recent times we have been particularly interested in protein toxins and other effectors deployed in intra- and inter- genomic and organismal conflict across life. These studies have also yield a several key clues regarding the bacterial contributions to the emergence of multicellularity among eukaryotes, including metazoans. Several such contributions have been described in our recent monograph of polymorphic toxin systems (click to refer) and will be outlined in a future post.

Tuesday, August 7, 2012

New direction on the function of GRAS proteins in gibberellin signaling

Gibberellin GA-1
Gibberellins (GAs) are key plant hormones that regulate various aspects of growth and development of land plants and have been at the center of the “green revolution”. In angiosperms practically every aspect of plant life including seed germination, elongation growth, and flowering are influenced by the action of GAs. In at least some angiosperms, they have a key role in surviving certain stress conditions such as saline environments and cold. In ferns certain GAs (e.g. GA1) or related compounds (e.g. antheridic acid) induce male gametophyte development, might repress the female gametophytes and also play a role in spore germination. Commercially GAs are used in a wide range of applications such as to promote growth of fruit crops, to increase sugar yield in sugar cane and, stimulate malting of barley during beer production. Did you know that almost all seedless grapes and sweet bing cherries are treated with GA derivatives to increase their size?  GAs are also used in “fruit cosmetics” to prevent the undesirable russeting in apples. The next time you visit a plant nursery, note that GA inhibitors are often used to retard growth of nursery plants. As you can imagine given its remarkable uses, the GA pathway is one of the most intensely studied in plant biology and agriculture. It is in this regard that a new story has emerged from research in our group.

Russeting in apples
Research in the past 20 years or so has shown that some of the key players in the GA response pathway  are members of the GRAS family of proteins, which includes proteins such as GA insensitive (GAI), Repressor of GA1-3 (RGA1/DELLA), Short-root, Scarecrow (SCR), and Nodulation Signal Pathway 1 and 2 (NSP1 and NSP2). For a variety of historical reasons, including incorrect domain prediction, GRAS protein were generally believed to function as conventional transcription factors. The original reports that described them as transcription factors were based on flawed recognition of features such as coiled-coil regions, resulting in comparisons between them and bZIP proteins. This was compounded by another erroneous piece of sequence analysis which led to the idea that they might be plant equivalents of the STAT transcription factors (which have P53/cytochrome f fold DNA-binding domains) known from animal and amoebozoans. Consequently, almost all efforts to understand this family have been spent in testing hypotheses arising from this perspective. However, the evidence that GRAS proteins bind DNA is not very rigorous, with several GRAS proteins failing to display the purported DNA-binding activity despite sharing a conserved structure, raising questions about their mode of action and function.

Our recent studies help clarify the situation. We showed that the GRAS family actually belongs to the Rossmann-fold methyltransferase superfamily. We establish that the GRAS family first emerged in bacteria and plant versions represent a case of lateral gene transfer prior to the radiation of land plants. We further show that all bacterial, and a subset of plant GRAS proteins are likely to function as small molecule methylases, but the remaining plant members have lost one or more AdoMet (SAM)-binding residues while preserving their substrate-binding residues. Thus, based on sequence- structure analysis, combined with functional evidence, we predict that GRAS proteins might either modify or bind small molecules, which might include GAs or their derivatives.

Our results have thus falsified the previously-published relationships that were proposed for the GRAS proteins, and more importantly throw a completely new spin on their mode of action in the context of  GA binding or modification. One delicious possibility is that the active versions function as methylases that might modify certain GAs or their derivatives, whereas inactive versions act as GA binding proteins (Experimentalists  take note). While a GA receptor belonging to the alpha/beta hydrolase superfamily has been described previously, the functional evidence suggests that not all aspects of the GA signaling are channelized via that receptor. Hence, the possibility of direct interaction between a GA or its modified derivative with the GRAS methylase domain remains open and a potentially important avenue for signaling. In addition, very little is known of the fate and prevalence of GA methylation which is a mechanism of GA deactivation in angiosperms. The currently characterized GA methylases (GAMT1 and GAMT2) which are also Rossmann-fold methylases belonging to a radiation of plant methylases of ultimately bacterial origin, includes enzymes that methylate carboxy, hydroxyl and amino groups in synthesis of plant metabolites  like caffeine, theobromine, methyl salicylate, and methyl jasmonate among others. In Arabidopsis, these are primarily expressed in the siliques (fruits) including the seeds and are believed to deactivate GAs via methylation and subsequent degradation during the maturation of seeds. One possibility is that such a methylation dependent control of GAs also occurs in other parts and other developmental processes via the action of GRAS family methylases. The possibility of the inactive versions of the GRAS proteins binding methylated or other modified GAs is also an avenue for possible functional studies.

It should be noted that our phylogenetic analysis (see figure above) suggests that the GRAS superfamily was delivered to plants via a single lateral transfer from bacterial prior to the diversification of land plants --  this ancestral plant GRAS protein underwent a lineage-specific expansion into 13 distinct well-supported clades that contained at least one representative from bryophytes, lycopodiophytes and angiosperms. At face value, assuming a direct GA-related role for the GRAS family, this would suggest that the GA-like molecules were already functional in the early history of land plants. This clearly goes contrary to certain suggestions of plant evolutionists that GA-like molecules were absent in bryophytes like Physcomitrella, but supports recent experimental results suggesting a role for GA-like molecules in caulonema formation, growth direction of protonemata, and spore germination these mosses (Hayashi et al). Our findings suggest that the predicted small-molecule binding/modifying activity would extend to the base of land plants and could have bearing on the enigma of the role of GA-like molecules in basal land plants. For more details, you can read our paper here.

Wednesday, February 22, 2012

NAD, ARTs and ARGs: new players and biochemistries

Our studies have been steadily revealing the deep evolutionary connections between systems involved in cofactor, amino acid and secondary metabolite biosynthesis, and those involved in modifications of proteins and nucleic acids. For example, the origin of several eukaryotic enzymes that add or remove a methyl group on lysines and arginines in histones and other proteins can be directly traced to bacterial pathways involved in synthesizing peptide-derived antibiotics and siderophores (Click on numbers to read various papers : [1] [2]). In a similar vein, multiple components of the peptide ligation and deubiquitination pathways in the eukaryotic ubiquitin system show evolutionary relationships to enzymes involved in diverse bacterial biosynthetic systems for cofactors (thiamine and molybdopterin), siderophores, antibiotics and the amino acid cysteine (click numbers to read papers : [3] [4] [5]). Enzymes catalyzing other major forms of peptide tagging of proteins in eukaryotes, e.g. protein polyglutamylation, polyglycination and tyrosinylation also display evolutionary connections to peptide ligases involved in diverse prokaryotic pathways for the biosynthesis of various antibiotics, the amino acid lysine and cofactors like peptidylated tetrahydrosarcinapterin (a folate-like pterin derivative) and F420 (a flavin-like molecule) (Click to access paper). The generality of this theme is further reinforced by the evolutionary links between enzymes catalyzing other forms of peptide tagging of proteins, such as pupylation and protein arginylation/leucylation, and enzymes mediating peptide-bond formation, respectively, in the synthesis of the peptide cofactor glutathione, and a variety of compounds, such as peptidoglycan and peptide-modified lipids (Click numbers to access papers: [7] [8]). Thus, the ultimate origin of numerous enzymes involved in covalent modifications of proteins and nucleic acids, particularly in eukaryotic regulatory systems, can be linked to enzymes catalyzing similar reactions in bacterial biosynthetic systems specializing in the production of cofactors, amino acids and metabolites such as antibiotics, siderophores and cell-cell communication molecules.

We now consider the links between the biosynthetic and regulatory pathways centered on the ancient and ubiquitous metabolite, nicotinamide adenine dinucleotide (NAD) or its phosphorylated derivative NADP. NAD fits particularly well into the above-discussed patterns because it is both a cofactor for numerous enzymes as well as substrate for numerous protein- and nucleic acid-modifying reactions. As a cofactor it functions as one of the central redox molecules or hydrogen-carriers in the cell for reactions catalyzed by several diverse oxidoreductases, usually of the Rossmann fold. As a substrate in protein and nucleic acid modification it supplies the ADP ribose moiety for modification of side chains of amino acids such as glutamate, glutamine, lysine, asparagine, cysteine and diphthamide (a modified histidine) and arginine and guanine in DNA. The most common superfamily of enzymes that catalyze such reactions unites the ADP ribosyltransferases (ARTs), which catalyze the transfer of a single ADP ribose moiety to a target molecule, and polyADP ribose polymerases/polyADP ribose transferases (PARPs/PARTs) that transfer multiple such moieties to form branched or straight chain ADP ribose polymers. A nucleic acid-modifying ART is the RNA 2’phosphotransferase KptA/Tpt1, a RNA-repair enzyme that transfers the 2’ phosphate, which is generated as a result of tRNA splicing and RNA ligase action, to NAD, resulting in the generation of ADP-ribose 1”-2” cyclic diphosphate (Appr>p) and release of nicotinamide. The rifamycin ART, which is related to above RNA-processing enzyme, instead inactivates the antibiotic by ADP ribosylation of a hydroxyl group on its carbon.

In recent years there has been tremendous progress in terms of structural and biochemical understanding of ARTs, PARTs, sirtuins, MACROs and several NAD biosynthesis enzymes. There have also been several efforts in terms of sequence analysis leading to the discovery of novel ART superfamily enzymes and tremendous interest in the connections between NAD metabolism and the dynamics of heterochromatin formation, especially in the context of organismal aging.  Our comparative genomic and sequence analyses of NAD-utilizing and synthesizing enzymes has led to the identification of a novel enzymatic fold that appears to have supplied multiple distinct families of proteins implicated in NAD/ADP ribose metabolism in diverse contexts. Using contextual analysis we show that some of these proteins potentially act in the context of RNA repair, where NAD is used to remove 2'-3' cyclic phosphodiester linkages. Likewise, we uncover novel NAD-dependent proteins ADP-ribosylation systems involving novel ADP-ribosyltransferases. Some of these are type-II toxin-antitoxin like systems with ART and different ribosylglycohydrolase enzymes analogous to the DraG-DraT system. We present evidence that some of these TA-like systems are likely to regulate certain restriction-modification enzymes in bacteria. We also show that eukaryotic relatives of such ARTs constitute a novel family typified by NEURL4. This leads to a key prediction that ADP-ribosylation of specific proteins in conjunction with ubiquitination might be a critical step in centrosomal assembly. Other ARTs represent a novel group of bacterial polymorphic toxins deployed by contact, T6SS and T7SS/Esx. The ADP-ribosyltransferases found in these, the bacterial polymorphic toxin and host-directed toxin systems of bacteria such Waddlia also throw light on the evolution of this fold and the origin of eukaryotic polyADP-ribosyltransferases. We also infer a novel biosynthetic pathway that might be involved in the synthesis of a nicotinate-derived compound in conjunction with an asparagine synthetase and AMPylating peptide ligase. This work has also yielded some additional novel domains involved in NAD metabolism. To read the paper, click here.

Monday, February 13, 2012

How are nucleosomes differentially repositioned?

A recent discovery by us has helped identify a common denominator the defines the structural basis for nucleosomal repositioning by the ISWI clade of SWI2/SNF2 ATPases.

One feature that sets eukaryotes apart from other forms of life are the multiple essential SWI2/SNF2 ATPases that are at the center of several functionally distinct chromatin remodeling complexes. Our earlier studies had suggested that the SWI2/SNF2 ATPases were probably introduced to eukaryotes from a restriction-modification system of bacterial provenance, wherein it probably facilitated the access of target sites by restriction/modification enzymes (click here to read). We also established that a spectacular radiation of SWI2/SNF2 ATPases, which happened in the period between the first eukaryotic common ancestor and the last eukaryotic common ancestor, spawned the clades of most major chromatin remodeling SWI2/SNF2 ATPases (click here to read).

In functional terms the characterized chromatin remodeling SWI2/SNF2 ATPases can be divided into three broad classes: 1) Those utilizing actin-like proteins. This class might be further divided into those which associate with the Reptin/pontin AAA+ ATPases, i.e. the INO80-like class and those which associated with SWIRM domain containing subunits, i.e. the Brahma-like class. 2) The CHD/MI-2 like remodelers. 3) ISWI remodelers. All these classes can be traced to the last eukaryotic common ancestor. Of course beyond these there are the Rad54-like, Rad5-like and Strawberry notch like versions which are much less understood (see this for a detailed classification of the SWI2/SNF2 ATPases). Of these the Brahma-like remodelers may slide or eject nucleosomes from chromatin. The Ino80-like remodelers include versions that facilitate exchange of canonical nucleosomes with those containing H2A.Z, promoting transcriptional activation by facilitating transcription start site exposure. The CHD/MI-2 like remodelers also tend to slide or eject nucleosomes in both repressive and activating contexts. The ISWI-like remodelers are unique in regulating nucleosome spacing – they might either optimize (e.g. the ACF and CHRAC) it to facilitate repression or randomize it to facilitate transcriptional activation and are the focus of this post.

Several effects have been attributed to the ISWI-like complexes: In Drosophila the loss of dACF1 reduces nucleosome spacing periodicity and shortens the length of DNA per nucleosome. Loss of ISWI in Drosophila results in major decondensation of the male X chromosome and to some degree also the polytene chromosomes. The WICH complex, which combines an ISWI ATPase with the WAC domain tyrosine kinase containing WSTF protein, phosphorylates tyrosine 142 of H2A.X in course of nucleosome repositioning during DNA repair. In vertebrates several distinct ISWI-like complexes have been identified: 1) ACF; 2) CHRAC; 3) WICH; 4) NoRC; 5) WCRF; 6) CECR2-embryonic stem cell/germline; 7) CECR2-somatic cell; 8) NURF. Of these the first six have SNF2H and CECR2-stem cell/germline as the ISWI ATPase, whereas NURF and CECR2-somatic cell have SNF2L as their ATPase subunit. These complexes have been shown to have biological roles by mediating different nucleosomal repositioning events. Prior experiments have demonstrated that their accessory subunits have a role in sensing linker DNA and thereby possibly regulating nucleosomal spacing (Click here to read). However, it remained unknown as to how exactly this was achieved.

It was in this context that we were able to use sequence analysis and comparisons with known structures to develop a unified mechanism (Click here to access the paper). First, using sequence profile searches we were able unify all the large accessory subunits of ISWI ATPases across eukaryotes, such as hACF1, WSTF, RSF1, TIP5, WCRF180, BPTF,  yeast Itc1, Ioc3 and Esc8, and the plant HB1 and MBD9 as having a common conserved module. This module is largely alpha helical and is characterized four conserved motifs. The first of these motifs maps to the previously identified DDT motif (however, previously not known from Ioc3); the remaining three motifs are termed the WHIM motifs 1 to 3. Recently, a remarkable structural study by the Richmond group revealed that Ioc3 interacts with the C-terminus the ISWI ATPases, which are characterized by a HAND, SANT and SLIDE domain. These interact with nucleosomal linker DNA and Ioc3. Ioc3 in turn also interacts with nucleosomal linker DNA and together with the C-terminal region of the ISWI protein constitutes a protein ruler that measure out the spacing between two adjacent nucleosomes in a dinucleosome (Click here to read). What our sequence, and structure based unification did was to generalize the findings developed from Ioc3 across all large accessory subunits of ISWI ATPases. As a result we were able show that the DDT and the WHIM1 and WHIM2 motifs tightly pack with each other to form a binding pocket for the trihelical tip of the SLIDE domain in the ISWI ATPase. Based on this mapping, the highly conserved basic residue in WHIM1 is identified as a key feature involved in packing with the DDT motif, and the acidic residue from the GxD signature of WHIM2 emerges as a major determinant of the interaction between the ISWI and its WHIM motif partners. WHIM3 on the other hand, along with the N-terminal portion of WHIM2, constitutes the inter-nucleosomal linker DNA binding site which contacts it in the major groove. This is the major recognition unit for the outer or the external linker DNA element of the dinucleosome. The helix-turn-helix SANT domain from the ISWI ATPase makes a similar DNA contact with the inner linker DNA element in the dinucleosome. Thus, the principle of the protein ruler is a common feature of all ISWI large accessory subunits that is determined by the DDT and WHIM motifs.

Second, most of these proteins have multiple domains for the recognition of histone H3 N-terminal peptides (PHD finger), acetylated histone peptides (bromodomains), monoubiquitinated peptides (the “little finger” type Ub-binding Zn-ribbon), phosphorylated peptides (SJA/FYR) and methylated peptides (AGENET, BMB/PWWP and AUX-RF, a novel Chromo-like domain). Additionally, others like HB1 and MBD9 in plants, BPTF, BAZ2A/B, CECR2 in animals, and previously uncharacterized proteins in chlorophytes and stramenopiles contain DNA-binding domains such as the HARE-HTH, histone H1, CENB-HTH, TAM(MBD), homeo, HMG, BRIGHT, CXXC and AT-hooks. Of these the TAM(MBD) domain in the plant MBD9 proteins is predicted to specifically bind methylated CpG dinucleotides, whereas that in the animal BAZ2 proteins is unlikely to have specific methylated CpG recognition capabilities. The CXXC domain also recognizes the CpG sequence, though most versions prefer unmethylated targets. We have also proposed that the HARE-HTH has a possible role for in discriminating modified DNA. Thus, it appears that a common theme in the WHIM motif proteins is their coupling of measuring out of inter-nucleosomal distant with diverse domains involved in discriminating or catalyzing epigenetic modifications of histones or recognition of specific DNA features such as inter-nucleosomal linker regions and distorted DNA (e.g., histone H1, HMG, BRIGHT domains and AT-hooks) or discrimination of modified DNA marks (CXXC, TAM/MBD and HARE-HTH). One group of WHIM motif proteins from certain chlorophyte, rhodophyte and stramenopile algae combine the WHIM motifs with a RFD module, which is found at the N-termini of the DNMT1 methyltransferase. The RFD module consists of a circularly permuted version of the Sm domain fused to a HTH domain and has been demonstrated to be a key player in heterochromatinization by recruiting repressive proteins such as HDAC2.This suggests that these WHIM motif proteins might couple ISWI-dependent nucleosomal positioning with heterochromatin formation. Another interesting architecture seen in oomycetes combines the WHIM motifs with a Werner’s syndrome type DNA repair nuclease with 3'-5' exonuclease and HRDC domains, suggesting that in these organisms the ISWI-catalyzed chromatin repositioning might be directly combined with DNA repair.

In evolutionary terms the DDT-WHIM proteins and ISWI ATPases can be considered a synapomorphy of eukaryotes suggesting that guided nucleosome positioning was a phenomenon that was already present in the last eukaryotic common ancestor. On the whole, the independent diversity of the domain architectures of paralogous ISWI accessory large subunits in several distinct eukaryotic lineages points to an important role for distinct nucleosome position patterns in facilitating different sets of biological processes. In particular, it would be of great interest to investigate the role of the lineage-specific expansion of the DDT-WHIM motif proteins in ciliates. These unicellular eukaryotes do not have differentiated tissues like animals or plants that also show a multiplicity of DDT-WHIM motif proteins. But they show two functionally distinct types of nuclei – the transcriptionally active macronucleus being derived from the micronucleus following their sexual cycle. The macronucleus is characterized by drastic genomic rearrangements and lack of mitotic chromosome condensation and segregation. We suspect that the lineage-specific expansion of WHIM-DDT proteins in ciliates directly relates with the need for ISWI-dependent maintenance of particular nucleosomal positions in the macronucleus (Click here to access the paper).  Our extensive supplement can also be accessed here.