![]() |
"I tired mid-season. I don't know why, but I just couldn't get going again" -Lou Gehrig (1938). One of the greatest American baseball first baseman was diagnosed with amyotrophic lateral sclerosis (ALS) in June 1939 and died two year later.
Amyotrophic lateral sclerosis, or Lou Gehrig’s disease, is a debilitating motor neuron disease characterized by rapidly progressive muscular degenerations resulting in fatal paralysis. ALS is often known to occur in individuals with no family history. Studies on the inherited form, familial ALS, have shown that mutations in any of 19 genes can cause ALS, and analysis of the gene list does not reveal an obvious common thread. The neuropathological features include degeneration of corticospinal tracts and loss of lower motor neurons, and several distinct cell types in the primary motor cortex, and gliosis in the motor cortex and spinal cord. Like other neural diseases such as Parkinsonism or Alzheimer’s disease, the degenerating neurons show inclusion bodies of insoluble proteins, or proteins in complex with RNA. Several proteins have been reported in these inclusion bodies including ubiquitin, superoxide dismutase, peripherin, Dorfin, intermediate filament proteins and cystatin-C. Thus, although it has been a while since people recognized ALS, the many distinct causes and varying pathologies have posed great challenges with respect to diagnosis, treatment and understanding the mechanisms of the disease. It is in this regard that recent work from our group, while clarifying the origins of a particular type of ALS, adds new wrinkles to the story and opens up new doors for other poorly characterized proteins, some of which are implicated in human disease.
In 2011, two independent groups (DeJesus-Hernandez et al., Renton et al.) found that a mutation in the human gene C9orf72, is strongly associated with ALS and fronto-temporal dementia (FTD). This was the first gene that linked both these conditions. More precisely, the mutation involves an expansion of the hexanucleotide GGGGCC in the first intron of C9orf72. The absence of a protein defect led researchers to propose that the pathology of this disease may result from RNA-dominant toxicity or haploinsufficiency, supported by the presence of inclusion bodies with the RNA binding protein TDP-43. As of today, over 150 studies have been published regarding the role of C9orf72 mutations in ALS, and also many other neurodegenerative diseases such as Alzheimer's disease and mild cognitive impairment. Using sensitive sequence and structure analyses, we unified the C9orf72 to a well-known family of GDP-GTP exchange factors (GEFs) for Rab GTPases known as the DENN module, divergent versions of which were also recently identified in Folliculin (Nookala et al.). Mutations in Folliculin cause the Birt-Hogg-Dubé syndrome. Additionally, we showed that the Folliculin interacting proteins FNIP1/2, the nitrogen permease regulators 2 and 3 (Npr2 and Npr3), and the SMCR8 protein encoded by a gene in the Smith-Magenis syndrome candidate region also contain DENN modules. Unification of these proteins to a module that partners with the Rab GTPases connects them to intracellular vesicular trafficking events. This opens a new angle for ALS pathology, i.e the possibility of a vesicular trafficking defect in individual with the C9orf72 mutation. Defects in vesicular trafficking proteins have been previously implicated in phenotypically comparable neurological diseases. For example, mutations in ALS2, which has been proposed to function as a GEF for Rab5, result in an infantile onset motor neuron disease similar to ALS from C9ORF72 mutations. Likewise, an adult onset atypical ALS ensues from mutations in VAPB (ALS8), which is a vesicular trafficking protein. A mutation in the dynactin gene responsible for distal hereditary motor neuronopathy type VIIB (HMN7B; distal spinal and bulbar muscular atrophy or dSBMA), might also result from defects in vesicular trafficking on microtubule tracks by the dynein motor (Laird et al.). Impairment of intracellular trafficking is also a commonly observed theme in several neural diseases such as Huntington’s, Parkinson’s, Niemann-Pick Type C, and Alzheimer’s disease. The DENN module..... to be continued. For now you can read our paper (Zhang et. al) and also a news story related to our work. |
Research Highlights of the Aravind group
Tuesday, February 5, 2013
A common thread in amyotrophic lateral sclerosis, Fronto-temporal dementia, Birt-Hogg-Dubé syndrome, vesicular trafficking and bacterial cell polarity
Wednesday, September 5, 2012
Origin of multicellularity – the bacterial connection
![]() |
| From Dayel et al. |
Recently there has been some interest regarding work on the choanoflagellate Salpingoeca rosetta and it transition to multicellularity induced by the sulfonolipid produced by its prey, the bacteroidetes Algoriphagus (click to refer). This is interesting because it is consonant with a concept we have been articulating in print over the last 13 years: genetic material encoding particular protein domains which were horizontally transferred from bacteria were directly responsible for the origin of multicellularity in eukaryotes. We were first alerted to this possibility when we discovered the first caspases, AP-ATPases and TIR (Toll-interleukin) domains in bacteria ( click to refer). These molecules were just then emerging as key mediators of apoptosis in metazoans. This led to the idea that apoptosis, which is a key manifestation of multicellularity emerged directly on account of molecules acquired through lateral transfer from bacteria. We further developed this concept in a detailed sequence analysis of apoptosis mediators that became available as consequence of various genome projects and described this in a paper concomitant with the announcement of the human genome (click to refer). Subsequently, in another article we pointed out that many key aspect of multicellularity, both in terms of signaling and organization have had their ultimate origin in bacteria (click to refer). In terms of signaling, we were able to show that some major metazoan pathways such as the Notch pathway, which is involved in asymmetric cell-division, apoptotic pathways, and cell-cell signaling pathways, e.g. the nitric oxide signaling pathway have crystallized on account of components, whose origins lay in lateral transfers from bacteria. For example, in the Notch pathway the Swi2/Snf2 ATPase protein, Strawberry notch has emerged from bacterial DNA-modification systems related to restriction-modification systems. On the other hand, we showed that the nitric oxide/ carbon monoxide receptor domains emerged from comparable bacterial signaling domains (click to refer). On the organizational side, we were able to show that the origin of key cell-cell adhesion mediating domains (click to refer) also lay in bacteria – in particular we showed that the cadherin, Ig, FNIII and TIG domains emerged from various bacterial proteins with roles in cell-cell adhesion in bacteria, probably in the context of bacterial multicellularity and biofilm formation (click to refer). For a summary of our views one might refer to our paper on the origin of multicellularity (click to refer).
Our recent studies on 2-oxoglutarate and iron dependent (2OGFeDO) and Jumonji-related dioxygenases provide insights into the origins of a quintessential animal molecule collagen and the enzyme required for its biogenesis – the prolyl hydroxylase (click to refer). We uncovered several operons in bacteria that combine genes for one or more distinct 2OGFeDOs, namely amino acid beta-hydroxylase phytanoyl CoA and AlkB-like hydroxylases, with distinct versions of methyltransferase and sulfotransferase domains-containing proteins. These operons might also encode phosphoadenosine phosphosulfate synthetases, acetyltransferases either or both of two types of non-enzymatic proteins: (i) a member of the bacteriophage tail–collar family prototyped by the phage T4 short tail–fiber protein. (ii) Secreted glycine-rich peptides, some of which have a similar pattern of tripeptide repeats as seen in animal collagen. These operonic contexts suggest that the bacteria possessing them might produce collagen-like protein, which are modified by hydroxylation just like their animal counterparts. Indeed, this suggests that a collagen-precursor and its modifying enzymes were acquired from a bacterial source through the lateral transfer of such an operon played a role in the origin of animal by furnish a major component of animal extracellular matrices. Interestingly, the presence of sulfotransferase and phosphoadenosine phosphosulfate synthetases points to sulfate modification, which are also an essential feature of the animal extracellular matrices. On a more general note we observed that related sulfotransferases are fused to Jumonji-related extracellular dioxygenases of the FIH1 family in the choanoflagellate Monosiga (in most organisms they are intracellular and even nuclear proteins). This is particularly interesting in the context of the multiply hydroxylated sulfonolipid reported as being the multicellularity inducing agent secreted by Algoriphagus. Indeed the phytanoyl CoA hydroxylase-like, FIH1-like and sulfotransferase enzymes found in these operons can potentially participate in the synthesis of such metabolites. Therefore, we already have potential candidates for the biosynthesis of the multicellularity inducing agent and also evidence that genes for the synthesis of such molecules have been laterally transferred from bacteria to choanoflagellates.
In more recent times we have been particularly interested in protein toxins and other effectors deployed in intra- and inter- genomic and organismal conflict across life. These studies have also yield a several key clues regarding the bacterial contributions to the emergence of multicellularity among eukaryotes, including metazoans. Several such contributions have been described in our recent monograph of polymorphic toxin systems (click to refer) and will be outlined in a future post.
Tuesday, August 7, 2012
New direction on the function of GRAS proteins in gibberellin signaling
![]() |
| Gibberellin GA-1 |
![]() |
| Russeting in apples |
Our recent studies help clarify the situation. We showed that the GRAS family actually belongs to the Rossmann-fold methyltransferase superfamily. We establish that the GRAS family first emerged in bacteria and plant versions represent a case of lateral gene transfer prior to the radiation of land plants. We further show that all bacterial, and a subset of plant GRAS proteins are likely to function as small molecule methylases, but the remaining plant members have lost one or more AdoMet (SAM)-binding residues while preserving their substrate-binding residues. Thus, based on sequence- structure analysis, combined with functional evidence, we predict that GRAS proteins might either modify or bind small molecules, which might include GAs or their derivatives.
Our results have thus falsified the previously-published relationships that were proposed for the GRAS proteins, and more importantly throw a completely new spin on their mode of action in the context of GA binding or modification. One delicious possibility is that the active versions function as methylases that might modify certain GAs or their derivatives, whereas inactive versions act as GA binding proteins (Experimentalists take note). While a GA receptor belonging to the alpha/beta hydrolase superfamily has been described previously, the functional evidence suggests that not all aspects of the GA signaling are channelized via that receptor. Hence, the possibility of direct interaction between a GA or its modified derivative with the GRAS methylase domain remains open and a potentially important avenue for signaling. In addition, very little is known of the fate and prevalence of GA methylation which is a mechanism of GA deactivation in angiosperms. The currently characterized GA methylases (GAMT1 and GAMT2) which are also Rossmann-fold methylases belonging to a radiation of plant methylases of ultimately bacterial origin, includes enzymes that methylate carboxy, hydroxyl and amino groups in synthesis of plant metabolites like caffeine, theobromine, methyl salicylate, and methyl jasmonate among others. In Arabidopsis, these are primarily expressed in the siliques (fruits) including the seeds and are believed to deactivate GAs via methylation and subsequent degradation during the maturation of seeds. One possibility is that such a methylation dependent control of GAs also occurs in other parts and other developmental processes via the action of GRAS family methylases. The possibility of the inactive versions of the GRAS proteins binding methylated or other modified GAs is also an avenue for possible functional studies.
It should be noted that our phylogenetic analysis (see figure above) suggests that the GRAS superfamily was delivered to plants via a single lateral transfer from bacterial prior to the diversification of land plants -- this ancestral plant GRAS protein underwent a lineage-specific expansion into 13 distinct well-supported clades that contained at least one representative from bryophytes, lycopodiophytes and angiosperms. At face value, assuming a direct GA-related role for the GRAS family, this would suggest that the GA-like molecules were already functional in the early history of land plants. This clearly goes contrary to certain suggestions of plant evolutionists that GA-like molecules were absent in bryophytes like Physcomitrella, but supports recent experimental results suggesting a role for GA-like molecules in caulonema formation, growth direction of protonemata, and spore germination these mosses (Hayashi et al). Our findings suggest that the predicted small-molecule binding/modifying activity would extend to the base of land plants and could have bearing on the enigma of the role of GA-like molecules in basal land plants. For more details, you can read our paper here.
Wednesday, February 22, 2012
NAD, ARTs and ARGs: new players and biochemistries
Our studies have been steadily revealing the deep evolutionary connections between systems involved in cofactor, amino acid and secondary metabolite biosynthesis, and those involved in modifications of proteins and nucleic acids. For example, the origin of several eukaryotic enzymes that add or remove a methyl group on lysines and arginines in histones and other proteins can be directly traced to bacterial pathways involved in synthesizing peptide-derived antibiotics and siderophores (Click on numbers to read various papers : [1] [2]). In a similar vein, multiple components of the peptide ligation and deubiquitination pathways in the eukaryotic ubiquitin system show evolutionary relationships to enzymes involved in diverse bacterial biosynthetic systems for cofactors (thiamine and molybdopterin), siderophores, antibiotics and the amino acid cysteine (click numbers to read papers : [3] [4] [5]). Enzymes catalyzing other major forms of peptide tagging of proteins in eukaryotes, e.g. protein polyglutamylation, polyglycination and tyrosinylation also display evolutionary connections to peptide ligases involved in diverse prokaryotic pathways for the biosynthesis of various antibiotics, the amino acid lysine and cofactors like peptidylated tetrahydrosarcinapterin (a folate-like pterin derivative) and F420 (a flavin-like molecule) (Click to access paper). The generality of this theme is further reinforced by the evolutionary links between enzymes catalyzing other forms of peptide tagging of proteins, such as pupylation and protein arginylation/leucylation, and enzymes mediating peptide-bond formation, respectively, in the synthesis of the peptide cofactor glutathione, and a variety of compounds, such as peptidoglycan and peptide-modified lipids (Click numbers to access papers: [7] [8]). Thus, the ultimate origin of numerous enzymes involved in covalent modifications of proteins and nucleic acids, particularly in eukaryotic regulatory systems, can be linked to enzymes catalyzing similar reactions in bacterial biosynthetic systems specializing in the production of cofactors, amino acids and metabolites such as antibiotics, siderophores and cell-cell communication molecules.
We now consider the links between the biosynthetic and regulatory pathways centered on the ancient and ubiquitous metabolite, nicotinamide adenine dinucleotide (NAD) or its phosphorylated derivative NADP. NAD fits particularly well into the above-discussed patterns because it is both a cofactor for numerous enzymes as well as substrate for numerous protein- and nucleic acid-modifying reactions. As a cofactor it functions as one of the central redox molecules or hydrogen-carriers in the cell for reactions catalyzed by several diverse oxidoreductases, usually of the Rossmann fold. As a substrate in protein and nucleic acid modification it supplies the ADP ribose moiety for modification of side chains of amino acids such as glutamate, glutamine, lysine, asparagine, cysteine and diphthamide (a modified histidine) and arginine and guanine in DNA. The most common superfamily of enzymes that catalyze such reactions unites the ADP ribosyltransferases (ARTs), which catalyze the transfer of a single ADP ribose moiety to a target molecule, and polyADP ribose polymerases/polyADP ribose transferases (PARPs/PARTs) that transfer multiple such moieties to form branched or straight chain ADP ribose polymers. A nucleic acid-modifying ART is the RNA 2’phosphotransferase KptA/Tpt1, a RNA-repair enzyme that transfers the 2’ phosphate, which is generated as a result of tRNA splicing and RNA ligase action, to NAD, resulting in the generation of ADP-ribose 1”-2” cyclic diphosphate (Appr>p) and release of nicotinamide. The rifamycin ART, which is related to above RNA-processing enzyme, instead inactivates the antibiotic by ADP ribosylation of a hydroxyl group on its carbon.
In recent years there has been tremendous progress in terms of structural and biochemical understanding of ARTs, PARTs, sirtuins, MACROs and several NAD biosynthesis enzymes. There have also been several efforts in terms of sequence analysis leading to the discovery of novel ART superfamily enzymes and tremendous interest in the connections between NAD metabolism and the dynamics of heterochromatin formation, especially in the context of organismal aging. Our comparative genomic and sequence analyses of NAD-utilizing and synthesizing enzymes has led to the identification of a novel enzymatic fold that appears to have supplied multiple distinct families of proteins implicated in NAD/ADP ribose metabolism in diverse contexts. Using contextual analysis we show that some of these proteins potentially act in the context of RNA repair, where NAD is used to remove 2'-3' cyclic phosphodiester linkages. Likewise, we uncover novel NAD-dependent proteins ADP-ribosylation systems involving novel ADP-ribosyltransferases. Some of these are type-II toxin-antitoxin like systems with ART and different ribosylglycohydrolase enzymes analogous to the DraG-DraT system. We present evidence that some of these TA-like systems are likely to regulate certain restriction-modification enzymes in bacteria. We also show that eukaryotic relatives of such ARTs constitute a novel family typified by NEURL4. This leads to a key prediction that ADP-ribosylation of specific proteins in conjunction with ubiquitination might be a critical step in centrosomal assembly. Other ARTs represent a novel group of bacterial polymorphic toxins deployed by contact, T6SS and T7SS/Esx. The ADP-ribosyltransferases found in these, the bacterial polymorphic toxin and host-directed toxin systems of bacteria such Waddlia also throw light on the evolution of this fold and the origin of eukaryotic polyADP-ribosyltransferases. We also infer a novel biosynthetic pathway that might be involved in the synthesis of a nicotinate-derived compound in conjunction with an asparagine synthetase and AMPylating peptide ligase. This work has also yielded some additional novel domains involved in NAD metabolism. To read the paper, click here.
Monday, February 13, 2012
How are nucleosomes differentially repositioned?
A recent discovery by us has helped identify a common denominator the defines the structural basis for nucleosomal repositioning by the ISWI clade of SWI2/SNF2 ATPases.
One feature that sets eukaryotes apart from other forms of life are the multiple essential SWI2/SNF2 ATPases that are at the center of several functionally distinct chromatin remodeling complexes. Our earlier studies had suggested that the SWI2/SNF2 ATPases were probably introduced to eukaryotes from a restriction-modification system of bacterial provenance, wherein it probably facilitated the access of target sites by restriction/modification enzymes (click here to read). We also established that a spectacular radiation of SWI2/SNF2 ATPases, which happened in the period between the first eukaryotic common ancestor and the last eukaryotic common ancestor, spawned the clades of most major chromatin remodeling SWI2/SNF2 ATPases (click here to read).
In functional terms the characterized chromatin remodeling SWI2/SNF2 ATPases can be divided into three broad classes: 1) Those utilizing actin-like proteins. This class might be further divided into those which associate with the Reptin/pontin AAA+ ATPases, i.e. the INO80-like class and those which associated with SWIRM domain containing subunits, i.e. the Brahma-like class. 2) The CHD/MI-2 like remodelers. 3) ISWI remodelers. All these classes can be traced to the last eukaryotic common ancestor. Of course beyond these there are the Rad54-like, Rad5-like and Strawberry notch like versions which are much less understood (see this for a detailed classification of the SWI2/SNF2 ATPases). Of these the Brahma-like remodelers may slide or eject nucleosomes from chromatin. The Ino80-like remodelers include versions that facilitate exchange of canonical nucleosomes with those containing H2A.Z, promoting transcriptional activation by facilitating transcription start site exposure. The CHD/MI-2 like remodelers also tend to slide or eject nucleosomes in both repressive and activating contexts. The ISWI-like remodelers are unique in regulating nucleosome spacing – they might either optimize (e.g. the ACF and CHRAC) it to facilitate repression or randomize it to facilitate transcriptional activation and are the focus of this post.
Several effects have been attributed to the ISWI-like complexes: In Drosophila the loss of dACF1 reduces nucleosome spacing periodicity and shortens the length of DNA per nucleosome. Loss of ISWI in Drosophila results in major decondensation of the male X chromosome and to some degree also the polytene chromosomes. The WICH complex, which combines an ISWI ATPase with the WAC domain tyrosine kinase containing WSTF protein, phosphorylates tyrosine 142 of H2A.X in course of nucleosome repositioning during DNA repair. In vertebrates several distinct ISWI-like complexes have been identified: 1) ACF; 2) CHRAC; 3) WICH; 4) NoRC; 5) WCRF; 6) CECR2-embryonic stem cell/germline; 7) CECR2-somatic cell; 8) NURF. Of these the first six have SNF2H and CECR2-stem cell/germline as the ISWI ATPase, whereas NURF and CECR2-somatic cell have SNF2L as their ATPase subunit. These complexes have been shown to have biological roles by mediating different nucleosomal repositioning events. Prior experiments have demonstrated that their accessory subunits have a role in sensing linker DNA and thereby possibly regulating nucleosomal spacing (Click here to read). However, it remained unknown as to how exactly this was achieved.
It was in this context that we were able to use sequence analysis and comparisons with known structures to develop a unified mechanism (Click here to access the paper). First, using sequence profile searches we were able unify all the large accessory subunits of ISWI ATPases across eukaryotes, such as hACF1, WSTF, RSF1, TIP5, WCRF180, BPTF, yeast Itc1, Ioc3 and Esc8, and the plant HB1 and MBD9 as having a common conserved module. This module is largely alpha helical and is characterized four conserved motifs. The first of these motifs maps to the previously identified DDT motif (however, previously not known from Ioc3); the remaining three motifs are termed the WHIM motifs 1 to 3. Recently, a remarkable structural study by the Richmond group revealed that Ioc3 interacts with the C-terminus the ISWI ATPases, which are characterized by a HAND, SANT and SLIDE domain. These interact with nucleosomal linker DNA and Ioc3. Ioc3 in turn also interacts with nucleosomal linker DNA and together with the C-terminal region of the ISWI protein constitutes a protein ruler that measure out the spacing between two adjacent nucleosomes in a dinucleosome (Click here to read). What our sequence, and structure based unification did was to generalize the findings developed from Ioc3 across all large accessory subunits of ISWI ATPases. As a result we were able show that the DDT and the WHIM1 and WHIM2 motifs tightly pack with each other to form a binding pocket for the trihelical tip of the SLIDE domain in the ISWI ATPase. Based on this mapping, the highly conserved basic residue in WHIM1 is identified as a key feature involved in packing with the DDT motif, and the acidic residue from the GxD signature of WHIM2 emerges as a major determinant of the interaction between the ISWI and its WHIM motif partners. WHIM3 on the other hand, along with the N-terminal portion of WHIM2, constitutes the inter-nucleosomal linker DNA binding site which contacts it in the major groove. This is the major recognition unit for the outer or the external linker DNA element of the dinucleosome. The helix-turn-helix SANT domain from the ISWI ATPase makes a similar DNA contact with the inner linker DNA element in the dinucleosome. Thus, the principle of the protein ruler is a common feature of all ISWI large accessory subunits that is determined by the DDT and WHIM motifs.
Second, most of these proteins have multiple domains for the recognition of histone H3 N-terminal peptides (PHD finger), acetylated histone peptides (bromodomains), monoubiquitinated peptides (the “little finger” type Ub-binding Zn-ribbon), phosphorylated peptides (SJA/FYR) and methylated peptides (AGENET, BMB/PWWP and AUX-RF, a novel Chromo-like domain). Additionally, others like HB1 and MBD9 in plants, BPTF, BAZ2A/B, CECR2 in animals, and previously uncharacterized proteins in chlorophytes and stramenopiles contain DNA-binding domains such as the HARE-HTH, histone H1, CENB-HTH, TAM(MBD), homeo, HMG, BRIGHT, CXXC and AT-hooks. Of these the TAM(MBD) domain in the plant MBD9 proteins is predicted to specifically bind methylated CpG dinucleotides, whereas that in the animal BAZ2 proteins is unlikely to have specific methylated CpG recognition capabilities. The CXXC domain also recognizes the CpG sequence, though most versions prefer unmethylated targets. We have also proposed that the HARE-HTH has a possible role for in discriminating modified DNA. Thus, it appears that a common theme in the WHIM motif proteins is their coupling of measuring out of inter-nucleosomal distant with diverse domains involved in discriminating or catalyzing epigenetic modifications of histones or recognition of specific DNA features such as inter-nucleosomal linker regions and distorted DNA (e.g., histone H1, HMG, BRIGHT domains and AT-hooks) or discrimination of modified DNA marks (CXXC, TAM/MBD and HARE-HTH). One group of WHIM motif proteins from certain chlorophyte, rhodophyte and stramenopile algae combine the WHIM motifs with a RFD module, which is found at the N-termini of the DNMT1 methyltransferase. The RFD module consists of a circularly permuted version of the Sm domain fused to a HTH domain and has been demonstrated to be a key player in heterochromatinization by recruiting repressive proteins such as HDAC2.This suggests that these WHIM motif proteins might couple ISWI-dependent nucleosomal positioning with heterochromatin formation. Another interesting architecture seen in oomycetes combines the WHIM motifs with a Werner’s syndrome type DNA repair nuclease with 3'-5' exonuclease and HRDC domains, suggesting that in these organisms the ISWI-catalyzed chromatin repositioning might be directly combined with DNA repair.
In evolutionary terms the DDT-WHIM proteins and ISWI ATPases can be considered a synapomorphy of eukaryotes suggesting that guided nucleosome positioning was a phenomenon that was already present in the last eukaryotic common ancestor. On the whole, the independent diversity of the domain architectures of paralogous ISWI accessory large subunits in several distinct eukaryotic lineages points to an important role for distinct nucleosome position patterns in facilitating different sets of biological processes. In particular, it would be of great interest to investigate the role of the lineage-specific expansion of the DDT-WHIM motif proteins in ciliates. These unicellular eukaryotes do not have differentiated tissues like animals or plants that also show a multiplicity of DDT-WHIM motif proteins. But they show two functionally distinct types of nuclei – the transcriptionally active macronucleus being derived from the micronucleus following their sexual cycle. The macronucleus is characterized by drastic genomic rearrangements and lack of mitotic chromosome condensation and segregation. We suspect that the lineage-specific expansion of WHIM-DDT proteins in ciliates directly relates with the need for ISWI-dependent maintenance of particular nucleosomal positions in the macronucleus (Click here to access the paper). Our extensive supplement can also be accessed here.
Saturday, January 21, 2012
On the origins of the bacterial transcription apparatus
Many years ago, a little after the first RNA polymerase structures were solved, we obtained several remarkable insights into the core transcription apparatus of life. We were the first to show that the RNA polymerase subunits, cognates of the bacterial beta and betaprime subunits, contain recognizable, evolutionarily conserved domains and that each of these subunits contribute a double-psi beta barrel domain to the active site. We also showed that the polymerase subunits accreted several other domains in a lineage-specific manner, which differ between the archaeo-eukaryotic and the bacterial subunits, and even within the bacterial versions. Our study also established the common origin of the RNA-dependent RNA polymerase involved in RNAi and the cellular DNA-dependent RNA polymerases (Click to access [Reference1] [Reference 2]).
Recently, we conducted a reanalysis of the bacterial transcription apparatus and from this study emerged several new insights that have refined or redefined our thinking on the origins of the transcriptional apparatus [Click to read]. Some of these new findings were discussed at greater length with a leading researcher in the field of transcription and a part of the correspondence is reproduced below as questions and answers.
Question: One of the new points uncovered in this study is the shared evolutionary ancestry of the archaeo-eukaryotic TFIIB and the bacterial sigma factor, based on structural similarity of the cognate HTH domains that interact with similar sites on the archaeo-eukaryotic and bacterial RNAP, respectively. Is the homology in any way reflected on the sequence level?"
Answer: The simple answer to the question is yes -- we can detect using different sequence profile methods statistically significant sequence similarity between the TFIIB and Sigma HTHs. In conclusion there is no doubt about their evolutionary relatedness and descent from a common ancestor (for example a comparisons of the HMMs of archaeal TFIIB orthologs with Sigma70-like superfamily using profile-profile comparisons; e.g. HHpred; gives p=1.8e-6 and probability of 86% and many more such lines of support). .
Question: Could the similarity between the transcription factor-RNAP interactions in the bacterial holo-RNAP and the RNAPII-TFIIB / RNAP-TFB complexes be a case of convergent evolution?
Answer: Several aspects of the interactions of the bacterial and archaeo-eukaryotic RNAPs are very likely to be convergent and we have no counter-argument in this regard. The main point is the orthology of sigma and TFIIB despite being distantly related (which seems likely now to us).
Question: The current consensus in the field is that there are no real sigma homologs in archaea, or eukaryotes. It is argued that the LUCA RNAP could have initiated in a transcription factor-independent manner, and that the sigma and TFIIB/TFB-related factors emerged in evolution following the split of the bacterial and archaeo-eukaryotic lineages.
Answer with a bit of history from LA): Many moons ago in our early days of sequence analysis we had studied the HTHs in considerable depth. One thing that became clear was that all these HTHs, be it sigma or TFIIB certainly shared a common origin (a view articulated in these papers that we wrote several years later pmid: 10556324 [Click to access]and another in 2005 pmid: 15808743 [Click to access]). As a result of these investigations it became clear that TFIIB (of course including TFB)/cyclin/RB and sigma are *real homologs*, but throughout that period the issue remained as to whether they were *real orthologs*. The reason being many other basal HTHs also show significant similarity to each. Of course, we could rule out things like TFIIE wHTH and MBF-like 4-helical HTHs from contending for ortholog-hood with sigma because they belong to different lineages of HTHs that have their own clear-cut bacterial cognates. But sigma remained unclear. In course of the above mentioned papers, I took a stance that indeed sigma and TFIIB, while being genuine homologs, were independent recruitments as basal TFs which interacted with the RNAP. But since 2005 we got an opportunity to understand the RNA polymerase evolution better using the template of our earlier studies on these proteins (12553882 [Click to access], 15194191 [Click to access]) aided by the various versions from diverse selfish elements that offered potential evolutionary intermediates. So in conclusion it became clear that they began as RNAPs that could have initiated transcription factor-independently, especially given that they lacked any specially adaptation to interact with TFs or had inbuilt HTH domains that might have substituted for the TF. But the beta cognates of the RNAPs of cellular life were unified by one striking synapomorphy in the form of the insertion of the SBHM within the catalytic DPBB domain that could not have been convergence. The emergence of this insert would indicate the emergence of interactions of a DNA-bound TF as it plays this role in all the three superkingdomains of life and is absent in the RdRP-like RNA polymerases (e.g. YonO) and RNAPs of selfish elements such as the NCgl1702-type RNAPs. This, taken together with the homology of the sigma and TFIIB, and the fact they have double HTHs, made us reconsider our former position and accept the more simple explanation of sigma and TFIIB being orthologs, albeit distant in sequence. Of course this divergence in sequence is not surprising with lot of independent action happening around them such as emergence of TBP in the archaeo-eukaryotic lineage etc.
Question: If the primordial ribozyme RNAP evolved into the extant multisubunit RNAP by recruiting a dimeric DPBB protein cofactor which usurped the active site, and over time increased the subunit complexity to result in the extant multisubunit RNAP, where does that leave the single subunit enzymes? Did they emerge later, earlier, or at the same time? Different members of extant single subunit nucleic acid polymerases have all these activities (RNAPs, DNAPs, RT etc.). Assuming that they would have predated multisubunit RNAPs, when did the change of guard occur, and for what functional reasons/selective advantages?
Answer: Currently we can list the following major independent inventions of RNA polymerase activity:
Within the RRM-like fold or the classical palm-containing polymerases: 1.1) The RNA viral RdRPs; 1.2) the THG1 (5'->3')-CRISPR-like RNA polymerases (at least some are RdRPs) and the 1.3) Phage T7-like RNAPs. Within the RRM-like fold with a flange: 2) archaeo-eukaryotic type primases. Within the TOPRIM fold: 3) DNAG-like primases. Within the pol-beta fold: 4) CCA-adding enzyme-poly A polymerase-like. Within the DPBB fold: 5) The double barrel RdRPs and DdRPs.
While there were many inventions of RNA polymerases, the following observations seem to hold: The RNAPs in the group 1.1 are the main replicative enzymes that replicate RNA in independent replicons. While the double-psi barrel RdRPs replicate small RNAs in the eukaryotic RNAi system, there is no evidence currently for them being dominant replicative enzymes of large replicons. The RdRPs in group 1.1 are further closely related to the replicative reverse transcriptases, which appear to have a single origin. On the other hand, representatives from 1.1, 1.3, 2, 3 and 5 can be associated with replication in the context of the synthesis of the RNA primer for DNA replication. Additionally, the primpols from group 2 can replicate DNA after initiating it with a RNA polymerase activity for priming. We are of the opinion that indeed RNA was more likely the primary nucleic acid (supported by: 1) its catalytic and replicative capacity; 2) its association with polypeptide templating, and 3) the priming problem making DNA a difficult starting genome. This conclusion, combined with the above observations regarding the RNAPs of group 1 and the relationship to RTs, leads us to propose that the polymerases from the 1.1. group were the first to emerge. They enabled the rise of DNA genomes with the origin reverse transcribing ability as they radiated. The emergence of DNA in turn offered a new niche for RNA polymerases due to the priming problem. This selective force appears to resulted in the emergence of multiple RNA primer synthesizing enzymes (early representatives of 1.3, 2, 3 and 5) as evidenced by the above observations. Even at this stage it is possible that there was a reverse transcribing intermediate in replication, which also helped solve the transcription problem for DNA replicons. The rise of large DNA replicons appears to have placed the pressure for transcription-specific RNAPs. This unique niche appears to have favored two major groups of RNA polymerases -- 1.3 and 5, but in the lineage leading to the cellular replicons 5 seems to have dominated. We suspect that the elements of the architecture of the double-psi beta barrel polymerases allowed them to be more effective transcription enzymes due to: 1) their ability to initiate transcription at internal sites independently of a replication origin signal for which the other enzyme were optimized; 2) their offering interfaces for regulation -- in particular the distinctive bihelical extension preceded by two extended segments forming a standalone haripin in beta-prime. The latest analysis of the evolution of double-psi beta barrel RNAPs suggests that they two began as a fusion of two DPBBs in a single polypeptide followed by a split prior to LUCA.
Question: Could these accretions have been responsible for improved regulatory potential or higher fidelity? In that context it is noteworthy than no single subunit RNAP can 'backtrack' and undergo transcript cleavage.
Answer: The addition of subunits, basal TFs and SBHMs and other domains do clearly point in the direction of continuous evolution favoring higher fidelity and regulatory potential. In particular it might have helped provide robustness to this central cellular system in face of mutational "attack" -- over-engineering.The last point of the question is of note and might have been a selective force in the later evolution of the RNAPs.
Question: Since the RNAP are predicted to have their origins as ribozymes and went through an RNA-protein stage, why is the ribosome apparently slower in losing its RNA components, as compared to nucleic acid polymerases.
Answer: First, regarding the ribosome where the RNA plays a role in peptidyltransfer: We have recently extensively studied the emergence of peptide bond forming activity in protein enzymes (pmid: 20023723 [Click to access], 20678224 {Click to access]). There were at least 11 independent inventions of peptide ligase activity, but an examination of each of these suggest that they are unable to handle the reaction in an amino acid independent manner. This inability of the protein peptide ligases might have allowed the RNA to persist. Further, a look at the other ancient ribozyme RNAse P suggests that shape selective recognition of nucleic acid structures, which is a feature it shares with the ribosomal RNAs might be a key factor that cannot be entirely reproduced by proteins. In these cases the ribozymes certainly would persist. Further RNA is also a better scaffold than proteins in certain contexts and it continues to be used as such in contexts like the eukaryotic Polycomb RNAs and HOTAIR. So, we do not see a need for RNA to be displaced in every case. Our original ribozyme displacement hypothesis was based on the observations like: 1) Several of the ancient enzymes are homologs of non-enzymatic ancient domains that bind RNA and 2) In cases like RNAseP, the protein component increases the catalytic rate of the ribozyme by potentially increasing local affinity metal ion. This offers a pre-adaptation for the protein acquiring metal-binding dependent catalysis. Now, given the new information on the evolution of the doublepsi beta barrel RNAPs, it appears that the RdRP activity might be a secondary innovation. Hence, it is conceivable the DPBB domains were merely nucleic acid binding cofactors in an already protein dominant world and its associated nucleic acid might not have had any catalytic activity. It is becoming increasingly likely that a RNA only world was probably never there (i.e. independent of proteins) and early RNAs at best had restricted catalytic capabilities in the RNA world. It is even possible that right from the beginning the basic reciprocal catalytic cycle involved early RNAs catalyzing peptide-bond formation and protein synthesis (precursor of the ribosome) and the proteins in turn catalyzing the formation of the phosphodiester bond and RNA synthesis.
Question: What about the evolutionary origins of the TBP fold, and of TBP itself? The single fold itself can be found in RNaseHIII and DNA glycosylases but it has not been demonstrated to mediate any direct interactions with DNA or DNA, that emerged later, with TBP in the archaeo-eukaryotic lineage. What happened before that, did the LUCA RNAP initiate TFIIB-sigma dependent?
Answer: TBP belongs to the larger helix-grip fold (pmid: 11276083 [Click to access]) that includes proteins with various binding capabilities. When we first showed the relationship between TBP and the RNAseHIII N-terminal domain in 2001 (pmid:11582786 [Click to access]), it was the closest to TBP within the helix grip fold. However, since then we found another member of the fold, CCTBP that is as related as the one in RNAseHIII to TBP (PMID: 19089947). Both these are much closer to TBP than the version in the DNA glycosylases. Hence, the evolution of TBP is to be understood in the context of these related domains. Of these the CCTBP is involved in sulfotransfer along with ubiquitin like proteins. The evidence does suggest that the RNAseHIII TBP domain might interact with DNA-RNA hybrid molecules. Hence, it appears that during the radiation of the TBP family it acquired very distinct activities, but the one associated with primer degradation or RNA-based DNA restriction is a more likely candidate for precursor of TBP the basal TF than CCTBP, which is associated with distinct metabolic activities. However, this might change if a nucleic acid binding activity is demonstrated for the CCTBP domain.
Saturday, October 29, 2011
A mystery pathway in prokaryotes
Computational studies of proteins have greatly contributed to our understanding of the biology of a species or a system. In many instances, computational analyses have solved tricky biochemical problems (e.g. the biochemistry of pupylation), or have uncovered unexpected systems or pathways (e.g. the prokaryotic cognates of the eukaryotic ubiquitin pathway), or solved long-standing mysteries (e.g. the principal transcription factors of apicomplexa), or clarified difficult evolutionary problems (e.g. the extent of lateral transfer between prokaryotes, the evolutionary origins of the AID/APOBEC deaminases). Yet there are instances, when the biochemistry of most parts of a system are easily identifiable, but the biology remains an unsolved puzzle. Recently, we uncovered one such widespread system present in most lineages of proteobacteria, actinobacteria, spirochaetes, cyanobacteria, chlamydiae and chloroflexi and also some crenarchaea. As the system is present in Mycobacterium tuberculosis, we shall use the Mycobacterial gene names as representative identifiers. The basic system consists of
Thus, these systems together include two active peptide ligases, 5 distinct types peptidase-like proteins (2 transglutaminases, Zincin-like metallopeptidase, the GAT-I domain and a NTN peptidase) , the mystery Alpha-E protein and an inactive peptide ligase that may be fused to the mystery Alpha-E domain. In any case all systems minimally contain at least one peptide ligase, the Alpha-E protein and one peptidase-like domain. The only evidence for its biological context comes from experiments in Pseudomonas putida where the transglutaminase is highly expressed upon nitrogen starvation. Several protein/peptide conjugation systems contain peptide ligases (e.g. the ubiquitin transferring enzymes, the Pup ligases) as well as deconjugating emzymes (e.g. JAB deubiquitinase and Dop depupylase) in the same gene context (For a comprehensive set of examples, read our paper on amidoligases).
However, assembling the pieces of the puzzle together, we can be sure of a few things
Thus the system appears to be a novel peptide transfer/peptidase system with the Alpha-E protein playing a central role. We postulate that the ATP-grasp and COOH-NH2 ligase in this system catalyze two distinct peptide bond formations. It is tempting to speculate that the Alpha-E protein with the highly conserved ER motifs serve as a substrate for elongation of a peptide via the gammacarboxylate of its side chain. This proposal is consistent with the use of glutamate side chains as substrates in eukaryotic proteins such as tubulin by peptide tagging ATP-grasp enzymes.The presence of two peptidase genes in most of these operons suggests that two successive peptidase reactions are necessary for removal of the peptide product.
Alternatively, the transglutaminase superfamily protein might indeed function in cross-linking the peptide to lysine side chains or other amino groups. Thus, the weight of the contextual evidence supports a role for this widespread conserved gene-neighborhood in peptide synthesis; the resulting peptide could be added as a tag to the unique Alpha-E protein in this system.Such a tag could either regulate the assembly of complexes of the alpha-E domain protein via cross-linking or its interactions (e.g. as in tubulin) or serve as an amino acid storage mechanism. Yet, as you can see, certain details of this interesting pathway are in need of further investigation, but its widespread presence suggests that an important and exciting piece of biology awaits creative experimentalists...
- Rv2410c (DUF403 in Pfam 25) : An alpha-helical protein,called Alpha-E that contains an internal duplication with each repeat possessing conserved ER motifs. Click here to access a multiple alignment.
- Rv2411c (split as DUF404+DUF407 in Pfam 25): A circularly permuted peptide ligase of the ATP-grasp fold.
- Rv2409c, Rv2569c: Transglutaminases that could serve either as a peptidase or a classical transglutaminase.
- Rv2568c (DUF2248 in Pfam 25): A metallopeptidase-family peptidase.
- Rv2567: An inactive circularly permuted ATP-grasp fused to the Alpha-E domain.
- Rv2566 (Transglut+DUF2126 in Pfam 25): A transglutaminase fused to a circularly permuted peptide ligase of the COOH-NH2 ligase superfamily.
- Some species additionally contain an NTN hydrolase related to the proteasomal peptidase (called Anbu in one study) in the gene neighborhoods (not Mycobacterium) and amidotransferases of the GAT-I family. Click here to access all operons.
However, assembling the pieces of the puzzle together, we can be sure of a few things
- This is not involved in amino acid or glutathione biosynthesis. The species containing this system typically have intact pathways for glutathione or amino acid biosynthesis. Also there are no other genes suggestive of metabolic function in the neighborhood.
- It is not involved in the biosynthesis of a distinctive secondary metabolite such as an antibiotic or siderophore, for it lacks characteristic associations seen in these systems (see examples in our study of such systems).
- There is no evidence of a small protein that is conjugated to a target as in ubiquitination or pupylation.
![]() |
| Gene neighborhoods of the novel system described in this post |
Alternatively, the transglutaminase superfamily protein might indeed function in cross-linking the peptide to lysine side chains or other amino groups. Thus, the weight of the contextual evidence supports a role for this widespread conserved gene-neighborhood in peptide synthesis; the resulting peptide could be added as a tag to the unique Alpha-E protein in this system.Such a tag could either regulate the assembly of complexes of the alpha-E domain protein via cross-linking or its interactions (e.g. as in tubulin) or serve as an amino acid storage mechanism. Yet, as you can see, certain details of this interesting pathway are in need of further investigation, but its widespread presence suggests that an important and exciting piece of biology awaits creative experimentalists...
at
7:17 PM
Labels:
Alpha-E,
ATP-grasp,
COOH-NH2 ligase,
peptidase,
peptide tagging,
pupylation,
Ubiquitin
Bacterial O-antigens, capsules, and cell-surface polysaccharides: not just all-sugar
You probably heard of Escherichia coli O104:H4, which caused a devastating outbreak of an enterohemorrhagic disease in many European countries this year. Did you ever wonder what the O and H in the name represent? In the pre-genome sequence era, enterobacteria were usually distinguished based on the type of their polymorphic surface antigens by a process called serotyping. In this, antibodies that specifically recognized a distinct type of surface antigen were used to identify the bacterial serotype. This was an extraordinarily successful tool in epidemiological studies. In enterobacteria, the polymorphic surface molecules are typically a surface lipopolysaccharide (O-antigen), flagellar proteins (H antigen) and/or the capsular polysaccharide (K-antigen). Thus the O104:H4 in the E.coli strain name refers to the type numbers of the O and H antigens respectively. E. coli has about 700 serotypes combined from some 180 O-antigens, 70 K-antigens and 54 H-antigens. Salmonella has about 2500 serotypes! Below we highlight a new twist to the O-antigen structure that we recently uncovered in our study on peptide ligases.
Let us study the Lipopolysaccharide (LPS), of which the O-antigen is a component, in some more detail (see figure below).The LPS is comprised of four components. 1) Lipid A, a lipid anchor that forms the outer monolayer of the outer membrane and anchors the LPS, 2) an inner core composed of characteristic sugars such as Kdo (3-deoxy-D-manno-oct-2-ulosonic acid) and a heptose, 3) an outer core typically containing hexose sugars, and finally (4) the O-antigen repeats that exhibit variations in the type and arrangement of the sugar residues within the O-unit of LPS (see figure below). Some O-antigens have repeats of 3-5 sugar units, others are branched with 4-6 sugar units. Also present are unusual sugars only seen in these surface antigens.The number of such repeats also greatly vary (See the O-antigen database). Estimates suggest that there are about a million LPS molecules sticking out from the outer membrane per E. coli cell. The variations are a means for the bacterium to escape the surveillance of the host immune system and function as a virulence factor. Additionally, the antigens might vary to avoid bacteriophages that target the O-antigen for attaching and invading the bacterial cell. The genes involved in the biosynthesis of the O-antigen are present in a large gene cluster and not unexpectedly show great variations between various O-antigen types. Many of these are involved in the biosynthesis and export of the sugar units in the LPS.
![]() |
O-antigen structure (from Raetz and Whitfield)
|
In a recent study, we noticed a somewhat unexpected presence in these gene neighborhoods-- peptide ligases. The proteins encoded by the E.coli/Shigella wfdG and wfdR O-antigen cluster genes (incorrectly labeled as glycosyltransferases) are members of the ATP-grasp superfamily of peptide ligases. Members of this family are present widely across bacteria, e.g. firmicutes, actinobacteria, proteobacteria, spirochaetes, bacteroidetes, fusobacteria and cyanobacteria. Interestingly, they are also present in the capsular biosynthesis locus of Streptococcus pneumoniae (e.g. wcyv). In general, this family of peptide ligases are combined with genes that encode proteins involved in biosynthesis of cell surface polysaccharides. In some instances members of this family are fused to other domains such as glycosyltransferases and the capsular biosynthesis-type PP2A-fold phosphatases. Often these neighborhoods encode multiple paralogous copies of ATP-grasps (access the operons here). Pioneering studies in Proteus and Providencia (e.g. Kocharova et al. and Kondakova et al) have shown that sugars of the cell surface O-antigen are further aminoacylated by D- and L-aspartic acid residues. Given the presence of ATP-grasp genes in these operons, we predict that they would catalyze the ligation of amino acids to sugar moieties in these polymers, as observed in these studies.
One other cell surface polysaccharide with known sugar-amino acid conjugates is teichuronopeptide, a highly acidic copolymer of glucuronic acid and amino acids such as glutamate that contributes to alkaliphily of organisms such as Bacillus halodurans. Experimental studies by Aono had implicated the TupA gene in the biosynthesis of this product but the mode of action was not understood until we unified TupA to the same family of ATP-grasps (TupA-like) present in the O-antigen and capsule biosynthesis loci. We predict that this is the ligase required for synthesis of the polyglutamate portion of the teichuronopeptide. The Teichuronopeptide synthesis locus additionally contains three paralogous ATP-grasp genes (see operons here). A comparable combination of gene neighborhoods is also seen in alkali resistant bacteria such as Dethiobacter alkaliphilus and Oceanobacillus, and the polycyclic aromatic hydrocarbon degrading Mycobacterium sp. JLS. This suggests that the teichuronopeptide-like polymer might have been an important solution to the problem of high alkaline or salt conditions. The lateral transfer of this neighborhod might have been important in the emergence of alkali resistance in various distantly related bacteria.
![]() |
| Teichuronopeptide unit |
at
1:23 AM
Saturday, September 3, 2011
The remarkable story of the mutagenic AID/APOBECs and other deaminases
The nucleotide and nucleic acid deaminases such as CDD1, ADAR, TadA/Tad2, AID, APOBEC and DYW catalyze deamination of nucleotide or nucleic acid bases in a wide range of contexts. Of these, some of the most remarkable ones are the nucleic acid deaminases such as AID/APOBEC, ADAR and DYW that modify bases in nucleic acids in a range of contexts such as organellar RNA editing, hypermutation of viruses and the generation of hypervariability in proteins involved in adaptive immunity. One characteristic of these that was most mystifying was the limited phyletic distribution of some families and their rapid evolution.In a comprehensive sequence-structure analysis of the deaminase superfamily, we now uncover several aspects that were previously unclear, including the overall history of the fold with respect to protein superfamilies such as the JAB peptidases. We report several new families of which the most remarkable are deaminases that serve as toxins in bacterial polymorphic toxin systems. Several new and interesting candidates in eukaryotes are also identified. Watch this space for key highlights. For now you can read the paper. Click here to access the paper. There is an extensive supplement available here.
Monday, July 4, 2011
Do bacterial pathogens secrete protein methylases to modify eukaryotic chromatin?
Figure: Hartmanella grasping a Legionella (source microbe world).
In eukaryotes members of the DOT1 family rather exclusively modify H3K79, and processively methylate it to give rise to mono-, di-, and trimethylated forms. DOT1-catalyzed methylation is rather distinctive in that it is a modification that targets a residue right within the globular histone fold, rather than lysines in low-complexity tails. Unlike the histone methylations catalyzed by the PRMT family, methylation catalyzed by the DOT1 family appears to have a predominantly negative effect on gene expression across eukaryotes. Studies in mammals indicate that DOT1 is part of a large protein complex, including two pairs of paralogous proteins, all of which give rise to fusion proteins arising from chromosomal translocations in mixed lineage leukemia (MLL): (1) ENL and AF9/MLLT3, both similar to TAF14, with a N-terminal YEATS domain and a C-terminal BrC domain and (2) AF17/MLLT6 and AF10/MLLT10, both with two N-terminal PHD fingers and C-terminal AT-hook motifs.
Studies in Saccharomyces cerevisiae, supported by studies in other eukaryotes, suggest distinct roles for the di- and trimethylated forms of H3 generated by Dot1, which occur on largely mutually exclusive sets of genes. H3K79me3 occurs predominantly within the gene body (i.e., protein-coding sequence), and is largely absent in promoters and intergenic regions. This form has been associated with genes that are transcriptionally less active, and is explicitly excluded from the nucleosomes associated with the most highly expressed genes: 50% of the genes generating just 1–4 mRNAs per hour are enriched in nucleosomes showing this modification, in contrast to just 2% of the genes giving rise to >50 mRNAs per hour. The increased processivity of DOT1 in catalyzing trimethylation appears to depend on prior monoubiquitination of histone H2BK123 by the Rad6/Bre1 ubiquitinating complex. Unlike H3K79me3 levels, which do not vary greatly over the cell cycle, H3K79me2 levels change significantly with the cell-cycle, being lowest in G1 and elevated during the G2/M progression. Further, H3K79me2 is not restricted to the gene bodies and is also seen in intergenic regions, including promoters. Moreover, the genes associated with this modification tend to be transcriptionally inactive during the G2/M phase, when its levels are elevated. In trypanosomes, which possess three DOT1 paralogs, two have been functionally characterized. The first, DOT1A, mainly catalyzes H3K79me2 formation in a cell-cycle dependent manner, whereas the other paralog DOT1B appears to be involved in subtelomeric gene-silencing associated with antigenic variation in trypanosomes. In mammals, DOT1 appears to regulate heterochromatin formation at telomeric and centromeric regions, consistent with the observations in yeast and trypanosomes. In addition to its role in silencing and heterochromatin organization, other observations suggest that DOT1 methylation regulates multiple aspects of DNA repair, such as base excision repair, Rad9-mediated checkpoint function, and negative regulation of the action of the translesion repair polymerases.
A big question to us has been when and how this big player in chromatin modification emerged in eukaryotes. With the exception of the basal eukaryotes Trichomonas and Giardia, DOT1 orthologs are present in all other major eukaryotic lineages for which genome sequences are available. However, within those lineages there are certain notable instances of gene loss—while the basal plant lineages such as the chlorophyte algae and lycopodiophytes have one or more DOT1 paralogs, they have been completely lost in the crown-group land plants such as angiosperms. Within animals and fungi, typically only a single DOT1 paralog is seen and they display a largely vertical pattern of evolution. However, in the caenorhabditiform nematodes there has been a notable lineage-specific expansion (LSE) of DOT1, with at least five paralogs in Caenorhabditis elegans. It seems important to study the potential functional compartmentalization of these newly emergent DOT1 versions in this organism. Phylogenetic analysis also suggests that the precursor of DOT1A and DOT1B in trypanosomes appears to have been acquired via lateral transfer from the animal lineage (click here to access a tree from our Supplementary material). Following this transfer, it appears to have acquired an N-terminal Zn-chelating domain with four conserved cysteines, and was then duplicated to yield two functionally distinct paralogs. In microbial eukaryotes such as chlorophyte algae, stramenopiles, apicomplexans, ciliates, and trypanosomes, there appear to have been multiple lateral transfer events that have disseminated DOT1 paralogs between distantly related lineages. Consequently, some of these eukaryotes have multiple DOT1 paralogs, with particularly notable complements of three or more paralogs seen in certain stramenopiles and trypanosomes (the third trypanosome DOT1 paralog is distinct from the previously studied DOT1A and DOT1B). This presence of multiple DOT1 paralogs is rather different from the situation seen in most animals and fungi, raising the possibility that some of them might have evolved distinct substrate specificities or may regulate H3K79 methylation in alternative signaling or developmental contexts.
Thus our studies raise two key issues:
1) While DOT1 seems to perform immensely important roles across the model eukaryotes, it was probably not present in the earliest branches of eukarya, and was acquired only prior to the separation of the parabasalids and diplomonads from other eukaryotes.
2) In many eukaryotes the multiplicity of DOT1s suggests that histone methylations catalyzed by the multiple paralogous forms might have a much richer contextual “meaning” than what is seen in model systems.
This leads to the question: So, after all where did DOT1 come from?
We discovered that DOT1 is nested within a vast radiation of bacterial methylases that are involved in the synthesis of secondary metabolites such as mycolic acids in mycobacteria (including the mycolic acid cyclopropane synthases), polyether antibiotics such as nigericin (e.g., NigE of Streptomyces sp. DSM4137) and as yet uncharacterized compounds in Micromonospora (gi: 288794127; in the same operon as a SnoaL-like polyketide cyclase). Further, gene neighborhood analysis suggests that several members in this bacterial radiation (e.g., gi: 294507034 from Salinibacter ruber) are specified by a conserved operon along with an amino acid transporter. It is conceivable that these versions are involved in the utilization of particular amino acids, or metabolites derived from them (Click here to access the operons). Thus, it appears the DOT1-like group arose as part of the radiation of methylases involved in generating diversity among secondary metabolites by adding of specific methyl groups to these metabolites – a common strategy used in the arms race between antibiotic producers and their intended victims. Alternative some of them were used by bacteria probably as a strategy to utilize amines or amino acids by methylating them. Some of these bacterial forms, such as those seen in Legionella, myxobacteria, and Protochlamydia, are particular close to the eukaryotic forms and share conserved sequence motifs in both the N-terminal extended element and in the loop between strand-6 and strand-7 (e.g., the conserved aromatic residue, click here [svg] or here [txt] to access an alignment of Dot1 and its homologs). Most interestingly, of the bacterial versions closest to the eukaryotic forms, some are encoded by intracellular pathogens or endosymbionts: These include the causative agent of Legionnaires' disease and Pontiac fever, i.e. Legionella. In addition to infecting animals Legionella is a very versatile endoparasite that infects amoebozoans like Hartmannella and Acanthamoeba, heteroloboseans like Naegleria and ciliates like Tetrahymena. DOT1 homologs are also seen in Protochlamydia a dedicated bacterial endosymbiont of Acanthamoeba. Given that it was laterally exchanged between distantly related endo- symbionts/parasites points to its importance for this mode of life.
This raises the issue of whether DOT1 is used to regulate the eukaryotic host’s behavior by modifying its histones. In support of this contention we found that these versions have signal peptides that are likely to allow their secretion into the host cells. Second, unlike the other bacterial versions they also lack operonic associations with secondary metabolite biosynthesis. Hence, it would be of great interest for experimental works to test our prediction to see if these bacterial DOT1s play a role in regulating host behavior via histone methylation comparable to the endogenous DOT1. Importantly, this observation also suggests that DOT1 was originally acquired by eukaryotes from their intracellular bacterial symbionts/parasites. The other bacteria with such DOT1 homologs are the myxobacteria – we suggest that these might enable them to play in the “big league” i.e. compete with environmental eukaryotes by deploying this secreted DOT1, among other proteins as a potential toxin. You can read a detailed account of protein methylases here. Also feel free to browse the extensive supplement.
This raises the issue of whether DOT1 is used to regulate the eukaryotic host’s behavior by modifying its histones. In support of this contention we found that these versions have signal peptides that are likely to allow their secretion into the host cells. Second, unlike the other bacterial versions they also lack operonic associations with secondary metabolite biosynthesis. Hence, it would be of great interest for experimental works to test our prediction to see if these bacterial DOT1s play a role in regulating host behavior via histone methylation comparable to the endogenous DOT1. Importantly, this observation also suggests that DOT1 was originally acquired by eukaryotes from their intracellular bacterial symbionts/parasites. The other bacteria with such DOT1 homologs are the myxobacteria – we suggest that these might enable them to play in the “big league” i.e. compete with environmental eukaryotes by deploying this secreted DOT1, among other proteins as a potential toxin. You can read a detailed account of protein methylases here. Also feel free to browse the extensive supplement.
Saturday, June 25, 2011
Free SAMPylation Musings
SAMPylation was recently introduced by the Maupin-Furlow group at the University of Florida as a novel protein conjugation system in archaea with parallels to the ubiquitin conjugation system. SAMPylation has gotten quite a bit of attention. Attachment of SAMP1 and SAMP2 (members of the ThiS/MoaD clade, which is the basal lineage of the larger ubiquitin-like clade of beta-grasp domains) to a target protein via an E1-like ligase might be interpreted as representing a rudimentary form of the classical eukaryotic Ubiquitin (Ub) and Ubiquitin-like (Ubl) conjugation system, which attaches Ub/Ubl domains to targets via a tri-ligase enzyme cascade initiated by an E1-like ligase and completed by the E2 and E3 ligases. In forthcoming articles, we review origins of SAMPylation (among other issues) in the context of other prokaryotic protein tagging systems and the eukaryotic ubiquitin system. Given the burgeoning interest in SAMPylation, however, we’ll preview these articles with a set of extended thoughts specific to the origins of SAMPylation.
1) Phyletic distributions and phylogenetic affinities of SAMP1/SAMP2
The SAMP2 protein is clearly a member of a small branch of the ThiS clade, which is restricted to euryarchaeota, with substantial representation in the haloarchaea as well as some methanoarchaea. The SAMP1 protein, on the other hand, is clearly a member of the MoaD clade; however, the phylogenetic relationships of the MoaD clade are complicated by the repeated occurrence of gene duplications and horizontal gene transfer (HGT) events. Recently, phylogenetic relationships between bacterial genomes have been described in terms of a “thicket” rather than a linear tree due to rampant genome duplication and exchange through HGT. Analogously, one can think of the MoaD gene family as the MoaD gene “thicket”. This leads to a poor picture of the divisions among the multiple functional roles ascribed to the MoaD family at large, including the incursion of a protein tagging role in SAMP1.
2) Multiple functional roles for the MoaD homolog thicket and the ThiS clade
Despite difficulties in assigning function to uncharacterized members of the MoaD family, a wide range of functional roles have been identified for MoaD-like proteins. In fact, it is no longer appropriate to view the MoaD and ThiS families exclusively as sulfur carriers for molybdenum and thiamine cofactor biosynthesis, respectively. As well as roles for both families in SAMPylation, they are also involved in sulfur transfer during siderophore-like compound and cysteine biosynthesis and tRNA thiolation. Our group predicts additional functions for these families in tungsten cofactor biosynthesis and perhaps a tagging role in recruitment to the ClpAP complex via association with a ClpS domain. As an added twist, some MoaD/ThiS domains have the demonstrated ability to carry out more than a single functional role. The second study on SAMPylation from the Maupin-Furlow group demonstrates that in Haloferrax, the SAMP1 protein is indispensible for MoCo/WCo biosynthesis to the point that the primary role for SAMP1 is indeed be akin to the classical MoaDs to which it is closely related. Meanwhile, SAMP2 appears to be important in tRNA thiolation as well as conjugation (Comparable to Urm1). Notably, all of these functional roles are thought to be performed in conjunction with an E1-like ligase domain.
The takeaway from these observations is that the ThiS/MoaD-E1 enzyme combination has proven to be an extremely adaptable platform for acquisition of novel functional roles, primarily in the context of sulfur incorporation but also for protein tagging (as seen now in SAMPylation, Urmylation, and Ubiquitination).
3) Independent emergence of multiple protein tagging systems in prokaryotes
SAMPylation is the latest in a string of newly-discovered prokaryotic tagging systems, perhaps most well-known is the co-translational tmRNA-based peptide tag. More recently the PUPylation system has been discovered. PUPylation involves attachment of the Pup protein to target proteins via the PafA ligase, unrelated to the SAMP/Ub tag/ligase counterparts. Most relevant to this discussion is the phyletic distribution of PUPylation, detailed in our past work. PUPylation-like systems are sporadically present in a range of bacterial lineages; however, these systems very clearly emerged first in actinobacteria before spreading to other lineages via HGT. This limited distribution very much resembles the distribution of SAMP2ylation.
In addition to PUPylation, several other tagging systems are known to prokaryotes: 1) our group has pioneered in the identification of ubiquitin-like systems containing the Ub tri-ligase (E1, E2, and E3-like ligase) complement in bacteria, while another group recently identified a similar system after sequencing the archaeon Candidatus Caldiarchaeum subterraneum. 2) N-end rule arginyl/leucyl ligation. 3) Convergently-emergent protein tagging systems in diverse prokaryotes predicted in a separate paper from our group. 4) Predicted conjugation of certain members of the YukD family. Given this list, it appears tagging systems have emerged multiple times in prokaryotes. The mode of emergence of many of these tagging systems also shares commonalities, having frequently emerged first in a particular lineage of prokaryotes, only to be transferred later to additional sporadic representatives of diverse lineages via HGT.
4) SAMPylation is condition-specific
To this point, SAMP2 has been demonstrated to form conjugates with substrate proteins (and itself) only in response to two conditions: nitrogen and oxygen depletion. SAMP1 conjugation is largely restricted to substantial nitrogen depletion.
Summary and outstanding questions
The above observations begin to clarify the relationship between SAMPylation and other prokaryotic tagging systems. Phyletic distributions and peripheral or condition-specific functional roles seem to suggest prokaryotic tagging systems in general have secondarily emerged within differentiated lineages, with the ThiS/MoaD/Ub-E1 functional connection seemingly particularly suited for acquisition of tagging roles. This description is particularly apt for the SAMP2ylation system which appears to be a genuine, condition-specific adaptation restricted to a small group of archaea. One key consequence of this observation: it is highly unlikely that SAMP2ylation represents a precursor for classical Ubiquitination. The greater mystery, however, lies with SAMP1ylation. At one extreme, SAMP1ylation could occur in all MoaD family members across bacteria and archaea, directing tagged proteins for proteosomal degradation. Given the broad distribution of MoaD and the relatively restricted distribution of core proteasomal subunits this seems questionable, particularly for bacterial MoaD-like domains. At the other extreme, MoaD family members are only conjugated in genomes capable of performing SAMP2ylation; a scenario implying conjugation of SAMP1 under certain conditions was adapted from the SAMP2ylation apparatus. Addressing key points outlined below surrounding SAMP1ylation would help immensely in clarifying the Ubiquitination/SAMPylation relationship.
Q1) What is the extent to which SAMP1 is used as a modifier?
The MoaD clade thicket (of which SAMP1 is a member) has been adapted to a vast and varied range of functional roles. Further analysis of SAMP1ylation in the MoaD clade is needed to define the precise conditions and functional contexts in which SAMP1ylation, if any, is observed amongst the set of all MoaD clade proteins.
Q2) Is there a relationship between the extreme conditions required to initiate SAMPylation and the likely functional roles for SAMP1 (and SAMP2)?
SAMP1, and not SAMP2, is linked to proteosomal degradation due to the accumulation of SAMP1ylated substrates in proteosomal subunit mutant strains. Why does extreme nitrogen depletion signal for sudden degradation of massive amounts of cellular protein? As an alternative theory, could extreme nitrogen depletion cause the UbaA protein to become more “promiscuous” in recognition of ThiS/MoaD-like substrates for conjugation? In a similar vein, the function of SAMP2ylation is still not entirely clear, although it has been linked to conjugation of proteins predicted to be involved in tRNA thiolation. Is SAMP2ylation acting as a negative feedback loop regulating the primary sulfur-carrier functional role for SAMP2 in tRNA thiolation? It remains to be seen if these conjugates are subject to deSAMPylation by some of the JAB domain proteins encoded in these genomes. This might better help in clarifying the above questions.
Q3) What happens when there is overlap between SAMP1ylation and other potential tagging systems, i.e. in bacteria?
If all MoaD-like proteins are capable, to some degree, of being conjugated to other proteins under the right conditions, this would result in the overlap of SAMPylation with other protein tagging systems including some specifically involved in targeting proteins for proteasomal-based destruction. This is most notable in the actinobacteria, where unlike the overlap between PUPylation and prokaryotic ubiquitin-like conjugation systems, which appear to occupy distinct functional niches (discussed in another post here), SAMP1ylation would directly functionally overlap with PUPylation.
Final Thoughts
The recent characterization of diverse tagging systems strongly suggests protein ligation emerged several times in prokaryotic evolution. Several of these systems are known or predicted to be coupled to proteasomal degradation, specifically suggesting multiple, distinct inventions of protein tagging-directed degradation. Many tagging systems appear to have emerged later in prokaryotic evolution within certain terminal branches of the bacterial tree followed by wider dissemination to additional lineages, in varying degrees, via HGT. In particular, more research into the generality of of SAMP1ylation should be investigated in greater detail across bacteria and archaea. In the absence of support of for its generality, and given the lineage-specific nature of SAMP2, the current balance of evidence, cannot rule out, and might in fact support a scenario where SAMPylation via E1-only conjugation systems represents a separate development. This type of conjugation, especially SAMP2lyation, might have emerged in parallel to the elaboration of more complete prokaryotic ubiquitination systems that contain just E2 or both E2 and E3 ligases, rather than being precursors of these systems.
On the other hand, evidence from eukaryotes for conjugation of Urm1 suggests that indeed a protein conjugation capability might have been latent right from the ancestral Ub-like beta-grasps domains that participated in ancient biosynthetic sulfur insertion functions. It is possible that this capability was an “unintended” consequence of the chemistry of sulfotransfer, primarily coming to fore under certain special conditions, not necessarily in a functional sense – i.e., it might be viewed as a Gouldian spandrel. However, in certain organisms it was exapted to different degrees and functionally “channelized” as a protein tag. Indeed, protein tags are a necessary prerequisite for targeted protein degradation and provide an additional advantage a regulatory mechanism – these provide a “niches” into which the Ubl-E1 dependent conjugation systems could diversify as protein tags. We had earlier discussed how distinct protein tagging systems for targeting proteins for degradation are basal and nearly universal across two of the three superkingdoms of life – the tmRNA-based system in bacteria and the Ub-based system in eukaryotes. Only the archaea, which ancestrally share the proteasome with eukaryotes seemingly lack such a system. From this viewpoint, evidence for cognates of SAMPylation across archaea, based on the more widely distributed MoaD clade, certainly need to be further investigated.
Additional Reading: Click on the pubmed ids to access the references
- Experimental characterization of SAMPlylation in Marlin-Furlow lab, 20054389, 21368171
- Discovery of PUPylation in the Darwin lab 18832610
- Evolutionary history of the PUPylation system 18980670
- Evolution of Ubl pathways in prokaryotes 16859499 21547297 21169198
- Higher-order relationships between five-stranded beta-grasp domain-containing protein families including ThiS, MoaD, Urm1, Ub/Ubl 17605815
- Functional linkages between E1 and ThiS/MoaD, Ub/Ubl 19089947
Subscribe to:
Posts (Atom)



















