Saturday, October 29, 2011

A mystery pathway in prokaryotes

Computational studies of proteins have greatly contributed to our understanding of the biology of a species or a system.  In many instances, computational analyses have solved tricky biochemical problems (e.g. the biochemistry of pupylation), or have uncovered unexpected systems or pathways (e.g. the prokaryotic cognates of the eukaryotic ubiquitin pathway), or solved long-standing mysteries (e.g. the principal transcription factors of apicomplexa), or clarified difficult evolutionary problems (e.g. the extent of lateral transfer between prokaryotes, the evolutionary origins of the AID/APOBEC deaminases). Yet there are instances, when the biochemistry of most parts of a system are easily identifiable, but the biology remains an unsolved puzzle. Recently, we uncovered one such widespread system present in most lineages of proteobacteria, actinobacteria, spirochaetes, cyanobacteria, chlamydiae and chloroflexi and also some crenarchaea. As the system is present in Mycobacterium tuberculosis, we shall use the Mycobacterial gene names  as representative identifiers. The basic system consists of
  1. Rv2410c (DUF403 in Pfam 25) : An alpha-helical protein,called Alpha-E  that contains an internal duplication with each repeat possessing conserved ER motifs. Click here to access a multiple alignment.
  2. Rv2411c (split as DUF404+DUF407 in Pfam 25): A circularly permuted peptide ligase of the ATP-grasp fold.
  3. Rv2409cRv2569c: Transglutaminases that could serve either as a peptidase or a classical transglutaminase.
  4. Rv2568c (DUF2248 in Pfam 25): A metallopeptidase-family peptidase.
  5. Rv2567: An inactive circularly permuted ATP-grasp fused to the Alpha-E domain.
  6. Rv2566 (Transglut+DUF2126 in Pfam 25): A transglutaminase fused to a circularly permuted peptide ligase of the COOH-NH2 ligase superfamily.
  7. Some species additionally contain an NTN hydrolase related to the  proteasomal peptidase (called Anbu in one study) in the gene neighborhoods (not  Mycobacterium) and amidotransferases of the GAT-I family.  Click here to access all operons.
Thus, these systems together include two active peptide ligases, 5 distinct types peptidase-like proteins (2 transglutaminases, Zincin-like metallopeptidase, the GAT-I domain and a NTN peptidase) , the mystery Alpha-E  protein and an inactive peptide ligase that may be fused to the mystery Alpha-E domain. In any case all systems minimally contain at least one peptide ligase, the Alpha-E protein and one peptidase-like domain. The only evidence for its biological context comes from experiments in Pseudomonas putida where the transglutaminase is highly expressed upon nitrogen starvation. Several protein/peptide conjugation systems contain  peptide ligases (e.g. the ubiquitin transferring enzymes, the Pup ligases) as well as deconjugating emzymes (e.g. JAB deubiquitinase and Dop depupylase) in the same gene context (For a comprehensive set of examples, read our paper on amidoligases).
However, assembling the pieces of the puzzle together, we can be sure of a few things
  1. This is not involved in amino acid or glutathione biosynthesis. The species containing this system typically have intact pathways for glutathione or amino acid biosynthesis. Also there are no other genes suggestive of metabolic function in the neighborhood.
  2. It is not involved in the biosynthesis of a distinctive secondary metabolite such as an antibiotic or siderophore, for it lacks characteristic associations seen in these systems (see examples in our study of such systems).
  3. There is no evidence of a small protein that is conjugated to a target as in ubiquitination or pupylation.
Gene neighborhoods of the novel system described in this post
Thus the system appears to be a novel peptide transfer/peptidase system with the Alpha-E protein playing a central role.  We postulate that the ATP-grasp and COOH-NH2 ligase in this system catalyze two distinct peptide bond formations. It is tempting to speculate that the Alpha-E protein with the highly conserved ER motifs serve as a substrate for elongation of a peptide via the gammacarboxylate of its side chain. This proposal is consistent with the use of glutamate side chains as substrates in eukaryotic proteins such as tubulin by peptide tagging ATP-grasp enzymes.The presence of two peptidase genes in most of these operons suggests that two successive peptidase reactions are necessary for removal of the peptide product.
 Alternatively, the transglutaminase superfamily protein might indeed function in cross-linking the peptide to lysine side chains or other amino groups. Thus, the weight of the contextual evidence supports a role for this widespread conserved gene-neighborhood in peptide synthesis; the resulting peptide could be added as a tag to the unique Alpha-E protein in this system.Such a tag could either regulate the assembly of complexes of the alpha-E domain protein via cross-linking or its interactions (e.g. as in tubulin) or serve as an amino acid storage mechanism. Yet, as you can see, certain details of this interesting pathway are in need of further investigation, but its widespread presence suggests that an important and exciting piece of biology awaits creative experimentalists...

Bacterial O-antigens, capsules, and cell-surface polysaccharides: not just all-sugar


You probably heard of Escherichia coli O104:H4, which caused a devastating outbreak of an enterohemorrhagic disease in many  European countries this year. Did you ever wonder what the O and H in the name represent? In the pre-genome sequence era, enterobacteria were usually distinguished based on the type of their polymorphic surface antigens by a process called serotyping. In this, antibodies that specifically recognized a distinct type of surface antigen were used to identify the bacterial serotype. This was an extraordinarily successful tool in epidemiological studies. In  enterobacteria, the polymorphic surface molecules are typically a surface lipopolysaccharide (O-antigen), flagellar proteins (H antigen) and/or the capsular polysaccharide (K-antigen). Thus the O104:H4 in the E.coli strain name refers to the type numbers of the O and H antigens respectively. E. coli has about 700 serotypes combined from some 180 O-antigens, 70 K-antigens and 54 H-antigens. Salmonella has about 2500 serotypes! Below we highlight a new twist to the O-antigen structure that we recently uncovered in our study on peptide ligases.

Let us study the Lipopolysaccharide (LPS), of which the O-antigen is a component, in some more detail (see figure below).The LPS is comprised of four components. 1) Lipid Aa lipid anchor that forms the outer monolayer of the outer membrane and anchors the LPS, 2) an inner core composed of characteristic sugars such as Kdo (3-deoxy-D-manno-oct-2-ulosonic acid) and a heptose,   3) an outer core typically containing hexose sugars, and finally  (4) the O-antigen repeats that  exhibit variations in the type and arrangement of the sugar residues within the O-unit of LPS (see figure below). Some O-antigens  have repeats of 3-5 sugar units, others are branched with 4-6 sugar units. Also present are unusual sugars only seen in these surface antigens.The number of such repeats also greatly vary (See the O-antigen database). Estimates suggest that there are about a million LPS molecules sticking out from the outer membrane per E. coli cell. The variations are a means for the bacterium to escape the surveillance of the host immune system  and function as a virulence factor. Additionally, the antigens might vary to avoid bacteriophages that target the O-antigen for attaching and invading the bacterial cell. The genes involved in the biosynthesis of the O-antigen are present in a large gene cluster and not unexpectedly show great variations between various O-antigen types. Many of these are involved in the biosynthesis and export of the sugar units in the LPS. 


                                              O-antigen structure (from Raetz and Whitfield)
In a recent study, we noticed a somewhat unexpected presence in these gene neighborhoods-- peptide ligases. The proteins encoded by the E.coli/Shigella wfdG and wfdR  O-antigen cluster genes (incorrectly labeled as glycosyltransferases) are members of the ATP-grasp superfamily of peptide ligases. Members of this family are present widely across bacteria, e.g. firmicutes, actinobacteria, proteobacteria, spirochaetes, bacteroidetes, fusobacteria and cyanobacteria. Interestingly, they are also present in the capsular biosynthesis locus of Streptococcus pneumoniae (e.g. wcyv).  In general, this family of peptide ligases are combined with genes that encode proteins involved in biosynthesis of cell surface polysaccharides. In some instances members of this family are fused to other domains such as glycosyltransferases and the capsular biosynthesis-type PP2A-fold phosphatases. Often these neighborhoods encode multiple paralogous copies of ATP-grasps (access the operons here).  Pioneering studies in Proteus and Providencia (e.g. Kocharova et al. and Kondakova et al) have shown that sugars of the cell surface O-antigen are further aminoacylated by D- and L-aspartic acid residues. Given the presence of  ATP-grasp genes in these operons, we predict that they would catalyze the ligation of amino acids to sugar moieties in these polymers, as observed in these studies. 


One other cell surface polysaccharide with known sugar-amino acid conjugates is  teichuronopeptide, a highly acidic copolymer of glucuronic acid and amino acids such as glutamate that contributes to alkaliphily of organisms such as Bacillus halodurans. Experimental studies by Aono had implicated the TupA gene in the biosynthesis of this product but the mode of action was not understood until we unified TupA to the same family of ATP-grasps (TupA-like) present in the O-antigen and capsule biosynthesis loci. We predict that this is the ligase required for synthesis of the polyglutamate portion of the teichuronopeptide. The Teichuronopeptide synthesis locus additionally contains three paralogous ATP-grasp genes (see operons here). A comparable combination of gene neighborhoods is also seen in alkali resistant bacteria such as Dethiobacter alkaliphilus and Oceanobacillus, and the polycyclic aromatic hydrocarbon degrading Mycobacterium sp. JLS.  This suggests that the teichuronopeptide-like polymer might have been an important solution to the problem of high alkaline or salt conditions. The lateral transfer of this neighborhod might have been important in the emergence of alkali resistance in various distantly related bacteria. 
Teichuronopeptide unit
The wide phyletic distribution of this ATP-grasp-centered and related operons suggests that sugar/sugar acid and amino acid conjugates are a common feature of the capsules and other distinctive cell surface polymers of a large number of bacteria. The presence of up to four ATP-grasp genes in some of these operons suggests peptide chains with complexity comparable to the peptide linkages in peptidoglycan might be present in some of these polymers. This throws an exciting twist to the composition of the cell surface polysaccharides of bacteria. The nature and type of amino acids in these various species would definitely be of great interest and importance to bacteriology and epidemiology. You can access our paper here and browse the extensive supplement here.