The report of the existence of a stable hydroxymethylcytosine modification in animals and the discovery of the enzymes that catalyze this modification represents one of the great triumphs of collaborative research. It has veritably opened a new angle to epigenetic regulation and our pioneering paper in the field has over 200 citations in just under two years. Scientific papers often conceal the history, drama and effort that goes into discovery and this study too had its fair share.
Our interest in this family was kindled by the discovery by Dr. Piet Borst's group of the enzymes that catalyze an unusual base modification in kinetoplastids. Euglenozoans and their parasitic relatives the kinetoplastids contain a distinct modified thymidine derivative in their DNA; base J. Unlike several phages, where modified nucleotides are incorporated during replication, base J conversion occurs on DNA and its synthesis was shown to involve two steps. First, thymidine is hydroxylated to hydroxymethyluracil and this is then glucosylated by a glucosyltransferase to give Base J.
Two proteins, JBP1 and JBP2, were shown to be important for base J synthesis and threading analysis revealed a common hydroxylase domain of the 2-oxoglutarate and iron dependent dioxygenases (2OGFeDO) superfamily in both of them, which suggested a 2OGFeDO-like mechanism that catalyzed the first step. You can read about the discovery of base J here.
Given our long-term interest in this superfamily (see an early study), we investigated this domain using sensitive sequence analysis methods and we noticed that it was also present in bacteria, phages, fungi, Naegleria and in the animal TET proteins (e.g. CxxC6), mutations in which were known to cause various myeloid cancers. Interestingly, in animals the domain was fused to the CXXC domain that is known to discriminate methyl cytosine containing nucleotides in DNA. This gave us a clue that the TET proteins modified DNA in situ like the JBP1/2 proteins, and perhaps at cytidine instead of thymidine. Given its hydroxylase function, we speculated that it might be involved in a DNA demethylation pathway through a hydroxyl intermediate, in a 2007 paper in the International Journal of Parasitology. However, there were too many unanswered questions and a hydroxylation-based C5-demethylation of cytosine was unprecedented in vivo (although it can be achieved in vitro), as compared to N-6 demethylation through a hydroxylase intermediate (e.g. as in the AlkB reaction). Experimental investigations, thus, were key to further understanding the biochemistry and functional role of this domain in animals.
This was when Dr. Anjana Rao's group at Harvard joined the effort in trying to unravel the function of the animal homologs containing the JBP1/2-like dioxygenase domain. Dealing with these proteins is no joke as they are gigantic. They range in size from 1600-2300 amino acids. Further, TET1, TET2 and TET3 contain a distinct cysteine rich domain inserted into the N-terminal part of the 2OGFeDO domain and TET1 and TET3 also contain an N-terminal CXXC domain which is separated from the cysteine-rich and 2OGFeDO domains by a long low complexity region (Click here for a text alignment or here for a colored version). The heroic efforts of Mamta Tahiliani resulted in the purification of these proteins and the development of an in vitro system to assay their activity. Overexpressing TET1 or just a minimal version with the 2OGFeDO domain showed a decrease in 5mC levels and the presence of an distinct TLC band. This was also reproducible in the in vitro assay. In collaboration with Dr. David Liu's group the distinct nucleotide species was shown to be hydroxymethylcytosine. Hydroxymethylcytosine research was prominent in the heydays of phage biology, it now had a second birth.
The study was published in two parts that were released almost simultaneously. A comprehensive theoretical analysis of the superfamily, its phyletic distribution, evolutionary history and contextual associations was published in Cell Cycle and the experimental studies were published in Science. TET proteins are now implicated in embryonic stem cell re-programming, cell lineage specification, reprogramming the paternal genome, fine-tuning transcription and opposing aberrant methylation. Our analysis provided the first biochemical leads towards understanding the basis of myelomas caused by mutations in the TET proteins. The list is only going to increase over time. While the animal proteins have received much press, there are several gems in the waiting in other eukaryotes. More on this in the next post.