Obesity is a major societal issue contributing to increased morbidity and mortality, as well as rising health care costs. In 2003–2004, 66% of the human population in the United States was classified as overweight (body mass index [BMI] ≥ 25 kg/m²), and 32% was classified as obese (BMI ≥ 30 kg/m²) (Ogden et al. 2006). Excessive weight is often associated with an increased risk of several life-threatening diseases, including cancer, heart diseases, and type 2 diabetes mellitus (Frayling et al. 2007). Unfortunately, the number of obese people continues to increase every day, probably as a result of a modified lifestyle (more food and less exercise). An improved understanding of the genetic basis, and the associated risk factors, is necessary if society is to proactively address this epidemic. Recently, several studies have demonstrated an association between the FTO gene locus and early onset and severe obesity in both children and adults (Dina et al. 2007; Field 2007; Frayling et al. 2007; Frayling 2007; Groop 2007; Scott et al. 2007; Scuteri et al. 2007). FTO, also known as FATSO, was originally identified as one of the six genes deleted in the fused toe (Ft) mutant mouse (van der Hoeven et al. 1994). Heterozygous animals showed fused toes on their limbs and a thymic hyperplasia, while homozygous mice exhibited a lethal malformation of the developing brain; the embryos lost genetic control of left-right asymmetry; and, finally, the mice died around the tenth day of their embryonic development (Peters et al. 2002). The Ft deletion spans several genes, of which quite a few remain of uncharacterized function. Peters and coworkers (1999) showed that one of these genes, FTO (FATSO), which is completely deleted in the Ft mutation, is expressed throughout embryonic development and at a high level in most organs in wild-type mice. In mouse, this novel gene spans at least 250 kb and encodes a protein of 502 amino acid residues of unknown function. It is still not known whether loss of FTO is a causal factor for the phenotype observed in Ft mutant mice. Furthermore, no deviations in BMI have been reported in Ft mutant mice. However, in human, unlike the associations with BMI initially reported for GAD, ENPPI, and INSIG2, which have not been reproduced consistently, association between the FTO locus and BMI is strongly supported. Frayling and co-workers (2007) studied almost 40,000 Europeans for variants of the FTO gene and identified an obesity risk allele. Depending on the presence of specific single nucleotide polymorphisms (SNPs) in the first intron of FTO, individuals weighed 1.2 to 3 kg more and had a 1.67-fold higher rate of obesity than those lacking the risk allele. Similar findings were reported by Dina et al. (2007), who studied 2,900 individuals of European ancestry, and potential Type 2 diabetes susceptibility has been correlated with another FTO intron 1 SNP (Scott et al. 2007).

Until recently, homologuey searches using the mouse FTO gene as a query only recovered sequences from vertebrates. However, with the complete genome sequencing of several marine algae, these results have been dramatically altered. While no clear homologue is found in invertebrate animals, fungi, plants, heterotrophic protists, bacteria, or archaea, we identified FTO homologues in the genomes of a diverse array of eukaryotic marine algae, ranging from unicellular photosynthetic picoplankton to a multicellular seaweed (Fig. 1). Specifically, FTO homologues were retrieved from three species within the Prasinophyceae (Micromonas pusilla, Ostreococcus tauri, and Ostreococcus lucimarinus) and two diatom species (Phaeodactylum tricornutum and Thalassiosira pseudonana), all of which are unicellular and which represent the only completely sequenced members of their respective lineages. Two copies of the FTO homologue were identified in the multicellular brown alga, Ectocarpus siliculosus. Furthermore, we scanned the Global Ocean Survey (GOS) dataset (Rusch et al. 2007) and recovered two additional FTO genes. These two sequences appear to be derived from the marine prasinophytes, due to high similarity to FTO homologues in the prasinophyte genomes supported by the presence of Ostreococcus and Micromonas 18S rRNA gene sequences in the same GOS sample. Strikingly, all the algae found to harbor FTO homologues live in marine environments, given that no FTO homologues were recovered from freshwater algae. We performed additional searches for FTO in freshwater algae using the Chlamydomonas reinhardtii genome sequence (Merchant et al. 2007), but to no avail. We also performed additional searches of the finished genome sequence of the red alga Cyanidioschyzon merolae, which thrives in acidic hot springs (Matsuzaki et al. 2004 ; Nozaki et al. 2007). Moreover, we performed these searches iteratively, using the newly discovered marine FTO sequences as queries, and still detected no homologues in invertebrate animals, fungi, plants, heterotrophic protists, bacteria, or archaea, confirming our initial findings.

Fig. 1
figure 1

Maximum likelihood tree showing the distribution of the FTO gene. Three major clades can be discerned: the previously described FTO genes in the vertebrates, the newly detected genes in diatoms and brown alga, and those of the chlorophytes and GOS sequences. All nodes are highly bootstrap supported (>70%) except two (indicated by a black circle ; 50% < BS < 70%). See Supplementary Materials for more information

As mentioned above, the function of FTO is still not known. Dina et al. (2007) detected FTO expression in 11 of 11 human tissue types tested, with the highest expression levels being in the hypothalamus, pituitary, and adrenal glands. These findings have promoted the hypothesis that FTO plays a role in body weight regulation through the hypothalamic-pituitary adrenal axis. FTO is also expressed in rat and mouse. EST data indicate that the marine FTO homologues in the diatom P. tricornutum and the prasinophyte M. pusilla are expressed under standard growth conditions. Although the biological roles of the algal FTO homologues are still unknown, these genes can be used, together with the vertebrate sequences, to explore basic protein features. Based on primary sequence characteristics, FTO proteins are unlikely to be targeted to either membranes or to organelles but, rather, are predicted to be globular, cytosolic proteins with mixed α/β structures. Looking at conserved positions shows a drop from 195 positions conserved among animal sequences to only 44 conserved over all sequences, likely pinpointing the functionally essential residues. Among the most widely divergent FTO sequences, three amino acid residues (W, Y, and H) are strikingly overrepresented among the 44 absolutely conserved positions (see Supplementary Fig. 1). In silico predictions indicate that these residues are more likely to be located at an active site than to be at a protein-protein interface or to be surface interacting residues (Ma et al. 2003). This suggests that FTO may have an enzymatic function rather than be involved in protein-protein interactions. Three of the conserved positions have high prediction scores for phosphorylated residues, indicating a potential role for phosphorylation in regulation of FTO.

Our findings do not negate the association between FTO intron 1 SNPs and obesity. While identification of risk factors has advanced tremendously, for the most part, the functional ramifications of these genetic variations remain uncharacterized. In the case of FTO, Frayling and colleagues (2007) raised the alternative hypothesis that the intron 1 SNP might serve to alter regulation of another gene, as opposed to having a specific affect on the product encoded by FTO. While risk factors carry value in preventive medicine, it is mechanistic knowledge that fosters therapeutic innovation. Why marine algae harbor and express FTO is unclear, as is the link with obesity in humans. However, previous studies have demonstrated that algal research can be applied to investigation of vertebrate gene function. For instance, Chlamydomonas is often referred to as “the green yeast” because it is an easy-to-work-with eukaryotic model organism which also performs photosynthesis (see Li et al. 2004). None of the highly developed but easy to use (i.e., not involving animal work) model organisms (e.g., Chlamydomonas, Arabidopsis, yeast, Drosophila, and C. elegans) possesses an FTO gene. Thus, here we identify alternative systems for functional studies, such as the genetically tractable diatom Phaeodactylum (Siaut et al. 2007). These in turn will shed light on FTO function and, should that function be relevant to vertebrate homologues, thereby streamline research on genetic factors contributing to human obesity.

Methods

We initially scanned all publicly available nonredundant databases, as well as our in-house data for homologues of the mouse FTO gene, using BLASTP (Altschul et al. 1997). Because there was a very clear drop-off in E-value between homologues and nonhomologues (significant values, from E−82 to E−27, then dropping to nonsignificant E-values of ≥0.71), selection of FTO homologues was straightforward. No (distantly related) genes homologous (or partially homologous) to the FTO genes could be identified. Next, HMMer (Eddy 1998) was used to generate a specific profile of the FTO gene family with hidden Markov Models, using all available sequences, and we searched NCBI EST and genome databases using tblastn. However, no new candidate FTO genes were detected.

Annotation of the FTO gene sequences was manually checked and corrected using artemis (Rutherford et al. 2000) when necessary. Protein sequences were aligned with clustalw (Thompson et al. 1994), and after manual improvement of the alignments using bioedit (Hall 1999), only 266 well-aligned positions were taken into account for tree construction. Pairwise distance trees were constructed using treecon (Van de Peer and Wachter 1994), based on Poisson-corrected distances, while phyml 2.4.4 (Guindon and Gascuel 2003) was used to compute the maximum likelihood tree. Bootstrap analyses with 500 replicates were performed to test the significance of the nodes. Both methods gave identical tree topologies and similar bootstrap support.

Data

Accession numbers are as follows: Ostreococcus lucimarinus, XP_001420808; Ostreococcus tauri, CAL57236; Thalassiosira pseudonana, jgi|Thaps3|261481|thaps1_ua_kg.chr_2000305 (http://genome.jgi-psf.org/Thaps3/Thaps3.home.html); Phaeodactylum tricornutum, jgi|Phatr2|41429|fgenesh1_pg.C_chr_30000044 (http://genome.jgi-psf.org/Phatr2/Phatr2.home.html); and Micromonas pusilla, EU293868. FTO sequence from and Ectocarpus siliculosus can be obtained from the authors upon request.