Introduction

Recently, we identified a new eukaryotic protein family, the TPPPs (Vincze et al. 2006; Ovádi and Orosz 2009; Orosz et al. 2009). Its first member, the tubulin polymerization-promoting protein, TPPP/p25, was originally found as a brain-specific protein, p25alpha, with unknown function (Takahashi et al. 1991). The old name was chosen on the basis of the approximately molecular mass of the protein. It is mainly expressed in differentiated oligodendrocytes (Takahashi et al. 1993; Skjoerringe et al. 2006; Lehotzky et al. 2008, 2010). We have shown that this small, basic, unstructured protein promotes tubulin polymerization into normal and double-walled microtubules and induces their bundling (Hlavanda et al. 2002; Tirián et al. 2003; Hlavanda et al. 2007). It exhibits microtubule-associated protein (MAP)-like functions by the stabilization of the microtubular network (Lehotzky et al. 2004; Hlavanda et al. 2007; Tőkési et al. 2010). Under pathological conditions, TPPP/p25 is enriched in glial and neuronal inclusions in synucleinopathies as Parkinson’s disease and multiple system atrophy (Kovács et al. 2004; Orosz et al. 2004). Recently, it has also been suggested that TPPP/p25 may work as a protective factor for cells against the damage effects of the accumulation of abnormal forms of prion protein (Zhou et al. 2011).

TPPP/p25 (or TPPP1) has two paralogues in the human (and mammalian) genome(s), denoted TPPP2/p18 and TPPP3/p20 (shortly TPPP2 and TPPP3), indicating their lower molecular mass, which do not possess the N-terminal tail of about 50 amino acids of TPPP1 (Vincze et al. 2006) (cf. Fig. 1). TPPP3 but not TPPP2 shares the MAP-like features of TPPP1. The common C-terminal part (55–219 amino acids in TPPP1) is denoted as p25alpha domain, Pfam05517 or IPR008907, which practically corresponds to the whole sequence of the two shorter paralogues.

Fig. 1
figure 1figure 1

Multiple sequence alignment of several members of the TPPP family by ClustalW. The alignment was refined manually. Residues identical and similar in at least all but two species are indicated by black and grey backgrounds, respectively. The p25alpha (Pfam05517, IPR008907) domain involves the whole protein except the N-terminal tail, i.e. the first line of the amino acid sequences. The first two lines represent the first coding exon; the third and the fourth lines correspond to the second and third exons, respectively. Asterisk notes that these amino acids are coded by the last two nucleotides of the first and the first nucleotide of the second exon. Proteins and ESTs(*) are HsTPPP1, HsTPPP2, HsTPPP3: Homo sapiens NP_008961, NP_776245, NP_057048; GgTPPP1, GgTPPP2, GgTPPP3: Gallus gallus XP_001231864, XP_424853, CR385779*; Ac TPPP1, AcTPPP2, AcTPPP3: Anolis carolinensis ENSACAP00000009224, FG794076*, FG677375*; TnTPPP1, TnTPPP4: Tetraodon nigroviridis CAG11971, CAF95233; CpTPPP2: Cynops pyrrhogaster FS292027*; XtTPPP3: Xenopus tropicalis NP_001096466; DrTPPP3, DrTPPP4: Danio rerio EH455616*, XP_687926; GaTPPP4: Gasterosteus aculeatus DN725593*; ScTPPP4: Sebastes caurinus GE803880*. For comparison, TPPPs of C. intestinalis (XP_002124388) and B. floridae (XP_002215000) are also shown

Recently, my analysis has revealed that TPPP proteins occur in two main different types, as long- and short-type ones, the distribution of which is characteristic for the phylogenomic super-groups (Orosz 2009). The typical length of short- and long-type TPPPs is about 150 and 170–180 amino acids, respectively. (In vertebrates, TPPP1 and its orthologues are even longer due to their additional N-terminal tail.) The short-type proteins do not contain a very conservative 30–31 amino acid sequence, probably responsible for the microtubule-binding capability, in the C-terminal half of the protein.

The short- and long-type TPPPs were clearly separated according to the phylogenetic analysis. It was unambiguous in trees obtained either by maximum likelihood (ML) or Bayesian methods (Orosz 2009). I concluded that short- and long-type TPPPs are paralogous proteins, encoded by different genes. The duplication may be the consequence of an event occurring in the common ancestor of eukaryotes.

The most important difference in their phylogenetic distribution is that Opisthokonta (Metazoa, Fungi as well as Choanomonada and some other unicellular organisms) is specific exclusively for the long-type TPPP. It is present in all the metazoan genomes known but Trichoplax adhaerens which contains instead a partial p25-alpha domain as a part of a new chimerical protein, apicortin (Orosz 2009). As in mammals, there are three paralogous (long-type) Tppp genes (Vincze et al. 2006), the possibility arose (Orosz 2009) that the multiplication of the TPPP gene is due to the two-rounds of large scale (whole-genome) gene duplication occurred in the vertebrate lineage after the separation of amphioxus and craniate ancestors (Ohno 1970; Pébusque et al. 1998; Panopoulou et al. 2003; Dehal and Boore 2005), and one of the four paralogues was lost.

In this article, I refine this picture raising the possibility of the existence of the fourth paralogue, tppp4, exclusively among teleost fishes, and suggesting a scenario of two-round duplication of the TPPP gene. Alternatively, the new group can be considered as the teleost fish orthologue of TPPP2.

Methods

Database Homology Search

Accession Numbers of protein and EST sequences refer to the NCBI RefSeq and GenBank databases, respectively, unless otherwise stated. Sequences of proteins/ESTs used in this study are available in supplementary file 1.

The database search was started with an NCBI blast search using the sequences of human TPPP proteins (NP_008961; NP_776245; NP_057048) as queries. BLASTP or TBLASTN analysis (Altschul et al. 1997) was performed on complete genome sequences and EST collections available at the NCBI website (http://www.ncbi.nlm.nih.gov/BLAST/). Similar search was carried out on various fish databases: http://www.fugu-sg.org/; http://www.sanger.ac.uk/Projects/D_rerio/; http://www.ensembl.org/Tetraodon_nigroviridis/; http://dolphin.lab.nig.ac.jp/medaka/. The homepage of the Ensembl project (Flicek et al. 2011), http://www.ensembl.org/, was also checked for orthologues. Nucleotide sequences identified in BLASTN searches were translated in the reading frames denoted in the BLASTN hit, taking frame shifts or introns of genomic sequences into account.

Phylogenetic Analysis

Multiple alignments of sequences were done by the ClustalW program (Larkin et al. 2007). Bayesian analysis was performed using MrBayes v3.1.2 (Ronquist and Huelsenbeck 2003). Default priors were used. The Poisson model (Bishop and Friday 1987) was used assuming equal rates across sites. Two independent analyses were run with three heated and one cold chain (temperature parameter 0.2) for 5 × 105 (2 × 105) generations, with a sampling frequency of 0.01 and 2 × 105 (1.4 × 105) generations discarded as burn-in. The runs converged in all cases.

The Phylip package version 3.68 (Felsenstein 2008) was used to build the ML phylogenetic trees with bootstrap values. One hundred datasets were generated using the program Seqboot from the original data, i.e. the multiple alignments done by ClustalW. (Multiple sequence alignment used for constructing phylogenetic trees is shown in supplementary file 2.) This was followed by running the program Proml (Protein ML) on each of the datasets in the group, using the Jones–Taylor–Thornton (JTT) model (Jones et al. 1992). A consensus tree (from all the 100 trees) was generated using the program Consense. The trees were drawn using the program Drawgram in all cases.

The ML tree was inferred using Tree-Puzzle 5.3. rc7 (Schmidt et al. 2002; Schmidt 2009), by inputting all possible tree topologies in ‘user-defined trees’ mode, assuming JTT + I+Γ8 model (the gamma distribution parameter α = 0.93). In the alternative topologies, I avoided breaking up monophyletic groups that were consistently recovered in all phylogenetic analysis: TPPP1 (TPPP1tetrapod, TPPP1 fish), TPPP2, TPPP3tetrapod, TPPP3fish, TPPP4; Outgroup1, Outgroup2). Phylogenetic relationships within these groups were constrained according to generally accepted phylogenetic relationships of relevant species. Statistical tests for evaluation of alternative tree topologies were performed using CONSEL (Shimodaira and Hasegawa 2001).

Alternatively, Boolean analysis using BOOL-AN v6.56 (Jakó et al. 2009) was used for analysing nucleotide sequences (cf. supplementary file 3). For the BOOL-AN, the positional information and ordering of sequence sites are also essential (Jakó et al. 2009; Ari et al. 2012). Euclidean method was used for distance calculation (Podani 2000) and the neighbour-joining method (Saitou and Nei 1987) for reconstructing phylogenetic trees.

Synteny Analysis

Paralogous genes on human chromosomes which possess TPPP genes were blast searched using the reciprocal best hit method (Tatusov et al. 1997; Bork et al. 1998). Large-scale investigation of synteny among TPPP loci Genomicus (http://www.dyogen.ens.fr/genomicus-66.01/cgi-bin/search.pl) (Muffato et al. 2010) and Synteny Database (http://teleost.cs.uoregon.edu/acos/synteny_db/) (Catchen et al. 2009) was used.

Results

A New Group of TPPP Proteins

BLASTP or TBLASTN analysis (Altschul et al. 1997) was performed on complete genome sequences and EST collections available at the NCBI and some fish genome websites using the sequences of human TPPP proteins as queries. A standard and simple method for determining orthology is the reciprocal best-hit approach (Tatusov et al. 1997; Bork et al. 1998). It helps to reveal 1:1 orthology even in the cases of relatively high phylogenetic distances. However, in the case of vertebrates the similarity of the genes/proteins are rather high; I found that the identity of the sequences was always higher than 50 %.

On the basis of the reciprocal best-hit method, it was unambiguous in most cases to identify the orthologues of TPPP1, TPPP2 and TPPP3. Moreover, the N-terminal tail is present only in TPPP1s and certain residues are characteristic for each paralogue, which helps their classification. All the three paralogous TPPP genes can generally be found in mammals, birds and reptiles (Cf. Fig. 1). Much less genomes are available for amphibians; maybe this is the reason why generally only the TPPP3 orthologue has been identified in this class. It is notable that various Xenopus species also seem to contain only TPPP3. TPPP1 orthologues have not been found yet, and only one TPPP2 as EST in Cynops pyrrhogaster (GenBank ID: FS292922 and FS292027) is known. However, class-specific gene loss cannot be excluded. TPPP1 and TPPP3 are widely distributed in teleost fishes. Interestingly, some of the recently sequenced ESTs found in this class show high identity with sequences of human TPPP family members, especially with that of TPPP3, but are not reciprocal best hits of any of them (Table 1). All of them show the highest sequence identity with human TPPP3, then with TPPP1, and last with TPPP2. The same order occurs if the comparison is made with the three chicken (Gallus gallus) TPPPs. (In some cases, only one TPPP has been identified in a given fish yet. Some of them are highly similar to the members of the new class. These are not listed in the table.)

Table 1 Fish TPPPs without human reciprocal best hit

There are at least two possible explanations. One of them is that these TPPPs represent the ‘lost’ fourth member of the family (cf. Introduction Section), retained (or has been found) only in teleost fishes. Alternatively, they can be the consequence of the well-known whole-genome duplication of the teleost fishes (‘3R’) (Aparicio et al. 2002; Christoffels et al. 2004; Jaillon et al. 2004; Kasahara et al. 2007). Thus, they could be named TPPP4 or TPPP3A in the first and the second case, respectively (Fig. 2a, b).

Fig. 2
figure 2

Schematic representation of the alternative trees of TPPPs. a (AB)(CD) evolutionary topology due to two rounds of whole-genome duplication. TPPP4s are the result of the 2R. b Emerging of TPPP3A subfamily as the consequence of fish-specific 3R. ce The members of the new group are the fish orthologues of TPPP2. The dotted line and the italic style indicate that TPPP4 was lost in the scenarios be. The new TPPP group is labelled with bold style

However, if we look for the best hits of human (or other mammalian) TPPP2 in various teleost fish species, it is always this ‘new’ (if exists) gene/protein not TPPP1 or TPPP3. It raises a further possibility, namely that the newly identified ESTs are the fish orthologues of TPPP2, which would be the simplest phylogenetic scenario in comparison with the two other ones mentioned above (Fig. 2c, d, e).

Phylogenetic Analysis

The phylogenetic tree of vertebrate TPPPs obtained by Bayesian analysis seems to give an unambiguous answer to this question (Fig. 3). An evolutionary topology of (AB)(CD) can be seen, with maximal support at all the basal nodes. TPPP1s + TPPP2s together form a sister group for the clade of TPPP3s + TPPP4s. TPPP1s and TPPP2s, on one hand, and TPPP3s and TPPP4s, on the other hand, are sister groups of each other. Inside the TPPP3 branch, the fish TPPP3 s form a sister group for tetrapod TPPP3s. Thus, the tree supports, with maximum posterior probabilities at the main branches, the possibility that the new group of fish TPPPs can be considered as TPPP4s, as the result of a gene duplication of the common ancestor of the TPPP3 and TPPP4 gene (Fig. 2a). If the 3R scenario would be held, then the new group had been the sister group only of the fish TPPP3s, but not that of the whole TPPP3 group (Fig. 2b).

Fig. 3
figure 3

Phylogenetic tree of vertebrate TPPPs constructed by Bayesian analysis. TPPPs of C. intestinalis and B. floridae are also involved as out-group. Numbers above internal branches indicate Bayesian posterior probabilities (shown as percentages). The values are not shown within classes. Branches that received maximum support are indicated by full circles

On the contrary, there were very poor or no bootstrap supports (<50 %) for the various positions in the ML tree. The main difference between the two trees is that TPPP2s, instead of being a sister group for TPPP1s, are inserted into the TPPP4 clade (Fig. 4). However, TPPP3s and TPPP4s are sisters to each other in both cases. It should be added that the topology of the ML tree depended on the outgroup(s) (TPPPs from Branchiostoma floridae, Ciona intestinalis, Nematostella vectensis, Strongylocentrotus purpuratus and their combinations) (data not shown). For example, TPPP1 was found in the different cases as sister to the clade of the other groups; to TPPP2 + TPPP4; or to TPPP3. The situation was the same if the long N-terminal tail of TPPP1s were not involved in the alignment and only the p25alpha domain (cf. Introduction Section) was used for phylogenetic analyses. However, TPPP2 and TPPP4 were always sisters to each other with a bootstrap varying between 34 and 50, indicating that TPPP4 may be the teleost fish orthologue of TPPP2 (cf. Fig. 2d). It is difficult to reconcile the different results from these two methods that account probably for rate heterogeneity in amino acid sequence evolution. The possibility that the new group is the consequence of the teleost fish-specific whole-genome duplication (the TPPP3A case) was not supported by the ML trees as well, as this group was never situated inside the TPPP3s.

Fig. 4
figure 4

Phylogenetic tree of vertebrate TPPPs constructed by ML analysis. TPPPs of C. intestinalis and B. floridae are also involved as out-group. Numbers above internal branches indicate bootstrap values. The values are not shown within classes. Branches that received maximum support are indicated by full circles

However, on the basis of the statistical analysis of ML trees even this possibility could not be excluded. I conducted an exhaustive ML analysis as suggested by Feiner et al. (2009) with several ‘operational taxonomic units’, i.e., monophyletic groups that were consistently recovered in all phylogenetic analysis: TPPP1 (TPPP1tetrapod, TPPP1 fish), TPPP2, TPPP3tetrapod, TPPP3fish, TPPP4; Outgroup1, Outgroup2). Phylogenetic relationships within these groups were constrained according to generally accepted phylogenetic relationships of relevant species.

Table 2 lists the topologies of highest probabilities. Out of all the 220 possible tree topologies, 18 of them were revealed to be within 0.5 S.E. of log-likelihood difference. Multiple tree topologies, including the ones represented in Fig. 2 (models A, B, C, D and E as discussed above) tend to be supported with similar log-likelihood values, and one cannot rule out any of these hypotheses. Only the trees shown on Fig. 2 are compatible with the (at least) two rounds of whole-genome gene duplication. The new TPPP group can be placed on the trees into four different positions of almost equal probabilities. However, in more than half of the cases (10 from 18), it is sister to TPPP2, which may mean that the new group is the fish orthologue of TPPP2.

Table 2 Statistical supports for alternative tree topologies for relationships among TPPP genes
Table 3 Number of genes in paralogons which include TPPP orthologs

TPPP Genes Involved in Large-scale (Whole Genome) Duplication

The presence of the four paralogues is in accordance with two rounds of large scale (whole-genome) gene duplication occurred in the early vertebrate lineage (Ohno 1970; Pébusque et al. 1998; Panopoulou et al. 2003; Dehal and Boore 2005). These kinds of paralogues are termed ‘ohnologues’ in honour of Ohno’s study (Wolfe 2000). It was suggested by several authors that the proof of these duplication events are the presence of the paralogons in the vertebrate genomes, i.e., the special physical organization of several sets of genes in a genome. The paralogons are paralogous chromosomal regions, derived from a common ancestral region. Thus, they have the same set of gene pairs in the same genome but at different chromosomal locations (Panopoulou et al. 2003; Dehal and Boore 2005). (In practice, a maximum of 100 unduplicated genes between them are allowed (Dehal and Boore 2005)). Indeed, I found that TPPP1 and TPPP3, in the human genome, are involved in paralogons containing several genes duplicated together, on Hsa5 and Hsa16 chromosome, respectively. Similar situation occurs in mouse, where Tppp1 and Tppp3 are located in paralogons on Mmu13 and Mmu8, respectively (data not shown).

Figure 5 and supplementary data 4 show that 24 genes of 40, including TPPP1, located at the end of the p arm of Hsa5 (5p15.33 region), have paralogues on Hsa16, in the q12–q22 regions, which are reciprocal best hits, not only on the corresponding chromosomes but also in the whole human genome. (Earlier, Dehal and Boore (2005) identified these paralogons but found only 5 paralogues among 35 and 142 genes of Hsa5 and Hsa16, respectively.) The TPPP1 gene is involved in a cluster, the genes of which can be found in the same order in the 16q22 region. Some other genes of the 5p15.33 region, including three IRX paralogues, can be found in inverse order and orientation in the 16q12 region. It is worth noting that, in contrast to the human case, the irx genes can be found in the neighbourhood of tppp3 in not-inverted orientation on Dre7 of Danio rerio (zebrafish) (data not shown); this fact suggests that the inversion occurred only after the Actinopterygii–Sarcopterygii split.

Fig. 5
figure 5

Paralogons of Hsa5 and Hsa16 including TPPP1 and TPPP3. The version of the Synteny DB including datasets from Ensembl version 61 was used to obtain the figure using a 100-gene sliding window (Cluster #48121). Conserved and non-conserved genes are represented by black and white squares, respectively

Although in many cases tri- and tetra-paralogons can also be found in the human genome, I did not find further paralogons for the above mentioned regions. TPPP2 gene is located at Hsa14; however, its immediate neighbourhood does not show any similarity to the corresponding regions of Hsa5 and Hsa16.

Shared synteny, i.e., the preserved co-localization of genes on chromosomes of different species can be indicative for orthology (Jun et al. 2009). There are shared synteny between human and teleost fishes in the case of TPPP1 and TPPP3 (Fig. 6 and Supplementary data 4). For example, regions on Hsa5 and Ola16 of Oryzias latipes (medaka) include TPPP1 and its neighbour, CEP72; regions on Hsa16 and Ola3 include TPPP3 and its neighbour, ZDHHC1, beside other orthologues. (Medaka is mentioned as an example as not only the existence but also the locations of all the TPPP genes are known.) The ZDHHC genes are the immediate neighbours of TPPP genes in almost every vertebrate: ZDHHC1 and its paralogue, ZDHHC11, are located beside TPPP3 and TPPP1, respectively. This gene is also the immediate neighbour of sea lamprey (Petromyzon marinus) tppp (ENSPMA00000005581). No synteny occurs in the case of human TPPP2 on Hsa14 and medaka tppp4 on Ola1. (The given region on Hsa14 has synteny with Ola17 and Ola18, whilst the region on Ola1 shows synteny with a part of Hsa17.) The position of either tppp4 or TPPP2 is not stabilized at all. Comparing the position of human and chicken TPPP2, I found no paralogous genes using a 100 gene window size. On the other hand, there are no (pufferfish, Tetraodon nigroviridis) and only one (zebrafish) paralogous gene in the 30 gene neighbourhood of tppp4 comparing with medaka or stickleback (Gasterosteus aculeatus). Synteny data show no signs of 3R duplication of TPPPs. 3R paralogues would be on Ola11 (paralogon for Ola16 including tppp1) and on Ola6 (paralogon for Ola3 including tppp3) but they do not exist. Other tentative paralogons cannot be identified as Ola1, where tppp4 is situated, has paralogons on Ola15, Ola8, Ola10 and Ola18.

Fig. 6
figure 6

Paralogons including TPPP1 of various species. The version of the Synteny DB including datasets from Ensembl version 61 was used to obtain the data for the figure using a 25-gene sliding window. Conserved genes on Hsa5 are represented by black squares and on teleost fish chromosomes by grey ones except TPPP1 which is black. Non-conserved genes are white

Tppp Gene in Cyclostomes

Further enlightenment could be expected if we knew more about the early vertebrate evolution and the exact timing of two rounds of whole-genome duplications. At present, it is considered that the 1R and 2R happened before and after the split of cyclostomes (hagfish, lamprey) and gnathostomes (jawed vertebrates) (Kasahara 2007); alternatively, both 1R and 2R took place before the split (Kuraku 2008; Kuraku et al. 2009). The increasing genomic information about Chondrichthyes (e.g. sharks) and Petromyzontidae (e.g. lampreys) will surely help to choose between these possibilities. At this moment only insufficient data are available; in the case of tppp genes two whole sequences from lampreys, and several EST sequences for Callorhinchus milii (elephant shark), for Lethenteron japonicum (lamprey) and for P. marinus, which correspond to various tppp exons, are available. (The exon–intron boundaries are well conserved and are the same in vertebrate Tppps. The length of the first and the third exons somewhat varies with the species and the subfamily of Tppps; the length of the second exon is almost absolutely conserved. Cf. Fig. 1). The first exon is known from several tppp genes of both lampreys and elephant shark thus I compared them with the first exons of fish and several vertebrate tppp genes. Figure 7 shows the tree based on the nucleotide sequences of the first exons. The topology of the tppp subfamilies is the following: tppp4and fish tppp3 are sisters of each other; together they are sisters to other tppp3 s including two lamprey exons. These clades are sister to tppp2; finally, all of these clades are sister to tppp1. This scenario is in accordance with the 3R whole-genome duplication. The clades of Euteleostei and Otocephala (Lavoué et al. 2005) are clearly separated within all the tppp subfamilies. One of the lamprey exons and one of the elephant shark exons are clustered with tppp1 exons. The other elephant shark exon and two other lamprey exons are grouped with tppp3 exons. None of them occur in tppp2 or tppp4 clades. These data are in accordance with but do not prove the view that only 1R, which resulted in the tppp1tppp3 split, took place before the cyclostome/gnathostome separation. Obviously, further knowledge on the sequences of these species could strengthen or change this view.

Fig. 7
figure 7

Boolean analysis of the first exons of fish and several tetrapod TPPPs. TPPP of B. floridae is also involved as out-group as well as exons found in lamprey (L. japonicum) and sea lamprey (P. marinus)

Discussion

TPPP1 is a microtubule-stabilizing protein, which may have important role in various neurodegenerative diseases. Until now, two further TPPP paralogues have been known in vertebrates, TPPP2 and TPPP3 (Vincze et al. 2006). Thus, the possibility arose that the multiplication of the TPPP gene is due to the two rounds of large scale (whole-genome) gene duplication (Orosz 2009), which is known to be occurred in the early vertebrate lineage (Ohno 1970; Pébusque et al. 1998; Panopoulou et al. 2003; Dehal and Boore 2005), and one of the four paralogues was lost.

Some of the recently sequenced ESTs found in teleost fishes show high identity with sequences of human TPPPs but are not reciprocal best hits of any of them. All of them show the highest sequence identity with human TPPP3, then with TPPP1, and last with TPPP2. There are several possible explanations (Cf. Fig. 2). The first one is that these genes represent the ‘lost’ fourth member of the family, retained only in teleost fishes, and can be named TPPP4. The second possibility is that they are the consequence of the whole-genome duplication in the ancestor of teleost fishes (‘3R’) (Aparicio et al. 2002; Christoffels et al. 2004; Jaillon et al. 2004; Kasahara et al. 2007). Finally, a simple but surprising case would be that the newly identified ESTs are the orthologues of TPPP2.

Bayesian analysis indicates that, with maximum posterior probabilities at the main branches, the new group of fish TPPPs can be considered as TPPP4s, as the result of a gene duplication of the common ancestor of the TPPP3 and TPPP4 gene. However, there are very poor or no bootstrap supports for the various positions in the ML tree which supports weakly the last scenario. TPPP3s and TPPP4s are sisters to each other in both cases. TPPP1s are always out-group to TPPP3s + TPPP4s. The only difference between the trees is the position of the (tetrapod) TPPP2s. Are they sister to TPPP1s (Bayesian tree, Fig. 3) or to the new fish-specific group, TPPP4s, which means that TPPP2s and TPPP4s are orthologues (ML tree, Fig. 4)? In other words: is this a case of reciprocal lineage-specific ‘ohnologues gone missing’ (loss of TPPP2 in the teleost fish lineage and loss of TPPP4 in the tetrapod lineage) (Postlethwait 2007); or, simply is one ohnologue (a sister to TPPP1) missing everywhere? On the basis of statistical analysis of likelihood values, neither these cases nor the ‘3R’ scenario (Fig. 2b) can be excluded. The new TPPP group can be placed on the phylogenetic tree into different positions of almost equal probabilities. However, in most of the supported trees, it is sister to TPPP2, which may mean that the new group is the fish orthologue of TPPP2 (Table 2).

Fig. 8
figure 8

Paralogons including TPPP2 of various species. Genomicus version 66.01 was used to obtain the data for the figure. Conserved genes are represented by filled squares as labelled on the figure. Non-conserved genes are white. ARHGEF40 is a PLEKHG4B paralogue

These results may indicate that there is insufficient variation to support the nodes which would reveal the relationship of the various TPPP groups. A possible reason of the discrepancy in the phylogenetic trees can be that the TPPP2 subfamily seems to evolve significantly faster than TPPP1s and TPPP3s (Table 4). The similarity among TPPP2s is much less than among TPPP1s and especially among TPPP3s. [As this author's group has shown earlier (Hlavanda et al. 2007), the somewhat higher difference among TPPP1s than among TPPP3s can be attributed to the unstructured N-terminal tail of TPPP1s, as unstructured regions usually evolve faster than the structured ones (Chen et al. 2006)]. The amino acid compositions of TPPP2s are the most divergent comparing to other TPPPs which may contribute to the uncertainty of the position of this group. The faster evolutions of some paralogues are not extraordinary; e.g. germ-line-specific genes evolve generally faster than the other ones (Koonin and Wolf 2010). As TPPP2s evolved significantly faster than TPPP3s, this fact might explain why the new fish TPPPs are more similar to mammalian TPPP3s than to TPPP2.

Table 4 Percentage identity of various TPPPs

Investigation of conserved syntenies may provide a clear picture when the phylogenetic trees have no sufficient power to resolve the history of a gene family (Postlethwait 2007). The human genome possesses three TPPPs (TPPP1, TPPP2, and TPPP3). TPPP1 and TPPP3, but not TPPP2, are involved in paralogons, on Hsa5 and Hsa16, respectively. However, all the duplicated genes are unlikely to form paralogons because of chromosomal rearrangements that occurred after the duplication event. Of course, in the case of single gene duplications, one cannot expect to be present paralogous chromosome regions. Shared synteny, i.e., the preserved co-localization of genes on chromosomes of different species can be indicative for orthology (Jun et al. 2009). Indeed, there are synteny shared between human and teleost fishes in the case of TPPP1 and TPPP3 (cf. Table 3.) E.g., these paralogons on medaka can be found on Ola16 and Ola3, respectively. No synteny occurs in the case of human TPPP2 on Hsa14 and medaka TPPP4 on Ola1. The situation is the same if mouse or chicken TPPPs are considered: synteny with fish chromosomes occurs only in the cases of TPPP1 and TPPP3 (data not shown). If synteny occurred it would be supportive for orthology of TPPP2 and TPPP4. The lack of synteny does not exclude this situation although is contra-indicative.

The locations of paralogons which include TPPP1 and TPPP3 follow those of other 1R/2R paralogons (Table 5) as defined by Nakatani et al. (2007). The original TPPP gene was located on the proto-chromosome B in the vertebrate ancestor. According to their chromosome reconstruction, chromosome B corresponds to six proto-chromosomes, B0–B5, in gnathostome (jawed vertebrate) ancestor. The split of B into the ancestors of B0 and B1 and B2, B3, B4 and B5, respectively, happened in 1R, whilst further split happened in 2R. TPPP1 and TPPP3 were located on proto-chromosomes, B0 and B4, respectively.

Table 5 Conserved vertebrate linkage blocks in proto-chromosomes of the vertebrate ancestor

Human and chicken TPPP2 are located on Hsa14 and GgaZ, respectively. Medaka tppp4 is on Ola 1. According to the reconstruction by Nakatani et al. (2007), neither GgaZ nor Ola1 contains genes originated from the vertebrate ancestor proto-chromosome B, whilst Hsa14 contains a small fragment of it through the gnathostome proto-chromosome B5. Thus, it suggests that the first split occurred between the ancestor of TPPP1 and the ancestor of TPPP3/TPPP2 in 1R. Considering the position of human TPPP2, it follows that TPPP3 and TPPP2 were split in 2R (B4–B5 split), whilst the product of the second duplication of TPPP1 (B0–B1 split) was lost. (This situation corresponds to what is shown in Figs. 4 or 2d). If I consider the position of chicken Tppp2 or medaka tppp4, then I cannot draw any definite conclusion; i.e., it is not possible either to verify or to reject the above mentioned scenario.

Following the logic of Nakatani’s reconstruction, genes on chromosome B0 should have their 2R ‘ohnologues’ on proto-chromosome B1. The corresponding human, chicken and medaka chromosomes are Hsa20, Gga20 and Ola5 and Ola7. In general, there are a strong synteny among Hsa20, Gga20 and Ola5 and Ola7 (Nakatani et al. 2007). Thus, the ‘lost’ TPPP would have been located on these chromosomes. It should have been valid for the genes of paralogons containing TPPP1. Thus, I checked in Genomicus whether the seven genes, TPPP1 and its immediate neighbours, which are included conservatively in this paralogons (cf. Fig. 6), are present on these chromosomes. The ohnologue of one of the seven genes, PLEHG4B, namely KIAA1755, is present on all the four chromosomes, in the paralogous blocks (Table 5); the other six ones are missing in each case. It suggests that the other six ones, including a TPPP paralogue, were lost shortly after the 2R, in the gnathostome or in the osteichthyan (bony vertebrate) ancestor. At least, it is more parsimonious explanation than their independent loss.

If I investigate the neighbours of the human TPPP2, then it helps me to confirm its history. Its immediate neighbour is NDRG2. This gene together with TPPP2 and two other genes (ARHGEF40, which is a PLEKHG4B paralogue, and ZNF219) are present in paralogons not only in mammals but also in the lizard, Anolis carolinensis (Fig. 8). These genes but Tppp2 are located on the neighbourhood of each other in frog and in teleost fishes as well. None of the three genes can be found in chicken at all indicating that they were lost together with most of the genes of the B5 gnathostome proto-chromosome (cf. Nakatani et al. 2007). Tppp2 was retained but in an unusual position in chromosome Z which does not share synteny with other TPPP2-containing chromosomes. It is worth to mention that the NDRG genes are nice examples of ohnologues (Table 5). They follow the position of other genes of 1R/2R paralogons (Table 5) as defined by Nakatani et al. (2007) and are an example for reciprocal ‘ohnologues gone missing’ (Postlethwait 2007) after the 1R, 2R genomes' duplications: loss of Ndrg2 in the avian lineage and loss of ndrg4 in medaka. These facts suggest that human TPPP2 is located on its ‘right’ place on Hsa14, which is a ‘descendant of the B5 gnathostome proto-chromosome. Thus, TPPP2 on Hsa14 is the sister of TPPP3 on Hsa16 originating from the B4 gnathostome proto-chromosome. Thus, the right name for TPPP2 ought to be TPPP4, whilst the ‘real’ TPPP2 of the B1 gnathostome proto-chromosome was lost (cf. Figs. 4 and 2d).

Finally, I cannot tell anything about the ‘fish-specific’ tppp using only synteny data. The lack of shared synteny makes it impossible to give an answer. I guess that tppp4 genes are not located in their original position. Considering medaka as an example, tppp1 and tppp3 are located on the ‘right’ place on Ola16 and Ola 3, respectively. Their 3R paralogues ought to be on Ola11 and Ola6, respectively. The ‘lost TPPP’ and its 3R paralogue ought to be on Ola5 and Ola7; whilst the orthologue of tetrapod TPPP2 and its 3R paralogue ought to be on Ola18 and Ola17, respectively. However, the ‘fish-specific’ TPPP paralog is on Ola1.

Thus, the probable history of the quadruplication of the single non-vertebrate tppp gene was that the diversification of TPPP1 and the precursor of TPPP2/TPPP3 occurred in the first round of whole-genome duplication. It was followed by two further splits, TPPP1/lost and TPPP3/TPPP2, in the second round. The phylogenetic analysis does not allow giving a clear answer; however, only this possibility is supported by the synteny analysis. The remaining question is the position of the new fish-specific group of TPPPs (tppp4). All the phylogenetic analysis shows unequivocally that tppp4s are not sister to tppp1s. Thus, the two remaining possibilities in the frame of whole-genome duplications are that they are the fish orthologues of tetrapod TPPP2s or the 3R paralogues of fish tppp3. In the latter case, the subsequent translocation of the tppp4 gene had to occur. ML analysis allows both possibilities; Boolean analysis based on the first exons supports the 3R case (Fig. 7), whilst synteny analysis does not help resolving the problem. Alternatively, a single gene duplication of a tppp gene (except that of the tppp1), an event specific for teleost fishes, might have occurred.

The fate of duplicated genes, in the majority of cases, is the degeneration of one of the copies. It seems to be held for the quadruplicated and octaduplicated genes due to the 2Rs (Dehal and Boore 2005) and 3Rs as well (Sato et al. 2009). Only a smaller part is retained due to their sub- or neofunctionalization. However, if they ‘survive’, the most prevalent case (about 25 %) is that all the four paralogues are kept (Dehal and Boore 2005). For TPPPs, this scenario might be also valid, although differential losses might occur such fishes are missing tppp2 and tetrapods are missing Tppp4. However, as I have shown, the general loss of the fourth paralogue seems to be more probable. However, it cannot be excluded that the lacks are due to incomplete sequencing rather than gene loss.

What can we tell about the sub/neofunctionalization of TPPP genes? It has been previously suggested, on the basis of phylogenomic analysis, that there is a tight parallel between the phylogenetic distribution of TPPP proteins and eukaryotic cilium/flagellum (Orosz and Ovádi 2008). TPPP orthologues can be found in all kinds of ciliated organisms thus their role may be connected to a basic function of cilium. The axoneme of cilium consists of nine doublet microtubules; motile cilium (or flagellum) contains an additional central pair of microtubules. The microtubule-stabilizing (MAP-like) effect of TPPP1/p25 and TPPP3/p20 (Vincze et al. 2006; Hlavanda et al. 2007) is in accordance with the suggestion that TPPP orthologues play a role in the stabilization of axonemal microtubules as well. However, it is an open question which of the vertebrate TPPP paralogue(s) retained the ciliary function and whether the paralogues really share in the roles: are TPPP1 and TPPP3 present in non-motile and motile cilia, respectively, as suggested by proteomic data (Orosz and Ovádi 2008)

TPPP1 and TPPP3 have very similar localization and functions in brain-stabilizing axonal microtubules (Vincze et al. 2006; Lehotzky et al. 2010; Tőkési et al. 2010). However, whilst TPPP1 seems to be brain-specific, and is enriched in glial and neuronal inclusions in synucleinopathies, TPPP3 has been found in other tissues as well (Staverosky et al. 2009; Zhou et al. 2010a, b) and does not seem to play any role in neurodegeneration. We showed (Hlavanda et al. 2007) that TPPP1s are phoshoproteins with potential regulatory role of phosphorylation/dephosphorylation on their function, which is unique among TPPPs. The phosphorylation sites are located mostly on their unstructured N-terminal tail which is missing in the other members of the protein family, including invertebrates. These sites are much more conserved than the other amino acids of the tail (Hlavanda et al. 2007). Beside these, there is an additional, absolutely conserved, site (SPT) which is phosphorylated in vivo near the beginning of the third exon, just before the microtubule-binding sequence, and can be found exclusively and without exception in TPPP1s.

At this moment, we do know almost nothing about the possible special function of either the TPPP2 or the TPPP4 subfamilies. Human TPPP2/p18 was found at the mRNA level in foetal but not in adult brain (Zhang et al., 2002). It was expressed also in the liver and pancreas and had a moderate expression level in the heart, skeletal muscle, and kidney. Importantly, it has much less affinity for microtubules than TPPP1 and TPPP3, at least in vitro (Vincze et al. 2006). There are data about the abundance of this paralogue in germ-line-specific tissues. Its occurrence, according to the data available, is restricted or biased towards testis in human, mouse, dog and chicken (http://www.ncbi.nlm.nih.gov/unigene?term=tppp2), which is in accordance with its faster evolution. The teleost fish-specific new TPPPs have not been isolated; thus, their function is unknown. Purifying these proteins is needed to get comprehensive information about the whole TPPP family.

Conclusions

TPPP1, a microtubule-stabilizing protein, may have important role in various neurodegenerative diseases. Until now, two further TPPP paralogues have been known in vertebrates. A scenario of two-fold duplication of the TPPP gene, as the part of the two-round genome duplication that occurred in the vertebrate lineage, is proposed. In this article, I show that TPPP1 and TPPP3, but not TPPP2, are involved in paralogons, on Hsa5 and Hsa16 human chromosomes, respectively. The locations of these paralogons follow those of other 1R/2R paralogons not only in human but in all vertebrates. On the contrary, only mammalian and lizard Tppp2s preserved their genomic locations. I suggest that the single non-vertebrate tppp gene diversified into tppp1 and the precursor of tppp2/tppp3 in the first round of whole-genome duplication. The existence of a fish-specific fourth paralogue, tppp4, has been raised, but it is not supported by synteny analysis. Alternatively, the new group can be considered as the fish orthologue of TPPP2. The case that the new group is the consequence of the teleost fish-specific whole-genome duplication (3R) cannot be excluded. Functional studies using purified proteins are needed to get comprehensive information about the whole TPPP family.