Introduction

Amino acids and auxins function as nitrogen sources, neurotransmitters, and hormones and are essential for the growth and development of multicellular organisms (Bennett et al. 1996; Clark and Amara 2005; Tegeder and Ward 2012). The amino acid/auxin permeases (AAAP) are enzymes that mediate the transport of a broad range of amino acids or auxins into and out of cells (Saier et al. 2009). Therefore, studying the AAAP gene family helps us to understand the molecular mechanisms for transferring and distributing amino acids and auxins in organisms. Previous studies have shown that there is a low sequence similarity among the AAAP transporters, but they all recognize certain amino acids or closely related compounds and have 10∼12 predicted trans-membrane (TM) α-helical spanners (Young et al. 1999; Chang and Bush 1997). Additionally, AAAP proteins with highly divergent sequences have various substrate specificities, catalyzing the symport of auxin (indole-3-acetic acid), single amino acids, or multiple amino acids across lipid membranes (Young et al. 1999; Saier et al. 2009; Tegeder et al. 2007).

Since the first mammalian AAAP transporter, named the vesicular GABA transporter (VGAT), was identified, three different subfamilies of AAAP transporters have been identified in mammals, including the VGAT subfamily (McIntire et al. 1997; Sagné et al. 1997), the system A/N subfamily (Chaudhry et al. 1999; Sugawara et al. 2000), and the proton/amino acid transporter (PAT) subfamily (Sagné et al. 2001; Boll et al. 2002). In plants, some Arabidopsis amino acid transporters were identified by complementation of yeast mutants defective in the uptake of amino acids (Chen and Bush 1997; Fischer et al. 1995). Amino acid transporters from other yeast and plant species (Mai and Lipp 1994; Tegeder and Rentsch 2010; Lu et al. 2012) were also isolated and functionally characterized. In the plant, members of the AAAP gene family were divided into different subfamilies, including the amino acid permease (AAP) (Frommer et al. 1993; Fischer et al. 1995, 2002; Okumoto et al. 2002; Lee et al. 2007), the lysine–histidine transporter (LHT) (Chen and Bush 1997; Hirner et al. 2006), auxin transporter-like protein (AUX1) (Bennett et al. 1996; Young et al. 1999), and the proline transporter (ProT) (Breitkreuz et al. 1999; Grallath et al. 2005). A series of studies strongly suggested that the AAAP gene family participates in the regulation of vital biological processes exclusive to plants (Frommer et al. 1993, 1995; Hsu et al. 1993; Rentsch et al. 1996; Grallath et al. 2005; Lee et al. 2007; Tegeder and Ward 2012). Therefore, a phylogenetic analysis of this gene family in representative species of different taxonomic groups is pertinent. Recent analysis of the Arabidopsis genome revealed that at least 43 putative amino acid carriers were present at the plasma membrane and tonoplast (Ward 2001; Wipf et al. 2002).

Brassica rapa (2n = 2x = 20, AA genome) is a diploid species that belongs to the Cruciferae family and is one of the major oilseed crops cultivated for raw materials for vegetable oil and animal feed protein in many areas. B. rapa has a variety of the advantages over oilseed rape (Brassica napus L., 2n = 4x = 38), including high yield potential, early maturity, and stable male sterile lines (Niemelä et al. 2006). In addition, B. rapa is a good genetic model that allows us to investigate duplicated gene fates, gene origin and expansion, gene dosage effects, and gene rearrangement after paleopolyploidizations (Cheng et al. 2011, 2012; Tang et al. 2012). Ideally, one locus in the Arabidopsis genome is orthologous to three counterparts in the triplicated B. rapa genome due to a whole genome triplication event approximately 11∼12 MYA (Lysak et al. 2007; Mun et al. 2009). Recently, the draft genome of B. rapa offered an opportunity to investigate the AAAP gene family in this species (Wang et al. 2011). Here, we first computationally identified the putative AAAP genes in B. rapa. These genes were further classified according to a comparative phylogenetic relationship with Arabidopsis orthologs. The chromosomal locations and duplication patterns of these genes were also studied. Finally, we examined selective signatures and functional divergence of these genes. The results provide new insights into the evolutionary history of this family, and useful clues for further functional analysis of each group of AAAP proteins in B. rapa.

Materials and Methods

Retrieval and Identification of Putative AAAP Genes

First, the known AAAP genes in Arabidopsis thaliana were downloaded from the TAIR database (http://www.arabidopsis.org/browse/genefamily/AAAP.jsp). We used these genes as queries, and retrieved the BRAD database (B. rapa ssp. pekinensis cv. Chiifu genome V1.0, http://Brassicadb.org) via the Basic Local Alignment Search Tool (BLAST) tool to identify AAAP candidate genes in B. rapa. Second, the HMM profile of the Aa_trans domain (PF01490) from the Pfam database (http://pfam.sanger.ac.uk/search) was reconstructed using HMMER 2.3.2 software (Eddy 1998), which was also used to search the BRAD database for AAAP candidate genes. Third, we took the AAAP candidate genes identified by the above methods as queries to further search the BRAD and National Center for Biotechnology Information databases. A collection of putative AAAP genes was obtained, and the redundant sequences were removed. Finally, the Pfam tool (Punta et al. 2012) was used to screen the genes for Aa_trans domains (PF01490) to confirm the identity of the genes as AAAP family members. If there was more than one allele at the same locus, only one allele was chosen as representative.

Chromosomal Mapping and Gene Expansion Patterns

The predicted AAAP genes were selected as queries, and BLASTN was used to search the BRAD database to identify their corresponding genomic position. Subsequently, the AAAP genes were manually mapped on chromosomes. Gene amplification patterns include two major molecular mechanisms, including segmental and tandem duplication. To elucidate the expansion patterns of this gene family, duplicate genes, generated by segmental duplication, were retrieved from the PGDD database for B. rapa and Arabidopsis. Furthermore, if two or more members of the AAAP gene family were located on the same region with a sequence length of less than 50 kb, they were considered tandem duplication genes (Riechmann et al. 2000).

Phylogenetic Analysis and Sequence Features

We aligned the full-length sequences of the AAAP proteins from Arabidopsis and B. rapa using Clustal X with default parameters (Thompson et al. 1997). The phylogenetic tree of AAAP genes from B. rapa and Arabidopsis was constructed using the neighbor-joining method (Saitou and Nei 1987) with p distance and pairwise deletion options. The distinct layouts of the phylogenetic trees were viewed using MEGA (Tamura et al. 2007) and iTOL (Letunic and Bork 2011). Gene structures were illustrated using the Gene Structure Display Server (GSDS; Guo et al. 2007) by a comparison of the corresponding coding and genomic sequences of the B. rapa AAAP genes. MEME version 2.2 (Bailey et al. 2006) was used to identify the conserved motif patterns with a set of parameters as follows: number of repetitions—any, maximum number of motifs—20, and optimum motif width set to 6∼200.

Tests for Selective Pressure and Functional Divergence

To improve the accuracy and power of selective pressure analysis, raw sequences were filtered according to the following criteria: (1) 83 BrAAAP protein sequences were clustered using BLASTCLUST (http://toolkit.tuebingen.mpg.de/blastclust/) with 90 % coverage and 30 % identity. Only clusters containing four or more sequences were included in further analysis. (2) Multiple codon alignments of 42 filtered BrAAAP genes were generated using PAL2NAL (Suyama et al. 2006). (3) Gaps and highly divergent regions were automatically removed.

To detect the selective pressures acting on each amino acid site of AAAP proteins, two codon-based likelihood methods, including site and branch site models (Yang and Nielsen 2002; Zhang et al. 2005; Yang 2007), were carried out using the CODEML package of PAML version 4.0. For site models, M0 (one ratio), M1a (nearly neutral), M2a (positive selection), M3 (discrete), M7 (beta), and M8 (beta and ω) were used to calculate the related parameters and likelihood scores, and then likelihood ratio test (LRT) was performed by a comparison of M0 with M3, M1a with M2a, and M7 with M8. Finally, Bayes empirical Bayes (BEB) was used to infer posterior probabilities for all of amino acid sites under the M3, M2a, and M8 models. For the branch site model, foreground clades were labeled, and positive section sites at these clades were identified by test 2. The LRT was performed in test 2 by the comparison between the null model MA1 (model = 2, NSsites = 2, ω = 1) and the alternative model MA (model = 2, NSsites = 2, estimated ω; Yang 2007). The result of LRT was applied to measure the statistical significance and the BEB method was used to infer posterior probabilities.

Functional divergence between different groups of the AAAP protein family in B. rapa was identified by DIVERGE2 (Gu 1999). The coefficient of type I functional divergence (θ I ) was estimated (Gu 1999; Gu and Vander-Velden 2002). The fact that the values of θ I were significantly greater than 0 revealed that a site-specific evolutionary rate shift existed between groups (Gu 2003).

Prediction of Trans-Membrane Domains and Protein Structures

TMPred (http://www.ch.embnet.org/software/TMPRED_form.html) was used to predict trans-membrane domains of BrAAAP1 proteins that were illustrated by TOPO2 (http://www.sacs.ucsf.edu/TOPO2/). The homology models for BrAAAP2 and BrAAAP11 proteins were automatically generated by the SWISS-MODEL web server (Arnold et al. 2006), and their three-dimensional structures were visualized by PDBpaint (Fournier and Andrade-Navarro 2011). The critical amino acid sites with functional divergence between groups and under positive selection were mapped on the representative TM and protein structures.

Results

Identification and Phylogenetic Relationships of AAAP Genes in Arabidopsis and B. rapa

The highly conserved Aa_trans domain allows us to easily identify all members of the AAAP family of proteins. Using the combined methods of BLAST, HMMER, and Synteny mapping, a total of 83 AAAP genes were identified from the sequenced B. rapa genome. All AAAP proteins in B. rapa except for six (BrAAAP8/43/47/61/69/78) contained at least one Aa_trans domain. The proteins without an Aa_trans domain were also considered members of the AAAP family based mainly on a high degree of collinearity between them and well-known AAAP genes. Detailed information of all genes is listed in Electronic Supplementary Material (ESM) Table S1.

To investigate the phylogenetic relationships of AAAP genes between B. rapa and Arabidopsis, a phylogenetic tree of the AAAP genes was constructed using the neighbor-joining method with the p distance and pairwise deletion options (Fig. 1). Based on the phylogenetic analysis, the typical relationships of AAAP genes between Arabidopsis and B. rapa can be classified into three evolutionary relationships, 1:3, 1:2, and 1:1 orthologous relationships. Of the 83 AAAP genes identified in B. rapa, 30 (36.1 %), 28 (33.7 %), and 11 (13.3 %) genes are 1:3, 1:2, and 1:1 orthologs, respectively. Additionally, the orthologous counterparts of BrAAAP22/81, BrAAAP5/59, BrAAAP30/72, and BrAAAP65 were absent in Arabidopsis genome, suggesting that their orthologous relationships to Arabidopsis counterparts have been broken by rapid evolution. BrAAAP26 is orthologous to two Arabidopsis loci (AT3G09330 and AT3G09340) that are tandem duplicated within a 5.7 kb region of chromosome 3.

Fig. 1
figure 1

Phylogenetic relationships of AAAP family proteins in Arabidopsis and B. rapa. The circular tree based on the full-length sequences of proteins was viewed using iTOL. Blue, red, and green clades indicate 1:3, 1:2, and 1:1 orthologous relationships between Arabidopsis and AAAP proteins in B. rapa. Each of the shape–color combinations represent one independent domain. The red vertical hexagon is representative of the Aa_trans (PF01490) domain. The remnant shape–color combinations are representative of 27 distinct domains

Gene Structures and Conserved Motifs of AAAP Genes in B. rapa

To clarify the sequence–phylogeny relationships between the AAAP genes in B. rapa, a phylogenetic tree was constructed using the neighbor-joining method (Saitou and Nei 1987) implemented in MEGA software (Tamura et al. 2007). Additionally, the gene structures and conserved motifs were identified using the GSDS (Guo et al. 2007) and MEME (Bailey et al. 2006) tools and were illustrated on the right of the phylogenetic tree (Fig. 2).

Fig. 2
figure 2

Phylogenetic relationships and sequence features of AAAP genes in B. rapa. The phylogenetic tree was constructed based on the full-length sequences of 83 AAAP proteins. Black diamond highlight different clades on the left. The rectangles with different numbers and colors demonstrate distinct conserved motifs in the middle. On the right, exons are represented by green boxes; introns are represented by black lines; UTR regions are represented by blue thick lines; 0, 1, and 2 indicate distinct intron phases

Based on our phylogenetic analysis, the AAAP genes in B. rapa, could be divided into six major clades: A, B, C, D, E, and F (Fig. 2). Analyses of gene structures showed that the coding sequences of all BrAAAP genes were disrupted by introns except for BrAAAP13/54/57 in clade C and BrAAAP25/52 in clade D. According to the number of introns, 83 BrAAAP genes were divided into three groups with a range of 1–3, 4–7, and 8–11 introns. We found that 52 (62.7 %) genes had between four and seven introns, indicating that the most common structures of AAAP genes in B. rapa could be found in each of the clades in the phylogenetic tree. Of BrAAAP genes with between one and three introns, BrAAAP16 in clade D and BrAAAP61 in clade E had one intron; BrAAAP62 in clade A, BrAAAP9/14/27/46/51 in clade D and BrAAAP8/47/69/78 in clade E had two introns; BrAAAP5/73 had a three-intron structure that was specific to clade C. For BrAAAP genes with between 8 and 11 introns, BrAAAP6 in clade A and BrAAAP35 in clade F had 8 introns; BrAAAP18/26/37/71 in clade D and BrAAAP20 in clade E had 9 introns; BrAAAP23/34/38/80 had a 10-intron structure that was specific to clade D; and BrAAAP21 had an 11-intron structure that was specific to clade F.

The number, order, and position of motifs within BrAAAP proteins were graphically presented (Fig. 2). Motif 4/8/10, motif 6/5/3/10, motif 6/11/10, motif 6/7/10, motif 6/1/2, and motif 9/8/10 constituted the most common patterns of motif architectures in clade A, B, C, D, E, and F, respectively. In clade A, motif 4 was absent in BrAAAP4/62; motif 10 was also absent in BrAAAP1. However, motif 6 was inserted into BrAAAP11, and another motif 10 was observed in BrAAAP53. All proteins in clade B had the same motif organization pattern. In clade C, motif 6 was added in BrAAAP15/50 and motif 10 was deleted in BrAAAP54. A gain of motif 5 in BrAAAP71 and motif 11 in BrAAAP27 was found, but a loss of motif 10 in BrAAAP41 and motif 6 in BrAAAP52 was also observed in clade D. BrAAAP9 had a distinct motif pattern with motif 4 and 7. Interestingly, BrAAAP47/61 in clade E did not contain any motifs, and BrAAAP8/69/78 contained only motif 4, but BrAAAP77 had a distinct motif pattern with motif 1 and 10. In clade F, BrAAAP7 contained motif 6/8, BrAAAP32/64 contained motif 6/8/10, BrAAAP21/30/39/72 contained motif 9/8/10, and BrAAAP35/40 contained motif 9/8/10/6.

Physical Mapping and Duplication Patterns of AAAP Genes in B. rapa

Physical mapping of AAAP genes showed that ∼81 BrAAAP1 genes were unevenly distributed on 10 chromosomes in B. rapa, while the chromosomal positions of the BrAAAP82 and BrAAAP83 genes were still unknown because they were mapped to scaffolds (Scaffold000123 and Scaffold000217) that were not associated with a specific B. rapa chromosome. The number of AAAP genes per chromosome ranged from 4 to 13. Gene mapping revealed that 4, 13, 10, 8, 10, 5, 5, 7, 12, and 5 AAAP genes mapped to chromosome A01, A02, A03, A04, A05, A06, A07, A08, A09, and A10, respectively.

Based on the phylogenetic results (Fig. 1), 27 pairs of duplicated co-paralogs were identified in BrAAAP genes. Twenty protein-coding genes flanking each pair of duplicated co-paralogs were retrieved from the B. rapa sequencing database using the PGDD tool. Novel co-paralogs were found in protein-coding genes near each pair of 23 duplicated co-paralogs (Fig. 3), suggesting that these 23 duplicated co-paralogs could be segmental duplicates. In addition, four pairs of B. rapa AAAP genes (BrAAAP21/22, BrAAAP24/25, BrAAAP39/40 and BrAAAP75/76) occurred within a distance of 50 kb, and thus they were considered to be the results of tandem duplications according to the criterion reported by Riechmann et al. (2000). Other duplicated co-paralogs might be explained by different duplication events, such as random duplication and/or insertion events.

Fig. 3
figure 3

Chromosomal localization and expansion pattern of B. rapa AAAP genes. Eighty-one AAAP genes were mapped to 10 chromosomes in B. rapa. Four tandem-duplicated gene pairs are indicated by rectangles. Twenty-three segmental duplicated gene pairs are marked by a broad range of symbols. The start and stop positions of the chromosomes are abbreviated as STT and STP, respectively

Testing for Selection in AAAP Proteins

Site-specific models assumed variable selective pressures among sites. M0 (one ratio, ω), M1a (nearly neutral, ω 0 < 1 and ω 1 = 1), and M7 (beta, p, q) were the null models that did not allow for any codons with ω > 1. However, when compared with three null models above, M2a (positive selection; ω 0 < 1, ω 1 = 1, and ω 2 > 1), M3 (discrete; ω 0, ω 1, ω 2), and M8 (beta and ω; p, q, ωs > 1) were the alternative models that were added to an extra class of sites with ω > 1. An LRT was performed by the comparison of M0 and M3, M1a and M2a, and M7 and M8. The integrative analysis of LRT and BEB were used for an investigation of positively selected sites. The results of the selective pressure analysis of AAAP proteins are listed in Table 1. For the M1a/M2a and M7/M8 comparison, LRT analysis revealed no sites under positive selection. According to the results of the LRT for M0 and M3 (2Δl = 771.155, df = 4, P < 0.01), we found amino acid sites under heterogeneous selection pressures, but the LRT of the M0/M3 comparison was not adopted in most cases. This indicated that AAAP genes were under strong purifying selection across amino acid sites. Additionally, no site under positive selection was identified via Bayes empirical Bayes inference using a probability criterion of 95 %.

Table 1 Tests for selection among codons of B. rapa AAAP proteins using site models included in Codeml program implemented in PAML 4

Some of amino acid sites on the AAAP proteins from the specific clades in the phylogenetic tree (ESM Fig. S1) might be driven by positive selection. Therefore, we further detected this type of amino acid site for AAAP proteins using the branch site model. Model A (MA, model = 2, NSsites = 2, estimated ω) was the alternative model, and its corresponding null model (MA1, model = 2, NSsites = 2, ω = 1) fixed ω 2 = 1 for the foreground branch (Yang and Nielsen 2002; Zhang et al. 2005). The LRT of MA and MA1 was used to identify some positively selected sites of specific clades. The LRT statistic of comparison between MA and MA1 in clade 1c (2Δl = 4.26, df = 1, P < 0.05), clade 3a (2Δl = 7.84, df = 1, P < 0.01), and clade 2b (2Δl = 10.24, df = 1, P < 0.01) suggested that MA was significantly better than MA1 for these specific clades. Furthermore, regardless of the probability, the BEB analysis showed that there were three positively selected sites in clade 1c (46Y, 201L), one positively selected site in clade 2b (133 W) and one positively selected site in clade 3a (39 V). However, only one positively selected site (133 W) was identified from eight AAAP proteins in clade 2b at significant level (Table 2).

Table 2 Parameters estimation and likelihood ratio tests for the branch-site models

Analysis of Functional Divergence

To identify key amino acids involved in functional divergences between protein subfamilies, AAAP proteins in B. rapa were divided into five groups (ESM Fig. S2). Pairwise comparisons between AAAP groups, including AAAP I, AAAP II, AAAP III, AAAP IV, and AAAP V, were carried out using the DIVERGE program (Gu and Vander-Velden 2002). The coefficients of type I functional divergence with standard errors, significance levels, and critical amino acids are shown in Table 3. Type I coefficients of functional divergence in comparisons between different groups, apart from the AAAP II/IV, AAAP II/V, AAAP III/IV, and AAAP IV/V comparisons were greater than zero at a significant level (P < 0.05). Furthermore, the site-specific profile generated by posterior analysis was used to predict the critical amino acid residues with Q k  > 0.85 as a cutoff. Although θ I of functional divergence in the AAAP I/IV, AAAP I/V, and AAAP II/III comparisons were significantly greater than zero, no amino acid sites with Q k  > 0.85 were identified. However, in the AAAP I/II, AAAP I/III, and AAAP III/IV comparisons, we detected 37, 200 and 1 amino acid positions using Q k  > 0.85 as a cutoff, respectively (Fig. 4). Take BrAAAP1 as an example. We identified critical amino acid positions of the AAAP I/II comparison using Q k  > 0.95 as a cutoff, which included six amino acid sites: Gly76, Leu212, Ser264, Phe345, Ser357, and Val417. Moreover, we also identified 10 amino acid sites, including Gly76, Ala103, Trp110, Leu159, Leu212, Trp226, Ala302, Ser357, Ser392, and Tyr409, which were critical sites determining functional divergence (Table 3).

Table 3 Estimation of the coefficient of type I functional divergence (θ I ) from pairwise comparisons between AAAP groups in B. rapa
Fig. 4
figure 4

The estimation of type I functional divergence between subgroups of AAAP proteins in B. rapa. The positions with gaps involved in each pairwise comparison were retained. The line indicates a posterior probability cutoff of 0.85. a The distribution of the posterior probability values of each AAAP comparison is plotted side by side, showing the minimum, the 1 % percentile, the 25 % percentile, the median, the mean, the 75 % percentile, the 99 % percentile and the maximum values. b and c Posterior probability values in AAAP I/II and AAAP I/III comparisons, respectively

Mapping of Critical Amino Acids on the Representative TM and Three-Dimensional Protein Structures

To gain insight into the potential effect of positive selection and type I functional divergence sites on AAAP protein structures and/or functions, we located the sites of selection on the representative TM and three-dimensional structures. The results showed that most of the type I functional divergence sites with posterior probability values >95 % were found in the TM helix. For example, 83.7 % of type I functional divergence sites between AAAP I and AAAP III existed in the TM helix, and this type of site between AAAP I and AAAP II was found in the TM region (Fig. 5a). BrAAAP2 and BrAAAP11 were representative of the rapidly evolutionary clade 1c and 2b of the phylogenetic tree (ESM Fig. S1), and their protein structures were constructed using SWISS-MODEL based the template protein (PDB ID: 3ob6). Four positively selected sites were found via a branch site model and mapped on the corresponding protein structures. The positively selected site Tyr257 in clade 2b was mapped to the coil structure inside the cytoplasm of the BrAAAP11 protein (Fig. 5b), but the positively selected sites Val51 and Leu162 in clade 1c were mapped to the α-helix structure of the BrAAAP2 protein (Fig. 5c). In addition, Cys263 in clade 3a (P = 0.737) was found in the region inside the cytoplasm of the BrAAAP2 protein (Fig. 5c).

Fig. 5
figure 5

Localization of critical amino acid sites on the representative TM and 3D structures of AAAP proteins in B. rapa. Type I functional divergence sites and positive selection sites mapped on BrAAAP1 TM structures (a) are labeled by red circles and blue diamonds, respectively. For BrAAAP11 (b) and BrAAAP2 (c) proteins, white, red, and blue regions show TM–helix structures, inside residues, and outside residues, respectively, and purple amino acid sites included in the yellow circle were under positive selection

Discussion

B. rapa, one of the major vegetable and oil crops in Asia and Europe, has a simple diploid genome that has been fully sequenced and is a potential model species for the Brassicaceae family. Extensive studies have shown that the Brassica and Arabidopsis species shared three whole-genome duplication events, named At-α (3R), At-β (2R), and At-ϒ (1R) (Lagercrantz and Lydiate 1996; Lan et al. 2000; Lukens et al. 2003; Babula et al. 2003; Lysak et al. 2005; Yang et al. 2006; Cheung et al. 2009; Beilstein et al. 2010; Cheng et al. 2011). Moreover, after the split of the Brassica and Arabidopsis species approximately 10–20 Ma ago (Yang et al. 1999), the Brassica species experienced an additional whole-genome triplication named 4R (Lan et al. 2000; Babula et al. 2003; Lukens et al. 2003; Mun et al. 2009; Tang et al. 2012; Cheng et al. 2012). Therefore, it was clear that lineage-specific expansion occurred in the B. rapa genome (Johnston et al. 2005), and theoretically, three paralogous genes in B. rapa should be co-orthologous to one Arabidopsis counterpart. In our study, we identified 83 members of the AAAP gene family in the B. rapa genome. They were unevenly distributed on 10 chromosomes. The nearly twofold increase of AAAP gene members in B. rapa might result from the lineage-specific expansion described above. Based on the circular phylogenetic tree, it was clear that there were three types of phylogenetic relationships between Arabidopsis and B. rapa AAAP orthologous genes, including 1:3, 1:2, and 1:1 orthologous relationships (Fig. 1). These complex relationships might be related to a preference for gene retention and loss after triplication caused by chromosomal rearrangements (Yang et al. 2005, 2006; Cheung et al. 2009; Beilstein et al. 2010; Cheng et al. 2011). It is well-known that three duplication patterns, including polyploidy (whole genome duplication), tandem duplication, and transposition events were responsible for the lineage-specific expansion (Kong et al. 2007). To investigate the evolutionary relationships of the AAAP genes in B. rapa, we further analyzed the syntenic relationships of paralog pairs at the terminal nodes of the phylogenetic tree using the PGDD tool. The results revealed that 23 and 4 paralog pairs resulted from segmental and tandem duplications, respectively, suggesting that segmental and tandem duplication might have contributed greatly to the expansion of this family in B. rapa.

Correct classification was important for an in-depth understanding of the structural, functional, and evolutionary information about the AAAP genes from B. rapa. Unlike the Arabidopsis AAAP gene family that was divided into four major clades, the B. rapa AAAP genes were divided into six different clades. Comparative analysis showed that AAAP genes in clade A, B, E, and F from the phylogenetic tree (Fig. 2) corresponded to the annotated LHT, AAP, AUX1, and ProT genes in Arabidopsis, respectively (Rentsch et al. 2007; Tegeder and Rentsch 2010; Tegeder and Ward 2012). However, AAAP genes in clades C and D corresponded to un-annotated AAAP genes in Arabidopsis, suggesting that little was known about their functions. Homology analysis demonstrated that all members of the AAAP family shared approximately 8 % sequence similarity, and members of the AAAP family in clade A, B, C, D, E, and F sequence shared approximately 31, 48, 40, 22, 26, and 28 % sequence similarities at protein level, respectively. These results indicated that the AAAP gene family might have multiple cellular functions and be involved in several biological processes (Frommer et al. 1993, 1995; Rentsch et al. 1996). According to the orthologous relationship between BrAAAP proteins in clade B and AtAAP1∼8, we predicted that the motif 6/5/3/10 was closely related to the function of generally transporting neutral and acidic amino acids with moderate affinity (Okumoto et al. 2002). BrAAAP proteins in clade E corresponding to AUX1 did not contain motif 10 shared by most AAAP proteins in B. rapa. This finding implied that the presence of motif 10 could depress the activity of auxin transporters (Rentsch et al. 1996). Six BrAAAP genes (BrAAAP8/43/47/61/69/78) shared high collinearity with well-known AAAP genes, but their proteins had no Aa_trans domain. This implied that there was domain loss during the evolution of the BrAAAP gene family.

To detect selective pressure on the AAAP family, the site-specific model in PAML was used to evaluate positive evolution in the phylogenetic tree. Positively selected sites were found using the BEB method. This suggested that AAAP proteins evolved under purifying selection. The branch site model has advantages over the site model because it can detect positively selected sites that occurred in certain positions in specific clades (Yang and Nielsen 2002; Zhang et al. 2005). Here, we identified the site (Trp241), shared by eight proteins in clade 2b, as being under positive selection. These proteins were orthologous to LHT based on phylogenetic relationships. Therefore, we predicted that LHT had undergone rapid evolution, which was associated with an increase in the number of LHT gene and was closely related to additional physiological functions, except for the function of a lysine and histidine selective transporter (Foster et al. 2008; Tegeder and Rentsch 2010; Tegeder and Ward 2012). Moreover, some sites of AAAP proteins (e.g., BrAAAP2, BrAAAP32, and BrAAAP64) had a value of ω >1 (ESM Fig. S1). These AAAP proteins were orthologous to Arabidopsis ProT proteins, and thus we predicted that ProT proteins in B. rapa were under relaxed functional constraints (Rentsch et al. 1996).

LRT analysis of functional divergence showed that type I functional divergence occurred in six group comparisons (Fig. 4 and Table 3). Interestingly, the members of the AAAP I group, including BrAAAP30, BrAAAP32, BrAAAP39, BrAAAP40, BrAAAP64, and BrAAAP72, were highly divergent from the other groups (ESM Fig. S2). These genes in the AAAP I group were orthologous to ProT genes in Arabidopsis based on phylogenetic relationships. Previous studies showed that the ProT proteins functioned mainly as low-affinity transporters for glycine, betaine, proline, and quarternary ammonium compounds (Rentsch et al. 1996, 2007; Young et al. 1999; Breitkreuz et al. 1999; Grallath et al. 2005; Tegeder and Rentsch 2010). Therefore, we predicted that ProT genes had undergone rapid evolution and developed distinct functions. This observation was consistent with the fact that many amino acid sites under positive selection or relaxed purifying selection were identified in clade 3. Collectively, these data provided new insights into the evolutionary history and diverse functions of the AAAP gene family.

Conclusion

The AAAP gene family members were divided into six clades. The AAAP genes were unevenly distributed on 10 chromosomes, and their diverse sequence features provided potential evidence for diversifying functions. Syntenic analysis suggested that segmental and tandem duplications contributed to the expansion of the AAAP gene family in the B. rapa genome. Most of BrAAAP proteins evolved under purifying selection, but some sites of the AAAP proteins from clades 1c, 2b, and 3a were subject to positive or relaxed purifying selection. Interestingly, a total of 238 type I functional divergence sites (Q k  > 0.85) were identified in the AAAP I/II, AAAP I/III, and AAAP III/IV comparisons, implying that these sites were responsible for functional divergence of AAAP proteins in B. rapa.