Introduction

Wild species present an especially rich source of desirable genetic diversity, providing an excellent model system for genetic studies (Tomato Genome Consortium 2012; Steinhauser et al. 2010). Wild tomato species accumulating two- to threefold more fruit sugars than cultivars are valuable as a source of high-sugar loci that can be used in breeding programs for broadening the genetic base of the current cultivars and as a model to study sucrose-hydrolyzing enzymes and their roles in fruit development (reviewed by Beckles 2012).

Higher plants have two ways of phloem unloading, apoplastic and simplastic. Sucrose transported to sink tissues from photosynthetic organs via simplast or apoplast is cleaved either by sucrose synthases in the presence of uridine diphosphate (sucrose + UDP ↔ fructose + UDP-glucose) or by invertases (family 32 glycoside hydrolase β-d-fructofuranosidase, EC 3.2.1.26) which catalyze irreversible sucrose hydrolysis (sucrose + H2O → d-glucose + d-fructose), producing two times more hexoses than sucrose synthases (Koch 2004). The hexose-triggered signals can affect the expression of many genes, including those encoding sucrose synthases and invertases (Koch 2004).

Based on biochemical properties, invertases can be classified into acid and neutral (or alkaline) isoforms, which exist in different subcellular compartments (Sturm 1996, 1999). Among them, acid invertases are known to play an important role in plant development, stress response, and plant-pathogen interactions, most likely by controlling sugar composition and metabolic fluxes (Obata-Sasamoto and Thorpe 1983; Tang et al. 1999; Shimon-Kerner et al. 2000; Maiti et al. 2011; Proels and Hückelhoven 2014). Acid invertases are considered to be the key enzymes in sucrose unloading and the maintenance of source/sink balance within the plant (Godt and Roitsch 1997; Tang et al. 1999) by supplying carbohydrates to sink tissues (Sturm 1996).

Acid invertases are divided into two separate groups comprising cell wall ionically bound (CWIN) and soluble vacuolar (VIN) isoenzymes, respectively (Sturm 1999; reviewed by Koch 2004). VINs catalyze the primary pathway of sucrose cleavage, playing a key role during the initiation and expansion of diverse storage organs such as fruits and tubers (Koch 2004; Wang et al. 2015). Extracellular CWINs control sucrose partitioning to the cell via the apoplastic route during continued sink initiation and expansion, especially when intercellular plasmodesmatal connections are absent, as is the case in developing seeds and pollen grains (Koch 2004).

In tomato, the CWIN subfamily comprises four enzymes encoded by genes located on chromosomes 9 (LIN5 and LIN7) and 10 (LIN6 and LIN8) (Godt and Roitsch 1997; Fridman and Zamir 2003). The gene pairs are each arranged in a direct tandem and represent a segmental duplication between the two chromosomes. Interestingly, the genes encoding apoplastic invertase isoforms appear to be regulated in an organ-specific manner. Thus, the LIN genes of chromosome 9 are expressed in flowers and fruits, while those of chromosome 10 are primarily expressed in vegetative tissues (Fridman and Zamir 2003), suggesting distinct functions for each invertase in sink metabolism. Among the LIN family members, LIN7 has the highest expression and could only be detected in tomato large flower buds and flowers (Godt and Roitsch 1997). Histochemical analysis of β-glucuronidase (GUS) activity has revealed LIN7 expression in pollen and pollen tubes of corresponding transgenic plants (Proels et al. 2006). In wheat (Triticum aestivum), stress-induced male sterility is preceded by invertase activity breakdown (Dorion et al. 1996), confirming that extracellular invertases determine plant fertility by supplying carbohydrates to anthers (Godt and Roitsch 1997).

Many genes and cDNAs encoding CW invertases have been cloned from various plant species (Tymowska-Lalanne and Kreis 1998; Godt and Roitsch 1997; Ji et al. 2005). However, in tomatoes, LIN7 was isolated and characterized only in S. lycopersicum (сv. Heinz 1706 and Moneymaker) (Fridman and Zamir 2003; Proels et al. 2003). Although currently, 12 wild tomato species are distinguished in Solanum section Lycopersicon (Peralta and Spooner 2001), the LIN7 sequence was determined (but not characterized) only in one wild species, S. pennellii (genomic, HG975448, and mRNA, XM_015230964), while for another wild species S. chmielewskii, the LIN7 expression was analyzed (Zhang et al. 2013), but gene was not sequenced. No information on the LIN7 gene from other wild tomato species has been reported. In this study, we identified genomic sequences of LIN7 homologous genes in 9 of the 12 currently known wild tomato species, which differ considerably in physiology and mating systems (Peralta and Spooner 2001; Beckles 2012). In total, we examined one cultivated and 11 wild tomato accessions of Solanum section Lycopersicon for LIN7 genomic and protein sequence variability, and expression patterns. We also showed, for the first time, that the LIN7 gene could be used in phylogenetic analysis of the Solanum section Lycopersicon species. Our data should further understanding of possible links between CW invertase variability and physiological diversity of tomato species, suggesting that the diversity in the LIN7 sequence may represent the resource of valuable traits to improve existing cultivars (Rao and Hodgkin 2002; Gerszberg et al. 2015).

Materials and Methods

Plant Material and Growth Conditions

Seeds of wild tomato (Solanum section Lycopersicon) accessions (Table 1) obtained from the N. I. Vavilov Institute of Plant Genetic Resources (St.-Petersburg, Russia) were sterilized with 0.5% sodium hypochlorite, rinsed with water, and plated on MS (basal salt) medium (Duchefa Biochemie B.V., Haarlem, the Netherlands) with 2% (w/v) sucrose and 0.8% (w/v) agar (pH 5.6). Germinated tomato seedlings were grown under greenhouse conditions (16/8-h light-dark cycle, light intensity from 300 to 400 μmol m−2 s−1, 28 °C at daytime, 23 °C at night) until fruit ripening. During plant development, samples of roots, stems, leaves, buds (two types: small young and large mature), flowers, and fruits (at mature green and ripe stages) were individually collected, frozen in liquid nitrogen, homogenized, and stored at − 80 °C. Small young buds are closed buds 0.3–0.5 cm long (from the receptacle to the corolla tip; depending on tomato species) with green-colored petals of the same length as sepals. Large mature buds are closed buds 0.8–1.1 cm long (from the receptacle to the corolla tip; depending on tomato species) with yellow-colored petals and sepals that started to blossom. In red-fruited tomatoes, mature green fruit is the green fruit of a final (maximal) size and ripe stage corresponded to the red fruit of a final size. In green-fruited tomatoes, mature green and ripe stages corresponded to hard and soft fruits, respectively, of a final size (Tanksley 2004).

Table 1 List of tomato accessions used in this work

Identification, Structural Characterization, and Phylogeny of LIN7 Homologous Genes

Total genomic DNA was isolated from young leaves of individual plants using the ZR-96 Plant/Seed DNA kit (Zymo Research, Irvine, CA, USA) and used as a template for LIN7 gene amplification with gene-specific primers (Online Resource 1: Table S1) and LongAmp® Hot Start Taq DNA Polymerase (New England Biolabs, Ipswich, MA, USA). Thermal cycling conditions were as follows: initial denaturation for 10 min at 94 °C, 35 cycles of denaturation (30 s at 94 °C), annealing (30 s at a specific Ta listed in Online Resource 1: Table S1) and extension (4.5 min at 65 °C), and final extension for 10 min at 65 °C. PCR products of the expected size were isolated from agarose gels using the QIAEX® II Gel Extraction kit (QIAGEN, Hilden, Germany), cloned into the Escherichia coli plasmid pGEM®-T Easy (pGEM®-T Easy Vector System I, Promega, Madison, WI, USA), and sequenced on an ABI Prism 377 DNA Sequencer (Applied Biosystems, Waltham, MA, USA) using BigDye-terminator v3.1 chemistry and specific primers (Online Resource 1: Table S1).

DNA sequences were edited, translated, aligned, and analyzed for polymorphism and phylogenetic relationship using the MEGA 6.0 software (Tamura et al. 2013). Phylogenetic trees based on nucleotide and amino acid sequences were constructed using bootstrap analysis with 1000 replicates and three different methods: neighbor-joining, minimum evolution, and maximum likelihood.

Isoelectric point (pI) was calculated using Protein Calculator v3.4 (http://protcalc.sourceforge.net) which assumes that all residues have pK a values equivalent to that of the isolated amino acids. Possible impact of amino acid substitution or insertions/deletions (indels) on protein structure and function was analyzed using PROVEAN (protein variation effect analyzer) (Choi et al. 2012). To predict three-dimensional (3D) protein structures, we used PHYRE2 (protein homology/analogy recognition engine) which detects putative ligand binding sites and analyzes the effect of amino acid variants due to non-synonymous single nucleotide polymorphisms (SNPs) based on advanced remote homology detection methods (Kelley et al. 2015). Functional analysis of the predicted 3D structures was performed using ProFunc (Laskowski et al. 2005).

Expression Pattern Analysis of LIN7 Homologous Genes

Total RNA was extracted from individual samples of roots, leaves, young buds (<3 mm), mature buds (5–7 mm), opened flowers, mature green fruits, and ripe fruits using the RNeasy Plant Mini kit (QIAGEN, Hilden, Germany) and evaluated for concentration using a Qubit® Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and for quality by gel electrophoresis. First-strand cDNA was synthesized using the Reverse Transcription System (Promega, Madison, WI, USA) with an oligo-dT primer, and its concentration was determined using the Qubit® Fluorometer.

Gene-specific primers (Online Resource 1: Table S1) were designed to amplify partial coding sequences; forward and reverse primers were separated by at least one big intron in order to exclude the contamination with genomic DNA. Quantitative (q)RT-PCR was performed using the SYBR-Green and ROX RT-PCR mixture (Syntol, Moscow, Russia) and 2.5 ng of cDNA as a template. The PCR cycling program was as follows: 95 °C for 5 min, and 35 cycles of 95 °C for 15 s and 62 °C for 50 s. In each sample, relative LIN7 mRNA levels were estimated by normalizing the expression of LIN7 to that of the tomato CAC (SGN-U314153) gene for clatrin adaptor medium subunit (Expósito-Rodríguez et al. 2008). Considering that CAC alone was used in several studies as reference in qRT-PCR performed in fruit tissues at different developmental stages (for instance, Shima et al. 2014; Slugina et al. 2017) and that previous expression data used in the present study for comparison have been done with Actin gene as a reference (Miron et al. 2002), we used single CAC gene as a reference in our study. Statistical analysis of qRT-PCR data was performed using the GraphPad Prism program version 7.02 (San Diego, CA, USA; https://www.graphpad.com/scientific-software/prism/). Six values (three technical replicates of each of two biological replicates) were used for SD calculation. The error bars were generated based on mean with SD calculation. Significance of the qRT-PCR data within the same tissue between species was estimated by unequal variances Welch’s t test of the hypothesis that two populations have equal means; result was considered as significantly different if P value was < 0.05. Obtained results were additionally treated with Bonferroni’s correction: if any of the t tests in the list had p ≤ 0.05/number of t tests in the list, then the null hypothesis was rejected, i.e., the difference between samples was recognized as significant.

Results

Cloning and Sequencing of LIN7 Homologous Genes in Cultivated and Wild Tomato Accessions

The NCBI genome database contained LIN7 genomic sequences for S. lycopersicum cv. Heinz and S. tuberosum, which were used to design gene-specific primers (Online Resource 1: Table S1) for the amplification of full-length LIN7 invertase genes.

In total, 12 full-length LIN7 homologous genes were amplified from the analyzed tomato accessions (Table 1), cloned, sequenced with specific primers (Online Resource 1: Table S1), and deposited in NCBI GenBank (Table 2).

Table 2 Characteristics of newly identified tomato genes encoding LIN7 apoplastic invertase orthologues

All LIN7 homologous genes consisted of six exons and five introns (Fig. 1). Comparative analysis of LIN7 homologs revealed 746 SNPs; among them, 226 were localized in exons. The highest number of substitutions compared to the reference LIN7 gene of S. lycopersicum cv. Heinz was found in wild tomato accessions (1.44% in the red-fruited group, and 15.27% in the green-fruited group). The levels of the interspecific polymorphism were 21.9, 7.89, and > 30% for complete genomic sequences, exons, and introns, respectively (Online Resource 2: Fig. S1, S2). Introns contained a high number of indels; the longest indel (300 bp) was found in intron III and was likely a result of tandem duplication. LIN7 homologous genes of S. habrochaites and S. chmielewskii were shown to be the most diverse among the analyzed species.

Fig. 1
figure 1

Exon-intron structures of 12 LIN7 homologous genes in the genomes of different tomato species. Exons are indicated by numbered boxes; introns are shown as lines

Amino Acid Variability and 3D Structures of LIN7 Orthologs in Wild Tomato Species

The coding sequences of LIN7 genes were translated and aligned. All isolated LIN7 genes encoded proteins of 583 amino acids. In case of S. pimpinellifolium var. racemigerum, the analyzed individual plant was heterozygous and contained an allele with a premature stop codon at position 262. Paired comparison showed that the deduced sequence of S. lycopersicum cv. Silvestre recordo LIN7 shared 99% identity with those of S. lycopersicum var. humboldtii, S. pimpinellifolium var. racemigerum, S. cheesmaniae, S. galapagense, and S. arcanum and 98% identity with those of S. chmielewskii, S. chilense, S. corneliomulleri, S. peruvianum, S. peruvianum 3966, and S. habrochaites.

Putative LIN7 proteins demonstrated a wide range of calculated pI values (from 6.89 to 8.18, Table 2). LIN7 from S. chilense and S. pimpinellifolium var. racemigerum had almost neutral pI compared to other analyzed LIN7 proteins with basic pI.

Among SNPs detected in LIN7 genes, 46 exonic SNPs causing non-synonymous mutations were identified (Fig. 2; Online Resource 2: Fig. S3); among them, 39 were found in individual sequences, while 7 were common for all analyzed proteins. The highest number of SNPs was detected within exon III. The average protein identity of LIN7 orthologs was about 92%, and sequences from S. habrochaites, S. peruvianum 3966, and S. chilense were found to have the highest divergence.

Fig. 2
figure 2

Amino acid substitutions in the analyzed tomato LIN7 sequences compared to S. lycopersicum cv. Heinz CW invertase (NP_001234701.1). Red-colored residues are potentially deleterious according to PROVEAN analysis

The identified putative LIN7 invertases contained highly conserved sites (Fig. 3). Among them, we detected the fructosidase motif NDPNA (66–70) partially encoded by the shortest exon II, the RDP motif (191–193) highly conserved among the members of the glycosyl hydrolase family 32, and the cysteine catalytic site WECPD (246–250) important for proper enzyme conformation and catalytic activity (Goetz and Roitsch 1999).

Fig. 3
figure 3

Alignment of the N-terminal domain (1–255 aa) of the identified LIN7 orthologs from cultivated and wild tomato accessions. 1 S. lycopersicum cv. Heinz (NP_001234701.1), 2 S. lycopersicum cv. Silvestre recordo, 3 S. lycopersicum var. humboldtii, 4 S. pimpinellifolium var. racemigerum, 5 S. cheesmaniae, 6 S. galapagense, 7 S. chmielewskii, 8 S. chilense, 9 S. corneliomulleri, 10 S. peruvianum, 11 S. peruvianum 3966, 12 S. arcanum, 13 S. habrochaites. Active catalytic sites are underlined

Amino acid variability between LIN7 from S. lycopersicum cv. Heinz and red-fruited tomato accessions (S. lycopersicum cv. Silvestre recordo, S. lycopersicum var. humboldtii, S. pimpinellifolium var. racemigerum, S. cheesmaniae, and S. galapagense) was 2%, and for green-fruited wild tomato species (S. chmielewskii, S. chilense, S. corneliomulleri, S. peruvianum, S. peruvianum 3966, S. arcanum, and S. habrochaites), it reached 6.34%. Among SNPs found in the red-fruited tomato group, 1.25% were in exons; of them, 54.5% were non-synonymous and led to amino acid substitutions. In green-fruited tomato species, the number of exon-specific SNPs was 7.48%, but only 28.2% of them were non-synonymous.

The functional effect of amino acid substitutions in the analyzed tomato accessions was predicted using the PROVEAN program (Choi et al. 2012) (Fig. 2). Compared to S. lycopersicum cv. Heinz, red-fruited self-compatible tomato accessions had in total 2–3 amino acid substitutions, of which 0–2 were considered potentially deleterious. Green-fruited self-incompatible plants had 5–9 and 1–4 total and potentially deleterious substitutions, respectively, and a green-fruited self-compatible species S. chmielewskii had six and one total and potentially deleterious substitutions, respectively. We may assume that not all important sites specific to vacuolar invertases have been found up to date. Thus, identified LIN7 non-synonymous substitutions also may impact in LIN7 conformation and function.

Among non-synonymous substitutions, none of them (neither neutral one nor potentially deleterious one) were specifically common for all analyzed members of the self-incompatible (or self-compatible) cluster (Fig. 2). Moreover, in three different accessions of S. lycopersicum, there were no common substitutions in LIN7, as well as LIN7 of two S. peruvianum accessions did not contain any common substitution (except for neutral K555, which was common for the majority of analyzed tomatoes). Considering above, we cannot say that any of substitutions in LIN7 could be associated with the tomato species mating system.

Next, we modeled 3D structures of tomato LIN7 orthologs using Phyre2 (Kelley et al. 2015) based on closely matched crystal structures of glycoside hydrolase homologs (50–56% identity) available in the fold library (A. thaliana AtcwINV1 cell-wall invertase; PDB reference: cell-wall invertase 1, 2ac1, r2ac1sf; Verhaest et al. 2006). Overall, 90–91% (approx. 528 residues) of each analyzed sequence was modeled with 100.0% confidence using a single highest-scoring template. The overall predicted 3D model of LIN7 orthologs revealed the structure typical to plant CW acid invertases: the glycosyl hydrolase family 32 N-terminal domain folded in a five-bladed (I–V) β-propeller module (residues 57–375), while the C-terminal β-sandwich domain (residues 387–576) was formed by two antiparallel six-stranded β-sheets (Fig. 4). The main active sites of the LIN7 orthologs were located inside the β-propeller module, representing invertase conserved motifs, including blade I-specific NDPNA (66–70; invariant D residue), blade III-specific RDP (191–194; invariant D residue), and blade IV-specific WECPD (246–250; invariant E and C residues). Other found pockets representing putative active binding and catalytic sites (positions 64–67, 83, 91, 126–127, 278, 280–282, 314–318, 336, 338–339, 343, 347, 353–354, 437, 502, and 504–506) were conserved in the analyzed proteins, except a neutral substitution Y338N in all analyzed wild accessions and an N-terminal potentially deleterious substitution G315S in S. chilense compared to S. lycopersicum cv. Silvestre recordo.

Fig. 4
figure 4

Structural in silico analysis of S. lycopersicum cv. Silvestre recordo LIN7 invertase. a Tertiary protein structure modeled based on A. thaliana AtcwINV1 CW invertase as a template contains the N-terminal β-propeller domain (blades are numbered I–V, respectively) and the C-terminal β-sandwich domain. b Active sites (NDPNA, RDP, and WECPD) are shown as yellow-colored sticks. c Chain representation of the protein structure, including β-shifts and α-helixes. d Structural topology of S. lycopersicum cv. Silvestre recordo LIN7 invertase

Phylogenetic Analysis of Tomato LIN7 Acid Invertases

To better understand the evolutionary relationship among LIN7 invertase genes of wild and cultivated tomatoes, six phylogenetic trees based on genomic and protein sequences were constructed with orthologous potato invertase (NW_006238947.1) as an outgroup. The three different methods used (neighbor-joining, minimum evolution, and maximum likelihood) demonstrated the same topology for three gene-based dendrograms differing from protein-based trees in the green-fruited species cluster (Fig. 5; Online Resource 3: Fig. S4–S8). In all six trees, inside the cluster of orange/red-fruited tomatoes, S. galapagense grouped with S. pimpinellifolium var. racemigerum and S. cheesmaniae with S. lycopersicum var. humboldtii. Protein-based red-fruited cluster contained one more S. lycopersicum accession, as compared with gene-based trees, but it did not affect significantly the red-fruited cluster topology. Green-fruited S. chmielewskii took the transitional position between the red-fruited and green-fruited clusters. In green-fruited group of gene-based trees, S. chilense grouped to S. peruvianum 3966, and S. peruvianum to S. corneliomulleri and S. arcanum, whereas in protein-based trees, S. corneliomulleri was at the base of cluster, and two S. peruvianum accessions, as well as S. chilense and S. arcanum, grouped together. Considering the bootstrap values in all six generated trees, the gene-based data caused more confidence than protein-based data. The results obtained with LIN7 genomic sequences gave the best resolution with the neighbor-joining method; the phylogenetic tree had significant bootstrap values and divided the analyzed tomato accessions into two main clusters corresponding to self-compatible and self-incompatible species (Fig. 5).

Fig. 5
figure 5

Evolutionary taxonomic relationship among tomato species inferred based on genomic LIN7 sequences using the neighbor-joining method

Differential Expression Patterns of the LIN7 Gene

To determine the functional characteristics of newly identified genes, we analyzed relative LIN7 mRNA expression in leaves, roots, buds (young and mature), and flowers, as well as in mature green and red ripe fruits using qRT-PCR with the focus on interspecific diversity in the expression patterns. The accessions for expression analysis were selected based on the differences in crossing systems and fruit color.

The LIN7 transcription pattern in cultivated tomato S. lycopersicum cv. Silvestre recordo was fully consistent with that previously reported (Godt and Roitsch 1997), confirming the specificity of LIN7 expression to buds and flowers.

To compare LIN7 expression in cultivated and wild tomatoes, we used five wild species: S. cheesmaniae, S. chmielewskii, S. arcanum, S. peruvianum, and S. habrochaites (Fig. 6). In all selected tomato accessions, the LIN7 gene was expressed in mature buds and flowers (including reproductive organs) with similar dynamics, i.e., transcription was upregulated during the bud-to-flower development (except for S. habrochaites). However, expression levels showed significant species-specific differences. Thus, in mature buds and flowers, the highest LIN7 transcription was observed in S. cheesmaniae and S. lycopersicum, which did not express LIN7 in fruit. In S. arcanum flower, LIN7 expression level was high and comparable with LIN7 expression in self-compatible tomato species, but in S. arcanum large buds, LIN7 level was very low like in self-incompatible and unlike in self-compatible species.

Fig. 6
figure 6

Relative LIN7 expression in different tissues of cultivated S. lycopersicum cv. Silvestre recordo and wild tomato species. SB small (young) bud, LB large (mature) bud, F flower, MG mature green fruit, RF ripe fruit, L leaf, R root (scale bar = 1 cm). Red star next to the organ of the green-fruited species indicates significant difference of the LIN7 expression from that in the same organ of red-fruited species. Vice versa, green star next to the organ of the red-fruited species indicates significant difference of the LIN7 expression from that in the same organ of green-fruited species. Blue star next to the organ of the self-incompatible species indicates difference of the LIN7 expression from that in the same organ of self-compatible species

The other four species showed some expression in mature green fruit, and S. arcanum and S. peruvianum expressed LIN7 mRNA in ripe fruit. In S. arcanum, LIN7 expression was significantly decreased from mature green to ripe fruit, while in S. peruvianum, the opposite trend was observed, and in S. habrochaites, LIN7 mRNA levels were very low in all analyzed tissues, even in the mature buds and flowers.

These data indicate significant differences in LIN7 expression profiles among wild representatives of Solanum section Lycopersicon.

Discussion

In tomato, as in other plants, apoplastic carbohydrate supply mediated by CW invertases including LIN7 plays a critical role in sustaining pollen germination and pollen tube growth (Goetz et al. 2017). CWIN downregulation in Nicotiana tabacum, A. thaliana, and Brassica napus resulted in male sterile plants (Goetz et al. 2001; Hirsche et al. 2009; Engelke et al. 2010), underlining the role of CWINs in fertilization.

The current data on the genes encoding CWIN LIN7 orthologs in different tomato species are limited to genomic and cDNA sequences of S. lycopersicum (cultivated tomato) and S. pennellii (Fridman and Zamir 2003; Proels et al. 2003) and to LIN7 expression in S. chmielewskii (Zhang et al. 2013) and S. lycopersicum (Godt and Roitsch 1997; Fridman and Zamir 2003; Proels et al. 2006; Zhang et al. 2013). Therefore, the identification and analysis of LIN7 sequences in other wild tomato species should further understanding of phylogenetic and functional relationships among apoplastic CW invertases in tomatoes. In this study, we identified, cloned, and characterized complete genomic sequences encoding LIN7 family invertases in 11 wild and 1 cultivated tomato accessions.

Genomic Structure of LIN7 Homologous Genes in Wild Tomatoes

Typically, genes encoding plant invertases have a similar structure containing six to eight exons, including the smallest functional exon (9-bp exon II) in the plant kingdom, which is absent only in the carrot CW invertase-encoding InvDC1 gene (Sturm 1996; Fotopoulos 2005). Consistent with these data, the 12 tomato LIN7 homologs identified in this study had six exons, including highly conserved 9-bp exon II, and five introns (Fotopoulos 2005). The presence of 226 exonic SNPs (7.89%) and extremely high variability in intronic sequences (over 30%) indicate high interspecific divergence among the examined tomato accessions. The observed LIN7 polymorphism can be compared with that in other tomato genes (The 100 Tomato Genome Sequencing Consortium 2014) as evidenced from the whole-genome sequencing of 84 S. lycopersicum (54 cultivars and 30 wild accessions), and S. arcanum, S. habrochaites, S. pennellii and other wild species. In red-fruited tomatoes, the number of species- and accession-specific SNPs was 20 times lower compared to green-fruited wild tomato species (The 100 Tomato Genome Sequencing Consortium 2014), which is in agreement with our data that in green-fruited wild species, LIN7 sequences contained 10 times more SNPs compared to red-fruited accessions. Furthermore, in nine Russian cultivars, we found extremely low variability in LIN7 genes, even in intronic regions (data not shown) compared to S. lycopersicum cv. Silvestre recordo and Heinz, which is in agreement with the data of The 100 Tomato Genome Sequencing Consortium (2014), confirming the notion that crop tomatoes have much lower genetic diversity than wild species.

Our results are also consistent with a previous finding that the GBSSI gene-encoding structural granule bond starch synthase I showed much less interspecific variability in red-fruited self-compatible tomato species (S. lycopersicum, S. pimpinellifolium, S. cheesmaniae) than in green-fruited self-incompatible species (S. peruvianum, S. arcanum, and S. habrochaites) (Peralta and Spooner 2001).

Contrary to the annotated genome of the Heinz cultivar, the genomes of accessions analyzed by shallow sequencing (The 100 Tomato Genome Sequencing Consortium 2014) have significantly higher SNP frequency in intergenic regions compared to that in genic regions. Genes contain only about 10% of all polymorphisms, 7.55 ± 2.19% in introns and 2.33 ± 0.68% in exons; among the latter, 55% are synonymous. In our study, we found that 180 of the 226 SNPs detected in exons of the analyzed LIN7 homologous genes were synonymous, constituting about 75% of the exonic polymorphisms.

Protein Structure and Amino Acid Variability in LIN7 Invertases from Wild Tomato Species

A typical translated LIN7 sequence represents a pre-pro-protein containing a signal peptide providing invertase entry into the secretory pathway and an N-terminal pro-peptide responsible for protein delivery to the cell wall (Unger et al. 1994). In our study, we observed a similar structure in the deduced sequences of LIN7 CW invertases. In addition, all identified LIN7 orthologs share the N-terminal sucrose-binding box containing the highly conserved WECPD, RDP, and beta-fructosidase NDPNA motifs, which define the enzyme conformation and catalytic activity (Goetz and Roitsch 1999; Chen et al. 2009).

Sequence and 3D structure analyses revealed significant similarities among the identified LIN7 orthologs. The 3D structural model indicates that the active site located in the N-terminal β-propeller module includes all three major conserved motifs (Fig. 2). The β-propeller fold is typical for the members of the glycosyl hydrolase family 32 (Pons et al. 1998) including acid invertases (Verhaest et al. 2006; Yao et al. 2014). High degree of similarity in the tertiary structures and active catalytic sites in cultivated and wild tomato species indicates that putative deleterious residue substitutions found in invertase proteins should not influence LIN7 catalytic activity. Given the stereochemical mechanisms underlying invertase catalysis, these data suggest functional conservation of LIN7 CW invertases among diverse tomato species accessions.

Compared with LIN7 from S. lycopersicum сv. Heinz, red-fruited tomatoes (S. lycopersicum cv. Silvestre recordo, S. lycopersicum var. humboldtii, S. pimpinellifolium var. racemigerum, S. cheesmaniae, and S. galapagense) have a high level of LIN7 amino acid variability (2%), which is even higher (6.34%) in green-fruited accessions (S. chmielewskii, S. chilense, S. corneliomulleri, S. peruvianum, S. peruvianum 3966, S. arcanum, and S. habrochaites). This fact reflects evolutionary contribution to the interspecific diversity in tomatoes. The rate of evolution is obviously higher in green-fruited plants than in the red-fruited group, which is principally represented by crop cultivars and their wild varieties, especially in regard to fruit characteristics.

At the same time, it should be noted that although green-fruited species had six times more exonic SNPs compared to red-fruited species, only 28.2% of these polymorphisms result in amino acid substitutions, while in red-fruited tomatoes, the percentage of non-synonymous SNPs of the total SNP number is almost twice as much (54.5%). These results are in agreement with the genome-wide sequencing data (The 100 Tomato Genome Sequencing Consortium 2014), indicating that for crops, non-synonymous SNPs outnumber synonymous SNPs, while the opposite trend is generally observed for wild species, which can be associated with artificial selection pressure in cultivated tomatoes.

There was no evident correlation between LIN7 amino acid variability and crossing ability in the analyzed tomato species, but it was clearly seen the link between fruit color/fruit sugars composition and amino acid substitution number (Fig. 2), which needs an additional study.

Phylogenetic Analysis of Tomato LIN7 Orthologous Genes and Encoded Acid Invertases

Solanum section Lycopersicon (Solanaceae) comprises a cultivated tomato S. lycopersicum and its wild relatives, 10 of which are endemic to western South America from Ecuador to northern Bolivia and Chile, and two are endemic to Galápagos Islands (Peralta et al. 2008). In the present study, we analyzed 2 S. lycopersicum accessions (cultivated and wild) and 10 accessions of 9 wild tomato species (Table 1).

In plant species-level systematics, comparison of low-copy nuclear genes is considered to be the most informative (Marshall et al. 2001; Peralta and Spooner 2001; Peralta et al. 2008; Zuriaga et al. 2009; Gramzow and Theißen 2015; Zhang et al. 2014; Techen et al. 2017; Slugina et al. 2017). High variability in both intron and exon LIN7 sequences, including parsimony informative sites, indicated that LIN7 sequences can be used for phylogenetic analysis. Indeed, the LIN7 gene-based evolutionary topology is consistent with the current tomato phylogeny (Fig. 5).

The constructed phylogenetic tree has two main clusters corresponding to self-compatible (S. chmielewskii, S. lycopersicum, S. pimpinellifolium var. racemigerum, S. cheesmaniae, and S. galapagense) and self-incompatible (S. chilense, S. peruvianum, S. arcanum, S. peruvianum 3966, and S. corneliomulleri) species. It coincides with a previously suggested division of tomato species on the Esculentum (self-compatible) and Peruvianum (self-incompatible) complexes based on crossing relationships (Peralta et al. 2008; Rick 1960, 1979). In addition, our clustering corresponds to the color separation of analyzed tomato accessions: green-fruited species are clustered together, while the red-fruited group also includes a yellow-to-orange-fruited species. This classification corresponds to the separation into yellow-to-red-fruited and green-fruited lineages (Peralta et al. 2008) and to the division of Lycopersicon on the Eulycopersicon and Eriopersicon subgenera (Müller 1940a; Luckwill 1943a; Davies and Hobson 1981). LIN7 sequences from two species, S. habrochaites and S. pennellii, form the most ancient basal branches, and the encoded proteins are considered to be the closest relatives to the common LIN7 ancestor. Green-fruited S. chmielewskii forms a separate branch in the self-compatible cluster, which represents plants with orange-to-red fruits. The S. chmielewskii basal position within the self-compatible cluster was in agreement with the hypothesized direction of tomato evolution from green-fruited to red-fruited and from self-incompatible to self-compatible species (Igic et al. 2008; Miller and Kostyun 2011).

Despite the proper species clustering based on mating system and fruit color, the species resolution within each cluster was not fully consistent with current tomato classification. Namely, the S. lycopersicum LIN7 sequences did not group together, and close relationships between orange-fruited S. cheesmaniae and S. galapagense, as well as between green-fruited S. arcanum and S. chmielewskii, previously revealed by The 100 Tomato Genome Sequencing Consortium (2014) and Pease et al. (2016) were not shown. Therefore, the identified LIN7 genes can be employed only for the tomato species phylogeny based on mating system and fruit color.

Differential Expression Patterns of LIN7 Homologous Genes

Species belonging to the section Lycopersicon grow in various climate conditions and, accordingly, demonstrate distinct fruit biochemistry, morphology, and stress resistance, which is associated with carbohydrate metabolism (Beckles 2012; Beckles et al. 2012; Peralta et al. 2008; Proels and Hückelhoven 2014). To further elucidate the role of LIN7 invertase in sucrose hydrolysis and tomato growth, we performed LIN7 mRNA expression profiling in the organs of cultivated and wild tomato species.

According to our data, S. lycopersicum LIN7 is transcribed only in buds and flowers (Fig. 6). It is consistent with earlier reports about LIN7 expression in mature flower buds and flowers (weak levels in petals, medium in gynoecia, and high in stamens) (Godt and Roitsch 1997), in stamens and anthers (Fridman and Zamir 2003), and in the tapetal cell layer of flower buds, pollen grains of open flowers, and pollen tubes (Proels et al. 2003, 2006). LIN7 expression in S. lycopersicum cv. Micro-Tom was shown to be six times lower compared to that in S. chmielewskii where some expression was also found in leaves (Zhang et al. 2013). Here, we did not detect LIN7 mRNA in S. chmielewskii leaves, and its levels were lower in buds and flowers compared to those in another S. lycopersicum cultivar, Silvestre recordo. The discrepancy may be attributed to the difference in the specificity of RT-PCR primers, because we designed primers based on the identified S. chmielewskii LIN7 genomic sequence, while Zhang et al. (2013) used those specific for the S. lycopersicum cv. Micro-Tom LIN7 sequence.

The expression of all analyzed LIN7 genes in wild tomato buds and flowers, which include reproductive organs, indicates possible involvement of the identified invertase orthologs in supplying carbohydrates to developing pollen tubes and grains like it was shown by Goetz et al. (2017). In our study, the similarity in LIN7 expression patterns between S. cheesmaniae and cultivated S. lycopersicum (Fig. 6) corresponds to that in fruit color and self-compatibility (Table 1). In other species, LIN7 mRNA was also detected in mature green fruit and even in ripe fruit (S. arcanum and S. peruvianum), which corresponded to self-incompatibility, green fruit color, and storage sugar composition (sucrose). Interesting, in S. habrochaites, LIN7 expression was very low not only in mature green fruit, but also in mature buds and flowers, which may be due to the ancestral position of S. habrochaites relative to other analyzed cultivated and wild accessions.

Elliott et al. (1993) demonstrated that the intron and upstream of the start codon sequence variability is able to contribute to differences in temporal regulation of vacuolar invertase gene expression in tomato. Therefore, we suppose that variability of LIN7 regulatory sequences may be responsible for differences in LIN7 expression between self-compatible and self-incompatible species. Further studies should be performed to discover LIN7 promoter variability and functional responsibility of multiple polymorphic sites in LIN7 introns.

Typically, red-fruited tomatoes (S. cheesmaniae and S. lycopersicum in our study) convert sucrose into glucose and fructose (Beckles 2012), while green-fruited tomatoes (S. chmielewskii, S. arcanum, S. habrochaites, and S. peruvianum in our study) mainly contain sucrose (Davies 1966; Beckles et al. 2012; Elliott et al. 1993). For example, green ripe fruit of S. habrochaites accumulates eight times more sucrose than red ripe fruit of S. lycopersicum cultivars (Miron and Schaffer 1991). Although in expression analysis the data on storage sugars are limited, certain correlation was observed between fruit color/sugar composition and LIN7 expression. The absence of LIN7 transcription in the S. lycopersicum and S. cheesmaniae fruit suggests that other glycosyl hydrolases (such as vacuolar and/or neutral invertases, sucrose synthase, or CW invertase LIN5) may participate in sucrose cleavage to fructose and glucose. Thus, mRNA of LIN5 (another known CW invertase in tomato) was detected in green mature and red ripe fruits (Godt and Roitsch 1997), as well as TAI (tomato vacuolar invertase) expression with level differing depending on fruit sugar content was observed in mature green and ripe fruits of various tomato species (Slugina et al. 2017). At the same time, relatively low levels of LIN7 expression in mature green fruits of S. peruvianum, S. chmielewskii, and S. habrochaites, which contain sucrose as the main storage sugar, may indicate the absence of intracellular glycosyl hydrolase activity and sucrose-mediated induction of LIN7 expression, like it was previously shown for the sucrose synthase and invertase genes, whoaw expression can be affected by the hexose-triggered signals (Koch 2004).

The significant differences observed in LIN7 transcription among flower buds and flowers may be linked to the gametophytic self-compatibility of tomato species supported by the suppression of pollen tube growth (Franklin-Tong and Franklin 2003). In self-incompatible Solanaceae plants, an incompatible pollen grain germinates on the stigma, but pollen tube growth is arrested when it has reached about one third of the way through the style transmitting tract to the ovary where fertilization occurs (Franklin-Tong and Franklin 2003; Zhao et al. 2006). The stigma and style provide a nutritive support for pollen grain germination and pollen tube growth. The importance of a CW invertase for viable pollen development was demonstrated on tobacco plants with the CW invertase gene repression resulted in male sterility (Goetz et al. 2001). In addition, it was shown that carbohydrates play a critical role in pollen tube growth (Clement et al. 1996). Zhao et al. (2006) investigated tomato-style metabolome in S. chilense (self-incompatible, SI) and S. pimpinellifolium (self-compatible, SC) and found that SI style contained more sucrose and fructose/glucose than did SC style. After self-pollination, the sucrose content decreased in SC and increased in SI, and vice versa for fructose/glucose content, which may be one of the reasons that prevent the pollen tube growth in SI species (Zhao et al. 2006).

LIN7 is proposed to be involved in sugar content regulation in pollen grains and tubes, and, thus, in appropriate male gametophyte development and production of viable pollen (Proels et al. 2006). Gynoecia-specific LIN7 expression (Godt and Roitsch 1997) suggests its involvement in style carbohydrate metabolism. Considering above, LIN7 expression level may be linked to the suppression of pollen tube growth. Its expression in young and mature buds corresponds to the stages of pollen formation and development, while in open flower to the period of pollen germination and pollen tube growth.

Generally, in self-compatible S. lycopersicum, S. cheesmaniae, and S. chmielewskii species, LIN7 mRNA levels are higher compared to self-incompatible S. peruvianum and S. habrochaites. In open flowers, where pollen tube growth occurs, LIN7 expression in SI S. peruvianum and S. habrochaites is much lower than in SC species, and also lower than in SC S. chmielewskii. This may indicate the possibility of reduced apoplast sucrose hydrolysis in growing pollen tubes and gynoecium style of SI tomato species. LIN7 expression in green-fruited SC S. chmielewskii differs from that in red-fruited SC S. lycopersicum but is similar in red-fruited SC S. cheesmaniae, which may reflect the S. chmielewskii intermediate evolutionary position. However, in self-incompatible S. arcanum, LIN7 expression in flowers is comparable to that in self-compatible tomatoes, which may be attributed to impaired ovule fertilization and/or embryo development as likely mechanisms of self-incompatibility in this species (Stone and Goring 2001). Also, the similarity in LIN7 amino acid sequences between S. lycopersicum and S. arcanum is higher than that between S. lycopersicum and the other green-fruited self-incompatible tomatoes, which, together with the differences in pI, could influence LIN7 substrate specificity and interaction with common partners.

Conclusion

This study showed, for the first time, that the LIN7 gene could be used for phylogenetic analysis and classification of tomato species, varieties, and cultivars. We identified and characterized LIN7 homologous genes in wild and cultivated tomato accessions of Solanum section Lycopersicon in terms of nucleotide and amino acid sequence variability and expression patterns in different plant organs. The observed gene and amino acid polymorphisms as well as species-specific distinction in LIN7 expression profiles may indicate functional diversity among the enzymes, which, given the role of invertases in carbohydrate supply to sink tissues, could be translated into differences in tomato crossing ability and fruit ripening. Therefore, comprehensive characterization of CWINs from the representatives of the section Lycopersicon may be used to evaluate the genetic potential of invertases for tomato breeding programs. Our data should further understanding of the association between CWIN variability and physiological diversity among tomato species.