Introduction

An aspartic acid proteinase inhibitor (SQAPI) has been characterized from the phloem exudate of squash (Cucurbita maxima Duchesne) (Christeller et al. 1998; Farley et al. 2002). The protein has no sequence homology to the other four families of proteinaceous aspartic proteinase inhibitors (PIs) that have been identified and cloned to date: the potato plant Kunitz inhibitors (Mares et al. 1989), the Ascaris inhibitors (Martzen et al. 1990), the yeast inhibitor IA3 (Schu et al. 1991), and the pig serpin inhibitor (Mathialagan and Hansen 1996). SQAPI also has very different properties from the wheat inhibitor, which has only been partially biochemically characterized (Galleschi et al. 1993). The five cloned aspartic acid PIs show no homology to each other, although the Kunitz and serpin aspartic PIs belong to already established families of PIs (Rawlings et al. 2004; http://merops.sanger.ac.uk/).

Several PIs that have been exclusively identified in plants are often highly restricted in their distribution to a single family (Christeller and Laing 2005). The squash serine PI family appears to be limited to Cucurbitaceae, the PI II family to Solanaceae, the trypsin/α-amylase inhibitor family to Graminaceae, and the mustard seed inhibitor family to Brassicaceae. Such a limited distribution among plant families may indicate a recent evolutionary history. Two inhibitor families found only in plants have a broader distribution (Christeller and Laing 2005), with the plant Kunitz and the Bowman-Birk inhibitors being found in legumes and cereals and possibly other species. Only the PI I family, serpins and cystatin inhibitors, have homologues outside the plant kingdom and are widely distributed in plants (Christeller and Laing 2005).

However, the distribution and evolutionary origin of SQAPI are unknown and its phylogenetic relationship to other genes is unreported. Within the order Cucurbitales, there is a preponderance of taxa belonging to the Cucurbitaceae family, many of which are economically important. However, there are several other families within Cucurbitales, including Begoniaceae, Corynocarpaceae, and Coriariaceae. Molecular phylogenetic relationships have been established for Cucurbitales (Zhang et al. 2006).

Two additional widespread features of PIs are the presence of small gene families of each inhibitor within most genomes and hypervariation within the active site contact regions between orthologous inhibitors (Hill and Hastie 1987; Laskowski et al. 1987a,b). These features are considered to be due to the function of these inhibitors as resistance factors against secreted proteinases of pests, parasites, and pathogens and the intense evolutionary pressure generated by these interorganism protein-protein interactions (Christeller 2005; Creighton and Darby 1989). We previously identified two isoinhibitors of SQAPI from squash cDNAs (Christeller et al. 1998), but the extent of the gene family (paralogues) remains undetermined and hence the possibility of hypervariability within this inhibitor family is unknown.

In this paper, we investigate the evolution of this novel inhibitor and present evidence that indicates that SQAPI evolved not from other aspartic PIs but from a phytocystatin cysteine PI.

Materials and Methods

Plant Material

Squash, zucchini (Cucurbita pepo), cucumber (Cucumis sativus), watermelon (Citrullus lanatus), green nutmeg melon (Cucumis melo), bitter melon (Momordica cochinchinensis), and large bottle gourd (Lagenaria siceraria) were obtained as seeds from commercial sources or as gifts. White bryony (Bryonia dioica) was obtained as tubers from the New Zealand Department of Conservation from an infestation in Maungaweka. These materials were grown in a glasshouse at Palmerston North and harvested as required. Tissues from the southern cucurbit (Sicyos australis) were a gift from AgriGenesis Ltd., Auckland, New Zealand, those collected from tutu (Coriaria arborea) and karaka (Corynocarpus laevigatus) were growing in their natural environment near Palmerston North, and begonia (Begonia rex) plants were purchased from a local nursery. Material used in this study is listed in Table 1.

Table 1 Plant material used in this study from the order Cucurbitales

Prediction of Secondary Structure

Predictions of SQAPI secondary structure were based on tools provided by SwissProt using an NMR structure of a rice cystatin (1EKQ [Nagata et al. 2000]) as template (Guex and Peitsch 1997; Peitsch 1995; Schwede et al. 2003).

Fluorescence Spectroscopy

Fluorescence spectra were collected on a Perkin Elmer LS 50B spectrophotometer at room temperature, with a scan speed of 50 nm min−1, slit widths of 2.5 nm, excitation at 295 nm (for specific excitation of tryptophan), and emission from 300 to 400 nm. All data are the average of nine scans after subtraction of the fluorescence caused by the solvent alone. Recombinant SQAPI (HDVA isoform) was prepared as described previously (Farley et al. 2002). L-Tryptophan and porcine pepsin A were from Sigma Chemical Co. (St. Louis, MO, USA).

Identification of Squash SQAPI Gene Family Paralogues

Genomic DNA was extracted (Doyle 1990) from leaves of glasshouse-grown C. maxima var Supermarket Hybrid squash plants. The DNA was quantified fluorometrically using the Picogreen DNA kit (Molecular Probes Ltd., Eugene, OR, USA) at excitation and emission wavelengths of 480 and 520 nm, respectively. Clones were then obtained by PCR using nested primers (Siebert et al. 1995) as follows: SP1, 5′-TGACCTGCGTCTACACCAGCCAAGAT-3′; and SP2, 5′-CATGCTGTTTCAGTGCGAACTCTGCT-3′.

To confirm the approximate number of clones, Southern hybridization was performed (Sambrook et al. 1989) using 10 μg of squash genomic DNA. Aliquots of DNA were digested separately with 15 units of XhoI and BamHI and electrophoresed overnight in an 0.8% TAE agarose gel. Capillary transfer of DNA to Hybond N+ nylon membranes was done under alkaline conditions and the DNA fixed by UV cross-linking. The probe comprising the SQAPI coding sequence amplified from squash genomic DNA was randomly labeled with 32P α-dCTP using the Rediprime II system (GE Healthcare, CT, USA) according to the manufacturer’s instructions. Blots were hybridized at 62°C, then washed twice in 3× SSC and once in 0.5× SSC at 65°C.

Identification of Cucurbitales SQAPI Gene Orthologues

Genomic DNA from the leaves of each plant was extracted using the Nucleon extraction and purification kit (GE Healthcare) following their standard protocol for extraction from 100-mg samples. This DNA was then used as a PCR template with forward and reverse nested primers based on the 5′ and 3′ ends of SQAPI (Christeller et al. 1998). PCR was carried out using Platinum Taq (Invitrogen) for 30 cycles with an annealing temperature of 48°C. The four primers used were as follows: (5′ to 3′).

  • F1: ATG GTT GAT TTT CCA CAC ATG

  • F2: CCA GCC ATC GGT GAA GTG ATA

  • R2: GAG CTT CAG TGA ATT ATC TGA A

  • R1: AGC TTA GAA AAG AGG AAC GAA AG

PCR products were gel purified (QIAgen GmbH) and sequenced using the Capillary ABI3730 Genetic Analyser (Applied Biosystems Inc.). Unique predicted peptide sequence orthologues (Fig. 4) were used for subsequent phylogenetic analyses.

Phylogenetic Analyses

Alignment of SQAPI sequences was conducted using ClustalX (Thompson et al. 1997). Phylogenetic trees were generated using programs implemented within PHYLIP (Felsenstein 2002) and trees were subsequently rendered by Treeview (Page 1996).

Preparation of Anti-SQAPI Antibody

Polyclonal antibodies were raised in a New Zealand white rabbit by multiple subcutaneous injections of purified recombinant SQAPI (HDVA isoform) prepared as described previously (Farley et al. 2002). The initial immunization was by injection of 215 μg of His-tagged recombinant SQAPI emulsified in complete Freund’s adjuvant. Booster injections of 108 μg of His-tagged recombinant SQAPI and 75 μg recombinant SQAPI were administered on days 27 and 49, respectively. Serum was collected on day 59 and partially purified by ammonium sulfate precipitation before use.

Identification of SQAPI Protein

Phloem exudate was collected as previously described (Murray and Christeller 1995) and mixed with SDS-PAGE sample buffer. Plant tissues were ground in 50 mM Tris-HCl, pH 7.5, containing 20 μM E-64 and 5% PVPP (Sigma Chemical Co.). After centrifugation, extracts were dialyzed against three changes of 5 mM Tris-HCl, pH 7.5, and freeze-dried. The residue was suspended in a minimum amount of water and SDS-PAGE sample buffer. Samples were heated at 70°C for 10 min, then centrifuged and run on a NuPAGE 10% gel (Invitrogen, Carlsbad, CA, USA) in MES-Tris-EDTA-SDS buffer (pH 7.3). The gel was blotted onto an Immobilon-P membrane (Millipore Corp., Milford, MA, USA) in 0.5% disodium tetraborate/20% methanol and incubated with rabbit anti-SQAPI polyclonal antibody in PBS-T. SQAPI was detected using goat anti-rabbit IgG-alkaline phosphatase and NBT/BCPIP tablets (Sigma-Aldrich, WI, USA). In the case of Begonia rex, leaves were extracted in ∼2 vol of 0.75% lactic acid with 0.02% Tween 20, then centrifuged, the pH was adjusted to 7.6 using Tris base, the mixture was concentrated to 0.2 vol using a centrifugal concentrator, and 1 ml was applied to a G75 Superdex column (GE Healthcare) equilibrated with 0.1 M Tris-HCl, pH 8.0. Fractions were assayed for pepsin inhibitory activity and active fractions combined, concentrated 10-fold, and assayed (see below).

Cloning of SQAPI Variants and Assay of Pepsin Inhibitory Activity

SQAPI was recloned into pET30 (Laing et al. 2004) from four selected clones isolated as above. SQAPI from expressed clones and from phloem extracts was assayed as described previously (Christeller et al. 1998; Farley et al. 2002).

Results

Identification of Phytocystatins as the Nearest Homologues to SQAPI and Prediction of the Tertiary Structure of SQAPI

BLAST (BLASTP, TBLASTN) (Altschul et al. 1997) searches in GenBank (including EST sequences) using SQAPI sequences indicated that the nearest match to the SQAPI sequence was the phytocystatin family. For example, the closest translated gene sequence to SQAPI was Prunus persica (peach) BAC clone 82I18 (GenBank accession AC154901), the translation of which gave 33% identical sequence over the full length of SQAPI and 51% similar sequence. BLAST of the translation of this particular Prunus clone back into the GenBank protein sequence database hit only cystatin clones from a wide range of species, and the Prunus sequence showed the diagnostic cystatin motifs (Turk and Bode 1991). A functionally verified cystatin (Rassam and Laing 2004) gave an amino acid identity value of 21% and a similarity of 43%.

Pairwise comparisons of SQAPI with Arabidopsis thaliana putative PI peptide sequences using the Smith Waterman local alignment program gave similar identities to those described above over the full length of SQAPI to Arabidopsis cystatins, but only gave alignments over 11 to 24 residues with other Arabidopsis putative PIs (e.g., Serpins, Bowman Birk, proteinase inhibtor 2, and mustard seed families). One exception was a putative serpin inhibitor (AT3G45220; 393 residues) which had 20% identity over 73 residues. Thus there is a strong contrast between cystatins, which align over nearly their full length with SQAPI, and other PIs, which not only are not detected by BLAST searches, but also show little global alignment to SQAPI. In addition, SQAPI and cystatins are of a similar number of residues in length, in contrast to most of the other PIs compared.

We used the structure of a rice phytocystatin (GenBank accession number and Protein data bank ID 1EQK), which had been determined in solution using NMR, as a template on which to model SQAPI using Swissmodel (Schwede et al. 2003). These two proteins show 26% identity and 44% similarity over 104 residues when gaps are unrestricted, with lower percentages when gaps are restricted for modeling purposes (23%/44% over 100 residues; Fig. 1). The identical and similar residues were evenly distributed along the full sequence. The model fitted well to cystatin (Fig. 2), predicting the same two loops and the α-helix region found in cystatin (Turk and Bode 1991). The sequences of the SQAPI loops were quite distinct from the cystatin loops having a tryptophan in the center of loop 1 and no tryptophan (found in cystatin) in loop 2. However, the consensus sequence for the α-helix found in the cystatin, (LVI)-(AGT)-(RKE)-(FY)-(AS)-(VI)-X-(EDQV)-(HYFQ)-N (Margis et al. 1998) was also consistent with the SQAPI sequence at the predicted α-helix (IAEFALKQHA) in eight of nine residues (equating I with L in position 6). The two gaps in the modeled alignment (Fig. 2) occurred in the predicted noncontact loop between β-sheet 2 and β-sheet 3 (missing larger loop in SQAPI due to two extra residues in the SQAPI sequence compared to the rice cystatin; Fig. 2) and in the relatively unstructured sequence leading up to the α-helix (where SQAPI has one less residue; not visible in Fig. 2). While some structural discrepancies were reported by Whatcheck (Hooft et al. 1996), these were probably a function of the original rice cystatin template. For example, the Ramachandran Z-score was −6.163 for the rice cystatin and −5.4 for the modeled structure. These discrepancies were not taken further at this stage.

Fig. 1
figure 1

Amino acid alignments of SQAPI (AAT67163) with the rice cystatin used for three-dimensional structural determination (1EQK_A). The symbols under the sequence refer to an identical residue in both proteins (*; shaded), a functionally similar residue (:) and a similar residue (.). The identified conserved motifs in cystatin proteins are labeled under the alignment.

Fig. 2
figure 2

Modeling of SQAPI (GenBank AAC39473) onto the rice oryzacystatin structure. The solution ribbon structure of rice cysteine PI, 1EQK (Guex and Peitsch) (left) and the threaded fit of SQAPI to this structure (right) modeled using Swiss-PdbViewer v3.7b2. Missing sequence in SQAPI in the turn between β-sheet 2 and β-sheet 3 was not modeled by the program. We have evidence that the tryptophan shown on the modeled SQAPI is at a binding site of SQAPI (Fig. 3), at an analogous position to binding residues in cystatins.

Fig. 3
figure 3

Fluorescence emission difference spectra. Pepsin/inhibitor complexes were prepared by preincubating pepsin (6 μM) and an equimolar amount of either recombinant SQAPI (HDVA isoform) or pepstatin in 90 mM lactate-BisTris buffer, pH 3.5, at room temperature for 15 min. An equimolar mixture of porcine pepsin A and tryptophan (6 μM) and a sample of pepsin only (6 μM) in the same buffer were treated similarly. Fluorescence spectra were collected for each sample (excitation, 295 nm; emission, 300–400 nm). The fluorescence of recombinant SQAPI alone (6 μM) is shown in line A. The fluorescence, after subtraction of the pepsin fluorescence, of the pepsin/tryptophan mixture (line B), the pepsin/SQAPI complex (line C), and the pepsin/pepstatin complex (line D) are also graphed.

Fluorescence Spectroscopy of the SQAPI-Pepsin Complex

The predicted tertiary structure for SQAPI places the only tryptophan found in SQAPI in the first active site turn, suggesting that this tryptophan may be intimately involved in binding with the target pepsin. The emission wavelength maximum for the single tryptophan residue (residue 53) in recombinant SQAPI (HDVA isoform) was 350 nm in 50 mM acetic acid (pH 3.2) and 351 nm in 50 mM ammonium bicarbonate (pH 8.0), respectively (Fig. 3). For comparison, the emission wavelength maximum for the amino acid tryptophan in solution was 354 nm at pH 3.2 and 356 nm at pH 8.0, respectively. The fluorescence of the SQAPI tryptophan residue was almost completely quenched following formation of a complex with pepsin. The difference spectrum for the complex was essentially indistinguishable from the difference spectrum for the pepstatin-pepsin complex, whereas the fluorescence of an equivalent concentration of tryptophan was easily observed.

SQAPI Is A Small Gene Family in Squash

In order to determine the size of the SQAPI gene family in squash, we searched for homologues in this species. The existence of a small squash gene family was confirmed by two lines of evidence: first, by isolation of different genes by PCR, from both cDNA and genomic DNA, and, second, by Southern blotting of genomic DNA. Two variants had previously been obtained by PCR from cDNA of squash tissue (GenBank AAC39473 and AAC39474) (Christeller et al. 1998). Additionally, seven more complete clones were identified using a reverse primer based on the 3′ terminal sequence of the above isolated cDNAs (Siebert et al. 1995). Translations of these clones are listed in Fig. 4. In total, 9 distinct variants have been cloned from C. maxima var Supermarket Hybrid, showing changes in 15 residues of 103.

Fig. 4
figure 4

Amino acid alignments of Cucurbitaceae SQAPI paralogues and orthologues. Sequences were aligned using Clustal X as described under Materials and Methods. GenBank numbers are as follows: B. dioica, DQ286436; B. rex 1, DQ286437; B. rex 2, DQ286438; C. arborea, DQ286454; C. laevigatus 1, DQ286439; C. laevigatus 2, DQ286440; C. laevigatus 3, DQ286441; C. lanatus, DQ286442; C. maxima 1, AAC39474; C. maxima 2, AAT72725; C. maxima 3, AAT67162; C. maxima 4, DQ286443; C. maxima 5, AAT67163; C. maxima 6, DQ286444; C. maxima 7, DQ286445; C. maxima 8, DQ287856; C. maxima 9, AAT67163; C. melo, DQ286446; C. moschata, DQ286447; C. pepo 1, DQ286448; C. pepo 2, DQ286449; C. sativus, DQ286450; L. aegyptica, DQ286451; L. siceraria, DQ286452; S. angulatus, DQ286453. All primer sequences were removed from the DNA sequences before translation.

The nonconservative residue changes are mainly toward the N-terminal half of the SQAPI sequences (Fig. 4). For example, C. maxima 6 and 7 differ from the other C. maxima SQAPI sequences in having a proline at residue 12 instead of an alanine found in the others (P12A). Similarly C. maxima 2 and 5 differ from the other SQAPIs by having a proline at reside 6 (P6A) and a glycine at residue 15 (G15E), while C. maxima 2, 5 and 9 have an N at residue 3 and other SQAPIs have a D (N3D). Conservative changes also occur further along the SQAPI sequence. Five squash clones have the sequence HWD at residues 59–61, while four have DWN. In another distinctive change, C. maxima 8 has an extra amino acid insertion at position 24 but is otherwise identical to C. maxima 3.

Southern blots showed supportive evidence for a small SQAPI gene family (Fig. 5). The data show five distinct bands when the DNA is restricted with Xho1 and over six bands when restricted with BamH1. The increase in band numbers is expected with BamH1 because the SQAPI sequence has a BamH1 site in the middle, whereas Xho1 cuts only outside the sequences we have cloned. The results suggest that between 5 and 10 loci for SQAPI alleles occur in squash.

Fig. 5
figure 5

Southern blot analysis of C. maxima cv. Supermarket Hybrid DNA using a nucleotide probe against SQAPI (see Materials and Methods). From left to right the lanes are (1) Xho I-cut squash gDNA, (2) standards of 15, 8, 5, and 1.6 kb, and (3) BamH1-cut squash gDNA.

Identification of other Cucurbitales SQAPI Genes

We then searched for SQAPI homologues in other plant species. SQAPI genes were identified in all the Cucurbitales species tested (Table 1) by PCR, using either single or nested primer pairs, and subsequent sequencing of the PCR products. Our survey covered representative species from four of seven families within the order Cucurbitales, these families representing over 99% of the species in the order. The data show that SQAPI is present beyond the family Cucurbitaceae, indicating that evolution of SQAPI preceded speciation within the order Cucurbitales. Furthermore, sequencing revealed at least one major product from each species, and in several instances, two or three different genes were identified. Thus the data show evidence for small gene families of this inhibitor to be widely distributed in the order. BLASTP searches of GenBank failed to discover any non-Cucurbitales SQAPI clones with significant homology (E < 0.018) from either the All GenBank database (excludes ESTs) or the GenBank EST database including the HortResearch apple EST database that did not have cystatin signature motifs (Margis et al. 1998).

SQAPI Expression as a Protein in Phloem

The presence of highly homologous SQAPI genes across the Cucurbitales does not necessarily show that all these genes are transcribed, translated, or active. The presence of SQAPI protein in the phloem of related cucurbits should be detectable by Western blotting and cross-reaction with a SQAPI polyclonal antibody, as the predicted protein sequences (Fig. 4) are very similar. It proved extremely difficult to obtain phloem exudate from the majority of species. However, SQAPI was identified in some Cucurbitaceae by Western blot (data not shown). Bands were observed of varying intensity in the 10-kDa region for phloem exudate collected from the fruit of C. maxima, C. moschata, C. pepo, C. sativus, L. aegyptica, and L. siceraria. The molecular masses of the L. siceraria and C. moschata proteins were slightly greater than those of the other species (data not shown). We could detect the presence of SQAPI using pepsin inhibition assays of crude phloem extracts in L. aegyptica, C. sativus, L. siceraria, C. moschata, Citrullus lanatus, and C. maxima extracts (Table 2). While we could not detect inhibitory activity in C. pepo or C. moschata, this was probably because little to no protein was detected in these extracts (Table 2). In addition, a partially purified aspartic PI activity could be detected in Begonia rex (Table 2), with a molecular mass of ∼11 kDa (data not shown). Three C. maxima variants of SQAPI (GenBank AAC39473, AAT67162, and AAC39474) have been cloned, expressed, purified, and evaluated for inhibitory activity as described in Materials and Methods. All showed strong inhibitory activity (Table 2).

Table 2 Properties of SQAPI extracted from various cucurbit species or expressed in E. coli. Binding constants were measured as described in the methods using pepsin as the target enzyme.

Phylogenetic Analysis of Squash Paralogues

To further understand the relationships between the isolated members of the small gene family of SQAPI, we carried out a phylogenetic analysis of the sequences in Fig. 4 beginning at residue Gly18 in the alignment. Using the neighbor-joining method based on Jones-Taylor-Thornton protein distances, a phylogeny was produced that reveals at least four paralogous clusters of genes. Phylogenies based on DNA sequences using parsimony and distance methods revealed similar groupings, however, the relations among the groups and the position of B. rex 2 were not stable across the methods (data not shown). Groups II, III, and IV contain a member from a family other than the Cucurbitaceae, while group I only contains Curcurbitaceae members. Cucurbitaceae (19 genes) has members in all groups, whereas Corynocarpaceae (3 genes) has members in two groups. Begoniaceae (2 genes) and Coriariaceae (1 gene) are each represented in a single group. Because it is probable that these plant families each possess small gene families of SQAPI that we have not detected, the distribution of genes in Fig. 6 is certainly incomplete, although the number of groups may be more accurate and correlates with the suggested number of loci.

Fig. 6
figure 6

Phylogeny of sequenced Cucurbitales SQAPI and cystatins from Arabidopsis and rice. The neighbour-joining tree was generated based on Jones-Taylor-Thorton protein distances using a ClustalX generated alignment (Thompson et al. 1997). The phylogram is rooted using the cystatin AC154901. Bootstrap values are indicated on branches with >50% support as generated from 1000 bootstrap replicates. Proposed groupings of SQAPI sequences are indicated by roman numerals (I–IV).

The plant cystatins, selected either from Arabidopsis or from other cystatins referred to in this paper, showed much greater difference between different members compared to SQAPI members, although this may be biased by the length difference between SQAPI and cystatins.

Hypervariability in Transcribed Amino Acid Residues in SQAPI Genes

PIs generally exhibit hypervariability at their contact residues due to coevolutionary pressures from their cognate proteinases (Christeller 2005; Creighton and Darby 1989). Cystatins are proposed to have three contact points with papain, their model target proteinase: a glycine near the N terminal and the two loops discussed above (Turk and Bode 1991). We identified these positions on SQAPI aligned with 1EQK_A (Figs. 1 and 2) and examined the sequences of these postulated contact points for variability compared with other sections of SQAPI. In the N-terminus there are four glycine residues in SQAPI, three of which are invariant, with several other residues (12, 13, 15, and 17) showing variability between SQAPI clones (Fig. 4). The second contact region occurs at residues 57–61, where again there is significant variability around an invariant W. We have suggested that this W is possibly involved in inhibitor activity. The third region occurs at residues 87–92 and again includes variant residues.

These three regions of SQAPI were analyzed for hypervariability as described by Creighton and Darby (1989), whereby the numbers of amino acid replacements per amino acid per gene analyzed are compared for variable and nonvariable regions and hypervariability is detected with ratios significantly >1. We used this method as we wished to count the maximum possible changes as observed in our data, rather than the extent of change as would be measured by amino acid diversity. The three variable regions chosen contain five or six amino acids, compatible with cystatin inhibitor loop sizes (Turk and Bode 1991). Residues 1–5 were excluded from the analysis because we found that these residues were commonly cleaved in mature molecules and do not affect protease binding (Christeller et al. 1998). The data in Table 3 show that hypervariability is established for these three regions since the function divergence ratios (FDR) are much >1 and are similar to FDR values observed in other protease inhibitors (Creighton and Darby 1989). In addition, the predicted α-helix region of SQAPI showed a reduced FDR value compared with the three identified regions or the rest of SQAPI (Table 3).

Table 3 Hypervariability among SQAPI genes

While there is evidence for hypervariability in the three regions of SQAPI thought to be involved in contact with the protease, there is no evidence for hypervariability over the whole SQAPI protein as estimated by the ratio of amino acid replacements per replacement site (Ka) over the number of silent changes per silent site (Ks) (Ka/Ks). The overall value of Ka/Ks for the SQAPI dataset is 0.355. A Ka/Ks <1 is generally indicative of a protein being under functional constraint, presumably in the case of SQAPI to maintain its structure and PI activity.

Discussion

The Predicted Three-Dimensional Structure of SQAPI, Hypervariability, Binding Loop Identification, and Homology with Cystatin

Hypervariability is well documented in many PI families as positive Darwinian selection at regions of the molecules where interaction with proteinases occurs (Christeller 2005; Creighton and Darby 1989). The involvement of the two interacting molecules from two different organisms in a pathogenic, parasitic, or food source relationship is central to this phenomenon and represents a case of evolutionary warfare (Creighton and Darby 1989). Table 3 shows the higher variability between different SQAPIs in predicted proteinase contact regions compared with other regions with predicted backbone structure function. It has been found that regions of hypervariability within a PI indicate the presence of external loops that are involved in proteinase binding (Hill and Hastie 1987; Laskowski et al. 1987ab). That our predicted contact regions are hypervariable supports the concept that these regions of SQAPI are regions that are also involved in these protein-protein interactions despite no direct structural information being available. This is also supported by the fact these hypervariable regions in SQAPI aligned with predicted papain contact regions of cystatins. SQAPI shows 23%–33% identity to phytocystatins and 45%–54% similarity. Interestingly, high homology is detected (30% identity and 47% similarity over 101 residues including gaps) to a Cucurbitaceae cystatin, the C. sativus cystatin (GenBank BAA28867). This level of homology is a very good indication of related proteins, e.g., the homology and similarity between the cucurbit cystatin and cucurbit SQAPI is similar to that between apple and pear cystatins (GenBank accession numbers AAO18638 and AAB71505; 30%/55%; alignment not shown). These homologies indicate that SQAPI may have evolved from a phytocystatin, the latter being widespread throughout Eukaryota, at a late stage in Angiosperm evolution.

Tryptophan Fluorescence of SQAPI

The invariant single tryptophan in SQAPI aligns and models with the contact loop sequence QVVSG on the oryzacystatin both in the threaded structure and in BLAST alignments with other phytocystatins. Consistent with a relatively exposed location for the single tryptophan residue, the emission wavelength maximum of the SQAPI tryptophan fluorescence was found to be almost identical to that of free tryptophan but fluorescence emission difference spectrum analysis shows complete quenching on binding of SQAPI to pepsin (Fig. 3). Furthermore, kinetic studies with the 60W/A mutant and SQAPI in which the single tryptophan residue was specifically oxidized with N-bromosuccinimide (data not shown) have also implicated this invariant tryptophan as a contact residue. Together, these three independent observations strongly suggest that we have correctly identified a binding loop of SQAPI. Structural and sequence similarity also supports the identification of the N-terminal region and the second identified loop as important in inhibitor binding as predicted by hypervariability.

The Cucurbita maxima SQAPI Gene Family

Southern blotting of C. maxima genomic DNA with restriction enzymes XhoI and BamH1 produced estimates of a gene family of at least five genes (Fig. 5). However, about two bands in each blot were broad and dense, suggesting that more than a single species might be present, although matching the probe to the loci might also affect this interpretation. Because BamH1 cuts in the middle of the SQAPI gene sequence, it would very probably produce multiple bands of very similar size if the gene family has evolved by gene duplication to produce tandem or higher-level contiguous genes. Gene duplication as an evolutionary mechanism is very common in many inhibitor gene families and has been well reviewed (Laskowski and Kato 1980; Rawlings et al. 2004). The most extreme examples are genes in the PI II family and cystatin family that have been characterized with varying numbers of domains, from one through eight (Choi et al. 2000; Walsh and Strickland 1993). Cloning from cDNA and gDNA, while not necessarily exhaustive, suggests that at least nine distinct orthologous genes are present. Thus although the two approaches give different but similar answers, neither is authoritative or can provide an absolute value for the size of the gene family. The consistency in the data is further underlined by noting that cloning identifies individual genes and Southern blots identify gene loci and alleles if the loci are polymorphic. We can reasonably deduce from the data that we are dealing with a gene family of about nine genes (although we can not necessarily distinguish alleles from loci), with at least two orthologous variants in each cluster (Fig. 6). This result indicates that evolution of orthologues has continued during Cucurbitaceae evolution.

Phylogeny

We were able to clone SQAPI gene homologues from all Cucurbitales specimens examined. This includes specimens from four families within the order representing over 99% of the species in the order. These data indicate that SQAPI evolved before the evolutionary separation of families. The phylogenetic tree (Fig. 6) provides evidence that that the evolution of orthologues had also occurred in the Cucurbitales ancestor. The tree shows the existence of four paralogue clusters, all of which contain Cucurbitaceae genes and three of which contain non-Cucurbitaceae genes.

While cluster I only contains Cucurbitaceae sequences, we cannot be sure that any clusters of paralogues are uniquely Cucurbitaceae because it is probable that small gene families exist in all these species and we have merely isolated representative genes. This conjecture is supported by the isolation of multiple genes (two or three) from three specimens. However, the appearance of Corynocarpus genes in two highly separated clusters indicates that at least gene duplication had occurred in the Cucurbitales ancestor. The tree also shows the clear-cut separation of SQAPI from the cystatins.

As the angiosperm phylogeny becomes increasingly certain, it may be possible to trace the appearance of relatively newly evolved genes like SQAPI by identifying their presence or absence in sister orders. The current status of angiosperm phylogeny suggests that Rosales and Fagales are the two sister orders to Cucurbitales (Zhang et al. 2006). We believe it is likely that SQAPI evolved comparatively recently because aspartic protease inhibitors are rare in nature, and the SQAPI gene has not been identified in any fully sequenced genomes, including plant genomes. Neither has it been found in any order other than the Cucurbitales in genes deposited in GenBank, including EST collections, or in the HortResearch apple (Rosales) EST database of over 150,000 ESTs (Newcomb et al. 2006). In addition, we have been unable to detect pepsin inhibitory activity in apple tissue samples (unpublished data).

Our data suggest that not only has SQAPI evolved recently from the older widely distributed cystatin family, but also it has also utilized the cystatin inhibitory mechanism. The cystatin mechanism, which relies on steric hindrance by insertion of its inhibitory contact residues into the active site crevice of cysteine proteinases, does not directly interact with the proteinase nucleophilic sulfydryl. It remains to be seen exactly how SQAPI interacts with aspartic proteases. The protein-protein interactions for two protein aspartic inhibitors, PI-3 and IA3, have been characterized (Li et al. 2000; Ng et al. 2000) and are quite distinct from that proposed for SQAPI.

Nevertheless, evolution of a protease inhibitor of one family from that of a different family is not without precedent. It appears to be an evolutionary mechanism occurring frequently. Examples include serine PIs of the serpin family recruited to cysteine proteinase inhibition (Komiyama et al. 1994) and to aspartic proteinase inhibition (Mathialagan and Hansen 1996), serine PIs of the seed Kunitz family recruited to cysteine proteinase inhibition (Krizaj et al. 1993) and to aspartic proteinase inhibition (Mares et al. 1989), and cysteine PIs of the thyropin family, recruited to aspartic proteinase inhibition (Lenarcic and Turk 1999).