Introduction

Lupin is an herbaceous plant of the leguminous family belonging to the genus Lupinus. Lupin flour and seeds are widely used in bread, cookies, pastry, pasta, sauces, as well as in beverages as a substitute for milk or soy, and are widely used in gluten-free foods. Lupin flour is a good emulsifying, rich protein source and has many advantageous nutritional properties being a good controller of cholesterolemia as well as glucose hematic levels [1]. However, in response to the increasing number of severe cases of lupin allergies reported during the last decade, in December 2008, lupin was added to the list of substances requiring mandatory advisory labeling on foodstuffs sold in the European Union [1, 2]. Thus, all products containing even trace amounts of lupin must be labeled correctly [3], and the International Union of Immunological Societies (IUIS) allergen nomenclature subcommittee recently designated β-conglutin as the Lup an 1 allergen [4]. There are several techniques used to detect lupin allergens, mainly based on antibodies used in enzyme linked immunosorbent assay (ELISA) kits, although some PCR-based kits are also available. Currently available commercial ELISAs exploit polyclonal antibodies that are not specific to each of the conglutin subunits, and reports in the literature only detail monoclonal IgG antibodies against α-conglutin and IgM antibodies against β-conglutin [5, 6]. There are no reports or commercial ELISAs for the specific detection of β-conglutin.

DNA or RNA aptamers have garnered much attention as the next generation antibody-like molecules for use as molecular recognition tools for diagnostic use. Aptamers are synthetic nucleic acids that specifically bind a wide range of targets, from small molecules to whole cells [710], and are developed through the use of Systematic Evolution of Ligands by Exponential Enrichment (SELEX) [1113]. Recently, in our group, aptamers specifically binding to the Lup an 1 allergen, which as said is the β-conglutin subunit, have been selected showing dissociation kinetic constants (K D) in the range of 3.6–5.2 × 10−7 M [14].

A full-length aptamer usually has three functional regions. Firstly, the region that has the role of interacting directly with the target is normally approximately 10–15 nucleotides long and has secondary structures such as hairpin loops, G-quartet loops, bulges, or pseudoknots [11]. A second region contains nucleotides that do not directly contact the target but play an important role in supporting the interactions between the contacting nucleotides and the target, whilst the third region comprises the nucleotides that do not bind to the target nor support the binding of the contacting nucleotides to the target [15], and these are regarded as nonessential nucleotides. It is always desirable to truncate the aptamer to eliminate the nonessential nucleotides after the upstream aptamer selection process (i.e., downstream truncation). There are several reasons for carrying out downstream truncation. The elimination of nonessential nucleotides has been observed to increase the binding functionality of the aptamer presumably due to reduced steric hindrance [16], and furthermore, although the chemical synthesis of an oligonucleotide is a well-established technique, the synthesis of aptamers longer than 60 nucleotides is always more expensive and difficult to perform; the shorter the aptamer, the more cost-effective it is. It is thus highly important to understand how selected aptamers interact with their targets, in order to elucidate the redundant motifs in the aptamer sequence.

On the other hand, truncation may result in the removal of the non-binding domain and may actually interfere with the interaction between the aptamer and target protein by formation of complex secondary structures, and could eventually prevent the binding domain from folding into the desired conformation for binding to the target. Therefore, identifying the high binding affinity domains in the post-screened aptamer is an equally important key step for producing potent aptamers of higher affinity/specificity [17]. Nuclear magnetic resonance (NMR) spectroscopy has been successfully applied to solve the structures of a range of aptamer or ligand–aptamer complexes. Using NMR spectroscopy, the structure and dynamics of aptamers can be predicted [18, 19], and moreover, the characterization of aptamer conformational transitions and the structural changes due to the addition of cofactors, such as ions, small molecules, or even proteins could give an insight into the biomolecular reaction mechanisms [20, 21]. Rational truncation by computational structural modeling [22], partial fragmentation [23], enzymatic footprinting [24], or microarray-based binding sequence determination [25] are other alternatives to predict short high-affinity binding sequences. Alternatively, online available software algorithms can be used to deduce the consensus motif which is responsible for the binding of the aptamer to its target and can be used to predict candidate bases for truncation [17, 26, 27].

In the work reported here, a combination of approaches were used in order to optimize the size of the aptamers selected against the β-conglutin [14], which we refer to as the S2 and S40 β-conglutin binding aptamers. In the first interaction, the primers were removed, and the degree of interference with the interaction between the aptamer and its target was probed using surface plasmon resonance (SPR). A second truncation approach was then carried out based on the analysis of simulated structures, as the use of algorithms to theoretically predict the secondary structures of aptamers can assist in determining which segments of the selected aptamer are essential sequences for binding [28, 29]. Current algorithms predict the structures of nucleic acids in the absence of target molecules whereas it is possible for some, if not all, aptamers to undergo conformational changes upon binding to the target molecules [30]. Following prediction of the structure using m-fold and GQRS Mapper, the presumed G-quadruplex structure was studied using one-dimensional NMR. Finally, binding studies of truncated sequences were used for determination of the specificity and affinity of these truncated aptamers using SPR, and the K D of the final truncated 11-mer aptamer was determined.

Materials and methods

Materials

All oligonucleotides were HPLC purified and lyophilized by Ella Biotech GmbH (Germany). Proteins from Lupinus albus seeds were extracted, purified, and characterized as previously described [31], obtaining a pure isolate of α-, β-, γ-, and δ-conglutins. The binding buffer (BB) was phosphate-buffered saline (10 mM phosphate, 138 mM NaCl, 2.7 mM KCl, pH 7.4) containing 1.5 mM of MgCl2, pH 7.4. All solutions were prepared in high-purity water obtained from a Milli-Q RG system (Spain) and passed through a 0.45-nm filter.

Truncation strategy

The first iteration of truncated sequences was generated by removing the primer regions at either the 3′ extreme or the 5′ extreme, or at both extremes (Table 1). A second truncation iteration was carried out based on the analysis of simulated structures. To prepare this second generation of truncated sequences, full-length sequences S2 and S40 were loaded to the m-fold program, and bases which formed different loops of predicted secondary structure were removed, resulting in different sequences for evaluation (Table 2).

Table 1 Full-length aptamer sequences (S2, S40) and truncated species obtained after removal of the 3′ primer (S2.3, S40.3) or the 5′ primer (S2.2, S40.2), or both primers (S2.1, S40.1)
Table 2 Truncated aptamer species modeled according to the secondary structure of the aptamer sequences S2 and S40, short G-quadruplex (SGQ), and DNA controls: mutated SGQ (SGQm) and thrombin-binding aptamer (TBA)

Secondary structure prediction

The secondary structure model of the sequences obtained was predicted using m-fold at 22 °C in 0.138 M [Na+] and 1.5 mM [Mg2+] and the GQRS Mapper, for predicting G-quadruplexes in nucleotide sequences [28, 29].

Surface plasmon resonance analysis

A BIAcore 3000 (Biacore Inc.) with BIAevalution software was used for the SPR experiments. Proteins of interest (i.e., β-conglutin and potential interferents α-, γ-, and δ-conglutin and gliadin) were immobilized, via amine coupling, on separate channels of a CM5 sensor chip. First, the chip was activated using EDC/NHS followed by an injection of the protein (5 μL/min for 10 min). After immobilization of the protein, any unreacted NHS esters were deactivated by injecting an excess of ethanolamine hydrochloride, followed by 75 mM NaOH to remove any nonspecific adsorption. The various truncated sequences under evaluation were diluted in 10 mM phosphate, 138 mM NaCl, 2.7 mM KCl, 1.5 mM MgCl2, pH 7.4, and injected for 6 min at a flow rate of 5 μL/min followed by 3 min stabilization and 10 min dissociation. The binding of DNA sequences was analyzed through corresponding changes in the refractive index of optical signals and expressed as resonance units (RU). All sensograms were corrected for nonspecific binding and refractive index changes by subtracting the signals of an equivalent injection across the underivatized flow cell. All reagents and buffers were previously filtered and dissolved in Milli-Q water. All SPR measurements were performed in triplicate.

NMR experiments

1-H-NMR spectra were acquired at 500 MHz using a Varian Unity Inova-500 NMR spectrometer. The HPLC-purified and lyophilized 11-mer β-conglutin aptamer (“short G-quadruplex containing” (SGQ)) was diluted in 90 % v/v H2O/10 % v/v D2O to 2 mM. One-dimensional spectra were acquired by double pulsed field gradient selective excitation for H2O suppression at 22 °C before/after subsequent additions of 10, 20, 30, 40, and 50 mM KCl.

Results and discussion

The first step for the truncation of the S2 and S40 β-conglutin aptamers was to remove the primer sequences at either the 3′ extreme (2.3, 40.3) or the 5′ extreme (2.2, 40.2), or at both extremes (2.1, 40.1). In the first instance, β-conglutin was immobilized on a CM5 chip, and sequences corresponding to the truncated S2 and S40 aptamers detailed in Table 1 were passed over the chip surface. As can be clearly seen in Fig. 1a, the sequences present at the 3′ extreme (S2.2 and S40.2) do not appear to participate in target binding, whilst sequences present in the 5′ extreme are obviously essential for binding (S2.3 and S40.3).

Fig. 1
figure 1

SPR experiments showing the interaction between truncated aptamer sequences obtained after removal of primer regions (A) or based on the analysis of simulated structures (B) to β-conglutin protein immobilized on the surface of the CM5 Biacore chip. The binding of sequences is represented in resonance units (RU) (i) and normalized by the molecular weight of each sequence (ii)

The procedure used for the second truncation approach was based on the analysis of simulated structures. The use of algorithms to theoretically predict the secondary structures of aptamers can help to determine which segments of the aptamers are essential sequences for binding. Firstly, the S2 and S40 β-conglutin aptamers [14] were compared, and 11 sequential nucleotides with 100 % homology were found in both sequences; we refer to this 11-mer as the SGQ β-conglutin aptamer (Tables 1 and 2). To generate information on the composition and distribution of putative quadruplex-forming G-rich sequences in the SGQ β-conglutin aptamer, the QGRS Mapper was used. The G-quadruplex structure, also known as a G-quartet, is composed of stacked G-tetrads, which are square coplanar arrays of four guanine bases each. These interesting structures may be formed by repeated folding of a single nucleic acid molecule or by interaction of two or four strands and are generally very stable due to cyclic Hoogsteen hydrogen bonding between the four guanines within each tetrad [28]. The SGQ β-conglutin aptamer was compared with the thrombin-binding aptamer (TBA) [3234] using the GQRS Mapper. The G-score is the statistical parameter used by the GQR Mapper software for predicting the probability of finding a G-quadruplex structure motif; the higher the G-score, the higher the probability is. The TBA, which has a well-known and studied G-quadruplex structure, achieves a GQ-score of 20, and the same range was obtained for the 11-mer truncated aptamer (G-score 21). This preliminary analysis thus suggested the high probability of the presence of G-quadruplex structures in the motif.

The identification of this 11-mer G-quadruplex can also explain the results obtained in Fig. 1a for the 2.1, 40.1, 2.2, and 40.2 truncated sequences; only seven bases of the identified 11-mer are present, with GGTG of the 11-mer having been truncated, thus preventing the G-quadruplex from forming and consequently hindering target binding, whilst in the 2.3 and 40.3 sequences, the 11-mer is kept intact.

Thus, maintaining the GQ motif, the secondary structure of the aptamer sequence was modeled with the m-fold, and the different loops of the secondary structure predicted were removed, resulting in different sequences: S2a, S2b, S2c, S2d, S40a, S40b, S40c, and S40d as outlined in Table 2b, and are depicted in the Electronic Supplementary Material (Fig. S1). The binding characteristics of these sequences were analyzed together with the main SGQ motif and its non-truncated counterparts (S2, S40) using SPR with β-conglutin immobilized on the surface of the CM5 Biacore chip (Fig. 1b). According to the technique employed, the RU obtained are proportional to the molecular weight of the molecule, and to make the responses comparable, the RU were normalized with the corresponding mass of the sequence. As can clearly be seen in Fig. 1b, the highest signal was obtained for the G-quadruplex structure of 11-mer (SGQ), and none of the structures predicted using m-fold resulted in aptamers of higher affinity.

The effect of truncation on the binding affinity of the SGQ aptamer was investigated, and the dissociation constant of the truncated 11-mer aptamer was observed to be markedly improved (Electronic Supplementary Material Fig. S2). The K D of the truncated GQ motif SGQ β-conglutin aptamer was determined using a Langmuir model, and a two orders of magnitude improvement in K D from 3.60 × 10−7 to 1.7 × 10−9 M was obtained. The good fit to the model was demonstrated by very low χ 2 values of 1.9. [Note: The χ 2 value is a standard statistical measure of the closeness of fit of data to the model used for elucidation of the K D, where for good fitting to ideal data, χ 2 is of the same order of magnitude as the noise in RU, typically <2].

To address the specificity of the truncated SGQ aptamer to β-conglutin binding, control proteins were selected to compare binding. The signal observed in the control channels with immobilized α-, γ-, and δ-conglutins and gliadin was negligible, demonstrating the high specificity of the SGQ aptamer (Fig. 2a). A mutated form of the SGQ (SGQm) was used as a negative control to demonstrate that the observed binding is related to the specific G-quadruplex structure of the 11-mer. Furthermore, the thrombin-binding aptamer (TBA) was used as another negative control, as an example of an aptamer containing a G-quadruplex structure. As can be seen in Fig. 2b, the signal observed for both negative controls was negligible, highlighting the high specificity of the SGQ aptamer for β-conglutin detection.

Fig. 2
figure 2

(a) SPR sensogram obtained for the SGQ 11-mer aptamer passed over the CM5 Biacore chip surfaces functionalised with the β-conglutin protein and negative controls (α-, γ-, δ-conglutins and gliadin). (b) SPR sensogram obtained with passing the SGQ sequence, the thrombin-binding aptamer, and a scrambled 11-mer sequence over a CM5 Biacore chip surface functionalised with ß-conglutin

To probe the presence of a G-quadruplex structure in the SGQ β-conglutin aptamer, a double pulsed field gradient selective excitation (DPFGSE) spectrometry for water suppression was carried out at 22 °C by KCl titration at 0, 10, 20, 30, 40, and 50 mM. The structure was stabilized at 40 mM of cations dissolved in ultrapure water. The experiment was repeated in the presence of 0, 20, and 40 mM KCl at 22 °C (Fig. 3) To analyze the effect of K+ on the G-quadruplex formation, the imino region of the 1H 1D NMR spectra was studied. At the lowest KCl concentration, at least two and possibly three different forms of quadruplex are co-populated (there are about 18 discernible peaks); as a consequence, a broad signal around 11.0 ppm is obtained (Fig. 3a). At 40 mM KCl, only a single form of quadruplex is populated, while the population of the alternative quadruplex forms decreases at higher concentrations of KCl, and the broad signal at 11.0 ppm becomes sharp (Fig. 3b, c). The six imino proton chemical shifts between 11.1 and 12.2 ppm resonate in the region characteristic of guanine imino protons involved in GGGG tetrad formation [35]. The effect of MgCl2 was also carried out using titration studies, but no particular effects were observed.

Fig. 3
figure 3

NMR spectrum of the 11-mer SGQ β-conglutin aptamer with sequence: GGTGGGGGTGG. 1H-NMR spectra were acquired at 500 MHz from HPLC-purified and lyophilized GQ sequence diluted in 90:10 % v/v H2O/D2O. The spectra were acquired at 22 °C in the absence (a) or in the presence of KCl at 20 mM (b) and 40 mM (c)

Conclusions

In conclusion, the identification of the high-affinity binding domain of the β-conglutin aptamer by truncating primers and identifying the stem-loop structure was carried out. Through comparing the predicted secondary structures of the aptamers, a hairpin structure with a G-rich loop was determined to be the binding motif of these aptamers. The highest binding affinity was observed with a truncation resulting in an 11-mer sequence showing an improvement of two orders of magnitude in the binding affinity for β-conglutin. This truncated aptamer also exhibited high specificity to β-conglutin with negligible binding affinity to the other lupin conglutins (α-, γ-, and δ-conglutins) and gliadin. The NMR results clearly indicate a G-quadruplex, and future work will probe the role of the β-conglutin target as a chaperone for quadruplex formation and development of analytical tools for the detection of the Lup an 1 allergen in foodstuffs, including biosensors based on fluorescent and electrochemical detection.