Introduction

Pathogenic bacteria such as Escherichia coli O157:H7, Yersinia enterocolitica and Salmonella typhimurium are the most common causes of food-borne illnesses and therefore early identification is required. The disease control and prevention centres estimate that every year in the USA alone about 76 million people become ill from pathogens in food and about 5,000 people die [13].

Although several immunological or genetic methods such as enzyme-linked immunosorbent assay and polymerase chain reaction have already been developed for the detection of bacteria, rapid and reliable analytical methods for screening and identification of both known and unknown microorganisms are still needed. These methods should differentiate pathogenic from non-pathogenic bacteria and possibly identify biomarkers of pathogenicity.

Soft impact mass spectrometry (MS) techniques have emerged as powerful tools for detection of bacteria and discovery of biomarkers since the mid-1970s [412]. Nowadays, matrix-assisted laser desorption/ionization (MALDI) time-of-flight (TOF) MS represents the most used MS technique for a rapid and specific detection of whole bacteria thanks to its ability to simultaneously analyse bacterial peptides and proteins in a broad mass range, generating a protein fingerprint of the target bacterium [1315]. Moreover bacteria-specific proteins can be identified by proteomics databases by comparison of their molecular weight or by MALDI-TOF/TOF analysis. A novel bioinformatics-based approach for identification of bacteria, based on matching protein molecular masses in the spectrum with those predicted from the genomic sequences, has recently been developed but its application is limited to those microorganisms whose genomes have been sequenced [16, 17]. As an alternative, a bottom-up proteomics approach based on the analysis of proteolytic peptides from their proteins by MALDI or by tandem MS could be used for identification of microorganisms [18, 19]. Other research groups have developed alternative methods for the identification of bacteria which do not require the use of proteomics databases. For example, the whole mass spectrum can be considered as a bacterial protein fingerprint, thus allowing the comparison of such a spectrum with a library of spectra from known bacterial strains and the identification of unknown bacteria [20, 21]. However, the comparison between reference and unknown spectra requires a high level of spectral reproducibility, which is related both to experimental conditions and to instrumental setup and performance [2226]. In particular, it was demonstrated that different growth conditions, such as the composition of the medium [27] or the culture growth time [28], can strongly change the bacterial protein expression profile. For this reason constant culture conditions are pivotal. Moreover, factors such as signal-suppression effects, the choice of the matrix, the calibration standards used for the analysis or differences in instrumental performances and operators can induce variability in the MALDI mass spectra [26]. Recently Wunschel et al. [29] performed an interlaboratory comparison in order to evaluate the degree of reproducibility of MALDI mass spectra obtained from the analysis of the same microorganism using different commercial mass spectrometers and a standard experimental protocol. The results obtained in this study confirmed the feasibility of MALDI MS as a rapid technique for the analysis of bacteria but underlined the importance of the standardization of the experimental conditions to improve spectral reproducibility and allow identification of bacteria. Several algorithms for the comparison of MALDI MS spectra of whole bacterial cells have been developed, and these differ in the procedures used to compare and estimate the similarity between two spectra [13, 3032]. In this work we propose an alternative approach for identification and classification of bacteria based on direct MALDI-TOF MS analysis and the use of chemometric techniques for the extraction of the useful information from a data set with many variables such as MALDI spectra [33]. Data handling in MS is complicated by the wide number of variables (m/z) that usually exceed the number of objects (spectra). Multivariate analysis was performed for the discrimination of two food-borne pathogenic bacteria, E. coli O157:H7 and Y. enterocolitica, and the non-pathogenic E. coli MC1061. To evaluate spectral variations due to culture conditions, the first step was the MALDI-TOF MS analysis of bacteria grown in different media and at different growth times. As a first attempt, principal component analysis (PCA) was used for the visualization of the spectra of the three bacterial strains. It was observed that the MALDI mass spectra were strongly dependent on the growth conditions of the bacteria and on the characteristic of each strain.

Subsequently, linear discriminant analysis (LDA) as a classification technique and soft independent modelling of class analogy (SIMCA) as a class-modelling tool were used for the classification of the bacterial strains on the basis of the growth conditions and for building statistical models for the discrimination among different bacterial strains.

A feature-selection technique was used before applying both LDA and SIMCA to discard the variables without discriminant information and to maintain only those variables which really improve the classification ability.

To better establish the analytical performance of this approach, a combination of different chemometric techniques and MALDI-TOF MS was also used for the discrimination of six different wild-type strains of E. coli O157:H7. To perform a severe validation of the classification method, the Kennard-Stone algorithm was applied for the selection of the bacteria samples to be used as a representative test set [34].

The applicability of the MALDI TOF MS method for the detection of bacteria in real samples was tested by spiking Y. enterocolitica in bovine meat samples and comparing the results obtained with those obtained with a multiplexed sandwich chemiluminescent enzyme immunoassay (CL-EIA) previously developed by our research group [35].

Experimental

Bacteria growth

The bacteria used in this work were E. coli O157:H7 (ATCC 35150), Y. enterocolitica (ATCC 23716), obtained from the American Type Culture Collection Rockville, MA, USA, and E. coli MC1061 (pT0011), kindly provided by Marco Virta, Department of Biochemistry and Food Chemistry, University of Turku, Finland. E. coli MC1061 is a non-pathogenic strain of E. coli genetically modified to resist the antibiotic kanamycin. Bacterial cells were transferred from solid agar selective for the isolation of gram-negative bacteria (Herellea agar; Biolife, Milan Italy) to 5–7 mL of Luria-Bertani (LB) broth (Bacto® LB broth, LENNOX; DIFCO Laboratories; Detroit, MI, USA), and were grown for 15–18 h at 37 °C. An aliquot of 50 μL of each culture was innoculated in 50 mL of LB broth or modified tryptic soy broth (mTSB) and bacteria were grown at 37 °C with shaking at 280 rpm for 24 h.

Six different E. coli O157:H7 wild-type strains (E. coli O157 E-D 416, E. coli O157 E-D 521, E. coli O157 E-D 412, E. coli O157 E-D 390, E. coli O157 E-D 268, E. coli O157:H7 2) were also studied. These bacteria were kindly provided by the Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise “G. Caporale” Teramo, Italy.

Sample preparation for MALDI-TOF MS analysis

The cellular suspension was collected at different incubation times (4, 6, 9, 15, 18, 24 h), centrifuged at 3,000 rpm for 10 min, decanted and washed with 3 mL of 2% ammonium chloride. These steps were repeated three times. The purified suspension of bacteria was diluted with 2% ammonium chloride in order to achieve a concentration of of approximately 106 cells/μL and was analysed by MALDI-TOF MS.

To evaluate the effect of the culture conditions on spectral variations, E. coli O157:H7 and E. coli MC1061 were grown in different media: mTSB and LB broth for E. coli O157:H7 and LB broth with or without kanamycin for E. coli MC1061.

Spiked bovine meat samples were used to evaluate the analytical performance and the applicability of the method developed for the detection of bacteria.

Bovine meat (purchased from local food stores) samples were previously assayed with conventional procedures to verify the absence of Y. enterocolitica. Then, they were divided into two portions: one portion was used as a negative control and one was artificially contaminated with Y. enterocolitica by using previously reported methods [36, 37]. The samples (25 g) were inoculated with the appropriate amount of bacteria (final concentration ranging from 0 to 10 CFU/g), homogenized in a stomacher bag, and incubated with 225 mL of buffered peptone water for 2 h at 37 °C. Subsequently, 1 mL of this solution was removed, and incubated with 9 mL Yersinia enrichment broth (Sigma, Chemical Co., St. Louis, MO, USA) for 6 h at 37 °C under shaking, then assayed using the MALDI-TOF method developed. For each sample the concentration of bacteria was also verified by using a conventional microbiological reference method and a previously published method [35].

MALDI-TOF MS analysis

Positive-ion spectra were recorded in linear mode using a MALDI Voyager DE Pro (Applied Biosystems, Foster City, CA, USA) TOF instrument, equipped with a pulsed-N2 laser operating at 337 nm. On the MALDI target plate, 1 μL of bacterial suspension was mixed with 1 μL of a solution of 2% (w/v) ferulic acid (Sigma Chemical Co., St. Louis, MO, USA) in 50% acetonitrile (Carlo Erba, Milan, Italy), 20% formic acid (Merck, Darmstadt, Germany) and 30% water (Milli-Q Plus, Millipore, Bedford, MA, USA). Mass spectra were acquired in linear mode over an m/z range from 4,000 to 17,000 using a 25,000-V accelerating voltage, a 23,500-V grid voltage and a delay extraction time of 400 ns. Mass spectra were externally calibrated using cytochrome c (Sigma Chemical Co.) singly and doubly charged ions. Each spectrum was the average of 120 consecutive laser shots. The total number of spectra recorded was 935. In particular, 366 spectra were acquired for E. coli MC1061 (174 for bacteria grown in LB broth and 192 for bacteria grown in LB broth containing kanamycin), 379 spectra for E. coli O157:H7 (189 for bacteria grown in LB broth and 190 for bacteria grown in mTSB) and 190 spectra for Y. enterocolitica.

Data analysis

MALDI spectra were exported in ASCII format. Because of the high resolution of the TOF analyser the number of m/z variables in the mass range selected exceeded 20,000; therefore, spectra were subjected to a preprocessing procedure in order to reduce the number of variables and to eliminate errors in the original data (i.e. negative intensity values). For this purpose a calculation module was developed to reduce the variables to a given number established by the operator. In this study the number of variables in the output file was set to 500. The mass range was divided into adjacent intervals and the module calculated the average spectrum so as to have an overlap of 50% between them. Moreover the module identified some categories that were spectra of bacteria with an identical fixed parameter (i.e. the growth time). The output file obtained was a data matrix composed of as many rows as spectra (935) and as many columns as variables (500).

All the data were subjected to a row profile pretreatment to correct the instrumental variability [38], and multivariate data analysis using the V-PARVUS program was performed [39].

As a display technique, PCA [40] was performed on the centred data to extract and visualize the useful information from the data set. Afterwards LDA [41] was applied as a classification method, after a variable selection performed using SELECT [42], a feature-selection technique based on the stepwise decorrelation of the variables implemented in the V-PARVUS program [39].

The prediction ability was computed by the cross-validation method using five cancellation groups: the data matrix was divided into training and prediction sets, and the classification rule was computed five times, each time with the objects of four cancellation groups. The classification rate was computed as the percentage of training set objects correctly classified, while the prediction rate was computed as the percentage of the prediction set objects correctly classified by the rule previously developed. Five cancellation groups were selected because this allow a more realistic and not too optimistic evaluation of the prediction rate [43]. The application of both LDA and SELECT chemometric techniques to the analysis of MALDI mass spectra and the validation procedures has already been discussed in a previous paper from our research group [33].

Finally, to fix the boundaries of the “class space” for the three bacteria species analysed, SIMCA was applied as a class-modelling technique [44, 45]. Class-modelling techniques compute a “class model” and for each object a “distance from the model”. Statistical or empirical rules are used to define a maximum permitted distance from the model. An object is accepted by the class model (i.e. it can belong to the class) when its distance from the class model is less than the maximum permitted distance.

SIMCA builds a class model on the basis of the principal components of the category computing the components, generally after separated category autoscaling.

The sensitivity of the model of class A is the fraction of the objects belonging to class A accepted by the model, while the total specificity of the model of class A is the fraction of objects belonging to other classes (not class A) rejected by the model.

The model of SIMCA is a parallelepiped in the space of the first components, delimited by the range of the scores. The results presented in this work were obtained using three principal components and were displayed using Coomans plots [46]. In these plots the coordinates of each object are the distances from the models of two classes (their ratio to the critical distance corresponding to the boundary of the class space). The critical distances (95%) are drawn in the plot so that it is divided into four parts. The objects in the lower-left square are accepted by the models of the two classes because they have small distances from both models, whereas the objects rejected by both models are in the r upper-right square. The objects accepted by only one model fall in the other two parts (rectangles) of the Coomans plot.

Results and discussion

Factors such as growth conditions or sample preparation can affect the reproducibility of MALDI spectra, which could compromise the identification of microorganisms [27, 28].

Since keeping the culture conditions constant is a challenge, we applied chemometric techniques able to minimize spectral variation due to culture conditions while maximizing differences among bacterial species.

Subsequently LDA and SIMCA were performed to discriminate between the three different bacterial species despite the spectral variability due to the culture conditions.

Effect of culture growth time on spectral variations

To evaluate the effect of growth time on the bacterial protein expression profiles, whole-cell suspensions of E. coli O157:H7, E. coli MC1061 and Y. enterocolitica collected at different incubation times (4, 6, 9, 15, 18, 24 h) were analysed by MALDI-TOF MS (an example of the mass spectra acquired by analysing suspensions of E. coli MC1061 grown in LB medium containing the antibiotic kanamycin is shown in Fig. S1).

The comparison between mass spectra acquired at different growth times shows differences mainly in the relative intensities of the ions, particularly at early growth times. However, there is a series of peaks that reproducibly appears in all the mass spectra. As mentioned also by Arnold et al. [28], the variations observed could be partially ascribed to the modification of the composition of the medium, such as consumption of nutrients and oxygen during the bacterial growth cycle. To better appreciate such differences, a chemometric approach was used and PCA was applied as a display method. Figure 1a shows the score plot obtained by PCA of the centred data for E. coli MC1061: the first two principal components explain 60.3% of the total variance. A partial discrimination among spectra acquired at different growth times was observed also for E. coli O157:H7 and for Y. enterocolitica (data not shown). For all three bacterial species analysed the most important variability (explained by the first principal component) was due to the culture time effect.

Fig. 1
figure 1

Score plot of the first two principal components (a) and the first versus the second canonical variable of linear discriminant analysis (LDA) (b) of Escherichia coli MC1061 mass spectra recorded at different growth times. Bacteria were grown in Luria-Bertani broth containing kanamycin. The culture growth times are given on the right

Furthermore, after selection of variables performed using the SELECT algorithm, for each bacterial species LDA was performed, always retaining the 15 variables with the largest classification weights. Selection of features is an important aim of chemometrics: its goal is to discard variables without discriminant information and to maintain only those variables which really improve classification ability. A graphical display of the LDA results obtained for E. coli MC1061 is shown in Fig. 1b.

The LDA method allowed the discrimination between mass spectra recorded at different growth times for all three bacterial species considered. In particular, a better discrimination was observed among spectra recorded at early times (4, 6, 9 h) of culture growth. The smaller differences among the spectra acquired at 15, 18 and 24 h after inoculation could be due to the bacterial growth reaching the stationary phase. This is a steady-state equilibrium where the rate of cell growth is exactly balanced by the rate of cell death: the signals observed in the MALDI mass spectra recorded at these culture times exhibit much smaller differences since they are probably an average of those among the proteins expressed by cells during both division and death. The percentages of correct classification and internal prediction for these bacterial species analysed at different growth times (calculated using five cancellation groups) were, respectively, 99.6 and 98.5% (see Table S1 for the classification and prediction rates for all the bacterial species and all the growth times).

Effect of the composition of the medium on spectral variation

The effect of the composition of the medium on spectral variations was studied by comparing spectra of E. coli O157:H7 grown in LB broth or in mTSB and spectra of E. coli MC1061 grown in LB broth or LB broth containing kanamycin (representative MALDI spectra for each microorganism grown in the different culture media are shown in Fig. S2). LDA was performed on the acquired mass spectra after the selection of ten features. A perfect discrimination between bacteria grown in different media, both for E. coli O157:H7 and for E. coli MC1061, can be observed: in the case of E. coli MC1061, 99.7% of samples were correctly classified and correctly predicted by the cross-validation procedure with five cancellation groups, while for E. coli O157:H7 a 100.0% of samples were correctly classified and predicted.

In both cases the observed spectral variation can be ascribed to the different protein expression as a consequence of changes in bacterial metabolism induced by the different culture media.

On the basis of these results it is possible to conclude that MALDI mass spectra of bacteria change with growth time or by using different media compositions. Our results are in agreement with those of other research groups [27, 28]. In addition, we expect that other parameters linked to bacterial growth (i.e. temperature, growth medium volume, pH) can affect the protein profiles and therefore the spectra.

Discrimination between different bacterial species

MS spectral variations observed when analysing bacterial growth in different conditions prevent identification of microorganisms. To discriminate among E. coli O157:H7, E. coli MC1061 and Y. enterocolitica despite the spectral variations, all the spectra acquired in the previously described experiments were used to create a new data set and both LDA and SIMCA were performed as classification tools on the new data set after selection of variables. LDA classification and prediction abilities obtained using ten variables retained by SELECT were both 99.9%. The graphical display of the LDA results (Fig. 2) shows a complete separation among the bacterial species.

Fig. 2
figure 2

The first versus the second canonical variable of LDA for mass spectra of E. coli MC1061, E. coli O157:H7 and Yersinia enterocolitica. The three bacterial species are indicated by their class symbol: MC (E. coli MC1061), H7 (E. coli O157:H7) and Y (Y. enterocolitica)

Furthermore, SIMCA was applied to the ten variables selected by SELECT, to build class models useful to predict the type of bacteria; each class was suitably modelled by using three principal components and 95.0% confidence level. The results obtained were validated using five cancellation groups.

Internal prediction rate, sensitivity (the non-error rate for each category) and specificity (the percentage of objects of the other class rejected by the class model under study) are shown in Table 1. The classification and prediction ability were always very high: an average value of 99.0% of samples were correctly classified and 99.0% total internal prediction (five cancellation groups) was achieved. The class models built using SIMCA showed a high specificity, in particular the model of class 3 (Y. enterocolitica).

Table 1 Correct classification and prediction ability, sensitivity and specificity of soft independent modelling of class analogy (SIMCA) for Escherichia coli MC1061, E. coli O157:H7 and Yersinia enterocolitica

Coomans plots were used to display the result of SIMCA (Fig. 3), but when all three categories were displayed on the same Coomans plot, the results were confusing. Thus, to make clearer the visualization, the categories were displayed in pairs and three different plots were drawn. Figure 3a shows the discrimination between classes E. coli MC1061 and E. coli O157:H7, Fig. 3b shows that between E. coli MC1061 and Y. enterocolitica and Fig. 3c shows that between E. coli O157:H7 and Y. enterocolitica.

Fig. 3
figure 3

Coomans plots of the results of soft independent modelling of class analogy. a Discrimination between E. coli MC1061 (1) and E. coli O157:H7 (2); b discrimination between E. coli MC1061 (1) and Y. enterocolitica (3); c discrimination between E. coli O157:H7 (2) and Y. enterocolitica (3)

Discrimination between different E. coli O157:H7 wild-type strains

Finally, the possibility to discriminate among different wild-type strains of bacteria was evaluated. For this purpose six different E. coli O157:H7 wild-type strains (E. coli O157 E-D 416, E. coli O157 E-D 521, E. coli O157 E-D 412, E. coli O157 E-D 390, E. coli O157 E-D 268, E. coli O157:H7 2), grown in LB broth for 6 h, were analysed by MALDI-TOF MS. The data set (composed of 95 spectra and 500 variables) was divided into two subsets using the Kennard-Stone algorithm [34]. This algorithm allows one to select objects in a multivariate space under the criterion that the sampled points are the best possible representation of the population of all points. The objects having the largest distance in the multivariate space were assigned to the test set in order to have a severe validation of the classification rule. The first data set (63 spectra) was used as a training set in order to build the chemometric rule, while the second set (32 spectra) was used as external test set in order to validate it. To visualize and rationalize the Kennard-Stone algorithm results, PCA was performed on the autoscaled data.

Twelve variables were selected using the SELECT algorithm and LDA was applied on the reduced matrix. A graphical display of the LDA results is shown in Fig. 4, where the objects are projected on the first two canonical variables. The percentages of correct classification and of total internal prediction were, respectively, 100 and 98.4% (using five cancellation groups). The results obtained demonstrated that it is possible to discriminate among different E. coli O157:H7 wild-type strains on the basis of their mass spectra.

Fig. 4
figure 4

The first versus the second canonical variable of LDA for mass spectra of different E. coli O157:H7 wild-type strains (E. coli O157 E-D 416, E. coli O157 E-D 521, E. coli O157 E-D 412, E. coli O157 E-D 390, E. coli O157 E-D 268, E. coli O157:H7 2)

Method applicability

The applicability of the method was evaluated by using spiked meat samples. A total of ten artificially contaminated samples were analysed for the presence of Y. enterocolitica with both the MALDI-TOF method developed and a previously published CL-EIA method. No false-negative or false-positive results were obtained and for all samples the bacterial strains detected using the MALDI-TOF method corresponded to those revealed by the CL-EIA method.

Conclusions

The chemometric approach developed combined with MALDI-TOF MS analysis of whole bacteria was proved to be suitable for a complete and specific classification of the two pathogenic bacteria E. coli O157:H7 and Y. enterocolitica and the non-pathogenic E. coli MC1061.

The use of the SELECT variable-selection algorithm minimizes the variability associated with growth time and experimental parameters, permitting for each bacterium the evaluation of a pattern of proteins common to all growth conditions. These bacteria-specific protein fingerprints allow the identification of the bacteria by multivariate analysis approaches such as PCA, LDA and SIMCA independently of these parameters.

The analysis of bovine meat samples spiked with Y. enterocolitica and the comparison of the results with those obtained with a multiplexed sandwich CL-EIA demonstrated the applicability of the method for the identification of bacteria in real samples.

The combination of the proposed chemometric approach and the automation of MALDI-TOF MS analysis is well suited for rapid routine safety controls. In particular, despite the pre-enrichment step normally needed for the analysis of bacteria present in real samples, it is faster and cheaper than immunometric methods since no antibodies are used, and the risk of false-positives due to cross-reactivity is drastically reduced because a protein fingerprint is used instead of a single epitope for the identification of each bacterium. The specificity of the identification could be further increased by the use of MALDI-TOF/TOF platforms.

Furthermore the MALDI-TOF MS profiling approach developed allowed us to discriminate also among different strains of the same bacteria, as shown for E. coli O157:H7 wild-type strains. These results demonstrate the advantage of this method over immunoassays, which cannot distinguish among wild-type strains.

The overall results suggest that direct MALDI MS analysis of whole bacteria needs to be combined with a chemometric approach which allows their identification independently of experimental growth conditions, thus rendering this method suitable for application to real samples.