Introduction

Winemaking entails a complex interaction between microorganisms and grapes. Grape berries teem with filamentous fungi, yeasts and bacteria. Research has deepened our understanding of some of their physiological activities, such as the alcoholic fermentation conducted primarily by strains of Saccharomyces cerevisiae. However, the use of non-Saccharomyces yeasts in winemaking is nascent, as their sensory contributions are only now being discovered. The great majority of the grape flora do not survive the alcoholic fermentation, with the primary exception of certain lactic acid bacteria (LAB). The properties of LAB can substantially impact the final products of fermentation both positively and negatively. Winemakers aim to control such influences, and guide fermentations toward a desired final product. However, winemakers currently lack tools that provide rapid specific identification of LAB, hindering their ability to make informed decisions. Here, we investigate Raman spectroscopy and chemometrics as a potential solution to this problem.

Lactic acid bacteria are gram-positive bacteria found in dairy products, decaying plant material, and as microflora in the human body [2]. They are extensively used in fermentations including cheese, yogurt, processed meats, pickled vegetables, beer, and wine [2]. Representatives of four genera of LAB are found in wineries: Lactobacillus, Pediococcus, Leuconostoc and Oenococcus. Some of these bacteria may be deliberately introduced into wineries by inoculation, but all of them can inadvertently contaminate wineries because they are found on grapes [10]. The proportion of each species of LAB found in the vineyard is influenced by grape variety, climate, and season-to-season variation [4]. Most LAB in grape juice do not survive the alcoholic fermentation [14]. O. oeni is the major survivor although Lactobacillus and Pediococcus species are also found post-alcoholic fermentation [13, 34].

In the so-called malolactic fermentation (MLF), O. oeni and species of Lactobacillus and Pediococcus decarboxylate L-malic acid resulting in the softer tasting l-lactic acid [5]. O. oeni is the preferred MLF agent for reducing the acidity of excessively acid wines because of its ethanol and pH tolerance and its more desirable sensory products. Winemakers typically induce MLF by inoculation with freeze-dried preparations of O. oeni after the alcoholic fermentation, although simultaneous inoculation is also practiced [18]. Many O. oeni strains are available commercially. Winemakers also encourage MLF to stabilize wines, that is, to reduce the possibility of an unintended MLF occurring in the bottle. Such occurrence can result in unintended effervescence, haze and off-flavors.

In addition to the acidity reduction and increased microbial stability of an intended MLF, inoculation with a preferred strain is desirable for the sensory ‘complexity’ it adds to a wine [23]. The most important compounds produced during the growth of the LAB in wine are diacetyl, that adds a buttery or nutty note, and acetic acid, that adds to complexity at low levels [33]. LAB are potentially rich sources of glycosidase enzymes that function at wine pH. Glycosidase activity is important in flavor enhancement since many of the fruity, flowery aroma compounds derived from grapes, especially monoterpenes and norisoprenoids, are flavorless unless the bound sugar moiety is removed [31].

The growth of LAB in wine may negatively affect wine quality. In low acid wines, reduction of acidity by MLF may be detrimental, both on a sensory level and by encouraging the growth of Pediococcus and Lactobacillus spp. that prefer higher wine pH. The production of diacetyl may be detrimental to a wine style, and acetic acid at high concentrations adversely affects wine quality. The so-called “ferocious” Lactobacillus kunkeei [8] can produce as much as 4–5 g/L acetic acid in juice, not only imparting a vinegary note on the resulting wine, but also potentially causing stuck or sluggish alcoholic fermentation [9]. LAB, particularly L. hilgardii, can produce one or more of the acetyltetrahydropyridines, responsible for the ‘mousy’ off-flavor in wine [27]. P. parvulus strains are the major culprits in the development of ‘ropiness’, an unappealing, slimy texture, produced from the synthesis of a high molecular weight β-glucan [7]. L. plantarum strains have been shown to produce the volatile phenols, 4-vinylphenol and 4-ethylphenol, associated with Brettanomyces spoilage [11].

The growth of LAB in wine can have health implications. Biogenic amines, such as histamine, putrescine, cadaverine, and tyramine, can be synthesized by LAB [15]. L. hilgardii and P. parvulus are the major histamine producers in wine [18]. These compounds are of concern because of the physiological effects, such as headaches, respiratory difficulty, and severe allergic disorders, they can cause [26, 29]. Mycotoxins, many of which are carcinogenic, are secondary metabolites produced by molds. One of these, ochratoxin, has been detected in grapes and wine [3]. P. parvulus is able to degrade ochratoxin in grape must [1]. Citrulline, a breakdown product of arginine, has been shown to be precursor of ethyl carbamate, a carcinogen found in wine [17]. Strains of L. hilgardii, L. buchneri, L. brevis, and O. oeni have been found to produce citrulline from arginine degradation [6].

Given the great metabolic diversity of wine LAB, with the consequent myriad effects on wine, precise knowledge of the types of LAB present is critical for the winemaker’s control of the finished product. Most wineries check for the presence of LAB only to confirm that their concentration is sufficiently low to meet bottling standards, i.e. to avoid plugging the filter. The demands of molecular genetic techniques, i.e. expensive reagents, time-consuming sample preparation, and highly skilled personnel, limit a winery’s ability to identify LAB in their wines, even at the species level.

The aim of this study was to develop a simple method that wineries could use to identify lactic acid bacteria in wine. Raman spectroscopy can be used as a highly sensitive method of discriminating, classifying and identifying bacteria down to the strain level [30]. Raman spectra provide information regarding the biochemical composition of cells that can be used in the classification of species and strains. In addition Raman spectroscopy has proven useful for monitoring many chemical processes, such as vinegar fermentation [32], rice wine fermentation [35], and yogurt production [22].

A Raman spectrum contains two basic regions: every organic compound in a sample produces a unique pattern or “fingerprint” in the fingerprint (FP) region, 1500–400 cm−1. The FP is a valuable but complex region of interacting vibrations. Bands in the group frequency (GF) region, 3500–1500 cm−1, indicate the presence of specific molecules based on the presence of a specific functional group, such as COOH or NH. It is not possible to assign an exact wavelength to a bond as the frequency at which that bond absorbs is dependent on its environment. The wavelength range for a bond, e.g. C–H stretch at 2900–2700 cm−1, is obtained by identifying absorption frequencies of the bond in various molecules containing this bond. Individual absorption bands may not be visualized in a spectrum of cells or other complex biological mixtures due to a wide absorbance band of another bond. The complexity of such spectra makes quantitative and qualitative interpretation difficult, hence the need for multivariate analysis techniques.

Although much research has been undertaken employing Raman spectroscopy in the identification of medically important bacteria, less has been done with food and beverage-related bacteria [25]. Raman spectroscopy has been combined with various multivariate analytical tools including, support vector machines (SVMs), to classify LAB found in yogurt: Lactobacillus acidophilus, L. delbrueckii, and Streptococcus thermophilus [12]. LAB in kefir, L. kefir, L. parakefir, and L. brevis, were discriminated by Raman spectroscopy using principal component analysis and partial least squares discriminant analysis [20]. In this study, we develop a Raman and SVM-based method for the rapid discrimination of three kinds of lactic acid bacteria found in wine: Pediococcus, Lactobacillus, and O. oeni.

Materials and methods

Bacteria and culture conditions

The bacterial strains used in this study were obtained from various culture collections and from commercial liquid or freeze-dried preparations (Table 1). Bacteria were stored in Microbank™ (Pro-Lab Diagnostics, Austin, TX, USA) vials containing cryoprotectant at −20 °C. Strains were grown from a Microbank™ bead on Difco™ UBA (Becton–Dickinson, Sparks, MD, USA) plates supplemented with 0.5 g/L cysteine-HCl and 1 mL/L Tween 80 at 30 °C. Subcultures (24) from bead plates were incubated at 30 °C for 4 days for Lactobacillus and Pediococcus strains. O. oeni strains required 5 days to reach the same level of growth.

Table 1 Bacterial strains used in this study

Raman measurements

A loopful of cell mass from a subculture plate was suspended in 1.5 mL filtered PBS (pH 7.4; Santa Cruz Biotechnology, Santa Cruz, CA, USA) in 1.7 mL microcentrifuge tubes, and centrifuged at 6708 g for 3 min. Cell pellets were resuspended in 1.5 mL PBS. The turbidity of suspensions was not adjusted. One mL of suspension was pipetted into glass cuvettes (VWR shell vials, Radnor, PA, USA). Cuvettes were placed in a DeltaNu Advantage 532 Raman spectrometer (DeltaNu, Laramie, WY, USA) with frequency doubled ND-YAG exciting laser, emitting at 532 nm and a spot diameter of 35 µm. Medium power (30 mW) was used. Calibration was done daily prior to running samples using a polystyrene standard. Cyclohexane was run prior to running samples to check the baseline and peaks. The sample holder was covered with optical cloth after the cuvette was inserted into the cell holder to exclude extraneous light. Spectra were acquired for each sample over a Stokes Raman shift range of 3400–200 cm−1 with a 15 cm−1 resolution. The low resolution setting was used to optimize the signal to noise in spectra. Ten spectra, each with a 5 s integration time, were collected and averaged for each of the 24 subcultures of each strain. A total of 456 spectra were collected.

Statistical analysis

Data were preprocessed and analyzed using the statistical computing language R [21]. For the purposes of transparency and reproducibility, analysis code and raw spectral data are freely available online on the Open Science Framework (https://osf.io/9sx2e/). Three preprocessing procedures were applied prior to classification analysis, following earlier work on the classification of yeast strains via Raman spectroscopy [24]. First, background fluorescence due to the biological nature of the sample was removed via a polynomial subtraction procedure [16]. In this procedure, a fifth order polynomial was repeatedly fit to each sample. On each iteration of this process, a new data curve was formed by taking the pointwise minimum between the polynomial and the previous data curve. The process terminated when the data curve was not adjusted from one iteration to the next. The final polynomial was then subtracted out of the original data curve to produce the fluorescence adjusted sample. Second, the wavelengths were normalized by the application of a standard normal variate (SNV) transform which rendered each wavelength to a mean of 0 and standard deviation of 1. Finally, multivariate outliers were removed via a principal components analysis (PCA) based approach. For every sample, standardized scores were calculated on each with an eigenvalue greater than 1. Samples with a Mahalanobis distance three standard deviations greater than the mean over these scores were eliminated, resulting in the rejection of nine samples.

Classification of the bacterial strains was undertaken using a one-against-one multiclass linear SVM from the LIBSVM implementation in R. SVM classifiers have previously been used to successfully classify wine spoilage yeast and lactic acid bacteria [12]. The properties of this classifier make it well suited to analyzing high dimensional data without overfitting. Strain labels were used as the basis for the primary classification analysis, yielding a 19-class analysis. Full leave-one-out cross-validation was used to assess the generalizable accuracy of the model.

The statistical significance of the overall classification accuracy rate was assessed using an approximate permutation testing procedure. On each iteration of this procedure, the strain labels were randomized with respect to the Raman data. The classification was repeated with different sets of randomized labels 1000 times yielding an empirical null distribution of classification accuracies. This could then be compared to the accuracy of the real model to calculate a p value for observed accuracy (with the null hypothesis being chance accuracy 1/19 = 5.3%). The use of permutation testing is considerably more resistant to violation of assumptions than equivalent parametric statistical tests. Note that, for computational tractability, the procedure was conducted using split-half rather than leave-one-out cross-validation for both real and randomized classifications. This makes it a conservative estimate of the significance of the model since split-half accuracy will typically be lower than leave-one-out accuracy due to the relative paucity of training data.

In addition to the primary SVM classification, an additional set of classification analyses were undertaken to determine which wavelengths were capable of accurately classifying the different genera of bacteria, and the species/strains within each genus. To achieve this, SVM classification with leave-one-out cross-validation was completed for each wavelength in the Raman spectrum for each of four (sub)sets of the data. The first dataset consisted of the full data, though classified using genus labels rather than strain information. The other three subsets consisted of only samples from within each of the three genera. These subsets were classified with respect to the strain labels within the respective genera. The classification accuracy for each wavelength in these four classification analyses should reflect how well each wavelength can discriminate between the three bacterial genera or between the strains within one genus or species.

Results and discussion

Classification accuracy

The present study aimed to classify strains of common wine lactic acid bacteria based on their Raman spectra. Classification of the bacterial strains via SVM using the entire spectrum, 3400–200 cm−1, proved highly accurate. With respect to the strain labels provided to the classifier, overall accuracy was 86.8%. Chance accuracy for the 19-way strain classification was 5.3%, and the approximate permutation test confirmed that the observed accuracy was unlikely to occur by chance under the null hypothesis (p < 0.001) (Fig. 1). At the genus level, accuracy was noticeably higher: 93.7%. This increase in accuracy reflects, in part, the similarity between species and strains within the same genus, which resulted in a high within (vs. between) genus misclassification rate: 52.5% of strain misclassifications were within-genus, with only 29.8% expected by chance. The full cross-validation confusion matrix is provided in Table 2.

Fig. 1
figure 1

Permutation test on classification accuracy. The dotted-line indicates the split-half accuracy of the primary SVM classifier, computed at the species/strain level. The solid line indicates chance performance for the classifier. The grey histogram represents the empirical null distribution derived from repeating the classification analysis with randomized labels. The clear separation between this null distribution and actual performance indicates that the observed results are unlikely to occur under the null hypothesis

Table 2 Validation confusion matrix from SVM classification

Sensitivity and positive predictive value (PPV) for each genus, species, and strain are reported in Table 3. Sensitivity reflects the probability that a certain sample was classified as a particular strain when it actually belongs to that strain. PPV reflects the probability that a sample classified as a member of a particular strain actually belongs to that strain. Given the highly multi-class nature of the analysis, these two measures provide the best characterization of performance for each class separately. Other classification measures, such as specificity, are highly dependent on overall accuracy, and thus provide little additional information. All three genera expressed comparable sensitivities and PPVs. Sensitivity at the strain level within O. oeni and P. damnosus was similar to those observed in strains of three wine yeast in a similar study: six strains of Saccharomyces cerevisiae, Zygosaccharomyces bailli, and Brettanomyces bruxellensis with sensitivities of 98.6, 93.8 and 92.3% [24]. However, a wide range of classification performance was observed at the species level within both Lactobacillus and Pediococcus. At one end of the range, P. acidilactici NRRLB-14958 was perfectly classified, while on the other end, P. parvulus ATCC 19371 was classified with sensitivity and PPV both below 70%. In Lactobacillus, L. plantarum was classified with the highest (100%) PPV, and tied with several other strains for the highest sensitivity (95.8%). Meanwhile, the classifier achieved the worst overall performance in Lactobacillus for L. brevis, with sensitivity of 84.2% and PPV of 72.7%.

Table 3 Sensitivity and positive predictive values for SVM classification

Analysis of spectral bands

All bacteria share basic structures, such as cell walls and cell membranes, but the composition and kinds of lipids, proteins, carbohydrates and nucleic acids vary depending on species and even strains. This unique cell composition is what produces a ‘whole-organism’ fingerprint with Raman spectroscopy. However, the complex mixture of biomolecules in a cell results in a spectrum of broad peaks due to the many overlapping peaks. Examination of bands capable of accurately discriminating between the three genera of these gram-positive bacteria yielded diverse results (Fig. 2). Many individual wavelengths proved capable of accurately classifying samples across or within genera, but the degree of accuracy differed substantially across different spectral bands and different sets of organisms. Such results provide a nuanced view of the molecular bonds responsible for differentiating LAB, despite the incredibly rich chemical makeup of the cells assayed.

Fig. 2
figure 2

Classification accuracy by wavelength. Separate SVM classifiers were trained and tested with leave-one-out cross-validation for each wavelength in the spectrum. This analysis was repeated at the level of genus labels (a), O. oeni strain level (b), and within two species: Lactobacillus (c) and Pediococcus (d). Black points indicate actual classifier accuracy at each wavenumber. The grey line is a LOESS curve fit to these points for clearer visualization. The dashed line indicates chance performance for each classifier. Colored bands represent vibrational bonds associated with different families of molecules: band 1 lipids (CH2, CH3 stretch), band 2 lipids (C=O stretch), band 3 protein (amide I), band 4 protein (amide III), band 5 nucleic acids (PO2 asymmetric stretch), band 6 nucleic acids (PO2 symmetric stretch), band 7 carbohydrates (CO and CC stretch), band 8 protein (symmetric CNC stretch), band 9 nucleic acids (PO backbone), band 10 lipids (CH2 rocking) (color figure online)

Proteins make up 40–50% of a bacterial cell [19]. The amide I band of proteins (1700–1600 cm−1) and the amide III band (1350–1200 cm−1) contributed substantially to the accurate discrimination of the Lactobacillus and Pediococcus species, but little to genera or O. oeni strains. The amide II vibrational mode is a weak signal in Raman spectra [28]. The region where the symmetrical CNC stretching vibration of protein occurs (900–800 cm−1), however, did contribute to O. oeni discrimination as well as Lactobacillus and Pediococcus discrimination. Polysaccharides make up 10–20% of bacterial cells [19]. Many of their signatures, including the C–O and the C–C stretching vibrations, lie in the 1190–945 cm−1 region. This region contributed significantly to the accurate discrimination of Lactobacillus species and O. oeni strains. Lipids make up 10–15% of bacterial cells [19]. The lipid, phospholipid and membrane signature region of the CH2 asymmetric (~2930 cm−1) and symmetric (~2850 cm−1) stretching bands, C=O stretching vibration of lipid esters (1750–1730 cm−1), and the CH2 rocking vibration (730–715 cm−1) all contributed substantially to the accurate discrimination of Lactobacillus species. Bacterial cells contain 2–4% DNA and 5–15% RNA [19]. The PO2 symmetric stretching (~1090 cm−1) and PO2 asymmetric stretching (~1230 cm−1) bands contributed to the accurate discrimination of O. oeni strains as well as Lactobacillus and Pediococcus species. Vibrations of the phosphate-sugar backbone of nucleic acids at 820–780 cm−1 contributed to Lactobacillus and Pediococcus species discrimination. The bands giving the highest accuracy for genera discrimination were the amide I, the polysaccharide region, and the CH2 rocking vibration.

Dried yeast products for the wine industry are advertised as having positive attributes such as the ability to ferment under difficult conditions or produce or preserve attractive aromas. Winemakers can now confirm by Raman spectroscopy that the yeast they purchase is the strain that conducts the fermentation [24]. The impact different strains of LAB can have on wine flavor, aroma and texture is becoming more and more evident in winemaking. O. oeni strains are now advertised similarly to wine yeasts, i.e. for their specific properties, e.g. cinnamoyl esterase negative, not solely as a malolactic conversion agent. Thus, winemakers will want to confirm the identity of malolactic strains to ascertain that the strain they chose is responsible for the MLF, or at least is a major strain in a mixture of indigenous and inoculated strains. Additionally, knowledge of the bacterial species present in a wine is of value to winemakers because it allows them to take precautionary measures early enough to inhibit or encourage these bacteria. Many wineries employ in-house microscopy to visualize the types of microorganisms present in a wine, but this does not identify species or strain. To obtain this level of detail, wineries must currently avail themselves of often prohibitively expensive molecular tests, that presently give limited results for O. oeni strains, do not differentiate Pediococcus species, and group together related Lactobacillus species. As opposed to PCR-based assays that require significant sample preparation, technical expertise, a clean environment, and days to obtain results, the method developed in this study takes approximately 10 min from picking a colony on an agar plate to predicting the identity of that colony.

Raman spectroscopy is a comprehensive method because it captures, and allows for the comparison of, signals from all the components of a bacterial cell. The Lactobacillus and Pediococcus spp. and O. oeni strains in this study differed sufficiently to generate unique Raman fingerprints. Thus, we were able to obtain a highly accurate classification at the species and strain level using a SVM classifier. This Raman classification method would allow wineries or wine laboratories to identify these bacteria at a strain level for a fraction of the cost and half of the response time of the molecular tests. Such information would open a new dimension in winemaking, giving winemakers more control over the quality and style of their wines.