Introduction

Beer is the world’s most appreciated low-alcohol beverage and can be classified into different types and styles, according to the choice of raw materials and the brewing process [1]. The four essential ingredients are water, malted barley, hops, and yeast. The various combinations of raw materials and brewing processes result in a great diversity of beers [2]. For example, lagers (bottom fermentation) are fermented by active yeasts at low temperatures (9–15 °C), while higher temperatures (15–25 °C) are used for ales (top fermentation) [3]. In Brazil, lager accounts for about 98% of beer consumption, with Standard American Lager being a general representative of this style [4]. The predominant tropical climate in a large part of the country contributes to the choice of lighter and more, refreshing beers by consumers. This style can be characterized as having a light coloration, low alcohol content, and pH in the range 3.7–4.1 [5].

The Brazilian beer sector currently includes 889 legally registered breweries distributed throughout the country. However, the geographical distribution of breweries is concentrated in the south and southeast regions, with 384 and 342 installations, respectively. The States with the highest numbers of breweries are Rio Grande do Sul (186) and São Paulo (165) [6]. Although the number of breweries in the country is increasing, the beer market is dominated by only three large companies: Ambev, Heineken and Grupo Petrópolis. The brands of beers produced by these companies are slightly different in terms of flavor, aroma, and price. A wide variety of chemical components can directly influence the organoleptic properties of the final product. For example, dextrins are polymers of glucose molecules produced during the hydrolysis of starch in the mashing process that are not metabolized by the yeast which contribute sweetness to the beverage [7], while an acidic taste is associated with the presence of organic acids, such as lactic, acetic, malic, succinic and pyruvic acids, which can be very important in term of beer quality [8]. For this reason, quality control is a fundamental tool for ensuring the success of large breweries in this extremely competitive market. Moreover, quality control also plays a very important role in consumer choice of a particular brand [9]. Considering the data provided by police record cases of beer brands, counterfeiting has increased at a significant rate in Brazil. In most cases of fraud, the labels and caps of bottles of lower value brands are replaced by higher value brands [10]. This type of fraud is very difficult to detect, although government agencies have been hard at work in this regard, it is clear that there is a need to develop new methods for discriminating and classifying beers by type, style and brand.

The classification of beers has been investigated in studies using different spectroscopic techniques combined with a chemometric approach [11,12,13,14]. Recently, Pereira et al. [15] developed an approach based on paper spray mass spectrometry combined with multivariate analysis of data to differentiate Standard American Lager beers. Engel et al. [16] investigated combination of FT-IR ATR spectroscopy and approach chemometric as a powerful tool for identification of Trappist beers. For this same purpose, Mannina et al. [17] used 1H NMR spectroscopy to classify Trappist beers using soft independent modeling of class analogies (SIMCA). These analytical techniques normally demand any form of sample pretreatment. 1H NMR is reproducible, non-destructive and capable of giving information about a range of chemical compounds in beer, in a single analysis and without any significant sample preparation. It is important to emphasize that the majority of these methods have aimed at discriminating and classifying beers of different types and styles.

The Brazilian beer market includes a wide variety of brands manufactured by different companies in the brewing sector. For this reason, fraudulent practices are increasingly common in the country. Moreover, there is a lack of information concerning the development of models for discrimination and classification of lager beers, especially for the widely consumed brands. To the best of our knowledge, there have been no previous studies concerning the differentiation and classification of Brazilian beers of the same style using 1H NMR spectroscopy.

Materials and methods

Beer samples

A total of 126 canned lager beers (350 ml size) produced by three different manufacturers (each represented by three brands) were purchased in cities of the interior of the States of São Paulo and Minas Gerais. For each brand, two cans were obtained from each of seven different batches. The samples were separated into three classes: Ambev, Heineken and Grupo Petrópolis. This design was proposed in this way such that sample set was correctly representative and reliable.

Sample preparation

The samples were degassed in an ultrasonic bath at room temperature for 10 min. Subsequently, 540 μL of the beer sample, 60 μL of D2O, and 0.05% 3-(trimethylsilyl)-2,2,3,3-tetradeuteropropionic acid (TMSP-d4, as a chemical shift reference) were placed into a 5-mm NMR tube [18, 19].

NMR measurements

1H NMR spectra of the beer samples were acquired using a Bruker Avance III HD 600 spectrometer operating at 14.1 T (600.13 MHz for 1H). The instrument was equipped with a Triple Inverse TCI Cryo-probehead, automated tuning, a BCU (Bruker Cooler Unit I) temperature control accessory, and a Sample Express autochanger. Icon NMR software (Bruker) was used to control the entire process of tuning the equipment and acquiring the data. Samples were analyzed at 303.1 K after allowing 5 min for temperature equilibration. The spectra were automatically phased and baseline corrected using the TopSpin 3.5 software package (Bruker Biospin, Rheinstetten, Germany). The analysis time for each sample was 15 min. The experimental method used in the present study was the same as described in our previous work [19].

Spectral alignment

Spectral misalignment can be caused by small pH variations and intermolecular interactions between the chemical compounds. The main objective of the alignment procedure is to help to ensure correct interpretation of the results of multivariate analyses by correlation between a chosen reference spectrum and the signal sets to be aligned. In this study, the recursive segment-wise peak alignment (RSPA) method was used to align all the segments of the spectra. All signals were recognised through the zero crossing of the first-order derivative calculated using a third-order polynomial according to the Savitsky–Golay filtering method. Alignment of the test segment was accepted, subject to an advance in the alignment quality among peaks [20]. Spectra were then aligned using a single reference spectrum. Details about the method are published elsewhere [21, 22].

Chemometric methods

Chemometric procedures for classification of the samples of lager beers produced by the different breweries were performed using unsupervised and supervised methods. Principal component analysis (PCA) was used as an exploratory tool for visualizing the dataset. This is an unsupervised method, because no previous information about the sample classes is required. PLS-DA and SIMCA were used as discriminatory classification and class modeling techniques [23]. All analyses were performed using the Pirouette v. 4.5 (rev. 1) software package (Infometrix, Bothell, WA, USA).

Principal component analysis (PCA)

PCA provides an approximation of a data matrix, X, in terms of the product of two smaller matrices, T and P. These matrices, T (scores) and P (loadings), extract the essential information from X, so that its dimensionality is decreased [24]. The beer sample 1H NMR spectra were organized in a matrix X (126 × 10304), where the lines and columns corresponded to the samples and the 1H NMR chemical shifts, respectively. The signals for ethanol (at δ 1.17, 3.65 ppm), residual water (at δ 4.76 ppm) and C13 satellites of the CH3-group were removed. This dataset was preprocessed and mathematically transformed by mean centering, normalization, and smoothing.

Partial least squares discriminant analysis (PLS-DA)

A multi-class PLS-DA model was built by correlating the matrix Χ of class variables (breweries) with a vector y of dependent variables (1H NMR chemical shifts) [25, 26]. The classes were labeled as 1 (Ambev), 2 (Heineken), and 3 (Grupo Petrópolis). The numbers of samples in the classes were the same. For each class, 42 samples were assigned, split into 29 and 13 samples for the calibration and prediction sets, respectively, using the Kennard–Stone algorithm [27]. This procedure provided the correct selection of samples representing the entire dataset.

Soft independent modeling of class analogy (SIMCA)

This method performs PCA for all classes individually. The similarity between the samples belonging to the corresponding class is obtained, followed by determination of the number of PCs capable of explaining the variability of the dataset, hence defining the so-called internal space. This space contains the most relevant information for the correct interpretation of the model. To validate the pattern recognition analyses, the same procedure was used in PLS-DA [25, 28, 29].

Performance parameters

The quality of classification and discrimination is evaluated using the figures of merit (sensitivity, specificity, accuracy and inconclusive ratio). The sensitivity, specificity and accuracy were determined as follows:

$${\text{Sensitivity}}\;{ = }\;\frac{\text{TP}}{{{\text{TP}}\;{ + }\;{\text{FN}}}},$$
(1)
$${\text{Specificity}} = \frac{\text{TN}}{{{\text{TN}} + {\text{FN}}}},$$
(2)
$${\text{Accuracy}} = \frac{{{\text{TN}} + {\text{TP}}}}{{{\text{TN}} + {\text{TP}} + {\text{FN}} + {\text{FP}}}},$$
(3)

where TP, FN, TN, and FP indicate the numbers of true positives, false negatives, true negatives, and false positives, respectively [25, 30]. Inconclusive ratio (IR) shows the percentage of samples that cannot be assuredly assigned to a class, and thus recognises no assignation to any class:

$${\text{IR = }} \frac{\text{NA}}{\text{number of samples}},$$
(4)

where NA indicates unassigned samples and n° samples indicates the total number of samples that belong to a class [25].

Results and discussion

NMR spectroscopy allows evaluating a wide spectral range containing information on hundreds of chemical compounds in a single experiment. Moreover, it may be considered one of the best nontarget methods for quality control of beverages and foods without a priori knowledge about the sample composition [31, 32]. The 1H NMR spectrum is shown in Fig. 1, where the lower part provides the full spectrum δ 0–9 ppm and the upper part shows detail of the aliphatic region δ 0–3 ppm, highlighting the major organic acids are displayed in Table 1. First, different regions of the spectrum were analyzed, as well as the full spectrum. The results obtained can be found in Figs. S1, S2 and S3 of the Supplementary material. However, use of the aliphatic region alone was more promising, with clear separation between the clusters. To obtain these results, a spectral alignment procedure was required for reduction of small variations in the peak positions of the organic compounds, avoiding misinterpretations. The alignment procedure chosen was the RSPA method, which aims to rearrange the spectroscopic signals of identical chemical compounds so that they occupy the same resonance positions in multiple spectra, based on a reference spectrum [21, 22]. Figure 2 shows the 1H NMR spectra obtained before and after the alignment procedure, for the resonance position (δ 1.34 ppm) corresponding to lactic acid.

Fig. 1
figure 1

1H NMR spectrum of a lager beer sample. The upper part of the figure highlights the aliphatic region (0–3 ppm). (1) higher alcohols; (a) C13-satellites of the CH3-group; (2) lactic acid; (3) alanine; (4) acetic acid; (5) pyruvic acid; (6) succinic acid

Table 1 Assignment of 1H NMR signals of lager beer
Fig. 2
figure 2

Demonstration of results before and after the 1H NMR spectral alignment procedure, showing the signal at δ 1.34 ppm, a misaligned and b aligned

PCA was the basis for the pattern recognition technique. The scores plot for the beer samples is shown in Fig. 3a, with the first two principal components explaining 81.2% of the total variance. Figure 3b shows the loadings for the first two PCs, highlighting five resonance positions corresponding to the most relevant organic acid signals responsible for separation of the clusters. Cluster A1, corresponding to the Ambev samples, was in the positive region in PC1 and in the negative region in PC2. This was probably due to the presence of acetic acid (position 3) and pyruvic acid (position 4) in higher amounts, relative to the other beers. Clusters B1 and B2 corresponded to the Heineken beers. Despite being manufactured by the same company, there was clearly a difference between the brands chosen in this study. The observed difference may probably be related to some variations in the brewing process between the manufacturing units responsible for producing the same brand. Cluster B1 was in the positive regions in PC1 and PC2, while B2 was in the negative region in PC1 and next to the origin in PC2. The slight difference between the brands could be mainly attributed to the relative contents of succinic acid (position 5) and higher alcohols (position 1) in the B1 samples, and acetic acid (position 3) in the B2 samples. Although no information concerning the type of yeast or details of the manufacturing process was made available to consumers, minor changes in the manufacturing process or substitution of the yeast strains probably resulted in variations in the final product [33]. Clusters C1 and C2 were both located in the negative region in PC1, essentially due to lactic acid (position 2). However, C2 differed slightly from C1, due to greater amounts of higher alcohols (position 1) and succinic acid (position 5). It is important to note that lactic acid is used to determine the total acidity of beer. It is produced during the fermentation process and may also be added to the malt during the mashing process. Some breweries employ a more elaborate process in which the malt is spontaneously acidified by organic acids produced by the natural microflora of malted barley [34]. This process step is essential not only to control the pH of the wort, but also to optimize conversion of the starch to fermentable sugars, which directly influence the organoleptic characteristics of the final product. The organic acids contained in the beer give relevant information about the quality of the beer, allowing a good estimation of organoleptic properties and assess the performance of the fermentation process [35]. 1H NMR spectroscopy is not limited to qualitative analysis but also to quantitative purposes proving satisfactory results compared to different analytical reference methods such as high-performance liquid chromatography (HPLC), capillary electrophoresis (CE) and enzymatic assays [36, 37].

Fig. 3
figure 3

PCA plot for the aliphatic region δ 0.8–3.0 ppm. a PC1 × PC2 scores of the lager beer samples. Blue circles: Ambev (cluster A1); green circles: Heineken (clusters B1 and B2); red circles: Grupo Petrópolis (clusters C1 and C2). b Loadings for PC1 (black line) and PC2 (red line). Compound positions: (1) higher alcohols, (2) lactic acid, (3) acetic acid, (4) pyruvic acid, (5) succinic acid

Overall, there was a clear trend of separation among clusters A1, B2, C1, and C2. This was confirmed by constructing supervised models (PLS-DA and SIMCA) that enabled discrimination and classification of the beer samples according to the company responsible for the manufacture of each of the brands analyzed. The class assignments obtained with the PLS-DA and SIMCA models are summarized in Table 2. Only 2 of the 29 training set samples were not assigned to any class providing an inconclusive ratio of 6.89%, while all prediction set samples were assigned correctly. For the results obtained with PLS-DA, three incorrect assignments in the prediction set were obtained for Grupo Petrópolis, based on large Hotelling T2 and Q residues none of them was considered as an anomalous sample. The PLS-DA model was constructed with 3 classes (Ambev, Heineken, and Grupo Petrópolis) and was validated using the leave-one-out procedure. The number of latent variables was chosen considering the smallest cross-validation classification errors, with preprocessing and mathematical transformation by mean-centering, normalization, and smoothing.

Table 2 Classification performance parameters for PLS-DA and SIMCA models

The PLS-DA results revealed accuracy of 94.2% for the calibration set and 93.2% for the prediction set. For the SIMCA model, the same multi-class strategy employed for PLS-DA was adopted, together with the preprocessing and mathematical transformations, and the number of principal components was selected based on the lowest value of the root mean square error of cross-validation are presented in Table 3. The SIMCA results revealed that the Ambev class was strongly discriminated from the other classes, with values of 100% for the prediction set, while the overall accuracy were 88.5% for the calibration set and 97.4% for the prediction set. The classification results for these two methods in terms of sensitivity and specificity are summarized in Table 4. SIMCA presented the best predictions since both sensitivity and specificity of the training set were higher than 92%. As for the PLS-DA results, they cannot be considered satisfactory, especially regarding sensitivity, since a significant percentage of Grupo Petrópolis samples were predicted as belonging to another brewery, this may be associated with geographic factors, raw material and brewing processes.

Table 3 The correct classification rate obtained with PLS-DA and SIMCA
Table 4 Sensitivity and specificity for classification models

PLS-DA and SIMCA are widely employed in terms of authenticity of beverages and food. The results are reliable, although it is difficult to compare them because the aims of the models are completely different. SIMCA is a classification method able to describe a target object class and then predicts whether a new object looks like this class or not. On the other hand, PLS-DA allows two or more sets of objects to represent predefined classes and later evaluates whether a new object fits at least one of these classes [38].

Conclusions

The results obtained using the multi-class PLS-DA and SIMCA models clearly showed that 1H NMR spectroscopy can be effectively employed to improve understanding of the correlations among the raw material, the brewing process, and the main characteristics of the final product. The results demonstrated the potential of the chemometric approach as a robust multivariate method to discriminate and classify samples of lager beers manufactured by the three largest Brazilian beer companies. This type of approach can be used as a powerful tool to assist the brewing sector and government agencies in terms of quality control, allowing the identification of suspected samples of counterfeit through direct, fast and reliable analyses. Given the success of the methodology, future research will focus on geographic classification of the same beer brands manufactured by the three largest breweries.