Introduction

Pattern recognition and classification is a task naturally carried out by human beings, due to the evolution and adaptation, of our central nervous system, more specifically the cerebral cortex, over thousands of years. However, solving classification problems in most cases is extremely complex especially if the patterns are described by a large number of independent variables. When this occurs, automated systems must be used [1, 2]. There are several traditional and tested statistical methodologies used to classify and recognize patterns. One of the most traditional multivariate methods is principal component analysis (PCA). In this technique, highly dimensional data are reduced to a system of low-dimension linear coordinates in order to facilitate tendency or clustering visualization [3]. However, some disadvantages of the method are well known [4]. First, it is assumed that the data can be described by linear combinations. Consequently, non-linear systems will not be well represented. A second critical point in PCA is the quality of the result that can be influenced by discrepant samples. In addition, there is a possibility that after transformations, the number of significant components may still be high that makes it difficult to extract useful information from the data [2].

Artificial neural networks (ANN) are a set of techniques based on not very conventional statistical principles that have been gaining space in pattern recognition and classification [1]. The origin of the artificial neural network theory dates back to the mathematical and engineering models of biological neurons [5]. The neuron is the basic cell unit of the brain and is the center of the human nervous system [2, 6]. The capacity for knowledge adaptation and storage in an adult brain can be attributed to two mechanisms: creation of new synapsis connections among the neurons and modification of the existing synapsis. Thus, knowledge is defined as stored information or models used by a person or machine to interpret, predict and respond appropriately to the outside world [2].

In traditional computer algorithms, knowledge is constructed explicitly in the form of a priori defined rules. The artificial neural networks are computer techniques performed from a mathematical model inspired on the structure and functioning of the natural neural networks and are systems that can make generalizations, that is, they generate their own rules, to associate the input and output variables, after learning with training data [7]. This is possible by a learning algorithm that changes the weights of the connections among the neurons in function of responses obtained from the input variables [2].

Artificial neural networks seek to model, even though primitively, the logic operations by which the brain executes the most varied tasks [8, 9]. For this, processing units are created with functions similar to those of the biological neuron. Each artificial neuron can receive several input stimuli and generate several output stimuli. This stimuli flow is propagated by a connection network, as in the biological model. The propagation intensity of the stimuli between a determined connection is made by weighting [2, 6].

For the neural networks that have different forms of representation, from the inputs to the internal parameters, the development of a satisfactory solution is a real project challenge. The convergence of the method is much related to the topography of the network and the parameters adopted in the training algorithm. Depending on the choice, learning can be fixed in a minimum location or slowly converged [7]. Furthermore, in systems that work in real time, it is important to reduce the size of the training sample by removing redundant data to increase the processing speed [10].

A pre-established set of well-defined rules for the solution of a learning problem is called a learning algorithm. As can be expected, there is no single learning algorithm for the neural network project. Instead, there is a set of tools represented by a variety of learning algorithms, each offering specific advantages. Basically, the learning algorithms differ in the form in which the fit is formulated of a synapsis weight of a neuron [2]. Because artificial neural networks can solve problems of a general nature, such as approximation, classification, categorization and prediction [11], this set of techniques has been applied to a vast range of areas, especially to process control, satellite navigation, weather forecasting, signal processing, speech recognition, waste treatment, ceramic engineering, geographic origin, fire detection, the financial market and pattern recognition [6, 9, 1215].

The multilayer perceptron (MLP) type of artificial neural network is widely used for pattern classification [8]. The architecture of this type of network consists of an input layer with a neuron for each input variable. One or two intermediate layers, which separate the patterns by forming decision frontiers, contain a quantity of neurons to be defined, and an output layer that constructs linear combinations of decision frontiers formed by hidden neurons, where the quantity of neurons depends on how many products will be classified and how they will be represented [2]. In each, one of the neurons of the MLP network is realized a weighted sum of the synapsis weights of the signs from the neurons of the previous layer. This sum, called the induced local field, is applied to a non-linear activation function that will produce the output of the neurons [2].

ANN are extremely versatile to map the complex and non-linear relationships among multiple input and output variables. The disadvantages of the method include the need for a large quantity of training data [1], difficulty in choosing the training parameters and the type of network most suitable to the problem, and variable results due to initialization and sampling [16]. However, several studies have presented promising proposals to solve or diminish the disadvantages associated with the ANN [1719], and new softwares are implementing ANN in an automated manner, without the need to choose the training parameters, thus pointing to a better network for the proposed problem [20].

The importance of coffee in the world economy cannot be underestimated. Business related to coffee is extremely important for world trade and loses in value only to petroleum [14]. Furthermore, it is the raw material of one of the most accepted beverages in the world [21]. A beverage is produced from roasted and ground coffee that presents much appreciated aroma and flavor [22]. An interesting characteristic of the coffee beverage is the fact that it does not have a relevant nutritional value and is consumed basically due to its physiological and psychological effects related to the presence of caffeine and especially because of the pleasure and satisfaction that its aroma and flavor are able to give [23].

Coffee quality is directly related to the various physical and chemical constituents responsible for the aroma and characteristics of the beverages. There have been several attempts to associate the composition of green coffee with beverage quality. Green coffee does not have the aroma and flavor typical of the coffee beverage, and thus, roasting is essential to produce the compounds that confer the characteristics that everyone understands as coffee [24].

Among the compounds that confer the sensorial attributes of coffee, the sugars are outstanding and vary in the coffee beans according to the stage of ripeness [24] and react with the proteins that are responsible for the formation of most of the aromatic compounds of roast coffee [25]. Phenolic compounds present in the coffee bean range from 8 to 9% depending on the degree of ripeness of the cherries and coffee beverages. Chlorogenic acids, currently understood as compounds beneficial to health, are found in variable quantities among cultivars and are more highly concentrated in completely unripe beans [26].

Caffeine is the main alkaloid in coffee to which is attributed the activity stimulation of coffee. Unripe beans are metabolically more active in caffeine synthesis than ripe beans. The caffeine concentration ranges from 0.8 to 1.9% m/m in the Coffea arábica species, while in the C. robusta species, the caffeine concentration is higher, ranging from 1.2 to 4% m/m [24].

Generally, all ripe Arabic coffee beans still on the plants have the ideal composition to obtain an excellent beverage quality. Beans depreciate almost always during the drying process where biochemical reactions resulting from enzyme action and microorganisms can alter the initial aroma and flavor characteristics [26]. Climatic conditions confer special attributes to the beverage, body, acidity and aroma of the coffee from each region. Parana has climate and soil diversity so that the most various types of coffee can be produced. When quality can be associated with production location, the aggregated value of the coffee is increased. However, in this process, there must be mechanisms that reliably prove the geographic origin of the coffee [26].

Considering the commercial importance of coffee and the efficacy of the ANN technique, it was proposed to apply and adapt the artificial neural network methodology to identify coffees produced in different cities and regions of Parana.

Materials and methods

Coffee samples

Samples were collected from 16 cities distributed in the coffee region of the state of Parana, including coffee plantations from the regions with clay and sandy soil. About 3-kg coffee was collected from each location and taken to Londrina on the same day that they were collected. At the Paraná Agronomic Institute-Londrina (IAPAR), the samples were immediately placed in wooden boxes with a mesh bottom and moved eight times a day until beans reached 11–12% moisture and after the samples were benefiting from it (removal of hull and parchment). For the physicochemical analysis, coffee beans were frozen with liquid nitrogen and were ground in the mill disk (model Perten 3600) with 0.6 mm final particle size. The samples were kept frozen in a freezer and thawed for analysis.

Coffee beans size

After beneficiation, the coffee beans size of the samples was determined in a size 16 sieve appropriate for coffee, and the result was expressed in percentage of beans retained in this sieve [27].

Physicochemical analysis

Moisture was determined by heating in an oven with air circulation at 105 °C for 3 h. Caffeine was extracted with magnesium oxide and determined by the method espectrofotométrico [28], and total chlorogenic acids were evaluated according to the methodology proposed by Clifford and Wight [29]. Total tannins were determined with Folin–Ciocalteau reagent using gallic acid as padrão[30]. Total sugar sucrose and reducing sugars were extracted with water at 70–80 °C and determined with the Somogyi and Nelson reagent [31]. Protein, titratable acidity and total lipids were determined by the respective method proposed by AOAC [30].

Mineral analysis

The minerals P, Ca, Mg, Cu, Zn, B, Mn, Fe and S were determined in a Thermo Jarrel optical emmission spectrometer by Inductively Coupled Plasma (ICP-OES) after nitroperchloric digestion. The potassium was determined by flame photometry in a Micronal photometer. The mineral patterns were prepared by stock dilutions [32].

Sensory analysis

The coffee was roasted at roaster (Rod Bel) 210–200 °C for 8–10 min. The end of the roasting was determined by weight loss and visual color of coffee beans. An infusion of ground coffee (0.6 mm) with water (97–99 °C) in the proportion of 70 g L−1 was prepared in glass cups coded with three digits to be served to tasters. The eight professional coffee cup-tasters experienced in coffee evaluation for 3 years or more were trained to differentiate and quantify the quality of the beverage in the main sensory attributes (aroma intensity of attributes, quality of aroma, acidity, bitterness, astringency and body). In training, the tasters received samples with extreme quality attributes in order to differentiate the intensity of attributes. In each session, five samples were evaluated and the intensity of attributes was marked in the unstructured scale of 10 cm, where the extremes represent the minimum and maximum intensities of each attribute.

Artificial neural networks

The automatic and manual modules were used of the Statitistica® software version 9.0 [20] and a multilayer perceptron neural network (MLP) containing only one hidden neuron layer.

The values for moisture content, proteins, chlorogenic acids, total tannins, titratable acidity, total lipids, caffeine, total and reducing sugars, in percentage and phosphorus, potassium, calcium, magnesium, sulfur in g kg−1 sample, copper, zinc, boron, manganese, iron in mg kg−1 were tabled and presented to the artificial neural networks.

Databank

The databank of analyses made available by the Parana Agronomic Institute (IAPAR) consisted of a total of 172 samples from 16 different cities from Parana State. The quantity of samples was not the same for all the cities; therefore, there were cities with insufficient samples for the training, testing and validation.

Results and discussion

A multilayer perceptron network (MLP) was used, consisting of an input layer with one neuron for each one of the 20 input variables and a single intermediate layer that separated the patterns.

The 16 cities were grouped into three regions, A, B and C, generating two cases, classification by region and by city. The distance, mean temperature and rainfall index of the cities were considered in this classification, that is, the similar climatic characteristics among the cities (Table 1).

Table 1 Division of the samples of the cities and regions

For the first case studied, classification by regions, the artificial neural network used was formed by eight neurons in the hidden layer, with 120 training cycles. The learning process was finalized when the synapsis weights and the mean quadratic error converged to a minimum value.

Before starting the process, all the variables were transformed in a scale from zero to 1 (minimax) with a logistic-type sigmoid activation function that, according to Basheer and Hajmeer [33], is one of the most used functions in classification processes by artificial neural networks. The output layer contained three neurons, one for each region studied. The BFGS algorithm, proposed individually by Broyden–Fletcher–Goldfarb–Shanno [3437], a quasi Newton fast convergence method and very efficient in optimization without restrictions [38], was used in the neural network constructed. The training was randomized for each cycle, and the synapsis weight was corrected based on the sum of the quadratic errors and carried out after training each sample.

The sampling was divided into three parts where one, containing 80% of the sample, was used in the network training, another with 10% was for the test stage and 10% for the validation. The samples included in this division were chosen randomly by the [20] STATISTICA 9.0® software. The validation step has the objective of verifying the capacity of the network to carry out generalizations from the trained network, since the artificial neural networks learn a rule using the training samples.

An order of importance of the variables was obtained from the trained network. The moisture content was the variable identified by the artificial neural networks as the most important in the classification, followed by the sulfur and magnesium contents, reducing sugars, lipids and proteins. Figure 1 shows the moisture content distribution by region studied.

Fig. 1
figure 1

Moisture content (g 100−1 g sample) of the 172 coffee samples grouped in three producing regions in Parana

The total sugars, zinc and caffeine contents were the variables that least influenced the classification due to the smaller variability in the data per region studied, compared to the other parameters, as can be observed in Fig. 2.

Fig. 2
figure 2

Caffeine content (g 100−1 g sample) of the 172 coffee samples grouped in three producing regions in Parana

Figure 3 shows the mean moisture, sulfur and magnesium contents, variables considered the most important by the network and the total sugars, caffeine and zinc contents considered the least important for classifying the samples from regions A, B and C.

Fig. 3
figure 3

Mean value of the moisture content (g 100−1 g sample), sulfur (g 100−1 g sample), magnesium(mg 100−1 g sample), total sugars (g 100−1 g sample), caffeine(g 100−1 g sample) and zinc (mg 100−1 g sample) of the samples from regions A, B and C

In each region formed according to the conditions of climate and soil, samples of coffee had different physicochemical and mineral compositions.

The Tukey test applied to all media of moisture content showed that there is no significant difference in level of 5%; however, the samples from regions B and C have significant differences on the same level of significance. The contents of total sugars and caffeine showed no significant difference between the means of the regions studied. Among the minerals studied, for the sulfur content was found significant difference between regions B and C at the same level of significance, whereas for the magnesium content was not found significant difference in the three regions. For the zinc content, the Tukey test indicated that there is a significant difference only between the regions of samples A and B. However, the network did not take into consideration the analysis of the mean value of any parameter. It was trained, using a data set, to identify the characteristics and thus classify the samples [20].

The trained artificial neural networks generated an excellent response precision, presenting 99.28% accuracy in training and 100% accuracy at the test and validation stages. That is, of the 172 samples used for classification, the trained artificial neural networks only made one mistake in one classification.

For the second case, classification by city, the automatic module of the Statistica® software [20] was used in which the multilayer perceptron type of artificial neural network is used from the desired input and the output variables, in this case cities. The automatic module gets the best artificial neural network for the problem and determines the learning algorithm, activation function, number of neurons and number of cycles. For this, it was necessary to establish limits of numbers of neurons in the hidden layer, thus decreasing the convergence time of the network. The lower and upper limits of neurons in the artificial neural networks were 10 and 40, respectively, and the division of the sample followed the first case, 80% for training, 10% for the test and 10% for the validation.

The learning process of the artificial neural networks is similar to the manual module, but with sweeping, passing through the activation functions available in the software at an interval of the number of neurons, which results in a greater delay in obtaining the network; for more complicated cases, with a greater number of outputs, this methodology results in a good approximation to the ideal network, which can be generated later in the manual model with less convergence time since it limits the parameters.

For this case, 147 samples were used from the cities of Abatiá, Brasilândia do Sul, Francisco Alves, Guapirama, Jacarezinho, Joaquim Távora, Londrina, Nova Esperança, Santa Mariana, São Jorge do Patrocínio and Uraí. In this case, the neural network had 11 outputs because samples were taken from five cities which have insufficient number for training, testing and validation. The artificial neural network, in this case, presented 11 outputs because samples were removed from five cities with an insufficient quantity for training, testing and validation.

By the automatic module, five artificial neural networks were obtained and the best network, presenting 99% accuracy in training and 100% in the test and validation stages, was obtained with the BFGS algorithm, 18 neurons in the hidden layer, logistic function as activation function and 150 training cycles.

The input variable most important for classification in the trained artificial neural network was the quantity of sulfur in the sample, followed by the moisture content, phosphorus and calcium quantities. The quantity of iron was the least important in the classification, followed by the boron quantity and titratable acidity.

To confirm that the artificial neural networks trained by the automatic module presented repeatability, an artificial neural network was analyzed by the manual module, choosing as parameters those obtained in the automatic module. The precision of the network generated manually was the same as that obtained in the automatic module, and the same order of importance of the input variables was also observed.

Table 2 shows the average value, the number of samples, standard deviation and significance level of the parameters that most influenced the network used in the geographical classification of the green coffees produced in the cities of Parana–Brazil.

Table 2 Average parameters of the artificial neural network used in the classification by regions and cities

The input parameters that most influenced the coffee aroma and flavor were the sugar, protein and chlorogenic acid contents [39]. The quality of coffee as a beverage is highly dependent on its aroma and flavor, characteristics that determine its commercial value [39]. The aroma and flavor of the coffee beverage are highly complex, resulting from the combined presence of several volatile and non-volatile chemical constituents, especially the acids, aldehydes, ketones, sugars, proteins, amino acids, fatty acids, phenolic compounds and also the enzyme actions in some of these constituents that give reaction products whose compounds interfere in the beverage flavor. Among these compounds, 29 were identified as mainly accounting for the characteristic aroma of roasted and ground coffee [40].

Complex biochemical mechanisms are involved in the production of the characteristics of coffee color, aroma and flavor during roasting such as the Maillard and Strecker reactions, sugar caramelization, chlorogenic acid degradation, protein and polysaccharide degradation [41].

Proteins, the sources of most of the aroma and flavor characteristics of coffee, are denatured when roasted at temperatures lower than pyrolysis and hydrolysis occurs in the peptide links of the protein molecules [42].

Sugars resulting from the breakage during roasting react with free amino acids and amino protein groups (Maillard reaction) to form aminoketones and aminoaldoses, which after other reactions result in numerous volatile and odorous compounds. As the roasting continues, there are other reactions involving degradation by interaction and breakage of carbohydrates and nitrogen compounds, chlorogenic acids and other acids [43]. Sucrose, fructose and glucose are the main sugars found in coffee, and their concentration is related to the maturity stage of the bean. Sucrose increases as the bean ripens, while glucose and fructose decrease at the end of the cycle [24, 44, 45]. The sugars are among the most important precursors of the aromatic compounds formed during roasting [43] and were assessed in the sensorial analysis of the coffee beverage.

Table 3 shows the average value, the number of samples, standard deviation and significance level of the sensory attributes of coffee beans analyzed.

Table 3 Main parameters used by the artificial neural network classification by regions and cities

The sensory attributes of coffee analyzed were of little importance to classify the regions or cities by the neural network, which can be attributed to the small variability in the results (Table 3). But some physicochemical parameters interfere with their formation. By applying response surface, analysis is possible to understand the behavior of these attributes with the variation of these parameters.

Figure 4 shows the surface response relating the coffee aroma to the protein and total sugar content regardless of the producing region. In this figure, the best response for aroma was verified when the protein content presented its highest level and the total sugar content was at intermediate level (8% mass). However, the same happened when we had the upper limit for total sugars and intermediate protein levels.

Fig. 4
figure 4

Response surface showing the influence of the protein and total sugar contents on coffee aroma regardless of the regions studied

The presence of phenolic compounds in coffee is responsible for astringency and interferes in the flavor. A large number of phenolic compounds have been identified in roast coffee, and some of them originate from the chlorogenic acids and account in part for the aroma and bitterness of the beverage [46, 47].

Figure 5 shows the surface response that relates the beverage aroma to the total sugars and chlorogenic acids regardless of the producing region and highlights the contribution of the chlorogenic acids to aroma formation. High coffee aroma and flavor values are found in the presence of high chlorogenic acid concentrations and high total sugar values. In the roasting condition applied, the concentration of compounds formed by the total sugar degradation may have been sufficiently high to mask the compounds resulting from the chlorogenic acid degradation responsible for the acidity and bitterness of the beverage.

Fig. 5
figure 5

Response surface showing the influence of the total sugar and chlorogenic acid contents on coffee aroma regardless of the regions studied

Figure 6 shows that the presence of chlorogenic acid gives bitterness to the beverage in situations with low total sugar content. The presence of this sensorial attribute in the beverage is associated mainly with the degradation compounds of the chlorogenic acids formed during roasting [48, 49].

Fig. 6
figure 6

Response surface showing the influence of the total sugars and chlorogenic acids on the bitterness of coffee regardless of the regions studied

The presence of phenolic compounds represented by the chlorogenic acids exercised a strong influence on the global quality of the beverage. Although high aroma and acidity scores have been found in the presence of high values of chlorogenic acids, in the global assessment of the beverage, coffees with low chlorogenic acid contents and high total sugars values were best assessed (Fig. 7). This combination of compounds is found in ripe beans and confirms the importance of the stage of ripeness on the quality of the coffee beverage [50].

Fig. 7
figure 7

Response surface showing the influence of the total sugar and chlorogenic acid content on the aromatic quality of the coffee regardless of the regions studied

Conclusions

The multilayer perceptron type artificial neural networks were shown to be useful tools to identify coffee samples because they identified 100% of the samples at the test and validation stages for the division by planting regions. For each databank containing information per city, using the automatic model, the artificial neural network used presented 99% accuracy in training and 100% accuracy in the test and validation stages.

Using artificial neural networks, we can efficiently classify coffees from regions and cities in Parana and with this, identify the geographic origin of the coffee and associate quality with the producing location. Furthermore, we can assess the importance of some substances present in coffee in the formation of the sensorial attributes of the beverage.