Introduction

Coffee is the second most important commodity next to petroleum, with annual worth of up to 14 billion USD worldwide. Coffee is also one of the most widely consumed beverages throughout the world, where its consumption varies widely according to geographical location [1]. Its pleasant taste, aroma, stimulant effect, and health benefits are the main driving forces for its consumption [2]. Moreover, coffee is used as a raw material in various industries including food and beverage industries. Currently, more than 121 countries are involved in the production, export, and processing of coffee worldwide (http://www.ecea.org.et/uk/).

Ethiopia is the birthplace of Arabica coffee (Coffea arabica L.). It is the only species grown in the country [3, 4]. Coffee plays a vital role in the Ethiopian economy and is a major source of foreign exchange earnings. Moreover, the livelihood of about 25% of the country’s population is directly or indirectly dependent on the activities associated with the production, processing, and transportation of the crop [5]. Although Arabica coffee is the only species grown in Ethiopia, different varieties with distinct and impressive flavor profiles are produced in different regions of the country. These varieties have diverse tastes depending on their geographical origins [6], where climate, altitude, and soil type contribute to the unique characteristics of the varieties. Besides, coffees cultivated in different parts of the country are genetically diverse [7, 8].

In Ethiopia, coffee is mainly produced by smallholder farmers distributed throughout the country. The quantity and quality of the coffee produced, however, varies significantly with its cultivation region. The regional coffee varieties also fetch different prices on the market depending on their production region and quality [7]. Owing to their quality reputations, some of the regional coffees cultivated in the southern and eastern parts of the country are currently marketed with trademarks registered in various countries across the world [7, 8].

Amhara Region, which is found in the northwestern part of Ethiopia, is endowed with suitable agro-ecology and fertile soil for the production of Arabica coffee (http://newbusinessethiopia.com/index.php/resources/29-agriculture/648/648). Consequently, coffee is being produced in different parts of the region, with the major production areas located in East Gojjam, West Gojjam, and Awi Zones. Several tons of coffee is being harvested every year from these parts of the region where the majority of the produce is used for local consumption. As perceived by the local people, coffees produced in different areas of the region have diverse flavors and stimulation effect. However, coffees produced in this region have not yet been characterized and included in the country’s coffee export classification list while the country is striving to boost its market share by providing high quantity and quality of coffee to the international market, and the coffee varieties cultivated in Amhara Region remain underutilized.

On the other hand, to be competent in the international market, protocols should be developed that help to trace the sources of green coffee beans at different levels including continental, national, and regional levels. Traditionally, paper-based traceability and cup-quality verification by a panel of trained cup quality testers are used to characterize coffees and their growing areas. However, papers can be easily counterfeited and may create difficulties to differentiate coffees of different origins [8]. Therefore, development of analytical methods that are capable of verifying regional sources of green coffee beans is crucial to protect the market share and reputation of the growing countries including Ethiopia.

In this regard, the fatty acid composition of green coffee beans was used, for instance, to characterize Colombian coffees of different varieties that were grown in different regions [9]. Similarly, Villarreal et al. [10] have shown the effectiveness of fatty acid, elemental and chlorogenic acid compositions to distinguish Colombian coffees according to growing regions. Mehari et al. [7, 8] have also developed chemometric models, based on phenolic and elemental contents, to characterize the geographical origins of commercially important coffee varieties grown in the southern, eastern, and western parts of Ethiopia. According to the results of those authors, chemical constituents of green coffee beans vary significantly with their growing regions and found to be suitable to mark the coffee varieties with respect to their geographical origins.

In addition to chlorogenic acids, alkaloids and major elements, determination of trace elements is important to characterize the geographical origin of agricultural commodities as well as to evaluate their dietary importance and possible harmful effects [11]. In this regard, determination of trace metals like iron, copper, zinc, cobalt, and manganese is quite necessary, as they play important roles in biological systems. According to Liang et al. [12], the essential metals can also have harmful effects when their intakes significantly exceed the required quantities. On the other hand, determination of chromium, lead, and cadmium in food products is crucial as they can be toxic even when present in trace amounts. It is, therefore, necessary to quantify the contribution of coffee to the daily intake of minerals, especially for those individuals who regularly consume coffee beverage.

The chemical constitutes in general and the mineral constitutes in particular of coffee beans produced in Amhara Region have not been investigated and compared with other local as well as overseas coffees. The aim of this study was therefore to evaluate the effectiveness of multi-element and multivariate data analyses for differentiation of green Arabica coffee beans grown in Amhara Region. This will support the ongoing efforts toward certification of the region’s coffees with respect to indications of geographical origin, as well as identification, protection, and enhancing their demands in the international market.

Materials and Methods

Description of the Study Area

The study was conducted on Arabica coffee grown by smallholder farmers in East Gojjam, West Gojjam, Awi, and Bahir Dar Especial Zones of Amhara Region, Ethiopia (Fig. 1). Coffees produced in these areas are mainly sold at the local markets where the incomes help the farmers to improve their livelihoods.

Fig. 1
figure 1

Location map of sampling sites in Amhara Region

Equipment and Chemicals

Inductively coupled plasma optical emission spectrometry (Perkin Elmer Optima 8000), multi-block Kjeldahl digester (DK-20; Velp Scientifica), electrical blender (FW100 High Speed Universal Disintegrator) and nylon sieve (200 μm mesh size), perchloric acid (72%) and nitric acid (69%), and standard solutions of K, Ca, Na, Si, Fe, Mg, Mn, Zn, Ni, Cu, Pb, Cr, Co, and Cd were used in the study.

Coffee Sample Collection and Preparation

A total of 120 sun-dried coffee cherry samples were directly collected from coffee-growing farmers distributed in the various districts of East Gojjam (48 samples), West Gojjam (39 samples), Bahir Dar Especial (12 samples), and Awi (21 samples) Zones. Each sample weighed 500 g. All the samples were collected during the 2017/2018 crop season (December 2017 to February 2018) (Table 1). Sampling districts from each zone were selected purposively depending on their experiences in the production of coffee while the selection of individual farmlands within each district was random.

Table 1 List of sampling districts and number of samples analyzed

The husks of the coffee cherries were removed and the resulting green beans were powdered using an electrical blender. The coffee powders were then sieved using a 200-μm-mesh-size sieve.

Standard Solution Preparation and Calibration

Working standard solutions were prepared by serial dilution from stock solutions (1000 mg L−1) of each element. The wavelength and calibration conditions corresponding to each element is provided in Table 2.

Table 2 The wavelengths, calibration equations, and correlation coefficients corresponding to each of the elements determined

Digestion

Digestion of the samples was undertaken by adding 0.5 g of powdered green coffee bean samples into Kjeldahl digestion apparatus (Velp Scientifica DK 20 (20-position with 250 mL tubes, Ø 42 mm)) after proper optimization of the digestion procedure. For this, the use of 4 mL HNO3 and 4 mL HClO4 at variable temperatures (60 °C for 10 min, 80 °C for 10 min, 100 °C for 20 min, 120 °C for 20 min, and 160 °C for 30 min) was found to be optimum for coffee sample digestion (Table S1, Supplementary material). All the digested samples were diluted to 50 mL with deionized water, filtered with filter paper, and analyzed by ICP-OES. The ICP-OES operational parameters were power, 1500 kW; plasma flow, 12 L min−1; auxiliary argon flow, 1 L min−1; and nebulizer flow, 0.8 L min−1.

Accuracy of the Analytical Method

To know the accuracy of the analytical method, 0.5 g of powdered green coffee sample was fortified with a mixed standard solution at a concentration corresponding to the average measured value for each element and subjected to the digestion and analysis procedure. The fortification was performed in triplicate and the percentage recovery of each element was calculated.

Statistical Analysis

Chemometric data analysis was performed by using the statistical software packages SIMCA 13 (Umetrics, Sweden) and SPSS 20 (IBM Corp, USA). Each data set consisted of a matrix in which the rows represented the individual green coffee bean samples, and the columns represented the concentration of elements. A preliminary investigation based on principal component analysis (PCA) was applied to visualize sample trends and evaluate the discriminatory characteristics of the determined elements. One-way analysis of variance (ANOVA) was used to test for the presence of significant differences in the mean concentrations of the metals among the regional types of green coffee beans. Differences were considered significant when α was less than 0.05. In addition, linear discriminant analysis (LDA) was applied to construct classification models useful for the prediction of the geographical production origin of the Arabica coffee produced in Amhara Region.

Results and Discussion

Analytical Characteristics of the Method

Detection limit of the method was determined by digesting and analyzing blank samples. Standard deviation of the triplicate blank samples was taken and multiplied by three in order to obtain the method detection limit for each element. Results are presented in Table 3. Furthermore, accuracy of the method was evaluated by the recovery experiment and the following recovery values were obtained for each metal: K 110 ± 9%, Ca 110 ± 7%, Cu 90 ± 2%, Fe 105 ± 7%, Mg 96 ± 6%, Mn 105 ± 3%, Cr 95 ± 1%, Si 105 ± 8%, Ni 95 ± 2%, Pb 90 ± 2%, Cd 90 ± 5%, Na 106 ± 7%, Co 95 ± 3%, and Zn 110 ± 8%, which indicates that the method was accurate. The detection limit of the method varied from the lowest 0.03 mg kg−1 for Cd up to the highest 100 mg kg−1 for K.

Table 3 Method detection limits of each element (mg kg1)

MDL method detection limit

Concentration of Elements in Coffee Samples

The mean concentration of the studied metals in the samples from each zone is given in Table 4. In all the samples, the major constituents were K, Mg, Ca, Na, Si, and Fe. These elementals are considered as macroelements based on their concentrations in various samples [13, 14]. The mean concentrations of these metals in the studied samples were 17,094, 1702, 972, 200, 63.4, and 52.6 mg kg−1, respectively, where they are ordered as K >> Mg > Ca > Na > Si > Fe. These concentrations are within the concentration range that was also found in Arabica green coffee beans collected from various countries [7, 13, 15]. The results of the present study are generally in agreement with the results of various studies where K was the most abundant element followed by Mg and Ca [7, 9, 13, 14, 16, 17].

Table 4 Concentrations (mg kg−1 dry weight) determined in Arabica green coffee beans grown in various zones of Amhara Region, Ethiopia

Based on the results of the present study, coffee green beans collected from West Gojjam Zone had higher mean concentration of K (18,127 mg kg−1), Mg (1978 mg kg−1), Ca (1130 mg kg−1), Na (230 mg kg−1), and Si (83.0 mg kg−1) followed by those coffee samples collected from Bahir Dar Especial Zone (17,364 mg kg−1 K, 1745 mg kg−1 Mg, 943 mg kg−1 Ca, 170 mg kg−1 Na, and 71.2 mg kg−1 Si (Table 4). The elements K and Mg were lowest in samples collected from East Gojjam Zone with mean concentration 16,403 and 1425 mg kg−1, respectively, while the concentrations of Ca (870 mg kg−1) and Na (170 mg kg−1) were lowest in coffee samples collected from West Gojjam Zone.

One-way ANOVA (α = 0.05) revealed that K, Mg, Ca, Na, and Si were significantly different in all coffee samples except the samples between East Gojjam and Awi Zones for K, and the samples between East Gojjam and Bahir Dar Especial zone for Ca where the differences were insignificant. The concentrations of the microelements (Cu, Zu, Co, Ni, Mn, and Cr) are in good agreement with results which were reported from Ethiopian and coffee of others countries [7, 13, 14, 16,17,18]. However, the concentration of Mg recorded in the present study was almost 10 times higher than the data reported by Bertrand et al. [9] in Colombian coffee. This might be due to variations in the soil chemistry and other factors of the study areas.

Coffee is regarded as one possible source of micro and trace metals that are essential for the normal functioning of the human body [11, 13]. The presence of elements like Fe, Zn, Mn, Cu, Co, Ni, and Cr is widely reported in coffee. However, their concentrations were variable with the coffee type [13]. The order of these elements in the present studied coffees was found to be Fe (52.6 mg kg−1) > Cu (23.8 mg kg−1) > Mn (17.2 mg kg−1) > Zn (16 mg kg−1) > Co (5.2 mg kg−1) > Ni (2.79 mg kg−1) > Cr (1.33 mg kg−1).

Based on the results, coffee samples collected from Bahir Dar Especial Zone contained the highest concentration of microelements like Fe, Mn, and Co while the concentration of Fe in coffee of East Gojjam, Mn in coffee of Awi, and Co in the coffee of West Gojjam were lowest (Table 4). In this regard, the concentration of Fe was significantly varied between sampling zones of coffee. Similarly, the concentration of Co was also significantly different between sampling zones except between samples of West Gojjam Zone and Awi Zone and between East Gojjam and Bahir Dar Especial Zone. A wide variation in the concentration of Zn was also noticed between the sampling zones whereby the lowest mean concentration was found in coffee samples collected from East Gojjam Zone (average 4 mg kg−1) and the highest was recorded in samples collected from Awi Zone (average 22 mg kg−1). Such variations might be aroused due to variations in the physicochemical properties of the soils of the sampling areas and coffee genotypes.

The concentrations of Fe, Cu, and Mn which were found in the present study corroborate with the data reported by Mehari et al. [7] and Habte et al. [13] for other Ethiopian coffee types. However, the concentration of Zn found in the present study was about five times higher than its concentration reported in the other Ethiopian coffee types collected from Eastern, Southern, and Western parts of the country [13]. Compared to the results of Brazilian, Colombian, and Hawaiian coffee varieties, the results of the present study show un-predictable trends (Table 5). While the concentrations of Fe and Mn that were found in the coffee samples are similar with the literature values, the concentrations of Zn, Cu, and Co were widely varied between samples of different countries (Table 5).

Table 5 Comparison of the elemental composition (mg kg−1 dry weight) of Arabica green coffee beans reported in the literature with results from the present study

The mean concentration of Cr across the coffee samples was from 0.921 to 1.84 mg kg−1 which was not significantly different (P > 0.05) among the coffee beans. Nevertheless, the concentration of Cr found in other coffee varieties as indicated by Habte et al. (2016) and Krivan et al. (1993) was about five times less compared to the results of the present study. Lead was detected in all coffee samples of the present study with an overall average concentration of 0.38 mg kg−1. On the other hand, cadmium was detected only in samples collected from West Gojjam and Awi Zones with an average concentration of 0.012 mg kg−1. The determined concentrations of Pb and Cd were comparable with values reported for some coffee types in the literature (Table 5). Contrarily, about 100 times higher concentrations of Pb and Cd were also reported in Brazilian coffee [14].

The results of the present study generally indicated that the amount of metals found in coffee beans varies considerably with geographical production origin which can be associated with soil types, environmental conditions, the fertilizers used, and other conditions which is also supported with research findings of other researchers [17].

SD standard deviation, ND not detected, Min minimum, Max maximum

NR not reported, ND not detected

Principal Component Analysis

The determined concentrations of metals were used with PCA to explore the presence of trends or patterns in the distribution of minerals among the coffee bean samples from the four zones. PCA is an unsupervised statistical technique that categorizes samples based on projection of variance and hence avoids prejudice [20, 21]. PCA provides an immediate insight into the data structure and reveals any possible clustering of the observations and identification of any outliers in the data set. Furthermore, PCA provides an opportunity to identify the variables that contribute similar information to the PCA model and provide unique information concerning the samples under investigation [21].

The principal component results are elaborated in terms of scores and loadings. Scores and loadings are the transformed variable values and factors or multipliers that could result in the original scores, respectively [8, 21]. The scores plot (Fig. 2) shows the distribution of the studied coffee samples on the space created using the scores of the first two principal components.

Fig. 2
figure 2

Scores plot of the first two principal components showing the distribution of coffee samples from the four zones of Amhara Region

A four-component model was developed with the maximum variation along PC1 (72.2%). The cumulative chemical variation modeled using four PCs was 96% (R2Xcum = 0.96). R2 is the proportion of the variation of the data set explained by the model. Though the majority of the variance was captured in the first principal component (PC1), accounting for 72.2% of the variation among samples, only 11.5% was explained by the second principal component (PC2), which reflects the existence of less variation of samples within the same province or zone. In other words, variation within zone was much less than between zone. The value of the cross-validation parameter, Q2 (cum), was 66.5%. Q2 is the proportion of the variation of the data set that can be predicted by the model. Accordingly, the model has good predictive ability as indicated by Atlabachew et al. [20].

Elements that best fit the model, with more than 72%, explained variability, were Ca, Mg, K, Na, Si, and Fe. From the plot (Fig. 2), it is evident that the coffee samples tended to cluster according to their geographical production origins. Coffee samples obtained from West Gojjam Zone showed differences in their mineral composition compared to coffees from the other zones. Accordingly, most of these coffee samples were separated from those collected from East Gojjam, Awi, and Bahir Dar Especial Zones by PC1.

The loading plot for the first two principal components is presented in Fig. 3. The plot displays how the individual elements correlate with each other and contribute to the model. Elements that are far away from the origin have a strong impact on the model, whereas those that are closer to the center have weaker influence [8]. From the plot, it is evident that K, Mg, Ca, and Na play the largest role in discriminating the green coffee beans from various zones of Amhara Region. The first component is highly influenced by K and Mg, whereas Ca and Na contributed to separation by the second component.

Fig. 3
figure 3

Loadings plot for the first two principal components of the PCA model constructed from the concentration of metals quantified in coffee of Amhara Region

Linear Discriminant Analysis

Based on the concentrations of the elements, an attempt was made to develop a chemometric method useful for the authentication of the production region of the coffee beans. For that purpose, LDA was applied. Initially, in order to assess the potentials of individual elements toward constructing a linear discriminant analysis model, a test of equality of group means was employed. The test measures each elements potential before the model is created. This was performed by applying one-way ANOVA for each element using the sample regions as the factor. According to the significance values in (Table S2, Supplementary material), each element, except Pb and Cr, was found to be significant (α > 0.05) for the discriminant model. Wilks’ λ was another measure of a variable’s potential. Smaller values indicate that the variable is better at discriminating between regions. The results (Table S2, Supplementary material) suggest that K was best in this regard. Consequently, all of the elements, excluding Pb and Cr, were used simultaneously to construct the LDA model. In addition, all of the four groups of samples were assigned equal prior probabilities.

Accordingly, three discriminant functions were computed. The first explained 71.6% of the variation, the second 18.2%, and the third 10.3%. The significance of the three functions in discriminating among regions was assessed from Wilks’ λ (α > 0.05), which tests the hypothesis that the means of the functions are equal across groups. The significance values in Table S3, Supplementary material showed that all of the three functions contributed significantly to the model. Additionally, Wilks’ λ values indicate the proportion of the total variance in the discriminant functions not explained by differences among the classified groups of samples. The magnitude of Wilks’ λ encompassing the three functions was 0.035, reflecting that the LDA model accounted for almost all of the variation in the data set.

The contribution of each element toward the LDA model was assessed from the computed discriminant function coefficients (Table S4, Supplementary material). The discriminant function coefficients indicate the correlation between each element and the discriminant functions. Elements with higher absolute values of coefficients corresponding to the first discriminant function are the most discriminators among the group of samples. Potassium is the highest contributor to the first function followed by Na and Mg.

Application of LDA with the elemental concentrations resulted in an overall 94.2% correct classification of the coffee beans into their respective regions. The model classified 113 of the 120 samples correctly (Table S5, Supplementary material),while the 7 East Gojam coffee samples were incorrectly classified as Awi (3 samples) and Bahir Dar (4 samples). The distribution of the coffee samples on the plane formed by the first two discriminant function scores is shown in Fig. 4.

Fig. 4
figure 4

Scatter plot showing the distribution of green coffee bean samples on the space created by the scores of the first two discriminant functions

The first function has effectively differentiated East Gojam coffees from West Gojam coffees. This function explained 71.6% of the variation in the data set and was strongly associated with K (Table S4, Supplementary material). Hence, the grouping of West Gojam coffees to the positive side of this function can be attributed mainly to their higher content of K relative to East coffees. On the other hand, the second function has effectively differentiated Bahir Dar coffees from Awi coffees. This function was strongly and positively associated with Mn and negatively with Cu (Table S4, Supplementary material). Hence, the grouping of Bahir Dar coffees to the positive side of the second function can be attributed mainly to their higher content of Mn and lower content of Cu relative to Awi coffees.

To check the reliability of the LDA model, its recognition and prediction abilities were computed from samples assigned for training and validation sets. Accordingly, the entire sample set was divided as training (80 samples) and validation sets (40 samples). The prediction ability of the model was found to be 93.4%, which resulted from the percentage of the samples in the validation set that were correctly classified by using the model constructed in the training step. Similarly, the model had a 94.1% recognition ability, which was computed from the percentage of the samples in the training set that were correctly classified into their respective groups.

Conclusion

The multi-element composition of coffee samples collected from four major producing zones of Amhara Region, Northwestern Ethiopia were analyzed by ICP-OES and the data were used to discriminate coffees to their respective zones. The elements K, Mg, Ca, and Na were the most abundant constituents in all studied samples. Among the studied elements, K, Mg, and Na were found to be important chemical predictors of production zones of the coffee samples. The LDA model constructed based on the elemental compositions was found to be effective in classifying the coffee samples to their respective production zones. Application of LDA provided a classification model with an overall 94.2% accuracy and 93.4% prediction ability. The results obtained in this study could provide an empirical support for the continued effort for branding the coffee that is produced in Amhara Region like that of other well-known coffee-producing regions of Ethiopia.