Introduction

Coffee is one of the major commercial products in world trade after petroleum products (Nair 2010). Given its importance in world markets, maintaining quality control of coffee beans gains importance. Because coffee prices vary depending on the types and quality of the coffee (Zhang et al. 2018; Janandi and Cenggoro 2020).

In order to identify the types of coffee beans and classify them according to their physical state, manual techniques are often applied in some food industries. This technique is a manual visual examination carried out by a trained expert. This somewhat subjective process can be tedious and time-consuming. In addition, expert’s mood swings, mental states, and emotions such as stress may negatively affect the classification process (Arboleda 2019). Therefore, the use of computer vision applications instead of manual techniques stands out as a better option to classify products and separate them from unwanted substances. Computer vision applications can automatically extract useful information from a particular object in an image and analyze it. A computer vision algorithm, in general, consists of two main stages for defect or class detection. First, the segmentation of objects from the background is performed accurately. Second, diverse physical features of objects are extracted, and, by using classical machine learning methods, these features are classified or the defective ones are detected.

Coffee is a product grown in more than 60 tropical countries around the world. It constitutes a large part of the exports of the countries where they are grown. Coffee beans should be classified so that their quality can be determined before export. Determination of the variety and quality of coffee is carried out according to size, shape, color, acidity of aroma, and taste. The shape of coffee can vary depending on where it is grown. However, due to the similarity in taste, color, and shape, it is quite difficult for the consumer to identify the type of coffee (Waliyansyah and Hasbullah 2021). Food counterfeiting activities and counterfeiting are increasing in coffee commodities. In imitation, it is an attempt to change the appearance made for the purpose of making a large profit by beautifying the appearance of coffee. Coffee beans are usually classified according to expert opinion and tradition. Since these measurements may be inconsistent, the classification of coffee via digital images is of great importance. Unlike human beings, the results of classifications made on digital images have the ability to be more precise, objective, and sensitive (Vogt 2020; Adiwijaya et al. 2022a). In addition, it has superior features compared to manual methods that are boring, time-consuming, expensive, and more inefficient. Due to the mentioned problems, image processing and classification techniques have started to be used for coffee bean classification and determination. However, studies in this area are limited. Therefore, it is necessary to develop this field by carrying out work and integrating it into production facilities (Ansar et al. 2021).

When the literature is examined, it is seen that the researches are carried out with different types of coffee beans, and the classification and defect detection processes of coffee beans are made by different machine learning methods. In some of these studies, processes were performed by using Artificial Neural Networks (ANN) and Naive Bayes (NB) algorithms to classify according to CIE L*a*b* color values (Oliveira et al. 2016; Ropelewska et al. 2022), ANN with utilizing various morphological features, K-Nearest Neighbor (KNN) algorithms (Arboleda et al. 2018) and 22 different machine learning algorithms (Arboleda 2019), Multi-Layer Perceptron (MLP) algorithm with shape, size, and color features (Pizzaia et al. 2018), and Support Vector Machine (SVM), Deep Neural Network (DNN), Rule inference (Koklu et al. 2014), and Random Forest (RF) algorithms (Santos et al. 2020).

In addition to all these studies, deep learning methods ensure extracting features from images, performing classification, and defect detection based on these features. Examining the studies carried out by using deep learning methods in the literature, it is seen that there are different studies utilizing Convolutional Neural Networks (CNN) and their architectures (Janandi and Cenggoro 2020; Pinto et al. 2017; Huang et al. 2019, 2020; Rivalto et al. 2020; Fuentes et al. 2020; Kuo et al. 2019). Moreover, there are studies that extract features from images of coffee beans with the help of CNN and compare the results by classifying these features with machine learning methods. In the next chapter, the details of the studies carried out on the coffee bean in the literature will be given.

The objectives of the study can be listed as follows:

  • • To determine which classification technique is more successful.

  • • To examine the distinguishability of coffee varieties with images of coffee beans.

  • • To study to what extent coffee beans are classified.

  • • To examine the availability of models developed for the classification of coffee beans in automated systems.

The contributions of this study to the literature are given below.

  • 1.A total of 1554 images of 3 different coffee types, which are Espresso, Kenya, and Starbucks Pike Place coffee beans, were obtained through a specially developed mechanism, and a dataset was created.

  • 2.In order to distinguish coffee beans, 4 different CNN-based models were used.

  • 3.The coffee beans were classified with the models created, and different performance metrics were obtained for each model and each class. The average classification success of each model was compared, and the model having the highest success was determined.

  • 4.It is foreseen that the separation of coffee beans will be ensured by the methods performed in this study.

This study consists of five sections. In Section "Introduction", the developed work is contextualized. In Section "Related Works", the coffee classification process is described. In Section "Material and Methods", the developed context is explained. In Section "Experimental Results", the results are presented. Lastly, conclusions and perspectives for future studies take place in Section "Conclusion".

Related Works

In the literature, there are studies conducted with different coffee beans and in which classification processes are carried out with different methods. Researchers, in these studies, have used different machine learning algorithms and different deep learning models for classification processes. Some of the studies conducted on coffee beans in the literature are given below.

de Oliveira et al. (2016) used Artificial Neural Networks (ANN) and NB classifier to classify green coffee beans according to CIE L*a*b* color values. Coffee beans were classified into 4 groups as whitish, red green, green, and bluish green. A classification accuracy of 100% was obtained with NB (de Oliveira et al. 2016).

Pinto et al. (2017) utilized Convolutional Neural Network to classify beans according to 6 different defect types (black, sour, fade, peaberry, damage, and normal) by using coffee bean images. As a result of the classification, accuracy values of 72% and above were obtained according to the defect types (Pinto et al. 2017).

In the study of Arboleda et al. (2018), various morphological features were obtained from 195 training images and 60 test images in order for the automatic classification of coffee beans. Classification processes were performed with Artificial Neural Networks (ANN) and K-Nearest Neighbor (KNN). The highest classification accuracy was obtained from ANN with 96.66% (Arboleda et al. 2018).

Pizzaia et al. (2018) used Multi-Layer Perceptron (MLP) algorithm by extracting shape, size, and color features in order to classify arabica coffee beans as good and bad. As a result of the study, 94.10% classification accuracy was obtained (Pizzaia et al. 2018).

Fukai et al. (2018) aimed to develop an automatic coffee bean classification system for coffee bean producers by using state-of-the-art machine learning techniques and Raspberry Pi in their studies. As the first step of the system development, they performed classification processes with approximately 13 thousand images of 5 different types of coffee beans through deep Convolutional Neural Networks (CNN) and Support Vector Machines (SVM). Classification results were compared, and it was stated that the results obtained with CNN are higher than those obtained with SVM. Additionally, Raspberry Pi camera module and CNN results were compared, and it was concluded that higher results were obtained with CNN. The classification accuracies obtained through CNN for coffee beans varied between the range of 75% and 95% (Fukai et al. 2018).

Arboleda (2019) extracted 4 morphological features of Liberica, Robusta, and Excelsa green coffee bean types. These features were classified by 22 different machine learning algorithms, and their performances were compared. The results of the study showed that the highest accuracy rate in the classification of green coffee beans was achieved in the Coarse Tree Algorithm with 94.10% (Arboleda 2019).

Huang et al. (2019), in their study aiming to separate imperfect and perfect coffee beans with Convolutional Neural Network, obtained 1000 coffee bean images in the imperfect class and 1000 coffee bean images in the perfect class. They augmented the data by using rotation and flip methods in the obtained data set. At the end of this process, the network was trained on 72 thousand images, 36 thousand of which are imperfect and 36 thousand are perfect. The authors stated that they obtained an accuracy value of 93.34% as a result of the classification (Huang et al. 2019).

Kuo et al. (2019) proposed a control scheme to detect defects in coffee beans, named Hough Circle-Assisted Deep-Network Scheme. This scheme using a deep learning-based YOLOv3 network targets small circular objects and can accurately detect imperfect beans. With the proposed scheme, high control accuracy and precision were achieved in the detection of defects in coffee beans (Kuo et al. 2019).

Rivalto et al. (2020) used a deep learning-based Convolutional Neural Network to identify the types of coffee beans. Their study were carried out on 617 images of 4 different coffee beans grown in Indonesia. As a result of the training, it was stated that coffee types were classified with an accuracy of 74.26% (Rivalto and Pranowo, and A.J. Santoso. 2020).

Gope and Fukai (2020), in their study, aimed to distinguish the Peaberry, a bean that is relatively rounder and has a different flavor than flat coffee beans, from the others. In accordance with this purpose, classification operations were performed by using deep Convolutional Neural Networks (CNN) and Support Vector Machine (SVM). In the study, images of 1900 normal and 1438 Peaberry bean were used. In addition, classification processes were carried out with 4 different image sizes (32 × 32, 64 × 64, 128 × 128, and 256 × 256) in the training. As a result, it was stated that higher accuracy values (97% and above) were achieved in the classification results performed with CNN compared to SVM, and the image sizes did not affect the classification results (Gope and Fukai 2020).

Santos et al. (2020) aimed to analyze the importance of coffee beans’ shape and color characteristics via different machine learning techniques such as Support Vector Machine (SVM), Deep Neural Network (DNN), and Random Forest (RF) to evaluate the defects of coffee beans. For this purpose, an algorithm written in Python was used in order to extract shape and color features from images of coffee beans. Among the variables used, gmean from RGB (Red, Green, and Blue) color space and Vmean from HSV (Hue, Saturation, and Value) color space stand out as the most relevant features for classification models. Moreover, it was stated that accuracy values of 88% and above were obtained as a result of the classification processes of coffee beans performed with SVM, DNN, and RF machine learning techniques (Santos et al. 2020).

Janandi and Cenggoro (2020) developed a deep learning-based mobile application to automatically classify the quality of coffee beans via a mobile phone camera. One hundred sixty coffee bean images were collected to be utilized in the models created for the study in which they performed classification processes by using ResNet-152 and VGG16 deep learning architectures. At the end of the study, with the ResNet-152 architecture, the highest accuracy value of 73% was achieved (Janandi and Cenggoro 2020).

Huang et al. (2020) used CNN to classify coffee beans according to their quality. They obtained a total of 2000 images, 1000 of each coffee bean labeled as good and bad. Considering that these images may be insufficient in the training of the CNN model, rotation and flip procedures were used, and data were augmented 36 time. In conclusion, an accuracy value of 94.63% was achieved in the classification of coffee beans as good and bad (Huang et al. 2020).

In their study, Fuentes et al. (2020) aimed to classify the coffee fruit as ripe or unripe according to their color and shape by using deep learning algorithms. Within the scope of the study, the algorithm was trained with a total of 196 coffee fruit images, 108 ripe and 88 unripe. As a result, it was stated that the classification process was performed with 97.6% accuracy (Fuentes et al. 2020).

Waliyansyah and Hasbullah (2021) used NB, Tree, SVM, and Logistic Regression machine learning methods to classify two different coffee beans. With these methods, digital image processing was performed on a total of 58 coffee bean images from two types to obtain various features and perform classification operations. As a result of the study, they stated that coffee beans were successfully classified and that the most successful method was SVM (Waliyansyah and Hasbullah 2021).

Suyoto et al. (2021) performed texture analysis and feature extraction on grayscale images of 120 coffee bean images to determine the diversity of coffee beans. The obtained attributes indicated that they achieved an accuracy rate of 87.27% in detecting coffee bean varieties using the SVM algorithm (Suyoto et al. 2021).

Jumarlis et al. (2022) used the GLCM (Gray Level Co-Occurrence Matrix) and KNN methods to detect the defects of coffee beans. In addition, they have designed a website where farmers can perform defect detection using images of coffee beans. As a result of the study, they noted that the proposed system for detecting defects in coffee beans has achieved an accuracy of 90% (Jumarlis et al. 2022).

In their study, Lee and Jeong (2022) developed a CNN-based model to predict normal and defective beans from two-class coffee bean images. In the experimental results, they stated that they obtained an average classification accuracy of 90.44% (Lee and Jeong 2022).

Adiwijaya et al. (2022a) used the KNN machine learning method to classify coffee bean quality. They focused on the color features of coffee bean images for use in classification. In the study conducted using 90 coffee beans in three different classes, they stated that they achieved a classification accuracy of 83% (Adiwijaya et al. 2022b).

Material and Methods

In this section, information about the coffee dataset, machine learning models, and performance metrics used in the study will be given.

Coffee Dataset

The coffee plant is a type of plant that grows in certain parts of the world. Even if coffee cultivation is made from the same type of coffee, it can change the aroma of the coffee according to the soil, climate, precipitation, and the way the coffee is harvested. The three types of coffee beans used in the study are coffee beans grown in different regions. The origin of Espresso coffee bean in the study is Ethiopia, the origin of Kenya coffee bean is Kenya, and the origin of Starbucks Pike Place coffee bean is Mexico, Costa Rica, and Colombia (Seninde and Chambers 2020). In Fig. 1, the countries of origin of the coffee beans are shown on the map.

Fig. 1
figure 1

Origin countries of coffee beans used in the study

In this study, images of three different coffee beans were used with the aim of recognizing the coffee type via deep learning. In order to acquire the dataset, images of espresso, Kenya, and Starbucks Pike Place coffee beans were collected through a specially created mechanism. The setup created for the study has a camera, a box where the images will be captured, a computer, and a Computer Vision System (CVS) that allows the images to be saved in the desired color and resolution. The system used to take images of coffee beans is shown in Fig. 2.

Fig. 2
figure 2

Coffee image acquisition system

A total of 1554 coffee bean images were obtained with the created image acquisition mechanism. Five hundred thirty of these images are espresso coffee bean, 502 of them are Kenya coffee bean, and 522 of them are Starbucks Pike Place bean images. Images are RGB and have a resolution of 400 × 400 pixels. Figure 3 gives the sample coffee bean images obtained.

Fig. 3
figure 3

Sample images of coffee beans used in the study

The distribution of images in the coffee bean image dataset by classes is shown in Fig. 4.

Fig. 4
figure 4

Distribution of images in the dataset by classes

Convolutional Neural Network

CNN, which is a deep learning method that includes many layers with complex structure, is frequently used in solving image processing problems (Albawi et al. 2017). The CNN method can work as an end-to-end classifier. Enabling feature extraction from the data given as input to the CNN network thanks to its layers, this method ensures learning and classification with the extracted features (Koklu et al. 2021).

In order to be able to perform these processes, convolution layer where various features are obtained by applying step-by-step filters to the image, the pooling layer where large-scale data coming from this layer is simplified in order to facilitate learning, the activation layer to prevent values from being out of the applicable data range, the fully connected layer which is the artificial neural network layer for performing the classification, and lastly softmax layer where the classes are separated exist (Guo et al. 2016).

By applying various filters step by step on the regions on the image, image features are extracted from each region of the convolution layer. It is possible to increase or decrease the number of features to be obtained by determining the number of steps and filters desired to be obtained in this layer. However, this number should be set at an optimum level since a large number of features will make learning difficult for the network (Kandel and Castelli 2020).

In the pooling layer, the processes of reducing the large number of data coming from the convolution layer and the complexity are performed. Additionally, it is ensured that the image is transferred to the next layer by reducing its size without deteriorating its properties. Pooling layer types are mentioned in the literature. In this layer also, it is necessary to make optimum adjustments in a way that the classification is not affected (Singh et al. 2022).

After the required layers, it is ensured that the data is within the certain intervals by adding the activation layer. Then, in the fully connected layer as the classification layer, the features are reduced to the neural network level and learning processes are performed to make extraction. At the end of this process, the Softmax activation function is utilized to separate the classes. In this layer, an output is obtained by performing the labeling process (Guo et al. 2016).

Transfer learning approaches are influenced by the human learning model. In the learning process, in order to solve a problem they have never encountered, people benefit from the solution of problems they experienced before (Ying et al. 2018). Within the scope of this study, the CNN network is trained with the transfer learning method by using previously trained models (Deepak and Ameer 2019). SqueezeNet, Inception V3, VGG16, and VGG19 are the transfer learning models used in the study.

SqueezeNet

SqueezeNet is a smaller CNN architecture that uses fewer parameters than other CNN models. SqueezeNet has fifteen layers consisting of five different layers: two convolution layers, three maximum pooling layers, eight fire layers, one global average pooling layer, and the softmax layer with one output layer. SqueezeNet consists of fire layers, which are convolution layers compressed with only 1 × 1 filters. Fire layers form compression and expansion processes between convolution layers (Ucar and Korkmaz 2020; Lee et al. 2019; Taspinar et al. 2021a).

Inception V3

Inception V3 was trained to recognize 1000 different objects for the ImageNet 2014 competition. This network structure which consists of 48 layers has an image input size of 299 × 299. Inception V3 consists of symmetric and asymmetric building blocks, including convolution layers, maximum pooling layers, average pooling layers, dropout layers, and fully connected layers (Demir et al. 2019; Sinha and Clarke 2017).

VGG16

This network structure was proposed by Zisserman and Simonyan in 2014. The network, based on the AlexNet deep network, is more successful in image recognition and classification problems when the data set is defined correctly. It contains 13 convolutional layers (Bicakci et al. 2020; Koklu et al. 2022a).

VGG19

The main layers of the VGG19 architecture consist of 16 convolutional, five pooling, and three fully connected layers. This architecture has a total of 24 main layers. Since VGG19 has a deep network, filters used in the convolutional layer are used to reduce the number of parameters. The size of the filter selected in this architecture is 3 × 3 pixels. VGG19 architecture contains approximately 138 million parameters (Mateen et al. 2019; Taspinar et al. 2021a).

Confusion Matrix and Evaluation Metrics

A three-class confusion matrix was used to evaluate the performance of the classification models used in the study. Table 1 shows an example of a three-class confusion matrix.

Table 1 Coffee beans confusion matrix

Accuracy, Precision, Recall, and F-1 Score values which are frequently used performance measures were obtained from the confusion matrix. In order to obtain these values, there are 4 values in the confusion matrix: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). TP and TN indicate the number of correctly predicted positive and negative samples, while FP and FN indicate the number of incorrectly predicted positive and negative samples. Performance metrics are calculated with the formulas given in Table 2 by using these 4 values. Table 3 shows the calculation of TP, TN, FP, FN values for each class (Taspinar et al. 2021b; Koklu et al. 2022b; Koklu and Taspinar 2021).

Table 2 Formulas for performance metrics
Table 3 TP, TN, FP, and FN values by classes

Cross-Validation

Cross-validation is a method utilized for objective measurement of the classification models’ accuracy. In this method, the dataset is divided into equal parts according to the determined number value. 1/k part of the dataset is allocated for testing, and the k − 1 part is allocated for training. This process continues until each piece of the data set is used as the test segment, that is, this process is repeated k times. The overall classification success of the model is calculated by taking the arithmetic average of the classification successes obtained as a result of these processes (Arlot and Celisse 2010; Koklu and Tutuncu 2017). In our study, the k value was determined as 10. Figure 5 shows how the cross-validation method works.

Fig. 5
figure 5

Cross-validation process

Area Under the Receiver Operating Characteristic Curve (AUC-ROC)

AUC-ROC curve is used as a performance measure for classification problems. ROC is a probability curve, while AUC represents the degree or metric of scalability and shows how much the model can distinguish between classes (Sam et al. 2019).

Experimental Results

The obtained images of coffee beans were classified by using four different pre-trained CNN models. The training of the models was carried out on a virtual server having a 2-core Intel Xeon CPU, 12 GB ram, and 16 GB Nvidia Tesla T4 GPU provided by Colab through the Keras library in the Colab environment. In Fig. 6, the flowchart showing the processes of obtaining experimental results is given.

Fig. 6
figure 6

Flow diagram of classification process

The networks to be used for Transfer Learning were chosen among the networks trained with ImageNet: SqueezeNet, Inception V3, VGG16, VGG19. In this study, the weights of these previously trained networks (SqueezeNet, Inception V3, VGG16, VGG19) were transferred and used in the classification of coffee beans.

In Table 4, the parameters for the pre-trained deep convolutional neural networks used in the study are given.

Table 4 Model parameters

The success of the models was measured with the accuracy, Precision, Recall, and F-1 Score metrics that are calculated by using the values in the confusion matrix of each CNN model. The cross-validation method was utilized to ensure the reliability of the test results of all models. In Table 5, the confusion matrix obtained from the SqueezeNet model is given.

Table 5 Confusion matrix of SqueezeNet model

According to Table 5, the SqueezeNet model classified 489 of the images of Espresso coffee beans correctly and misclassified 41 of them. While it correctly classified 449 images and incorrectly classified 53 images of Kenya coffee beans, 418 of the Starbucks Pike Place coffee bean images were classified correctly, and 104 of them were classified incorrectly. Starbucks Pike Place is the class in which the largest number of coffee bean images is misclassified. Table 6 shows the performance metrics calculated by using the confusion matrix data of the SqueezeNet model.

Table 6 Performance metrics of the SqueezeNet model (%)

According to Table 6, SqueezeNet model achieved the highest classification success in the Espresso class. The class with the highest precision value is Kenya, while the Espresso class has the highest recall and F-1 score values.

The Inception V3 model was trained by using the images of coffee beans under the same conditions as the other models. The tests of the model that emerged as a result of the training were carried out. Table 7 shows the confusion matrix obtained as a result of the Inception V3 model of tests.

Table 7 Confusion matrix of Inception V3 model

According to Table 7, the Inception V3 model classified 457 images correctly and classified 73 images incorrectly in the Espresso class. With the same model, in Kenya class, 436 images were classified correctly, and 66 images were classified incorrectly, while 372 images of the Starbucks Pike Place class were correctly classified and 150 of them were incorrectly classified. As a result, the highest number of misclassification belongs to the Starbucks Pike Place class. Table 8 shows the performance metrics calculated by using the confusion matrix data of the Inception V3 model.

Table 8 Performance metrics of the Inception V3 model (%)

According to Table 8, the Inception V3 model achieved the highest classification success in Kenya class. Again, with Inception V3, the highest Precision, Recall, and F-1 Score values were obtained in Kenya class.

As a result of the model VGG16’s test process, the confusion matrix data in Table 9 were obtained. According to Table 9, 445 of the images were correctly classified, and 85 images were incorrectly classified in Espresso class, while 409 images were correctly and 70 images were incorrectly classified in Kenya class. Again, in the Starbucks Pike Place class, 362 images were correctly classified, and 160 images were incorrectly classified. So, the highest number of misclassifications belonged to the Starbucks Pike Place class. Table 10 shows the performance metrics calculated by using the confusion matrix data of the VGG16 model.

Table 9 Confusion matrix of VGG16 model
Table 10 Performance metrics of the VGG16 model (%)

According to Table 10, Kenya class, the model VGG16 has the highest classification success and Precision value. Same model has the highest Recall and F-1 Score values in the Espresso class.

The confusion matrix data given in Table 11 was obtained as a result of testing the VGG19 model. According to Table 11, the VGG19 model correctly classified 397 of the images in the Espresso class and incorrectly classified 133 of them, while the number of images it correctly classified is 418 and the number of the images it incorrectly classified is 84 in Kenya class. With the same model, 312 of the images in the Starbucks Pike Place class were classified correctly, and 210 of them were classified incorrectly. As with other models, the highest number of misclassifications belongs to the Starbucks Pike Place class. Table 12 shows the performance metrics calculated by using the confusion matrix data of the VGG19 model.

Table 11 Confusion matrix of VGG19 model
Table 12 Performance metrics of the VGG19 model (%)

According to Table 12, the model VGG19 has the highest classification success, Precision value, Recall value, and F-1 Score value in Kenya class. In Table 13, the average classification success, Precision, Recall, and F-1 Score values of SqueezeNet, Inception V3, VGG16, and VGG19 CNN models used in the study are given.

Table 13 Average performance metrics of all models (%)

Comparison of performance metrics of all models used for classification in the study is shown in Fig. 7.

Fig. 7
figure 7

Comparison of the performances of the models

According to Table 13, the SqueezeNet model has the highest value in all performance metrics. ROC curves provide information about the distinctiveness of the models. Figure 8 gives the ROC curves of all models.

Fig. 8
figure 8

ROC curves of all models ((a): SqueezeNet, (b): Inception V3, (c): VGG16, (d): VGG19)

Conclusion

Within the scope of this study, a total of 1554 coffee bean images obtained with the image acquisition mechanism that is created specifically for the study were used. These images were of three different types of coffee beans: Espresso, Kenya, and Starbucks Pike Place. Using the images of coffee beans, four different CNN models were trained with the transfer learning method. The tests were carried out after the training processes of the models used in the study, which are SqueezeNet, Inception V3, VGG16, and VGG19. In order to evaluate the performances of the models more objectively, tests were performed by the cross-validation method. The k value was determined as 10 in the cross-validation method. While the highest classification success rates were obtained with the SqueezeNet model, these rates were calculated as 93.8% for Espresso, 93.4% for Kenya, and 87.4% for Starbucks Pike Place classes. The average classification successes of the models are 87.3%, 81.4%, 78.2%, and 72.5%, respectively for SqueezeNet, Inception V3, VGG16, and VGG19. It is concluded that SqueezeNet is the model with the highest classification success, while the model with the lowest classification success is VGG19. Furthermore, SqueezeNet is the model also with the highest average Precision, Recall, and F-1 Score values. When the ROC curves are examined, it is seen that SqueezeNet model has the highest distinctiveness.

When the obtained coffee bean images are examined, it is understood that the view of Espresso and Starbucks Pike Place coffee beans are very similar. This situation can also be understood from the values in the confusion matrices. Due to the similarity of their appearances, these two beans sometimes cannot be distinguished.

Within the scope of this study, different types of coffee beans were classified and distinguished. The use of such practices in the production, packaging, and trade stages of coffee will make it possible to distinguish coffee beans. It is envisaged to reduce the processing time and labor cost to a minimum. This will also improve the quality-based export of coffee beans. Thanks to these models, it will be able to ensure that all products have the same standard. Quality control will be facilitated. Decision-making errors caused by the mental and physical condition of the specialist, such as fatigue and vision, will be prevented. It will be able to allow different companies to conduct fair quality control without any bias.