Keywords

1 Introduction

Wine has always been an essential part of the dinning culture in western countries. From the manufacturer point of aspect, understanding the quality of the wine and creating a steady production is an important goal for the industry. However, testing the quality of the wine is complex and diverse. The wine quality is evaluated in terms of subtlety and complexity, ageing potential, stylistic purity, varietal expression, ranking by experts, or consumer acceptance. By excluding the controllable object measures, the views of experts are very subjective because it may cause the most considerable influence on both winemakers and consumers [1].

Recording the steps of wine production procedure is to preserve the quality and knowledge of the whole winemaking process, and the collected information is the best tool to guarantee the wine quality. Currently, the wine industry has established the protected designation of origin (PDO) system [2] with the support of analytical chemistry and chemometric tools to obtain information related to a specific wine. With the improvement of technology both in software and hardware, winemakers started to use the collected data to improve the winemaking technique. Due to the high cost and lack of technological resources, it was difficult for most of the wine industries to classify the wines based on the chemical components. Many algorithms based on machine learning to assess the quality of wine have then been gained much attention for the wine industry to determine what attributes make a “good” wine that the consumers can satisfy with them. For instance, Yeo et al. focused on predicting the wine price using a machine learning technique by using past historical wine price data [3]. For wine production, Ribeiro et al. utilized the linear regression, neuron network and decision tree for predicting the wine vilification [4].

In 2009, Cortez et al. collected a wine quality dataset which consists of significant larger instances [6]. Then, three machine learning models, including multiple regression, support vector machine (SVM) and neuron network (NN), are trained using the collected wine dataset. It shows that SVM outperforms the other two methods, and indicates the importance of the correct setting of hyperparameters. Over the years, the wine dataset has been adopted in several studies with various methods such as SVM [7,8,9,10] , random forest (RF) [11,12,13,14], decision-tree-based algorithms [12,14], and NN [4,7,8] to predict the quality of the wine based on physiochemical characteristics in the wine.

However, the past literature mostly focused on using or comparing different machine learning models that can provide the best prediction result for specific datasets. To get a more effective classifier, in this paper, a hybrid model that consists of two classifiers at least, e.g. the random forest, support vector machine, is proposed for wine quality prediction. To evaluate the performance of the proposed hybrid model, experiments also made on the wine dataset to show the merits of the hybrid model.

2 Background Knowledge

Over the years, several different machines learning models are used to predict wine quality. The literature suggested that the LR and SVM provide better results than other models. In this section, the two commonly used classifiers are described.

2.1 The SVM Classifier

The support vector machine (SVM) [19] is a supervised machine learning model for solving a classification problem. The central concept of SVM is utilized the kernel function to find the hyperplane that can separate instances into categories. As mentioned earlier, the SVM has proven to be an effective classifier for wine quality prediction [7,8,9,10]. There are three hyperparameters in SVM that include penalty factor C, parameter gamma \(\upgamma \) and kernel function in SVM. The parameter C is a regularization parameter that controls the trade-off between maximizing the margin and minimizing the training error. The gamma parameter defines how far the influence of a single training example reaches. The final parameter is a kernel; there are three different main types of kernels linear, poly and rbf, which may fit best with the different dataset. Hyperparameter tuning relies more on experimental results than theory, and therefore the best method to determine the optimal settings is by trial and error.

2.2 Random Forest

Random forest (RF) [20] is a supervised learning algorithm, but different from SVM used on both classification as well as regression. Several studies have shown that using random forest can provide a good prediction accuracy, and one study showed an extremely high accuracy rate [14]. In this study, we used for classification problems. Based on the name, RF constructed from different trees and more trees means more robust forest. In general, the RF algorithm creates different decision trees on random data samples and then gets the prediction from each of the trees. Next step is to use a voting technique to select the best solution. It has an advantage over other methods because it reduces the over-fitting by averaging the result. In this study, we will tune the RF model to provide the best result. The random forest has six hyperparameters: (1) No. of estimators (2) Maximum features (3) Maximum depth (4) Minimum samples split (5) Minimum samples leaf and (6) bootstrap. Tuning hyperparameters can improve the accuracy of the model. However, if evaluating the model only on the training set can lead to overfitting.

3 Proposed Hybrid Wine Classification Model

In this paper, the goal is to predict the quality of wine using the hybrid wine classification model that composes of multiple classifiers. The flowchart of the proposed model is shown in Fig. 1.

Fig. 1.
figure 1

Flowchart of the hybrid wine classification model

From Fig. 1, it first selects the classifiers from the machine learning models pool. The hyperparameter of selected classifiers is them determine by the randomized search method. The selected models are gathered as a hybrid classification model. Then, the input red and white data sets are used to train and test the model. This paper attempts to provide a hybrid wine classification model that produces the best performance. The pseudo-code of the proposed model is illustrated in Table 1(Algorithm 1).

Table 1. Pseudo-code of the hybrid wine classification model.

In Table 1, the algorithm firstly selected at least two models (line 2). When the selection process is complete, the initial ranges of the hyperparameters associated with each model are set (line 3). For example, for SVM model (M0), the hyperparameters as pm = {C: [1, 100, 10000], gamma: [0.1 0.01, 0.0001] and kernel: [‘rbf’, ‘linear’, ‘poly’]} are initially set. The hyperparameters for each model is different, and there is no specific appropriate range of value for any particular model. Therefore, the trial and error strategy is used in the algorithm in order to find the proper range of values that can provide a better model performance. For example, the hyperparameters are fitted to the model, and the model is then evaluated using the predefined criteria (mainly accuracy). Base on the performance, it decides whether the range values should be added or removed from the initial setting (line 6). After the modification process, we compare the performance of new setting against past settings (line 7). If the new setting indeed finds “optimal” solution, then the process of trial and error can be interrupted. Otherwise, it will continue the process until reaching the required iteration setting (lines 4–9). Continue using SVM as an example, the hyperparameter setting after the trial and error process is pm = {C: [500, 1000, 10000], gamma: [0.01 0.001, 0.0001] and kernel: [‘rbf’]}. This range set of hyperparameters are passed to the next procedure. That is to find the best set of hyperparameter for SVM by using the random grid search method (line 11). After the tuning procedure, the selected models are merged to form a hybrid model Mnew (line 16). The model Mnew is then trained and tested for n times with different training and testing data for each iteration (line 18). The criteria used to evaluate the models are accuracy, precision, recall and f1-score. From past studies, prediction techniques like SVM and RF are commonly used and have better results. Therefore, in this paper, we use those two models as the selected models and the dataset collected by [6] as our testing dataset. Experimental results are shown in the next section.

4 Experimental Evaluation

4.1 Dataset Description

The wine dataset from the UCI database [6] that consists of two sets of wine data (red and white) is used in this paper. The red wine contains 1599 instances, and white wine contains 4898 instances. Both datasets contain 11 physiochemical variables, including fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, Sulphates, and alcohol. The output data (sensory data) is a quality rating from 0 (very bad) to 10 (excellent). We carefully examined the data to make sure no anomaly exists. As a result, for red wine, there are 24 duplicate instances, and 937 duplicate instances for white wine. However, those are not classified as an anomaly because all features and label values are precisely the same.

4.2 Performance Measure Metrics

The four criteria, including accuracy, precision, recall and f1-score, used to evaluate the performance of the model, are described in the following. The accuracy is to measure how often the classifier correctly classifies instances. The formulation is defined as follows:

$${{accuracy}} \, = \frac{{TP+TN}}{{TP+TN+FP+FN}}{,}$$
(1)

where TP (true positive) means the samples in the data which classified as belonging to the correct class, TN (true negative) means the samples in the data which classified as not belonging to the expected class correctly, FP (false positive) means the samples in the data which classified as belonging to the expected class incorrectly, and FN (false negative) means the samples in the data which classified as not belonging to the expected class, incorrectly.

In a multi-class condition, the micro and averaged accuracy, precision, and recall are always the same. Also based on past works, it is obvious the wine data is imbalanced, hence only using accuracy may not provide a clear picture. Therefore, the macro-averaging measurement for precision (macro-precision) and recall (macro-recall) are employed for a more detailed comparison.

The precision measures the percentage of the relevant results. When precision is one, it means the algorithm's prediction is perfect. The macro-precision will be lower than average precision for the model performing poorly on the rare classes even it performs well on specific classes. Since the measurement value can tell another story, hence it can be a complementary metric. Macro-averaging precision is performed by first computing the precision of each class, and then taking the average of all precisions.

$${Macro-Precision }{{MP}}_{{Classifier}}=\frac{{\sum }_{{c}}{{{P}}}_{{c}}}{{number of classes}}$$
(2)

The recall measures the percentage of total relevant results correctly classified by the algorithm. When the recall is one, it means that all truly positive samples were predicted as the positive class. Similar to micro-precision, the value will be lower if rare classed performed poorly.

$${Macro-Recall }{{MR}}_{{{classif}}{\text{i}}{{er}} \, }= { } \frac{{\sum }_{{c}}{{{R}}}_{{c}}}{{number of classes}}$$
(3)

Accuracy is useful when the class distribution in the dataset is even, but F1-score is a better metric when the dataset has imbalanced classes. Hence, we also used it as a criterion to evaluate the performance of the model, and F1-score is defined as follows.

$${Macro-F}{1} \, {{F1}}_{{{classif}}{{i}}{{er}}} \, = { } \frac{{\sum }_{{c}}{{{F1}}}_{{c}}}{{number of classes}}$$
(4)

4.3 Experimental Analysis

Since most of the past works mainly focus on accuracy, therefore, we compared the accuracy of the proposed model against others. In addition, because most works set the training and testing datasets ratio to 80/20, we also set the same ratio for comparison. For comparison, we included the work from Cortez et al. [6] and Apalasamy et al. [17] for performance comparison. The accuracy of each model is shown in Table 2.

Table 2. Comparison of different models in terms of accuracy.

In Table 2, the accuracies of the proposed model for red and white wines are 0.66 and 0.67. The results reveal that the proposed model performed slightly better than other models. Also, like most models, the performance of white wine is slightly higher than red wine. The experiments were then made to examine further the performance of the proposed model under different training and testing data ratio. The results of different testing size for red and white wine of the proposed model are shown in Table 3.

Table 3. Result for different testing data ratio

For red wine, the accuracy was at the highest when testing dataset ratio set at 20%. The macro-precision and recall are low across different ratio. It means the amount of false positive is very close or equal to a false negative. The F1 score for red wine gradually decreases with the increase in ratio. The white wine shows a different result, where the precision is always higher than recall. When the ratio was set at 10%, the accuracy and F1 score are at the highest. Also, the macro-precision and recall are at the closest. The low macro-F1 score for red and white wine indicates the data is highly skewed on some classes.

5 Conclusions and Future Work

This paper has proposed a hybrid wine classification model for quality prediction, which is unlike most past works focusing on which machine learning models provide the best performance in predicting wine quality. The proposed algorithm first selects n models from the given model pool. Then, the hyperparameters are then searched by the randomized search method. The models with acceptable performances are merged as the hybrid model. Experiments were done on the real dataset that contains red and white wines, and the results indicated the proposed hybrid model is effective in terms of accuracies when compared to other existing approaches. In the future, we will continue to design an algorithm that can be used to obtain both the hybrid models and hyperparameters for any wine dataset based on evolutionary algorithms.