Introduction

According to the OIV, global wine consumption in 2020 would be 234 million hectolitres (Mhl). In comparison to 2019, this is a 3% (7 Mhl) decrease. Consumption has decreased for the third year in a row. It is at its lowest point since 2002 (Karlsson 2020). The USA, France, and Italy are the top three wine-consuming countries. Portugal, Italy, and France are the three countries with the highest per capita wine consumption (Karlsson 2020). A diet high in low-quality foods raises your risk of chronic diseases, whereas a diet high in high-quality foods protects you (Renee 2010). Certain types of cancer are even influenced by diet. According to the World Health Organization, a nutritious diet reduces the risk of malignancies such as colon, breast, and kidney(Diet, nutrition, and the prevention of chronic diseases: report of a Joint WHO, FAO Expert Consultation (World Health Organization 2003). Every year, an estimated 600 million individuals—about one in every ten people in the globe—become unwell after eating contaminated food, and 420,000 die, resulting in the loss of 33 million healthy life years (DALYs) (World. Food safety 2020). Alcohol usage at the age of 15 predicted weekly alcohol consumption and alcohol intake exceeding the prescribed level 4 years later. The increased alcohol intake of young teenagers was not a passing fad. It was a pattern that continued throughout young adulthood, putting the teenagers at a higher risk of becoming long-term, large-scale consumers (Andersen et al. 2003). At the age of 19, at least 80% drank alcohol monthly, and 24% of men and 11% of women used alcohol in excess of the prescribed national limits, i.e., 21 weekly units of alcohol for males and 14 for women. Use of alcoholic drinks at the age of 15 increased the likelihood of weekly alcohol consumption at the age of 19 (odds ratio [OR]—values ranging from 1.11 to 3.53). Drunkenness among 15-year-old boys and spirit use among 15-year-old girls showed the best predictive connection with excessive consumption at age 19 (OR = 2.44, confidence interval [CI]: 1.38–4.29, respectively, OR = 1.97, CI: 1.15–3.38) (Andersen et al. 2003). Excessive alcohol intake is associated to a number of undesirable outcomes, including being a risk factor for diseases and health effects, criminality, traffic accidents, and, in some cases, alcohol dependence. Each year, 2.8 million people die prematurely due to alcohol usage around the world (Ritchie and Roser 2018). For hundreds of years, red wine has been a component of social, religious, and cultural gatherings. Monasteries in the Middle Ages believed that their monks lived longer because they drank wine on a regular, moderate basis. According to a report published in 2018 (Golan et al. 2019), drinking red wine in moderation has positive ties with trusted source; although there are no official guidelines around these advantages, drinking red wine in moderation has positive linkages with cardiovascular disease, atherosclerosis, hypertension, certain types of cancer, type 2 diabetes, neurological disorders, metabolic syndrome. Red wine, which is created from crushed black grapes, is a good source of resveratrol, a natural antioxidant found in grape skin (Abu-Amero et al. 2016). Antioxidants help the body fight oxidative stress. Many diseases, including cancer and heart disease, have been linked to oxidative stress. Fruits, nuts, and vegetables are just a few of the antioxidant-rich meals available. Whole grapes and berries have more resveratrol than red wine, and because of the health hazards associated with alcohol consumption, receiving antioxidants from food is likely to be healthier than drinking wine. To receive enough resveratrol, people may need to drink a lot of red wine, which may cause more harm than good. When it comes to alcoholic beverages, however, red wine may be more beneficial than others. Whole grapes and berries have more resveratrol than red wine, and because of the health hazards associated with alcohol consumption, receiving antioxidants from food is likely to be healthier than drinking wine. To receive enough resveratrol, people may need to drink a lot of red wine, which may cause more harm than good (Smith 2020).

The outline of the rest of the paper is as follows: In the second section, we have included the literature review in which we referred to various research works and explained the viability and performance of the different algorithms related to heart disease prediction. In Section 3, we have explained different machine learning algorithms. In Section 4, the proposed framework has been explained in details including model selection, parameter setting, experimental setup & proposed methodology. In Section 5, performance metrics, comparison of proposed framework with existing machine learning (ML) models and with existing literature is explained. Results are shown with respect to existing models and literature in this section. Section 6 contains the conclusion and future scope.

Literature Review

This section explains the 5 different research work done previously and how they approached the problem and their methodologies. Every year, an estimated 600 million individuals—about one in every ten people in the globe—become unwell after eating contaminated food, and 420,000 die, resulting in the loss of 33 million healthy life years (DALYs) (World. Food safety 2020). As a result, researchers are coming up with different approaches. Few of them are discussed below. In (Kumar et al. 2020a) authors have applied algorithms such as random forest, support vector machine (SVM), and Naive Bayes. Alongside testing accuracy, the author tested training accuracy as well (Table 1).

Table 1 Comparison of existing approaches for wine quality prediction

Machine learning algorithms have revolutionised how data analytics and data mining works. Many researchers since the dataset was made available had used robust models and different metrics to achieve better results. Cortez et al. (2009) used simple multiple regression, support vector machine, and neural networks. On the other hand, Er and Atasoy (2016) used 4 different techniques to experiment with the results, but the models remained the same which included support vector machine, random forest, k-nearest neighbourhood. The first technique was cross-validation, followed by percentage split, cross-validation (after PCA), and percentage split after using PCA. Cross-validation after PCA resulted in highest accuracy among all the methods used. This influenced our research work. Gupta (2018) experimented by selecting few features and discarding few based on the correlation among the variables. This resulted in the most robust model. Kumar et al. (2020b) used three models SVM, Naive Bayes, and random forest while taking all features in account. Ahammed and Abedin (2018) used linear discriminant analysis on red wines and got considerably high precision, recall values. Lee et al. (2015) saw the potential in decision tree as the first bagging method. Wie (2012) reporting was based on ROC-AUC scores. His study was solely based on decision trees. Our study is an advance in red wine quality prediction as we have taken into account the skewness and standardisation of data.

Our Contribution

  • The proposed framework consists of stacking-based ensemble learning which adds diversity in the classifier.

  • Skewness and Gaussian distribution and class imbalance are addressed.

  • Hyperparameter tuning is used in order to select the best parameter for ML model training.

  • The performance of the proposed framework is compared with existing literature on the basis of accuracy, precision, sensitivity, precision, and F1 score

Background and Preliminaries

This section explains the various machine learning classification methods that are used in the proposed framework. Before the final ensembling of top performing models, other classifier models were attempted. Ten different classifiers were trained on a training dataset. After the initial training, 4 models were selected based on their accuracy measure.

  1. A.

    Random Forest

The random forest classifier is made up of a series of tree classifiers, each of which is constructed using a random vector sampled independently from the input vector, and each tree casts a unit vote for the most popular class to categorise an input vector (Breiman 1999). To grow a tree, the random forest classifier utilised in this study uses randomly selected characteristics or a mixture of features at each node. For each feature/feature combination chosen, bagging, a method of generating a training dataset by randomly drawing with replacement N samples, where N is the size of the original training set (Breiman 1996), was employed. Any examples (pixels) are categorised by selecting the class with the highest number of votes from all tree predictors in the forest (Breiman 1999).

  1. B.

    K-Nearest Neighbour (KNN)

The k-nearest-neighbours (kNN) approach is a basic but effective non-parametric classification method (Hand et al. 2001). To classify a data record t, its k-nearest neighbours are collected, forming a neighbourhood of t. Majority vote among data records in the neighbourhood is commonly used to determine the classification for t, with or without taking distance-based weighting into account. However, in order to use kNN, we must first select an appropriate value for k, and the classification’s success is heavily dependent on this number. In certain ways, the kNN approach is influenced by k.

  1. III

    Support Vector Classifier

SVMs are based on statistical learning theory and try to determine the position of decision boundaries that result in the best class separation (Vapnik 1999). SVMs choose the one linear decision boundary that leaves the biggest margin between two classes in a two-class pattern recognition task when classes are linearly separable. The margin is defined as the total of the distances from the nearest points of the two classes to the hyperplane (Vapnik 1999). Using traditional Quadratic Programming (QP) optimization techniques, this problem of maximising the margin can be solved.

  1. IV

    Naive Bayes Classifiers (Leung 2007)

Statistical classifiers are Bayesian classifiers. They are capable of predicting class membership probabilities, such as the likelihood that a given sample belongs to a specific class. The Bayesian classifier is based on the theorem of Bayes. The influence of an attribute value on a particular class is assumed to be independent of the values of the other attributes by Bayesian classifiers. This is known as class conditional independence. It is designed to simplify the computation and is hence termed “naive.”

  • E. XGBoost (Smith 2020)

XGBoost is a scalable gradient boosting system that focuses on speed and performance. Intelligent tree penalization, proportionate leaf node reduction, and other randomisation settings make it apart from traditional gradient boosting algorithms.

  • F. Ensemble Learning (Lappalainen and Miskin 2000)

Ensemble learning learns by executing “base learner” multiple times. The final vote is casted on the hypothesis, and final weights are executed on “meta models.” Various types of ensemble technique include bagging and boosting.

  • G. SMOTEENN (Prati et al. 2004)

This technique is very helpful in solving class imbalance as it generates synthetic data points by SMOTE using the ENN algorithm. Synthetic data points are very different from duplicate points.

  • H. Gradient Boosting Algorithm (Friedman 2001)

The gradient boosting approach may predict both continuous and categorical target variables. Mean square error (MSE) is the cost function when used as a regressor, and log loss is the cost function when used as a classifier.

Proposed Framework

In this section, we explained model selection criteria and parameter setting of different algorithms used in the building the framework. Experimental setup and proposed methodology have been explained further.

Model Selection

Figure 1 all-inclusive depicts proposed framework for red wine quality prediction. Firstly, we have taken data from UCI Machine Learning Repository (only red wine data), which is explained in Section 3.3. We deleted outliers after thoroughly analysing the data and discovering correlations among other parameters. Then, we split our data into two partitions: training data consisting of 80% of instances and testing data with 20% instances. As Chao Ye et al. (2020) used XGBoost and LightGBM that gave highest accuracy, we included these models after considering every predefined model, and these models helped to increase the overall accuracy. Considering red wine quality prediction literature review, most of the authors who achieved considerable accuracy (Kumar et al. 2020b), Cortez et al., (2009), Er and Atasoy, (2016), Gupta, 2018 (Gupta 2018 Jan)) have used SVM. Naturally, SVM was to be included in model selection section. Various bagging and boosting algorithms are proven to increase the accuracy considerably as we can see in Chao Ye et al. 2020 (Ye et al. 2020), which is why apart from LightGBM and XGBoost other algorithms including gradient boosting algorithm, decision tree, random forest were considered while selecting models for stacked ensemble-based classifier.

Fig. 1
figure 1

The proposed ensemble framework for wine quality prediction

Apart from selecting our models using literature survey, hyperparameter tuning is also done to maximise the accuracy which is discussed in Section 4.2. After our model was built, we used stacked classifier class to perform ensemble learning that uses meta classifiers on specified base learners chosen by us. The aim is to have a diverse set of learners together. Given various classifiers, we choose the one with higher accuracy to be used as one of the base learners.

Parameter Setting

In this section, we discussed various factors that were used to improve the accuracy of our stacked ensembled model. XGBClassifier, random forest, SVM, gradient boosting classifier were with highest accuracy. These four were chosen as a base learner for the ensemble model. Hyperparameter tuning on the following model was done to further improve the accuracy which, as shown in Table 2, can in turn improve the accuracy of stacked classifiers. Random state is 42 throughout.

Table 2 Hyperparameters used

Experimental Setup

Dataset

The data was retrieved from UCI Machine Learning Repository (Cortez et al. 2009). It contains 11 input variables based on physicochemical tests which include fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulphur dioxide, total sulphur dioxide, density, pH, sulphates, alcohol. The output variable called quality contains variable indicating lowest quality wine with 3 going up to 8 which depicts good quality wine. This makes it a multiclass classification problem. Table 3 gives a detailed description of the attributes given in the dataset.

Table 3 Description of nominal attributes

First of all, class imbalance is analysed in the dataset, as it is a multiclass classification problem, and we are predicting values for each class as essential to address the imbalance by SMOTEEEN. Further, the few features were highly skewed which could have resulted in partial results; as a result, columns whose skewness was greater than 0.75 were corrected by using Power Transformer library. Other than quality which contains discrete values, every other column contains continuous values.

Data Visualisation

Multicollinearity can drastically affect the prediction of a machine learning algorithm. Multicollinearity affects the precision of computed coefficients, lowering your regression model’s statistical power. You might not be able to rely on p values to detect statistically significant independent variables (Frost 2017). Figure 2 shows multicollinearity between different attributes. As we can see, this dataset is free from multicollinearity, but there is a heavy class imbalance as shown in Fig. 3; we have corrected the imbalance by using SMOTEENN library. As shown in Fig. 4, few attributes are highly skewed which can make the prediction wrong. To correct this, we identified highly skewed columns (whose skewness was greater by 0.75), which were corrected by using Power Transformer library. Another example of class imbalance can be seen in Fig. 5 where we can thoroughly examine the distribution of class among various attributes.

Fig. 2
figure 2

Multicollinearity of attributes

Fig. 3
figure 3

Class imbalance (pie chart represents distribution of each class, bar chart—X-axis: total count of target variable; Y-axis: target variable)

Fig. 4
figure 4

Skewered attributes (X-axis: total occurrence of an attribute; Y-axis: attribute)

Fig. 5
figure 5

Analysing each column using barplot with quality

Proposed Methodology

Stacking is a type of ensemble learning where base learners predictions are done on the bases of meta-learner through union of various algorithms. In the proposed algorithm, top performing models are selected from all the algorithms and combined together to give even higher accuracy. The data is taken from UCI Machine Learning Repository, and the data is split into 80% training and 20% testing after correcting class imbalance through SMOTEENN, and skewness is corrected by using Power Transformer library.

The following attributes skewness is corrected because the value was high as 0.75: chlorides, total sulphur dioxide, residual sugar, free sulphur dioxide, sulphates, volatile acidity. Figure 6 shows our proposed methodology as a flow diagram.

Fig. 6
figure 6

Proposed methodology

Algorithm 1 below describes our proposed methodology which is divided into 3 different phases namely:

  • 1. Phase I: Preprocessing phase

  • 2. Phase II: Training phase

  • 3. Phase III: Testing phase

Preprocessing phase is responsible for data mining. Data visualisation is done to find various insights from the data and make required changes. For example, duplicate rows are removed, and data skewness and class imbalance have been corrected in this phase. Training phase is responsible for finding highest performing models based on accuracy, ensembling them, and learning all the weights and parameters required to make accurate predictions from ensembled framework. The final phase which is the testing phase evaluates the parameters and weights learned on unseen data to analyse our proposed ensembled framework. Symbols used in Algorithm 1 are mentioned in Table 4 given below.

Algorithm 1
figure a

This algorithm describes the phases to design proposed ensemble model

Table 4 Symbols used in Algorithm 1

Final outcome of the above stated algorithm classifies the Red wine into 5 categories.

Categories = {3,4,5,6,7,8}.

Phase 1: Preprocessing Phase

The dataset which is used in a form of matrix is highly skewed and contains imbalanced class distribution which can lead to inaccurate prediction hence low accuracy and precision. Also, the matrix contains many duplicate rows which can also contribute to skewed predictions. Initially, duplicate rows from the matrix are removed. To address the issue of class imbalance, we have used SMOTEEEN technique which generates synthetic data points and Power Transformer library to correct the skewness of attributes. After preprocessing, the dataset is divided into 80% training and 20% testing dataset.

Pseudo Code: Phase I

figure b

Training Phase

After removing duplicates, addressing class imbalance and correcting skewness of the dataset follow. We can now run different machine learning algorithms on 80% training dataset and judge the algorithms by taking accuracy as a metric. Top performing ML algorithms will be taken into account which can be stacked together.

Pseudo Code: Phase II

figure c

Testing Phase

In this phase, stacked model parameters are tested on the 20% testing dataset. Initially, the accuracy is tested; then, classification report is generated. The accuracy of our proposed algorithm is 98.36% which outperforms the previous literature work.

Pseudo Code: Phase III

figure d

Results and Discussion

In this section, we have discussed the results and analysis of our proposed framework. Different performance metrics have been used to evaluate the algorithms. Further, we have compared our model with other existing models and its comparison with respect to accuracy, precision, sensitivity, precision, F1 score, ROC, and MCC. We have also discussed proposed models with different algorithms and models covered in Section 2.

Performance Metrics

The following five parameters are used to assess the performance of the proposed framework:

  • 1.Accuracy: The value predicted when the sum of True Positive and True Negative is divided by the sum of True Positive, False positive, False Negative, and True Negative values of a confusion matrix.

    $$Accuracy=\frac{(True\;Positive+True\;Negative)}{(True\;Positive+False\;Positive+False\;Negative+True\;Negative)}$$
    (1)
  • 2.Precision: The value obtained when True Positive is divided by the sum of True Positive and False Positive values of a confusion matrix

    $$Precision=\frac{True\;Positive}{(True\;Positive+False\;Positive)}$$
    (2)
  • 3.Recall: Sensitivity sometimes also known as Recall. It is the value obtained when True Positive is divided by the sum of True Positive and False Negative values of a confusion matrix.

    $$Recall=\frac{True\;Positive}{(True\;Positive+False\;Negative)}$$
    (3)
  • 4.F-Measure: F1 score is obtained by multiplying recall and precision divided by sum of recall and precision of a confusion matrix. Result is then multiplied by two.

    $$F1 score=\frac{2 * (Recall * Precision) }{ (Recall + Precision}$$
    (4)

Comparison with ML Models

We created several baseline models, but the models with highest accuracy were chosen for stacking purposes as ensemble modelling adds more diversity to the predictions. We have chosen accuracy as the metrics to judge models initially. Accuracy of all the models used is shown in Table 5. Top 4 best performing models which include XGBClassifier, random forest, SVM, gradient boosting classifier are stacked together to give even better accuracy. We also tested out proposed algorithm on various factors including accuracy, precision, recall, F1 score. Figure 7 graphically represents the comparison of different machine learning algorithms with our proposed algorithm. As we can see, our proposed algorithm outperforms the existing algorithm and previous literature work as shown in Fig. 8. Hence, our work is an advance in red wine classification. We stacked these classifiers to get accuracy of 98.36%. As it is a multiclass classification problem, we got an average of 98.0% precision and 98% recall. As shown in Table 6, besides accuracy we have calculated precision, recall, and F1 score for comparison with other algorithms.

Table 5 Comparison of ML algorithm and their respective accuracies
Fig. 7
figure 7

Comparison of accuracy of proposed framework with different ML models

Fig. 8
figure 8

ROC-AUC comparison for each class

Table 6 Comparison of proposed framework with existing ML models

Additionally, to analyse our models, we have built ROC (receiver operating characteristic curve) because it shows the tradeoff between specificity and sensitivity for every combination of tests. As we can see in Fig. 9, our proposed algorithm ROC curve is approximately perfect. The better the model, the closer the area of ROC is to 1. As our problem is multiclass classification, ROC curve has taken macro average into account. Figure 8 depicts multiclass classification more comprehensively by plotting the curve for each class through our proposed framework. Ensembled model helped us to add diversity and multiplicity in our model. Further stacked based models add assortment which means if an individual model gives a wrong prediction about a certain feature, another model used in a stacked-based ensemble may have a chance to correctly identify the same feature.

Fig. 9
figure 9

ROC curve comparison

Our work greatly contributes towards food/wine analytics as we are able to classify good and worst quality of wine while outperforming the literature review. This can greatly impact the future research work, which can nearly perfectly predict the correct quality of the food item.

Comparison with Existing Literature

Our proposed algorithm shows a perfect ROC curve and good accuracy, and it can be considered an advance in red wine quality prediction in return, an advance in classifying quality of any other food item. Chao Ye et al. (Ye et al. 2020) used XGBoost which influenced our work. Further, every author used SVM and random forest which heavily influenced our work. As shown in Fig. 10, our proposed methodology outperforms previous literature work done on the dataset. This can be used further in biomedical research work relating to food/water quality predictions. Applying ensembled-based stacking can benefit further research work as it provides diversity to classifiers and increases other parameters like accuracy and precision.

Fig. 10
figure 10

Comparing proposed model with existing literature

Conclusion and Future Work

In this paper, we offer a machine learning-based computational framework for predicting red wine quality. The proposed framework successfully sorted red wines into various classifications. The proposed framework’s key contribution is handling skewed and imbalanced data utilising a power transformer and the SMOTEENN technique. Furthermore, ensemble learning helps to increase variety among the base learners, which increases prediction accuracy. Precision, specificity, recall, and F1 score are used to evaluate the performance of all approaches. On a single benchmark dataset, all algorithms are trained and tested. Most present strategies do not take into account the uneven nature of data and skewed data when creating red wine quality prediction tools. The proposed framework addresses the challenges of class imbalance and data skewness.