1 Introduction

Thermal comfort is the result of a well-balanced mix of building systems that are tailored to the building’s location as well as the sort of activity carried out within the building or room. The design of an energy-efficient building envelope is a good place to start. The American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) Standard 55 describes thermal comfort as the condition of mind that expresses satisfaction with the thermal environment. Thermal comfort significantly impacts humans both physiologically and psychologically. Moreover, thermal comfort varies across different geographic locations and climatic zones. People living in the same buildings exposed to a similar atmosphere can have different opinions on thermal comfort level [1]. Buildings also play an essential role in defining a thermally comfortable environment. Specifically, there exist two types of buildings: heating, ventilation, and air conditioning (HVAC) and non-ventilated (NV). HVAC buildings, with heating and air conditioning systems, try to provide a near-optimal thermally comfortable environment. On the other hand, NV buildings have manual heating and air conditioning systems to accommodate occupants. In this research, our focus is to propose a thermal comfort model using data collected from the HVAC buildings and coverings in summer and winter.

With increasing age or illness, the ability to sense the temperature decreases. The perception of a sudden temperature change is difficult for aged persons than youngsters. Due to this delayed perception, they are highly vulnerable to severe decreases like hypothermia [2] or hypothermia [3], which in some cases can cause death. Hypothermia is when heat absorbed by the human body is less than the heat produced by the human body. In the human body, metabolic rate is the total energy produced by the body, and the energy loses by the body. Sensitivity to feel cold and hot surfaces usually declines with aging. This is mainly because of the lower parts of the body’s slow response, which are responsible for maintaining the body’s core temperature at ideal values.

Several models are proposed using machine learning algorithms for human thermal comfort [4,5,6,7,8] but still, the most relevant feature that can predict thermal comfort accurately considering both psychological and physical factors is an open research area.

We summarize the main contributions of this research work as follows:

  • Propose a novel human thermal comfort sensation model using machine learning algorithm that: Select optimal feature set that can potentially determine thermal comfort. Balance data to improve the representation of classes with fewer samples. Perform classification to predict the thermal sensation. Effectively enhance the detection rate of thermal comfort with consistent performance in comparison with existing studies.

The rest of the paper is organized as follows: Sect. 2 reviews the related work on human thermal comfort approaches. Section 3 presents the proposed methodology for the thermal comfort sensation model. Section 4 presents the experimental analysis and describes the results in comparison with the state-of-the-art studies. Finally, Sect. 5 concludes the paper and leads towards future work.

2 Literature Review

Many models predict human thermal sensation in the literature consisting of features related to physiological, environmental, and psychological measures. As per output, these models calculate the key to predict the mean value of given votes for a group of occupants on a thermal sensation scale, also called predictive mean vote (PMV) value with various accuracies. Thermal equilibrium can be obtained when the heat loss and an occupant’s internal heat production are the same. We divided the literature into two parts: (1) Fanger’s model, widely adopted in the heat balance approach, and (2) machine learning models to predict thermal comfort sensation. Below we explain the key research work in both domains.

2.1 Heat Balance Approach (Fanger’s Model)

Fanger’s model is a widely adopted approach based on the heat balance equation, also known as Fanger’s comfort equation. Fanger’s model [9] is based on the metabolic rate heat production and total heat losses by the mechanical work within the body and heat losses in the environment. The equation of Fanger’s model is derived from the data collected from subjects in different environments like offices, schools. For this experiment, thermal data are collected for 3 h from a thermally controlled environment. For that time, subjects filled a questionnaire form for feedback regarding their thermal sensation. Occupant’s data were collected using a survey feedback form and saved in the database at different time intervals. In this model, environmental and physiological parameters are mapped on the ASHRAE seven-point thermal scale, also called predicted mean vote (PMV). ASHRAE seven scale votes are from − 3 to 3 with the step of 1, − 3 is for too cold, and three is for too hot as shown in Table 1.

Table 1 Seven-point thermal sensation scale

PMV is used to find out the number of unsatisfied occupants from all occupants. The number of unsatisfied persons can be reduced by settling different thermal settings that also can derive using the heat balance equation. Accuracy for the ASHRAE RP-884 dataset is based on the data of inside buildings. American Society of Heating, Refrigerating, and Air Conditioning Engineers (ASHRAE) reported that their model satisfied around 80% of the occupants inside buildings. Although many researchers find out different inconsistencies in PMV and Actual Mean Vote (AMV). In the heat balance equation, the metabolic rate equals the mechanical work and the total heat loss. Total heat loss can be due to convection, radiation, evaporation, or conduction. Fanger’s heat balance Eq. 1 [9] is explained below:

$$\begin{aligned} M = W + Q \end{aligned}$$
(1)

where \(Q = C + R + E + H\). Here, M represents the metabolic rate, W represents the mechanical work, Q is the total heat loss, C is the heat loss/gain by conviction, R represents the heat loss/gain by the radiation, E represents heat loss/gain by evaporation, and H represents the heat loss/gain by conduction.

The purpose of the article [10] is to report the findings of an experiment in which wristwatch sensors were used to estimate occupants’ thermal comfort under various ambient circumstances. This study aims to determine the measurement accuracy of smartwatches when employed as thermal comfort sensors in HVAC control loops. The researcher in [11] used machine learning to build a personalized overall thermal sensation (OS) model for evaluating the term psychological effect of local radiant heating and simulating the OS of EV passengers. The data were collected from a genuine electric vehicle that passed through a chilly environmental room and analyzed using the random forest method. This article [12] tackles data insufficiency concerns by utilizing ‘transfer learning’ to transfer previously acquired information from a source domain (similar climate zones) to a target domain (different climate zones) with sparse modeling data. To be more precise, a transfer learning-based convolutional neural network–long short-term memory neural network (TL CNN-LSTM) is created for effective thermal comfort modeling that uses the spatiotemporal relationships inherent in the thermal comfort data. The Chi-squared test is used to determine the significant modeling parameters for TL CNN-LSTM. The researcher in [13] proposes the system design of a thermal comfort system that would enhance occupant comfort while reducing energy consumption, even when outliers in groups such as persons of varying ages are included. The sensing stage data were gathered using the ASHRAE RP-884 database. Then, utilizing various machine learning methods, we use a machine learning methodology to predict the thermal sensation vote (TSV) of the thermal comfort model.

There are some problems with Fanger’s model which cannot be neglected. They conduct their experiment mostly in-office setting and covering a specific age group only. Further, they did not investigate the impact of age, geographical location, or outdoor weather conditions in their analysis. Moreover, the model proposed is static, thus does not adapt to the changes in the environment.

2.2 Machine Learning Approach

Machine learning approaches consider physiological, psychological, and behavioral features for thermal comfort. The basic idea of machine learning comfort models is that outdoor temperature influences indoor temperature; similarly, humans adopt a different temperature with different seasons. Authors in [4] described that every person has their thermal comfort zone and tried to map that thermal zone according to his desired thermal comfort zone. Machine learning approaches are mainly based on datasets gathered from different field studies. Their predictions are based on a large-scale publicly available dataset and the LIBSVM [14] classifier to predict the value of PMV with an accuracy of 76.7%, which is almost two times greater than the widely adopted heat balance equation model. Their approach takes a feature set as input and predicts thermal sensation class as output: neutral, cold, or hot. Their approach can predict a person’s thermal comfort in real time and suggest appropriate action according to the predicted thermal sensation class. They used tenfold cross-validation for the evaluation of the results.

Occupants have multiple building options to interact with heating and cooling equipment, including windows in modern building design. Artificial neural network (ANN) is used in [15] to predict the thermal sensation. ANN has excellent potential to predict thermal comfort votes if sufficient data is given as input to train the ANN model. They used the feed-forward neural network, an ANN type in which information is only passed forward. They used supervised learning. They used seven-point thermal scale for output. Neural network algorithms are good to learn highly nonlinear relationships by adding hidden layers and sufficient datasets.

Authors in [8] used a Bayesian network to predict the thermal sensation vote. Their model can adapt to the user’s preference temperature. For changing climate conditions, they combined adaptive and static models to obtain good results. They evaluated their model using data from 10 cities consisting of 553 participants. They used 7-point ASHRAE scale, and their results outperformed 17.5–23.5% on existing approaches after a short initial learning phase.

Authors in [7] proposed an indoor environment control integrated framework to evaluate criteria for the thermal comfort models. Their thermal comfort model consists of the following steps: data collection, which describes how to collect data and what data are essential for learning algorithms. In the second step, data are preprocessed such that raw data are collected in a form that is acceptable by learning algorithms. The third step is to select the model according to the problem and collected data. In the fourth step, the performance of the current model is compared with existing ones. The last step is continuous learning to make sure that models perform well on new and unseen data.

Energy efficiency is a significant issue, almost 12% of the energy consumed in domestic heating [16]. Thermal comfort is supposed to be able to turn on and off HVAC systems autonomously. By reducing interaction with the users, the thermal comfort system can be more energy efficient. Thermal models should be able to learn about the thermal environment and the user’s schedule. A novel thermal model is introduced to enable to learn user’s preferences [16]. This model is based on the user’s thermal comfort scores and provides 13.2–25.8% more accurate results than the previous models. According to their suggested thermal scale, their model increased comfort and reduced discomfort by 13.5%.

Table 2 Detailed analysis of thermal comfort approaches and dataset used in some related work

Implementing a thermal comfort model in large buildings (i.e., stadium) is very tough because of the different zones and population [17]. The authors divided the stadium into four different zones, and each zone changes its temperature rapidly due to population and other factors. Their thermal comfort method can predict the thermal environment of every zone using six factors. That affects thermal comfort (indoor air temperature, mean radiant temperature, and clothing) and some fixed factors (air velocity, metabolic rate, and relative humidity). Evaluation criteria are different here because of different zones, and they used their sensors generated dataset for evaluation. Vellei et al. [18] described that many factors influence thermal comfort models. However, relative humidity plays a significant role in finding the best thermal comfort for individuals. They evaluate their model with and without humidity and report 30% higher results using humidity than without relative humidity.

In Table 2, we explore detailed literature work regarding machine learning algorithms used to predict thermal comfort sensation. Various studies used different machine learning techniques to predict human thermal sensation class such as support vector machines [4, 19], artificial neural networks [15, 16] [17, 20], Bayesian networks [7, 8, 21], Adaboost [4], fuzzy logic [22], regression [7], logistic regressions and tree-based methods [18, 19], and random forest [4, 20, 22] with ASHRAE RP-884 [23] having some of them their own dataset. In all studies we discussed here, same dataset is used for their experiments which is publicly available, from some of them have created their own dataset by implementing sensors to the building in controlled environment and compare results on both datasets.

In summary, the accuracy reported in the past papers is between 13.2 and 30% higher from previous work, which was almost 38%. The reported accuracy so far is low and needs improvements. Our proposed approach overcomes the limitation of the past studies and provides overall promising results for predicting thermal comfort sensation votes.

In the next section, we describe how we preprocess raw data and extract meaningful features. Then, the dataset is balanced for classification algorithms. In the last, we discuss the results of our implemented methodology.

3 Proposed Approach

The proposed approach consists of four major steps: data preprocessing, feature selection, data balancing, and classification of thermal sensation vote class. First, we extract features that can potentially determine thermal comfort from the sensing dataset. Next, feature selection is performed to acquire a feature subset that correlates with thermal comfort classes. Next, data balancing is performed to increase the number of instances in the class with fewer samples to improve the representation of the minority class. Finally, classification is performed to predict the thermal comfort classes. Figure 1 summarizes our approach.

Fig. 1
figure 1

Block diagram of the proposed approach

3.1 Feature Extraction

We extract several potential features from the dataset described as follows:

  • Age: With time perception of feeling cold and hot delays, thus age is the important feature to be the part of the feature set.

  • Time: Due to the metabolic rate of the human body, that produces more heat when people eat or use to walk on foot. Their body produces more heat usually when they come to work in the morning. So the time feature plays a major role in affecting thermal comfort.

  • Metabolic rate: It is the rate at which the human body burns calories, and it usually gets high when you are gaining weight or doing some work that consumes your energy. When the metabolic rate is high, body produces more energy than it dissipates.

  • Clothing insulation: It is important to know that what occupants are wearing for predicting thermal sensation, how much clothing insulation a person has for how much clothing insulation with that temperature.

  • Chair clothing insulation: It affects your body temperature and thermal comfort directly that how much your chair has clothing insulation. Chairs with cloth (or leather) insulation are less cold than naked (without insulation).

  • Average indoor air temperature: Indoor air temperature is important because it represents the higher, medium, and lower levels of temperature.

  • Mean radiant temperature: Mean radiant temperature is calculated by how far the person is from the heating or cooling source and by which angle. So this information is important to predict thermal comfort.

  • Air velocity: Airspeed/velocity is important because it affects the temperature of the human body. High air velocity will give low temp and vice versa.

  • Average indoor relative humidity: It is important that how much water vapors are in the air to feel the person thermally comfortable, so we include relative humidity as one of the input features.

  • Average outdoor air temperature: Outdoor temperature affects the indoor temperature, so we take an average of the low, medium, and high levels of outdoor temperature.

  • Average outdoor relative humidity: Relative temperature affects directly human thermal sensation that is why this feature is also added to the initial feature set.

3.2 Data Preprocessing and Normalization

Preprocessing and normalization are essential in building a high-performance machine learning model, otherwise leading to high misclassification. In our study, the dataset contains missing values (nan) and some false information, such as, in the age feature, there exists some age index containing “0” values. 0 cannot be the age of an occupant, so we considered them as missing values, and in the gender (sex) feature, some values are minus one (− 1), which we considered as typo error and replaced them with one. In feature age, zero is used to represent males, and one is used for females. To handle missing data, data imputation is an important step.

Figure 2 shows the count of missing values up to 30%. Several studies use separate data files consisting of occupant’s physiological conditions and building’s environmental variables. There are multiple buildings data within one data file, and within one building, there are multiple occupants. Each instance of the dataset consists of occupants’ data-related physiological conditions of individuals (age, gender, clothing insulation, metabolic rate) and environmental variables measurement (temperature, airspeed, air turbulence, air velocity). Some feature data are missing from these files. A significant part of data is missing because this main project consists of multiple individual small projects, where each researcher from small projects submitted these small datasets to make one combined dataset called RP-884.

Fig. 2
figure 2

Missing values in dataset

To handle these limitations, we first divide the data into target classes. We predict the missing values of each class accordingly and combine all three classes to make a complete dataset. Using a machine learning model, we use various data imputation techniques to pick out the best techniques based on accuracy. Below we explain various data imputation techniques:

3.2.1 Imputation Using Median

First, we separate the data class-wise. We have three classes, so we separate the data for each class. Then, we calculate the median of each class’s data and impute the missing values accordingly. After that, we combine all three classes that are used later for further processing.

3.2.2 Imputation Using Mean

We calculate the mean for each class and impute the missing values accordingly. After that, we combine all three classes that are used later for further processing. We impute the data class-wise so that we can minimize biasing.

3.2.3 Imputation Using kNN

Missing values can be filled using k nearest neighbor (kNN). That takes the k nearest neighbors of the missing instance. In this method, we separate the data class-wise and apply kNN to calculate the kth nearest neighbors. After imputing the data, we combine all three classes to make a complete dataset repository.

3.3 Feature Selection

Feature selection or attribute selection [23] is the process for selecting a subset of the most relevant features that can represent a whole feature matrix to use for the machine learning classification model. It is the core concept of machine learning which affects the performance of the model. In some cases, the feature matrix consists of features, and some features do not provide useful information that can improve the model’s performance. Thus, we use multiple feature selection techniques to select the best contributing feature from the feature matrix. Below are the details of feature selection techniques.

3.3.1 Information Gain (IG)

IG works by calculating the entropy and information of the feature given in a feature [24]. A feature is more important if it provides more information gain and less entropy. The entropy of any random variable is nothing but an impurity constraint. Below are the equation from which information gain and entropy can be calculated using Eq. 2:

$$\begin{aligned} H(\mathrm{Feature}) = -\ \sum _{i}P(f_i)\times \log _2(P(f_i)) \end{aligned}$$
(2)

P is the probability of class I in the database and log2 is the base2 logarithm. H (feature) is the entropy that measures the degree of “impurity”. The closest to 0 it is, the less impurity there is in your dataset. A good feature contains most information gain and reduces the most entropy.

$$\begin{aligned} \mathrm{IG}(F_i ) = H(C) - H(C|Fi) \end{aligned}$$
(3)

To measure the information gain, \(f_i\) is then calculated as the following formula as shown in Eq. 3, where C represents different classification classes, and \(f_i\) are the different features in the dataset.

3.3.2 Chi-Square (Chi2)

Chi2 Decides the association between two variables that reflect their real association. This feature selection works based on similarities and transformation of the higher-dimensional data by projecting it to a lower dimension [25]. The chi2 method calculates the Chi2 for each feature set w.r.t the target and selects the subset of features with high Chi2 scores as shown in Eq. 4:

$$\begin{aligned} {\displaystyle X^2 ={\frac{(\mathrm{OF} - \mathrm{EF})^2}{\mathrm{EF}}}} \end{aligned}$$
(4)

where OF \(=\) number of class observations, EF \(=\) number of expected class observations to determine if there was a relationship between the target and the feature.

3.3.3 Joint Mutual Information Maximization (JMI)

JMI works by calculating the mutual information that two features share. The information theory calculates the mutual information and the entropy between the random variables. The entropy of any random variable is nothing but the impurity constraint [26]. We use the mutual information since it reduces the amount of uncertainty of a variable C, i.e., if the variables are statistically independent, the mutual information is calculated as zero. Equations 5 and 6 illustrate the use of JMI to calculate mutual information:

$$\begin{aligned} I(X;C|Y)= & {} H(X|C) - H(X|C,Y)\end{aligned}$$
(5)
$$\begin{aligned} I(X,Y;C)= & {} I(X;C|Y) +I(Y;C) \end{aligned}$$
(6)

The interaction between the variables can be termed as the amount of mutual information shared by those random variables. Hence, interaction and mutual information is defined as Eq. 7:

$$\begin{aligned} I(X;Y;C) = I(X,Y;C) -I(X;C)-I(Y;C) \end{aligned}$$
(7)

This algorithm is beneficial when used with the "maximum of minimum" approach.

We evaluate the features chosen by IG, Chi2, and JMI methods by investigating the classification accuracy when using feature subsets retrieved by these three methods. Specifically, the classification results are obtained using simple classifiers mentioned in 3.5 using tenfold cross-validation.

3.4 Data Balancing

Data balancing is the technique that balances class samples of the majority or minority class to improve a less represented class. Wold et al. [27] and He and Garcia [28] used different techniques to solve class imbalance dataset. Details of the techniques used in this work are described below:

3.4.1 Under Sampling (Resample)

Undersampling is the process of reducing the size of instance or record in a feature of the majority class to improve the representation of minority class [28]. This method helps to improve the classification of minority classes to participate in the classification process.

3.4.2 Bootstrap Oversampling

This filter produces a random subset of the dataset. This sample can be produced using replacement or without replacement of the actual instances [27, 28]. It is also known as random oversampling, which produces minority instances—producing sub-samples after replacement can produce uncertainty associated with estimates. Producing sub-samples without replacement can result in duplication of instances.

3.4.3 Minority Over-Sampling (SMOTE)

SMOTE is the process of improving the representation of minority classes to improve the overall classification accuracy. It works by creating new synthetic instances of the minority class by calculating the distance between the nearest neighbors and adding or subtracting it from the range of 0–1.

Suppose a sample (1,2) and let (3,4) be its nearest neighbor.

(1,2) is the sample for which k-nearest neighbors are being identified.

(3,4) is one of its k-nearest neighbors.

Let:

$$\begin{aligned} s1_1= & {} 1,\quad s2_1 = 2,\quad s2_1 - s1_1 = 1\\ s1_2= & {} 3,\quad s2_2 = 4,\quad s2_2 - s1_2 = 1 \end{aligned}$$

The new samples will be generated as:

$$\begin{aligned} (s1^,,s2^,)= (1,2)+ \mathrm{rand}(0-1)* (1,1) \end{aligned}$$

where rand(0–1) generates a random number between 0 and 1.

There are two ways to accommodate occupants that is individual or in groups. Predicting the individual thermal comfort of a person or for a group with machine learning algorithms in real time is tough. Machine learning algorithms are dependent on the dataset. They are trained on the dataset to predict decisions. There are two types of data in datasets training and testing data used to predict PMV.

The following are some general machine learning algorithms having functional distinctions that are helping to select a suitable algorithm for personal comfort models.

3.5 Machine Learning Algorithms

Below, we give an overview of the working of algorithms for thermal sensation vote prediction.

3.5.1 Support Vector Machine

A support vector machine is a discriminative algorithm that works by separating hyper-plane. The input to SVM is represented as a set of points in the space that are mapped so that each point belonging to a class is separated efficiently by a gap between them. SVM comprises a kernel function that helps to learn the boundary line between data points. Below, we discuss various kernels of SVM.

Kernel Algorithms are based on patterns or nonlinear relationships mapped on input data [29]. Radial basis function (RBF), support vector machine (SVM), linear discriminant analysis (LDA), and Gaussian process (GP) are examples of kernel algorithms. They efficiently model complex problems like human thermal comfort and are highly robust against over-fitting. They are expensive in computation, especially on large datasets, and can be used for categorical and numerical predictions. Assessment of the model is based on the accuracy of prediction and identifying the features that need improvement. Following Eqs. 8, and 9 helps the evaluation process

$$\begin{aligned} \begin{aligned}&\underset{\alpha }{\hbox {max}} \sum _{i=1}^{n} \alpha _i - \frac{1}{2} \sum _{i=1}^{n}\sum _{j=1}^{n} y_iy_jK(x_i,x_j)\alpha _i\alpha _j, \\&\quad \hbox {subject \, to}:\qquad \qquad \qquad \qquad \qquad \\&\quad 0\le \alpha _i \le C, \quad for \, i=1,2,\ldots ,n, \\&\quad \sum _{i=1}^{n} y_i\alpha _1=0\qquad \qquad \end{aligned} \end{aligned}$$
(8)

where C is an SVM hyper-parameter and \(K(x_i, x_j)\) is the kernel function, both supplied by the user; and the variables \(\alpha _i\) are Lagrange multipliers. In SMO, two multipliers are solved first so in case of SVM \(0\le \alpha _i \le C\) this equation is changed to the following:

$$\begin{aligned}&0\le \alpha _i,\alpha _i \le C \nonumber \\&y_1\alpha _1 + y_2\alpha _2=K \end{aligned}$$
(9)

3.5.2 Random Forest

Random forest is one of the most powerful supervised machine learning algorithms capable of performing both regression and classification. RF creates several forests with several decision trees. The more robust the decision, the more the accuracy. By aggregating the prediction results, overfitting the model can be reduced in advanced decision tree algorithms such as gradient boosted trees or random forest. The decision is made using Eqs. 10 and 11:

$$\begin{aligned} \hbox {Entropy} = H(T)=I_{E}(k_{1},k_{2},\ldots ,k_{J})=-\sum _{i=1}^{J}k_{i}\log _{2}^{}k_{i}\nonumber \\ \end{aligned}$$
(10)

where K refers to the probability for class i and log2 is the base 2 logarithm. H(T) refers to the entropy. A feature has most information gain and less most entropy is good.

$$\begin{aligned} IG(F_i ) = H(C) - H(C|Fi ) \end{aligned}$$
(11)

Then, the information gain of a feature \(F_i\) is measured with the above given formula where C denotes the classification classes, and \(F_i\) are the features in the dataset.

3.5.3 Naive Bayes (NB)

NB predicts the decision based on events that have more priority [30]. Bayesian network and Naïve Bayes algorithms are examples of Bayesian algorithms. These algorithms are based on the Bayes theorem. These algorithms supposed all the input features are independent of each other even though independence rarely occurred in real life. These algorithms can handle large datasets and have efficient predictions. For both categorical and numerical predictions, these algorithms can be used.

$$\begin{aligned} \begin{aligned} {\displaystyle p(C_{k}\mid {\mathbf {x}} )={\frac{p(C_{k})\ p(\mathbf {x} \mid C_{k})}{p({\mathbf {x}} )}}} \end{aligned} \end{aligned}$$
(12)

Given a discrete vector as shown in Eq. 12x, \(P(x | C_k)\) represents the posterior probability of class (c, target) given predictor (x, attributes). \(P(C_k)\) is the prior probability of class. P(x|c) is the likelihood which is the probability of predictor given class. P(x) is the prior probability of predictor.

3.5.4 Multilayer Neural Network

Multilayer neural networks are used to solve classification problems using hidden layers for nonlinear sets. The capacity of networks can be set by adding additional hidden layers, which also enhance the network’s separation capacity [15]. A feed-forward artificial neural network is also an example of a multilayer neural network.

Below are the two activation functions known as sigmoid [14] as shown in Eq. 13:

$$\begin{aligned} \begin{aligned} y(v_i) = \tanh (v_i) ~~ \text {and} ~~ y(v_i) = (1+e^{-v_i})^{-1} \end{aligned} \end{aligned}$$
(13)

The node weights are adjusted based on corrections that minimize the error in the entire output, given by Eq. 14:

$$\begin{aligned} \begin{aligned} \mathcal {E}(n)=\frac{1}{2}\sum _j e_j^2(n) \end{aligned} \end{aligned}$$
(14)

and weight can be updated using gradient decent. Below is the gradient decent function described as Eq. 15:

$$\begin{aligned} \begin{aligned} \varDelta w_{ji} (n) = -\eta \frac{\partial \mathcal {E}(n)}{\partial v_j(n)} y_i(n) \end{aligned} \end{aligned}$$
(15)

where \(y_i\) is the output of the previous neuron and \(\eta \) is the learning rate.

We used supervised learning methods for classification. It consists of a base classifier and a sub-classifier. We picked decision tree [31], Naive Byes [30], SVM [29] and multilayer perceptron (MLP) [14] for predictions. Using tenfold cross-validation, we achieved higher accuracy using joint mutual information as feature selection, upsampling in data balancing part, and support vector machine as a classifier. While using the 70% training and 30% testing dataset, we got the best results for Chi2: feature selection, upsampling for data balancing, and random forest as a classifier.

4 Experimentation and Results

This section discusses the details of the experimental setup, implementation, preprocessing, feature selection, and the classifiers with the parameters selection. Then, we discussed the detailed implementation results with a different combination of data imputation methods, feature selection methods with machine learning classifiers.

4.1 Dataset

Initially, data are recorded using sensors and occupant’s feedback questionnaire in an excel file. A common repository is maintained by ASHRAE and is publicly available as [23] containing multiple files from different studies by multiple researchers. Data files are collected from different climate zones located at different geographical locations. We only consider HVAC buildings for this research work. Let \(\mathrm{TS} = {\mathrm{TS}_1, \mathrm{TS}_2, \mathrm{TS}_3}\) represent the thermal sensation of individuals, i.e., uncomfortably warm, neutral and uncomfortably cold. Let \(S={s_1, s_2,\ldots , s_n}\) represents the data recorded for various physiological, psychological and environmental-related factors. Each instance, \(T_i\), in the dataset, can be represented as \(T_i = {s_1, s_2,\ldots , s_n, A_i, \mathrm{TS}_i}\), where \(\mathrm{TS}_i\) is the response variable.

Dataset consists of multiple files gathered by various researchers collected for 60 months belonging to different climate zones using sensor readings of indoor and outdoor environmental variables. Feedback is also collected from occupants on a questionnaire form, which the occupants have to fill while they reside in the room for at least 15 min. ASHRAE collected all data files of researchers and made a repository online named ASHRAE RP-884.

ASHRAE RP-884 consists of basic identifiers (i.e., building code (blcode), subject (sub), age (age), gender (sex), year (year), day of reading (day) and time of day (time), etc.), a thermal questionnaire filled by occupants (i.e., ASHRAE thermal sensation scale (ash), comfort level (comf), metabolic rate (met) and clothing insulation (clo), etc.), indoor climate physical observations (i.e., air temperature high ( ta_h), and air turbulence ( turb_l), etc.), indices which are calculated from existed data (average air temperature ( taav), average radiant temperature (trav), operative temperature (top), average of three heights air speed (velav), maximum of three heights air speeds (velmax), average of three heights turbulence (tuav), air pressure (pa), relative humidity (rh), new standard effective temperature index set (set), two-node disc index (disc), predicted mean vote (pmv), predicted percentage dissatisfied (pdd), etc.) personal environmental control (perceived control over thermal environment (PCC), and PCED from 1 to 7, etc.) and outdoor meteorological data (outdoor 15:00 (max) air temperature (day15_ta), outdoor 06:00 (min) air temperature (day06_ta), outdoor average of min/max air temperature (dayav_ta), outdoor 15:00 (min) relative humid on day of survey (day15_rh), outdoor 06:00 (max) relative humidity (day06_rh), outdoor average min/max relative humidity (dayav_rh, day15_et, day06_et, and dayav_et). Data files are combined to explore what features have insufficient data. Some files have not recorded data for some features due to send availability or their research relevancy issues.

For building code (blcode), we made a dictionary because the problem was that they are using the same code for different buildings so that we cannot differentiate whether the building data are from study A or study B, so we introduce a new blcode so that every building will distinguish from the other. We make new building codes with the combination of five numbers; the first number is used 1 for HVAC buildings, and then, the following two numbers are used for the original file number used in a public repository, and the following two numbers are used for that particular building code used in each research project. The new blcode will be 10,102 used for HVAC building exist for 01 files and contains building number 02 used for that study.

In Table 3, some characteristics were extracted during experimentation and analysis.

Table 3 Characteristics of the dataset

4.2 Performance Evaluation Metrics

Evaluation measure is a vital part of assessing the performance of the classifier model. Almost all evaluation measures depend on the nature of the classification model. We use four evaluation matrices, i.e., precision, recall, F1-score, and accuracy as shown in Eqs. 16, 17, 18 and 19. We use a tenfold cross-validation evaluation.

The proportion of positive classified instances that are actually correct is known as precision:

$$\begin{aligned} \hbox {Precision} = \frac{\hbox {TP}}{\hbox {TP}+\hbox {FP}}\ \end{aligned}$$
(16)

From all the true positive and false negative, how many numbers examples are correctly classified are known as recall.

$$\begin{aligned} \hbox {Recall} = \frac{\hbox {TP}}{\hbox {TP}+\hbox {FN}} \end{aligned}$$
(17)

From all the correct and incorrect classified examples, how many examples are correctly classified are known as accuracy

$$\begin{aligned} \hbox {Accuracy} = \frac{\hbox {TP}}{N} \end{aligned}$$
(18)

F-score is the harmonic mean of precision and recall. Arithmetic and geometric means can give us false readings, so F-score is good accuracy means. F measure is the combination of precision and recall:

$$\begin{aligned} \hbox {F-score} = 2 \times \ \frac{\hbox {Precision}\times \hbox {Recall}}{\hbox {Precision}+\hbox {Recall}} \end{aligned}$$
(19)

4.3 Results

The results in Table 4 are obtained using the mean data imputation method and downsampling technique for imbalanced data. It demonstrates that combining the Chi2 feature selection method with the RF classifier achieved 2%, and 8%, higher F1-score than the combination of JMI and IG feature selection method with the RF classifier. Combining the Chi2 feature selection method with the SVM classifier achieved 4% and 8% higher F1-score than the combination of JMI and IG feature selection method with the SVM classifier. Combining the Chi2 feature selection method with the NB classifier achieved 2% and 4% higher F1-score than the combination of JMI and IG feature selection method with the NB classifier. Combining the Chi2 feature selection method with the ANN classifier achieved 5% and 10% higher F1-score than the combination of JMI and IG feature selection method with the ANN classifier.

Joint mutual information (JMI) feature selection method provides these important features: rh, clo, trav, age, dayav_ta, taav, velav, time. Chi2 feature selection method provides these important features: time, age, clo, rh, trav, taav, velav, met, and IG feature selection method provides these important features: velav, time, upholst, trav, clo, taav, age, met that we used for the experimentation. The analysis of missing data imputation shows that when we apply mean imputation with the combination of Chi2 feature selection and SVM classifier while using downsampling class balancing technique, it gives almost 1%, 3%, and 17% better F1-score than RF, ANN, and NB, respectively, with 78.7 average F1-score.

Table 4 Classification of thermal sensation score using mean imputation along with feature selection methods and downsampling technique

The results in Table 5 are obtained using the mean data imputation method with tenfold cross-validation and applied the oversampling technique for imbalanced data. It demonstrates that the combination of the Chi2 feature selection method with SVM classifier achieved 4%, and 9%, higher F1-score than the combination of JMI and IG feature selection method with SVM classifier. Combining the Chi feature selection method with the RF classifier achieved 2% and 8% higher F1-score than the combination of JMI and IG feature selection method with RF classifier. Combining the Chi2 feature selection method with the NB classifier achieved 2% and 4% higher F1-score than the combination of JMI and IG feature selection method with the NB classifier. Combining the Chi2 feature selection method with the ANN classifier achieved 5% and 11% higher F1-score than the combination of JMI and IG feature selection method with the ANN classifier.

Joint mutual information (JMI) feature selection method provides these important features: rh, clo, trav, age, dayav_ta, taav, velav, time. Chi2 feature selection method provides these important features: time, age, clo, rh, trav, taav, velav, met, and IG feature selection method provides these important features: velav, time, upholst, trav, clo, taav, age, met. The analysis shows that when we apply mean imputation with a combination of Chi2 feature selection and SVM classifier while using the oversampling class balancing technique, it gives almost 1%, 3%, and 17% better F1-score than RF, ANN, and NB, respectively, with 82.5 average F1-score.

Table 5 Classification of thermal sensation scores using mean imputation along with feature selection methods and oversampling technique

The results in Table 6 are obtained using the median data imputation method with tenfold cross-validation and applied downsampling technique for imbalanced data. It demonstrates that the combination of the Chi2 feature selection method with SVM classifier achieved 2%, and 6%, higher F1-score than the combination of JMI and IG feature selection method with SVM classifier. Combining the Chi2 feature selection method with the RF classifier achieved 2% and 5% higher F1-score than the combination of JMI and IG feature selection method with the RF classifier. Combining the Chi2 feature selection method with the ANN classifier achieved 2% and 5% higher F1-score than combining the JMI and IG feature selection method with the ANN classifier. Combining the Chi2 feature selection method with the NB classifier achieved 1% and 3% higher F1-score than the combination of JMI and IG feature selection method with the NB classifier.

The analysis shows that when we apply median imputation with a combination of Chi2 feature selection and SVM classifier while using downsampling class balancing technique, it gives almost 1%, 2%, and 12% better F1-score than RF, ANN, and NB, respectively, with 73.4 average F1-score.

Table 6 Classification of thermal sensation scores using median imputation, feature selection methods and classifiers using downsampling technique

The results in Table 7 are obtained using the median data imputation method with tenfold cross-validation and applied the oversampling technique for imbalanced data. It demonstrates that the combination of the Chi2 feature selection method with SVM classifier achieved 2%, and 6%, higher F1-score than the combination of JMI and IG feature selection method with SVM classifier. Combining the Chi2 feature selection method with the RF classifier achieved 2% and 6% higher F1-score than the combination of JMI and IG feature selection method with the RF classifier. Combining the Chi2 feature selection method with the ANN classifier achieved 3% and 5% higher F1-score than the combination of the JMI and IG feature selection method with the ANN classifier. Combining the Chi2 feature selection method with the NB classifier achieved 2% and 3% higher F1-score than the combination of JMI and IG feature selection method with the NB classifier.

The joint mutual information (JMI) feature selection method provides these important features used for experiments; rh, clo, trav, age, dayav_ta, taav, velav, and time. The Chi2 feature selection method provides these important features used for experiments: time, age, clo, rh, trav, taav, velav, met. The IG feature selection method provides these important features for experiments: velav, time, upholst, trav, clo, taav, age, and met. The analysis shows that when we apply median imputation with the combination of Chi2 feature selection and SVM classifier while using the oversampling class balancing technique, it gives almost 1%, 3%, and 14% better F1-score than RF, ANN, and NB, respectively, with 76.3 average F1-score.

Table 7 Classification of thermal sensation scores using median imputation along with feature selection methods and oversampling technique
Table 8 Classification of thermal sensation scores using kNN imputation along with feature selection methods and downsampling technique
Table 9 Classification of thermal sensation scores using kNN imputation along with feature selection methods and oversampling technique

The results in Table 8 are obtained using the kNN data imputation method (where \(k=30\)) with tenfold cross-validation and applied downsampling technique for imbalanced data. It demonstrates that combining the Chi2 feature selection method with the SVM classifier achieved 8%, and 11% higher F1-score than the combination of JMI and IG feature selection method with SVM classifier. Combining the Chi2 feature selection method with the RF classifier achieved 6% and 10% higher F1-score than the combination of JMI and IG feature selection method with the RF classifier. Combining the Chi2 feature selection method with the ANN classifier achieved a 5% and 10% higher F1-score than combining the JMI and IG feature selection method with the ANN classifier. Combining the Chi2 feature selection method with the NB classifier achieved 4% and 6% higher F1-score than combining the JMI and IG feature selection method with the NB classifier.

The joint mutual information (JMI) feature selection method provides these important features used for experiments: rh, clo, trav, age, dayav_ta, taav, velav, and time. The Chi2 feature selection method provides these important features used for experiments: time, age, clo, rh, trav, taav, velav, met. The IG feature selection method provides these important features for experiments: velav, time, upholst, trav, clo, taav, age, and met. The analysis shows that when we apply kNN imputation with a combination of Chi2 feature selection and SVM classifier while using a downsampling class balancing technique, it gives almost 2%, 5%, and 19% better F1-score than RF, ANN, and NB, respectively, with 81.3 average F1-score.

Table 10 Performance comparison of the proposed approach with existing approaches

The results in Table 9 are obtained using the kNN data imputation method (where \(k=30\)) with tenfold cross-validation and applied the oversampling technique for imbalanced data. It demonstrates that the combination of the Chi2 feature selection method with SVM classifier achieved 5%, and 10%, higher F1-score than the combination of JMI and IG feature selection method with SVM classifier. Combining the Chi2 feature selection method with the RF classifier achieved 6% and 12% higher F1-score than the combination of JMI and IG feature selection method with the RF classifier. Combining the Chi2 feature selection method with the ANN classifier achieved a 5% and 11% higher F1-score than combining the JMI and IG feature selection method with the ANN classifier. Combining the Chi2 feature selection method with the NB classifier achieved 4% and 6% higher F1-score than combining the JMI and IG feature selection method with the NB classifier.

The joint mutual information (JMI) feature selection method provides these important features used for experiments: rh, clo, trav, age, dayav_ta, taav, velav, and time. The Chi2 feature selection method provides these important features used for experiments: time, age, clo, rh, trav, taav, velav, met. The IG feature selection method provides these important features for experiments: velav, time, upholst, trav, clo, taav, age, and met. The analysis shows that when we apply kNN imputation with a combination of Chi feature selection and RF classifier while using the oversampling class balancing technique, it gives almost 2%, 5%, and 19% better F1-score than SVM, ANN, and NB, respectively, with 86.1 average F1-score.

Fig. 3
figure 3

Graphical analysis of proposed approach performance with baseline approaches

4.4 Discussion

Our proposed approach demonstrated overall improvement in performance. Specifically, to improve the overall accuracy, we preprocess the raw data in different steps. Then, we extract features that can potentially determine thermal comfort from the sensing dataset. Next, feature selection is performed to acquire a feature subset that correlates with thermal comfort classes. Next, data balancing is performed to increase the number of class instances with fewer samples to improve the representation of the minority class. Finally, classification is performed to predict the thermal comfort classes. The limitation of our proposed approach is that we are unable to provide thermal comfort geographical location-wise because we do not include features belonging to predicting thermal comfort regarding different climate zones. This will be done in our future work in this domain. Table 10 presents a comparison of the suggested method to current research. Authors in [32] achieved the accuracy of 69% that is 15% less than our approach. Authors in [4] achieved the accuracy of 76.7%, respectively, which is 8% less than our approach. This comparison strongly suggests using our approach for thermal sensation voting. Figure 3 depicts the graphical analysis of the accuracy of the proposed approach with baseline studies.

5 Conclusion

This paper proposes a novel machine learning technique that predicts an individual’s thermal sensation with good accuracy. Our approach significantly improves the thermal comfort of individuals. The model selects the best feature set that may be used to estimate thermal comfort and balance data in order to enhance class representation with fewer samples. It also classifies the temperature feeling in order to anticipate it. Also, their health and productivity directly depend on their thermal comfort. Specifically, this approach preprocessed the raw dataset, imputed the missing values class-wise, balanced the class imbalance dataset, identified the best set of features, and trained a classifier that takes a feature vector as input and outputs a corresponding thermal sensation class. This approach is evaluated using a publicly available large-scale data set ASHRAE RP-884. Results demonstrate that when using RF classifiers, the accuracy of our approach is 86.08% versus 35.4% for Fanger’s model. Also, hypothesis tests demonstrate that our approach outperforms Fanger’s model with vital statistical significance. At last, our feature selection study indicated that relative humidity not included in Fanger’s model plays a vital role in thermal comfort, which is a finding interesting in its own right. Furthermore, our results outperformed the base paper results that have an accuracy of 76.7%. In the future, we intend to broaden the scope of this research to region-wise thermal comfort and non-ventilated buildings.