1 Introduction

Electricity is well connected to human beings in day to day life, so the need for electricity is also increased. The electricity swallowed by the buildings is 30–45% of intercontinental electricity consumption. The users of electricity should know how much energy absorbed by them for every month to use the available energy effectively. According to the recent studies from World Bank, population of urbanization raises by about 3.9% and consumption of electricity raises by 1100 kWh in last ten years. Prophecy of electricity is important for consumers, energy planners. Forecasting easily helps to determine the usage of electricity by the consumers. There exist some forecast models over the consumption of electricity for buildings based on the machine learning methodologies. The machine learning has two inherited characteristics: training and interference. The excess amount of data in a training model is divided into training, testing and validation. The accuracy of training sets is validated by the testing sets. The given input data is trained using machine learning algorithm and the output obtained is called as interference. Based on the input given, training is categorized into Supervised learning, Unsupervised learning, Reinforcement learning [10, 11].

The classification part of the algorithm categorize different data sets into respective sets. Some common examples are speech recognition, hand writing recognition, face recognition, identifying spam in email etc. The types of classification algorithm are linear classifiers, support vector algorithm, k-nearest neighbor, decision trees, and random forest. Regression model creates a prophecy of target, based on independent variables. It gives a graph of turn on variable (y) and individualistic variable (x).The types of regression algorithm are linear regression, logistic regression, and polynomial regression and step wise regression n.

In unsupervised learning, the datasets are given as input but the goal or intended value is not given. It performs dimensionality reduction, clustering and density estimation. Reinforcement learning is the intermediate between the supervised and unsupervised learning. Some real-life examples of machine learning are virtual assistants like Siri, Alexa, and Google, learning through video surveillance, Social Media Services like Email Spam, Malware Filtering, suggestions and Online Customer Support. In this approach, prediction is used while computing and the existing ANN model is trained to accomplish the Ontario electricity market to provide its high potential [1, 23].

The functional regression methods are collated with the original dataset [29]. An advanced self-adaptive method named radial basis function (RBF) neural network is proposed and trained by fuzzy c-means [8]. To examine customer electricity consumption behaviors, a flexible mass-feature K-means-affinity propagation (AP) clustering algorithm is handed-down [14]. To reduce the arithmetic complexity of functional principal component analysis (FPCA), an recursive dynamic factor analysis (RDFA) algorithm is handed-down which further tracks and predicts recursively using Kalman filter (KF) [31]. An accurate temperature forecast used by a refinement model can reduce the prediction error of the electricity prices remarkably [22, 30]. To provide relative accuracy of experimental results, the most perfect is selected from other predictions. [34].The models are constructed and randomly selected commercial buildings are tested using monthly landlord utility bills. Radial basis function (RBF) with selection of kernels depends on stepwise searching method and C, γ and ε are parameters to investigate the performance of SVM. Low coefficient of variance with a low percentage error is resulted [17, 26]. The extreme machine learning and non-dominated sorting genetic algorithm use ELM based forecasting method optimization. This approach provides reliability and sharpness [3]. To solve problems of single network structure and hyper parameter selection, a different LSTM model is used. This method divides the analyzed data based on parameter and preforms optimization [37].

A novel non-parametric approach is projected for modelling and manifold learning methodology for analysis of electricity price curves [4]. To operate smart home, real time electricity scheduling for home is needed. This method improves utilization of renewable energy. The proposed system is used to achieve the minimizing cost payment [22]. This method is proposed with the smart meter data set. It deals with utilities, usage, needs and suitability for different programs [20, 28]. The Random forest or Random decision forest is a type of algorithm that integrates few machine learning techniques into a single foretelling model. A heaps of decision trees are assembled at drilling time and outputs are in the form of classification and regression in discrete trees. Random Forest is a scheme of superintend machine learning algorithm based on combo learning. The combo learning also called as Ensemble learning that employs manifold learning algorithm to pick up a better predicted execution. Random forest is the most powerful algorithm which integrates the outcomes of various learning algorithms resulting as trees of a forest. This novel ensemble method is used for prophecy of home appliances, it integrates machine learning and statistical model for high accuracy of electricity consumption.

The motivation behind this work is that the forecast says that global electricity demand will increase by 5% for the upcoming years due to the launch of electric vehicle production hubs throughout the globe by various electric vehicle manufacturers. So there is a need for the government to forecast the electric power consumption accurately to make decisions on the quantity of electricity to be generated to outfit the customers demand.

1.1 Significance of the proposed work

Since the usage of appliances including electric car increases day by day, there is a drastic increase in the usage of electricity by the users. Hence there is a need for the accurate prediction and forecast for the sufficient generation of electric power for distribution.

The important contribution of this paper is to propose an efficient method for the accurate prediction and forecast of the monthly, weekly and daily electric power consumption using machine learning techniques to satisfy the requirements of the modern lifestyle of the customers. This forecast in turn will reduce the gap between the demand and supply thereby improving customer’s pleasure. The accuracy of forecast of this method is comparatively superior to the existing methods in the literature.

The major contribution of this paper includes

  • An efficient supervised learning model is applied for the efficient prediction and forecast of energy consumption to improve the satisfaction of customers with modern lifestyle.

  • It addresses the use of two classes or multi class prediction problems.

  • It supports mixture of categorical and continuous variables and has good prediction for data with more variables.

  • It takes cares of missing data in an effective manner.

  • It has the ability to handle thousands of input variables without variable selection.

  • They also offer a superior method for working with missing data.

  • It can automatically balance data sets, when a class is more infrequent than other classes in the data

  • The accuracy of prediction of the chosen model is superior.

The rest of the paper is organized as follows: In section 2, related work carried out in the literature is presented. Section 3 describes the methods and materials proposed. In section 4 the experiments and results are presented, at the end the concluding remarks is presented.

2 Related work

This section addresses the research carried out in the forecast of the demand of the electric power by various researchers in the literature. Finally we conclude that our proposed method has superior rate of prediction than the existing proposed methods.

In this paper [23] the authors propose the idea about energy saving, load balancing. Smart grid is installed which provides the quantity of electricity. Two types of MEP models are used which analyse and determine the quantities of electricity and pricing. The suitable MEP method is used based on the situation where it checks the vulnerability and duplicate data. This model is limited to user’s charge.

In recent days, the development of renewable energy for household has become a great demand. Finding the electricity usage and optimizing it, became the complicated task. In this paper [29], the authors employ statistical methods such as Gaussian distribution and Kullback-Leibler divergence. This model permits to find similarities between the patterns. Here large dataset of 500 house consumption is used. It deals with two concepts, electricity production and consumption. In this paper [8], electricity market liberation process which acts as a key driver is used. The main challenge of a Nigerian electricity sector is to provide an explanation to key. In This paper [14] the new model for Nigerian industry is used which address the current challenges but creates a new structure for Nigeria to form a secure energy future. The analysis of data is carried out in two stages. Hourly total consumption of electricity provides hourly weather related and illumination related electricity consumption. By subtracting the above two parameters, residual consumption is obtained. In second stage, agent based analytical tool is used. This tools performs many operations including optimization. A set of patterns are used to minimize negative effects on high peak demand. This work [31] deals with two major business challenges, inability to be stored economically and requirement for instant response. To compute capacity and energy charges, two mathematical equations are used. These equations can also be applied to find yearly average incentives and penalties. The use of proposed equation is associated with the unaccountable fluctuation. The primary aim is to provide benefits to the customer. In this paper [30], smart grid is used to detect abnormal electricity consumption behaviorists and cluster the similar user located in some area. This clustering is done based on electricity consumption. The electricity consumption of similar users are obtained with density clustering. The matching degrees are calculated based on historical data. Finally, abnormal electricity consumption is found with support degree. This method can effectively identify the abnormal consumption. Unsupervised learning of abnormal electricity consumption behavior is proposed [34]. The original data set is constructed by brainstorming method. Optimal feature set is selected based on variance and similarities between them. Unsupervised clustering is used to detect abnormal electricity consumption behavior. To perform evaluation label information of abnormal behavior is obtained by integrating the actual electricity consumption. This method has a benchmark on good and effective result [19]. It uses evolution based characteristics of smart meter data which removes irrelevant data and features. To predict the number of clusters, a visualization is required. K-means algorithm is used for segmentation. This method is applied on Guangdonprovice, China. This new clustering approach provides a good segmentation of data. This paper can be used in the field of data science [5]. It deals about the process of minimizing the electricity consumption and maintenance costs. Here, the use of virtual machine and cloud data centers are necessary. Many VMs are developed and processing costs are noted. Material based fatigue model calculates the maintenance and electricity costs data center algorithm is used. This algorithm focuses on load balancing and energy consumption. It is able to accomplish peak load shifting and decrease the bill by around 12% in a typical day. In this work, [26] forecasting short range electricity price is proposed. This paper deals with an alternative method named Levenberg-MarquardtBP(LMBP) method for regular Back Propagation (BP) method. This method increases the convergence speed which is used for training ANN model by MATLAB. This provides high performance and capability in forecasting short range electricity prices. This paper [3] integrates the extreme machine learning and non-dominated sorting genetic algorithm. This system uses ELM based forecasting method optimization. This approach provides reliability and sharpness. This system provides an accuracy bet.

ween 80% to 90%. This method [36] has been verified through the data provided by Australian electricity market. It forecasts the electricity spike using data mining and gives the occurrence of price. The method of predicting the occurrence of spike has not yet been discovered. This paper uses spike value prediction technique and comprehensive tool for price spike forecasting. The market data is used to test this method. It uses forecasting strategy based high resolution data. The hourly data is collected from available market. The proposed method [6] has ability to detect price spike and several price variation. To get an accurate update, it uses an intra hourly rolling framework. Here the Ontario’s electricity market data is used to evaluate performance. This method is applicable to small scale storage system. The predicted result is applied to optimization platform for operation scheduling of a battery energy system. It finds the solution by using heterogeneous structure LSTM for single network. This [37] method divides the analyzed data based on parameter and preforms optimization. Finally integrates and forecasts the output. Sequence model based optimization verifies decomposed reconstructed electricity price data This methods provides accuracy and stability. The analysis of electricity price time series provides a switching nature. It provides discrete changes in competition strategies, which represents a dynamic models of Markov chain. A hidden Markov model [12] is analyses and forecasts the electricity price. The input under different scenarios are found and characterized as more relevant. This method results with good accuracy. Conditional probability transition Matric finds the probabilities of remaining in existing state. This method has been tested in Spanish electricity market. It [4] uses novel non-parametric method for modelling and manifold learning methodology for analysis of electricity price curves. Here LLE is experimented to be an efficient way for extracting the intrinsic dimension structure of electricity price curves. This method fails for long period predictions. This method provides accuracy which is verified by data taken from Eastern US. This paper [27] forecasts the use of electricity tariff. This method provides a baseline consumption and deviation from anticipated baseline. First cost game is induced by single tariff and cost minimizing is done. The polynomial time algorithm is used to compute and validate this approach. This method can be used in large scale dataset as it provides a good result. This method improves the performance. In this paper [22], real time electricity scheduling to operate smart home. This method improves the utilization of renewable energy. The proposed system is used to achieve the minimizing cost payment. This optimization problem has been solved by genetic algorithm. The proposed approach improves the performance of home scheduling. Electricity prices varies every time. In this work [24], scheduling problem which is naturally lob as a Markov decision process is proposed. It provides the data in numerical form. This method is tested with real price data and provide economic advantages to consumer. This method provides good result for short tasks. This method [20] is proposed with smart meter data set. It deals with utilities, usage, needs and suitability for different programs. Defining and describing different customer segments will furnish decision makers with information. It not only deals in pricing and program marketing but also in resource allocation and program development. Lifestyle of customer, establish their electricity data and separate them into groups. Finally segmentation result is carried out based on energy program.

The shambled sequence is introduced in support vector regression (SVR) algorithm and evolutionary algorithm, that not only improve the prophecy accurately but also avoids converging prematurely. The electric load is subject to changes, due to cyclic economic activities or seasonal nature. The chaotic genetic algorithm is applied to improve the prophecy and genetic algorithm is applied to avoid premature converging. Both algorithms are used to determine the parameters for SVR algorithm [16].The artificial bee balcony algorithm with seasonal recurrent support vector regression model is applied to an electric load forecasting model that improves forecasting performance and functional optimization to overcome premature local optimum [15]. This paper presents a hybrid model that combines support vector regression (SVR), Empirical Decomposition Mode (EDM), the Krill Herd and chaotic mapping functions. EMD is used to decompose the input data series and SVR is used to forecast separately. KH is used to select the parameters for SVR and chaotic Mapping is used to prevent premature converging and to improve the accuracy of the whole model [35]. The SVR model is combined with differential Empirical Decomposition Model (DEMD) and Auto Regression (AR) for electrical load forecasting. The differential EMD is used to decompose several detailed parts with high frequencies and approximate part of low frequencies. The results illustrate forecasting with accuracy and interpretability [9]. A ship motion time series (SMTS) exhibits under the effects of periodic wave and strong nonlinearity. SMTS owing to wind, ocean currents and the load of ship itself, which make accurate forecasting difficult. Due to strong non linearity, the LSSVR model is used to forecast the accuracy. The chaotic cloud particle swarm optimization(CCPSO) algorithm is introduced to optimize the parameters of the LSSVR model [21]. Quantum computing mechanism is used to quantamize dragonfly behaviour to enhance the finding performance of the dragonfly algorithm, namely QDA. It conducts the data pre-processing by the complete ensemble empirical model decomposition adaptive noise (CEEMDAN) which is useful to improve the forecasting accuracy. Thus, a new electric load forecasting model, the CEEMDAN-SVRQDA model, that combines the CEEMDAN and hybridizes the QDA with an SVR model.It is proposed to provide more accurate forecasts [32]. A novel hybrid algorithm, cuckoo search and Differential evolution (CSDE) is used to solve the constrained engineering problems. CS has powerful ability on worldwide search and less control parameters, but suffers premature convergence and lower the density of population. DE specializes in local search and good robustness but both gave satisfied results. It divides the work into two groups and algorithms, CS and DE are applied independently and these groups exchange information. It provide premature convergence, stabilize the quality of solution and the computation consumption which provide satisfactory worldwide optima [33].. Nowadays Deep learning has been used in many fields like Traffic crowd, image processing speech recognition etc. Machine Learning will handle complex data but it learns from that data whereas Deep learning can take its own decision from the data [13].To predict accuracy in traffic flow convolutional neural network is used [2]. This prediction method can be applied to other factors like weather, social and electricity The Tabular sketch of the literature review has been shown in Table 1.

Table 1 Tabular sketch of the literature review

The related review reveals that there is a need for an efficient algorithm in predicting the energy consumption that will help the government to plan the generation of electric power. In this paper, Random forest learning model is proposed which is superior to the methods proposed in the earlier literature in terms of the accurate prediction of the demand of customers. Random forest is usually much faster than non-linear SVM. SVM works with specific dataset and not suited for large data. Random forest is suitable for multi class problem and has many decision trees so accuracy will be high when compared to SVM.ANN are more complex in adjusting the weights and moreover.

3 Methods and materials

The proposed framework involves data preparation to load the raw data for processing and data preprocessing to eliminate the redundant data, fill the missing values etc. These steps are commonly required by the machine learning algorithms ANN, SVM and Random Forest to analyze the customers past data usage to predict the future requirement. Since the random forest algorithm is able to build a number of decision trees and the final output is based on the majority voting,it shows pleasing results in prediction than the other two methods.

3.1 Data preparation

If the data collected contain missing values that may lead to inconsistency. Electricity consumed data must be preprocessed to upgrade the performance of the algorithm. The attributes that are remarkable are meter id, appliances usage, bill amount and units consumed. In order to fill missing data, interpolate() function is used. Based on the tie-up among attributes, data preprocessing in data mining is most time swallowed process. At the outset, the attributes which are important to make an electricity unit and amount prediction is found by attribute-evaluator and ranker as the search-method.

The following graph in Fig. 1 shows the missing values in the television and air condition data and how it recovers its missing value.

Fig 1
figure 1

Identifying empty values in dataset

In Fig. 1 the white shaded portion shows the missing data in the TV and AC. This is due to the place is blank space or having some duplicate value or none (nan). It is identified by missing no library package.

In Fig. 2 the missing values are filled by the interpolate function. It takes the two data from the dataset and fill it by taking the average of them.

Fig. 2
figure 2

Filling the missing values in dataset

3.2 Data preprocessing

Validation techniques used in machine learning are to get the error rate, it is closer to the true error rate of a dataset. Validation technique may not be needed if volume of data is large enough to represent the population but in real world scenario there is no true volume of data representation. While tuning model hyper parameters data sample are given to an unbiased evaluation of the model.

The datasets are loaded into the library packages for analyzing the identifier by data shape, data type and estimate the missing values and duplicate values. The proposed model can be evaluated for making the best utilization of test datasets and validation. Data cleaning / preparing by renaming the given dataset is to analyze the uni-variate, bi-variate and multi-variate process. The procedures and techniques for cleaning the data differ depending on a dataset. The primary goal of data cleaning/validation is to detect and remove errors and abnormality to improve the value of the data in analytics and decision making. The dataset collected for forecasting electricity unit and price is segregated into training set and test set. Basically, to separate the training set and test set 7:3 ratio is applied. Data model is generated by using Random Forest algorithm.

3.2.1 Training the dataset

Iris data set is imported by the initial line that is predefined in module named sklearn and the table that contains information about different varieties termed as datasets.For example, to import the dataset the data_dataset variable in the load_data() function is used to enfold the program by the use of train_test_split class from sklearn package and numpy of python. Further divide the dataset into training data and test data using train_test_split method. The X prefix in variable denotes the feature values and y prefix denotes target values. Then the dataset is segregated as training data and test data in the 70:30 ratio. Then the algorithm is encapsulated and training data is fitted into this algorithm so that by this data the computer can be trained. At the moment, training part is complete.

3.2.2 Testing the dataset

The dimensions features helps to prophesize the species of the features using the forecast method which takes the dataset as input and separates out the forecasted target value as output.

Therefore, the output forecasted target value becomes Zero. The test score is found by the ratio of number of predictions found right and total predictions observed and accuracy score method is found by comparing the actual values of the test set with the predicted values.

3.3 Proposed method

The newly proposed method for prophecy of appliances electric consumption and weather data are integrated in classification concept of machine learning. The ultimate aim of the classification to forecast the consumption of electricity and its price and there are seven appliances data as shown in Fig. 3.

Fig. 3
figure 3

Architecture of proposed method

The diagram clearly explains the users consumed electricity which is given in the dataset and then data processing is performed i,e, filing the missing data, removing the unwanted data and cleaning the data. After preprocessing machine learning algorithm (in our case random forest algorithm) is applied which trains the machine to predict and test the predicted results and validate it. If the accuracy of machine learning algorithm is less than the expected level again then the machine is again trained. Figure 4 shows the entire classification technique right from preprocessing till decision making or classifying.

Fig. 4
figure 4

Classification technique

3.3.1 Support vector machine

SVM belongs to the category of supervised machine learning method. It aims to work on classification of linear data initially and later on works with multidimensional data classification. It is also capable of solving regression problems named as Support Vector Regression that is based on support vectors which is a function as shown in Eq. 1.

$$ x=f(y)={U}^R\phi (y)+a $$
(1)

where a is a constant, ϕ is any nonlinear function with the parameters x, y U and R that is used to map between the inputs and output.

Hyperplane is a slope that helps in the classification of data. Figure 5 shows the hyper plane that is illuminated as W.X + b, where X is feature vector,w is the weight vector, x is the input vector and b is the bias

Fig. 5
figure 5

Support vector machine

3.3.2 Artificial neural network

ANN is another machine learning technique that is able to solve complex problems. It is capable of developing problems related to nonlinear category of classification and regression. Even though many variants of advanced ANN like Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) exists in the literature, we focus on the usage of basic category of feedforward neural network since the advanced techniques are more complex and require more computational power. A minimum of three layers are present in feedforward network namely input layer, hidden layer and output layer. The input layer is responsible for receiving the input data and prepares it to feed to the hidden layers. The responsibility of the hidden layer is to process the data fed by the input layer and given to the next hidden layer or the output layer. Finally the output layer combines the results received from the hidden layer to produce the final desired output as shown in Fig. 6. The output O is given as a function of inputs O as shown in Eq. 2.

$$ {O}_i=f\left({W}_{i,j}\ast {I}_j+{W}_{i,k}\ast {I}_k+{W}_{i,l}\kern0.5em \ast {I}_l\ \right) $$
(2)

where f(x) is a threshold or activation function that stimulates the output with weight W.

Fig. 6
figure 6

Artificial neural network

3.3.3 Random Forest

The Random forest or Random decision forest is a type of an algorithm that integrates few machine learning techniques into a single foretelling model. Random Forest uses Ensemble Learning technique and it works based on the bagging algorithm. It combines the output of all the trees which is created on the subset of the data. By this, it reduces overfitting issue in decision trees and also decreases the variance and therefore increases the accuracy. A heaps of decision trees are assembled at drilling time and outputs are obtained in the form of classification and regression in discrete trees. It is the most powerful algorithm which integrates the outcome of various learning algorithm resulting as trees of a forest.

Random forest is able to build a number of decision trees and the final output is based on the majority voting as shown in Fig. 7. The regression predictor with N trees is shown as shown in Eq. 3.

$$ f(y)\kern0.5em =\frac{1}{R}\sum \limits_{n=1}^R\left({T}_{dt}\Big(y\right) $$
(3)
$$ Y=\left\{{y}_1,{y}_2,{y}_3,\dots \dots, {y}_n\right\} $$

where y is the n dimensional vector of inputs and Tdt(y) refers to the decision trees.

Fig. 7
figure 7

Random forest

The basic steps involved in execution of the random forest algorithm is:

  1. Step 1:

    N records are chosen from the absorbed electricity data.

  2. Step 2:

    Decision trees are built based on the record.

  3. Step 3:

    The number of trees required by algorithm is chosen and repeat the steps 1 and 2.

  4. Step 4:

    If we need to solve regression problem, the consumption is prophesized by each tree in forest.

  5. Step 5:

    The average of all the values produced by the decision trees is taken to calculate the final value and to prophesied.

  6. Step 6:

    If we need to solve a classification problem, a new record is chosen based on majority vote and all trees in the forest prophesies the group to which the new record belongs.

4 Experiment and result

4.1 Experimental setup

The sample dataset consist of meter id and different appliances like AC, light, Cellar, Washing machine, Oven, laptops, AC and Fridge which are used by customers in day to day activities as shown in Table 2.

Table 2 Power consumption data

Figure 8 shows the mean consumption of the power. Any of the electrical appliances (like AC) run for 1 h continuously it consumes 1 unit.

Fig. 8
figure 8

Mean consumption

$$ 1\ast 60\ \min =60\ \mathrm{unit}\ \mathrm{per}\ \mathrm{hour} $$

Then its daily consumption 60 * 24 h = 1440 unit.

Week contain 7 days then weekly consumption 1440 * 7 = 10,080 unit.

Month contain 4 weeks then Monthly consumption is 10,080 * 4 = 40,320 is shown in Fig. 9.

Fig. 9
figure 9

Monthly consumption

The segment of the total number of predictions that is correct otherwise overall how often the model forecast correctly payer and non-payer.

Accuracy calculation

Accuracy = (True positive+ True Negatives) / (True Positives + True Negatives + False Positives + False Negatives).

False Positives (FP):It is the act of wrongly forecasting the positive classes.

False Negatives (FN): It is the act of wrongly forecasting the negative classes.

True Positives (TP): It is the act of correctly forecasting the positive classes.

True Negatives (TN): It is the act of correctly forecasting the negative classes.

Accuracy is the ratio of rightly forecasted observations to the total observations. It is seem that, if the model has high accuracy then the model is best. It is a significant factor that values of false positive and false negatives are almost equal for symmetric datasets which we have.

Precision

It is the ratio of rightly forecasted positive scrutiny to the complete forecasted positive observations.

Precision = True Positives / (True Positives + False Positives).

More precision associate with the less false positive rate. Precision of 0.788 is obtained which is good.

Recall

It is the ratio of rightly forecasted positive scrutiny to the all scrutiny in actual class.

Recall = True Positives / (True Positives + False Negatives).

The Weighted average of Recall and Precision is termed as F1 Score. Accordingly, F1 Score consider false positives and false negatives into consideration. Instinctively it is not as easy to understand as accuracy, but F1 is often more convenient than accuracy when an uneven distribution of class is there. If false negatives and false positives have similar cost then accuracy proves to be best. If false negatives and false positives have dissimilar cost then recall and precision have to be considered.

General formula

F- Measure = 2True Positives / (2True Positives + False Positives + False Negatives).

F1 Score Formula:

$$ \mathrm{F}1\ \mathrm{Score}=2\ast \left(\mathrm{Precision}\ast \mathrm{Recall}\right)/\left(\mathrm{Precision}+\mathrm{Recall}\right). $$

4.2 Comparative analysis

Random Forest uses Ensemble Learning technique and it works based on the bagging algorithm. It combines the output of all the trees which is created on the subset of the data and reduces overfitting issue in decision trees. This in turn decreases the discrepancy and increases the exactness.

Support vector machine resolve only classification issues, usually considers only 2 classes. But Random Forest resolves both classification as well as regression issues, which intrinsically suited for mutlticlass (~10) problem. Support vector machine memory usage will be higher and requires high cost of computation. Random forest balances the error, for unbalanced datasets unlike SVM. It uses a rule based instead of distance calculation so feature scaling is not required. Artificial Neural Network requires more computational cost and works better for huge volume of data

Random Forest performance is not affected by nonlinear parameters disparate curve based algorithms. Random Forest may carry out better than other curve based algorithms as it does not require feature scaling and not affected by non-linearity.

Random forest is the technique of machine learning while neural networks are exclusive to deep learning. Easy to make parallel method, training speed is faster in random forest. On the other hand, to become more precise, the recurrent neural network demands much more data than an individual person’s data. If neural networks are employed it becomes more tedious since we need to know its layers and neuron count in the layer and what activation and initialization should be performed but the Random forest requires less pre-processing and the training process is easier. Random Forest is robust to oddity and can grasp automatically and can judge the importance of the feature and the interaction between different features. Random Forest algorithm is very stable. The overall algorithm is not affected much even if a new data point in dataset is introduced since the new data can influence one tree, but it is very arduous for it to influence all the trees.

The study [18] is anxious with the collation of neural networks and random forest on prophesizing building energy consumption, which is an arithmetical forecasting and not a classification case. According to the study, Random forest performed little efficient than the neural networks as it productively handles any missing values and can exactly forecast even some of the input values were mislaid. It is less affected by noise and is clearly the best classifier as it achieves the best categorization results. The results of the neural network in average worse case is shown in Fig. 10 [8].

Fig. 10
figure 10

Accuracy of random forest against SVM and ANN

SVM algorithm goal is to draw the decision boundary that can separate n-dimensional space into classes such that new data points can be correctly categorized and the created decision boundary is also known as hyperplane. So the accuracy of the prediction of SVM is shown as 78.54%. Since Random forest has so many decision trees to accurately predict the accuracy of this algorithm is comparatively high.

The comparative results of the three algorithms with respect to precision and recall has been illustrated in Fig. 11. Since precision represents the ratio of the correctly forecasted positive analysis to the complete forecasted positive observations, it is observed that random forest depicts 0.2% to 0.3% superior precision than SVM and ANN. Similarly it shows 0.1% improved recall than the other two since recall is the ratio of rightly forecasted positive analysis to the all analysis in actual class.

Fig. 11
figure 11

Precision and recall of random forest against SVM and ANN

5 Conclusion

The flourish of smart meter paved a way for availability of information about how consumers uses the electrical energy across the country during varied seasons. This work performs the forecasting the consumption of electricity units and analyses the peak demand using efficient machine learning algorithm with the smart meter dataset. The proposed method uses Random forest classification technique to forecast the units and price for the different intervals of time for the various home appliances. Among the various classification models our approach outperforms the other algorithms for a large smart meter dataset with a performance accuracy of 95.67% and improved accuracy of precision and recall based on the obtained results. Since in near future electric vehicles are going to increase in count,the electric consumption of those vehicles can even be included as an additional parameter in the forecasting of electric load and furthermore evolutionary preprocessing tools and algorithms can be employed to improve the prediction accuracy.