1 Introduction

In the REM system, smart grid productivity increases with the interaction between multiple energy carriers to increase reliability, reduce planning cost and operations cost [1]. Energy management in multicarrier (i.e., electricity and gas) multimedia systems necessitates interaction among several resources and consumers demands without compromising the consumer comfort to handle energy consumption level [29]. Then, electricity prices are impacted by natural gas prices, hence, DRP requires energy reduction for gas and electricity both in multicarrier REM system [22]. Since residential consumers are a major contributor to energy demand and require an effective multicarrier energy system. Thus far, several past research works exist that touched the energy management issue of REM and presented a range of solutions. Though, most of these research works has mainly focused to reduce energy costs [6], decreasing greenhouse gas emissions, reducing energy load with consumer preferences, and smoothing load profile in single carrier energy system [7, 19, 27, 28].

Most of the studies use mathematical methods, specifically linear/non-linear mix integer linear programming (MIP) [1]. Then, due to data dimensionality issues and computing problem heuristic approaches have become popular using machine models in recent years. For instance, Pezeshki et al. [21] highlighted a comparison between Artificial Neural Network (ANN), and fuzzy & neural network (NN) for REM, and then Reynolds et al. [26] presented a combination of ANN and Genetic Algorithm (GA) methods for energy management of residential houses. Next, Wu et al. [14] developed the fuzzy logic, and Gutierrez-Martinez et al. [7] applied GA for energy management issues. Then, Sharifi et al. [28] presented an integrated REM approach, which includes GA with MIP method.

Most of the existing approaches use deterministic rules or abstract methods, which has certain disadvantages like it fails to guarantee optimality and could cause financial losses, application of abstract models in the actual multimedia system could be unrealistic and mostly relies on the smart grid administrator skill, and MIP/game-theoretic optimization approaches suffer from scalability issue. Motivated from the aforesaid challenges and to handle them, RL is a prominent solution, which plays a crucial role in optimal decision-making for realistic issues. It is a subset of Artificial Intelligence (AI) and proved its effectiveness by AlphaGO Zero and AlphaGO breakthroughs [15].

For REM, several RL-based approaches like Deep Q-Network (DQN), Q-learning, and others have been adopted globally by researchers to handle energy management issues for electric vehicles (EVs), electric home appliances, etc. [8, 14]. Then, day-ahead pricing mechanism also used with RL-based approach to handle energy load in REM system [10, 18, 22, 25]. Nevertheless, the utmost existing approaches use Q-learning and have been modeled as Markov Decision Process (MDP). So far, there are several limitations, for instance, (i) most studies only focused on electricity and did not consider gas consumption, then the RL algorithms would perform differently while working with different energy carriers, and (ii) REM on daily basis has been studied, though, the hourly DRP shows better prospective to balance REM associated with dynamic limitations.

Moreover, results gained from the REM agent required to be accessed securely by all stakeholders like smart grid administrators, utility suppliers, and consumers for real-time decision-making. Several RL-based methods exist for REM, yet it has not been exploited to its full potential with data security during multimedia communication. Numerous other challenges such as consumer trust, single-point failure, security, and privacy also exist. To cope up with the mentioned challenges, blockchain is a viable solution, which offers security, transparency, and trust to the stakeholders [10].

Blockchain is a decentralized database and Distributed Ledger Technique (DLT) to store data in chained blocks that protect a single-point failure issue, improve trust, privacy, and security [11]. It has been embraced in smart grid systems for secure energy management in an efficient manner [12], for example, Blom et al. [5] describe the likelihood of blockchain in a smart grid system and Li et al. [13] offered an optimal energy pricing scheme using Stackelberg game for the nourishment of efficient energy trading. Far ahead, blockchain is adopted in REM as well [2, 9, 12]. The prevailing blockchain-based REM approaches have several limitations, like limited energy data storage ability, high energy data storage cost on ethereum, and high network bandwidth due to multimedia data redundancy.

1.1 Motivation and research contributions

Motivated from the aforesaid issues and the fact that electricity prices are impacted by natural gas prices, so, DRP requires energy consumption reduction for gas and electricity both in multicarrier REM system in order to improve smart grid reliability and Quality of Service (QoS). In this paper, we propose Q-MSEM, a Q-learning-based multiagent and secure energy management scheme for multimedia communication in a smart grid system. The traditional blockchain-based DRP system stores energy transactions on a blockchain, which is costly and incompetent in a real-time scenario. However, the proposed Q-MSEM scheme stores energy data on the off-chain storage system, i.e., IPFS, which provide low data storage cost, system scalability, and high throughput during access of multimedia energy data [12, 23].

The following are major research contributions of this paper.

  • A REM scheme, i.e., Q-MSEM is proposed using RL methodology considering electricity and gas consumption with dissatisfaction cost to the consumer. The energy load reduction is addressed by consumer dissatisfaction cost.

  • An optimal energy load curtailment strategy is embraced via finite discrete MDP and its tractability is achieved by means of the Q-learning process.

  • An IPFS-based unique ESC is designed to access electricity and gas consumption multimedia data securely in real-time and attain high system throughput by scalability and low storage cost during access.

  • The performance of the proposed scheme Q-MSEM is evaluated compared to existing approaches to shows its effectiveness in terms of different evaluation parameters like load reduction, energy cost, and data storage cost.

1.2 Organizations

The rest of the paper is organized as follows. Section 2 highlights the related work and a comparative analysis of the proposed Q-MSEM scheme with existing approaches. Then, Section 3 describes the system model of the proposed Q-MSEM scheme and Section 4 discusses the problem formulation of Q-MSEM scheme. Then, Section 5 shows the complete workflow of Q-MSEM scheme, then, experimental results and performance evaluation are highlighted in Section 6. Finally, the paper is concluded in Section 7 with future work. Then, nomenclatures and abbreviations are listed in Table 1.

Table 1 Nomenclatures and abbreviations

2 Related work

Several RL-based approaches have been adopted globally by researchers to handle energy management issues for electric home appliances, EVs, and others. Most of the existing approaches use DQN, Q-learning, etc. to handle energy management in REM systems [8, 14]. In [18, 25], day-ahead pricing mechanism is used for RL-based approach to handle energy load in REM system. Moreover, the existing approaches use Q-learning with MDP, yet, there are several limitations, for instance, most studies only focused on electricity and did not consider gas consumption reduction. Furthermore, RL algorithms would perform differently while working with different energy carriers in REM systems. The energy consumption reduction has been studied on daily basis, though, the hourly consumption reduction shows a better perspective to balance DRP associated with dynamic limitations in REM systems.

The authors’ Lu et al. [15] presented a REM system to balance the demand response gap using RL and ANN & accelerate the stability and efficiency in the smart grid. Here, energy cost reduction has been achieved by 7.3%. Then, Xu et al. [31] proposed a data-driven mechanism for REM system using multiagent-based RL. Ahrarinouri et al. [1] presented a multiagent-based RL approach for energy management in REM systems and reduced the energy cost by 12%. Next, Rastegar et al. [24] proposed a mechanism to reduce energy consumption in REM system.

Results gained from the REM agent required to be accessed securely by all stakeholders to make a real-time decision. Several RL-based approaches exist for REM, yet it has not been exploited to its full potential for a multicarrier system with data security. Other challenges like end-consumer trust, single-point failure, security, and privacy also exist. To handle these challenges, blockchain is a prominent solution [10]. It has been embraced in smart grid systems for secure energy management in an efficient manner [12]. Blom et al. [5] describe the likelihood of blockchain in a smart grid system and Li et al. [13] proposed an optimal energy pricing scheme using the Stackelberg game for efficient energy trading. It is adopted in REM as well [2, 9, 12] and blockchain-based REM approaches have several limitations, like limited energy data storage capacity, high energy data storage cost on Ethereum, and high network bandwidth. Hence, this paper proposes Q-MSEM scheme for multimedia communication in a smart grid system using Q-learning and EB. The proposed Q-MSEM scheme stores energy data on IPFS (an off-chain storage system), which provides low data storage cost, system scalability, and high throughput during access of multimedia energy data [12]. Then, a comparative analysis of the proposed Q-MSEM scheme with other existing approaches are discussed in Table 2.

Table 2 A comparative analysis of the proposed Q-MSEM scheme with existing approaches

3 System model

Figure 1 represent the proposed Q-MSEM scheme, i.e., multiagent-based energy load reduction using Q-learning mechanism. Here, multiple energy carriers, i.e., electricity (ECξ) and gas (ECg) consumption is reduced during peak hour and consumption is increased during a non-peak hour, which reduces load burden on smart grid SG that improve QoS & reliability of grid and utility supplier, i.e., US,∀US ∈{US1,US2,US3....USm} and every US should be registered with SG. Here, we construct the dynamic energy load based on MDP for particular consumer C, where C ∈{C1,C2,C3,⋯ ,Cn} in a stochastic energy consumption environment with maximum number of n consumer.

Fig. 1
figure 1

System model of the proposed Q-MSEM scheme

In multicarrier REM system, electricity and natural gas consumption by home appliances (e.g., fridge, water heater, lights, and others) represent the environment, which is dynamic in nature and it has a specific REM agent associated with it. Initially, we formulate the optimal energy consumption as finite MDP for a particular consumer C in a stochastic environment. Here, energy consumption is categorized into two major groups, i.e., gas consumption and electricity consumption. The electricity demand is either controlled by control signals or consumes energy throughout the day without regulating it. The first one is controllable consumption and can be curtailed based on the consumer dissatisfaction cost. The later one unchangeable and identified as a critical load that requires electricity 24-hour a day, for instance, alarm system and fridge. Then, we develop an efficient Q-learning-based model-free consumption reduction approach that does not require the knowledge of system uncertainties. Here, consumers C participate in the proposed Q-MSEM scheme to reduce their energy cost and balance energy supply and demand. The US decides hourly energy prices, which is administered and monitored by SG. The REM agent for electricity and gas maximize their profit by taking action and getting rewards.

The optimal energy consumption for consumer C at each particular timeslot t is published on EB with hourly energy cost based on prices. It is incorporated with IPFS distributed and off-chain storage systems for secure and efficient data storage at a low cost with high scalability. The entire energy data is securely attainable by all stakeholders, i.e., {SG,US, and C} as required. Here, optimal energy consumption enables load reduction and efficient management of energy at the consumers-end, improves QoS, and maximizes SG profit.

4 Problem formulation

We consider a REM system, where houses are equipped with two energy carriers, i.e., electricity and gas, and consuming energy using various kinds of home appliances HA, with the need to optimize energy consumption as shown in Fig. 1. The REM is connected to the US and SG through a bi-directional communication network, which enables the exchange of multimedia energy consumption information and energy prices. Then, REM agents manage electricity and gas consumption in response to energy prices. In REM system, HA are categories as critical appliances and controllable/non-critical appliances based on their priorities and characteristics [15, 17]. The mathematical formulations of critical appliances are described as follows.

$$ \begin{array}{@{}rcl@{}} \xi_{{C},\omega,t}^{crit} = \mathbb{E}_{\omega,t}^{crit},\end{array} $$
(1)

where ω is the HA ∈{1,2,3,⋯ ,α}, α is maximum number of critical appliances, timeslot of a day is represented as t ∈{1,2,3,⋯ ,24}. Hence, critical load for specific consumer C at timeslot t can be represented as.

$$ \begin{array}{@{}rcl@{}} \xi_{{C},t}^{crit} = \sum \xi_{{C},\omega,t}^{crit}. \end{array} $$
(2)

Moreover, another type of appliances consist flexible energy load from minimum load to maximum energy load, for example, washing machine, lights, Air Conditioner (AC), etc., which is calculated as follows.

$$ \begin{array}{@{}rcl@{}} \xi_{{C},\nu,t}^{Ncrit} = \mathbb{E}_{\nu,t}^{Ncrit}, \end{array} $$
(3)

where, ∀ν ∈{1,2,3,⋯ ,β}, β is maximum number of non-critical appliances. Hence, non-critical load for specific consumer C at timeslot t can be represented as.

$$ \begin{array}{@{}rcl@{}} \xi_{{C},t}^{Ncrit} = \sum \xi_{{C},\nu,t}^{Ncrit}, \end{array} $$
(4)
$$ \begin{array}{@{}rcl@{}} min(\xi_{\beta,t}) \le \mathbb{E}_{{C},t}^{Ncrit} \le max(\xi_{\beta,t}) \end{array} $$
(5)

So, total electricity consumption is calculated using (2) and (4) as follows.

$$ \begin{array}{@{}rcl@{}} \xi_{{C},t} = \xi_{{C},t}^{Ncrit} + \xi_{{C},t}^{crit} \end{array} $$
(6)

Here, only non-critical consumption could be reduced as critical appliances consume electricity 24-hour. Though reduction in electricity consumption cause dissatisfaction to consumer. So, the dissatisfaction cost function for consumer C in terms of electricity consumption reduction is defined as follow [32].

$$ \begin{array}{@{}rcl@{}} \phi_{{C},t,\xi} = \frac{\gamma_{{C}}}{2} \Big \{\xi_{{C},t}^{Ncrit} - \xi_{{C},t}^{{\varOmega}_{Ncrit}} \Big\}^{2} + \delta \cdot \Big \{\xi_{{C},t}^{Ncrit} - \xi_{{C},t}^{{\varOmega}_{Ncrit}} \Big\}.\\ \gamma_{C}, \delta > 0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \end{array} $$
(7)

where, γC is preference parameter to consumer to reduce energy load that has greater importance in dynamic load reduction. In case of high value of γC, dissatisfaction cost to consumers will be high in terms of load reduction [32]. Here, \(\xi _{{C},t}^{{\varOmega }_{Ncrit}}\) represents dynamic energy load and range of load reduction is identified by \(min_{coef}\cdot \xi _{{C},t}^{Ncrit}\) and \(max_{coef}\cdot \xi _{{C},t}^{Ncrit}\) when energy pricing is in effect, where mincoef, maxcoef, and δ are system dependent parameter.

Furthermore, the main aim is to reduce energy consumption for both the energy carrier ECξ and ECg, so dissatisfaction cost in case of gas consumption reduction could be calculated as follows (considering gas consumption in case of room heating as critical consumption).

$$ \begin{array}{@{}rcl@{}} \phi_{{C},t,g} = \frac{\gamma_{{C}}}{2} \Big \{g_{{C},t}^{Ncrit} - g_{{C},t}^{{\varOmega}_{Ncrit}} \Big\}^{2} + \delta \cdot \Big \{g_{{C},t}^{Ncrit} - g_{{C},t}^{{\varOmega}_{Ncrit}} \Big\}.\\ \gamma_{C}, \delta > 0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \end{array} $$
(8)

4.1 Objective function

The proposed Q-MSEM scheme focuses on energy consumption reduction and energy cost CostC,t reduction to consumer C by employing dynamic energy consumption policy for efficient energy management. Therefore, electricity cost can be mathematically formulated as follows using (6).

$$ \begin{array}{@{}rcl@{}} Cost_{{C},t,\xi} = {\xi_{Price^{t}}} * \xi_{{C},t}.\\ \exists \gamma =\alpha +\beta \end{array} $$
(9)

where, \({\xi _{Price^{t}}}\) represent electricity price at time t and γ describe total number of HAs. Similarly, gas consumption cost can be mathematically calculated as follows.

$$ \begin{array}{@{}rcl@{}} Cost_{{C},t,g} = {g_{Price^{t}}} * g_{{C},t}.\\ \exists \gamma =\alpha +\beta \end{array} $$
(10)

where, \({g_{Price^{t}}}\) represent gas price at time t and γ describe total number of HAs consuming gas at time t. Here, both electricity and gas prices are definable hourly by US that is varying based on time and monitored by SG to handle any kind of discrepancies. Their contributions in the consumer objectives are stated in the following relation using (7), (8), (9), and (10):

$$ \begin{array}{@{}rcl@{}} \theta_{\xi} = {\xi_{Price^{t}}} * \xi_{{C},t}^{{\varOmega}_{Ncrit}} + \phi_{{C},t,\xi} \end{array} $$
(11)
$$ \begin{array}{@{}rcl@{}} \theta_{g} = {g_{Price^{t}}} * g_{{C},t}^{{\varOmega}_{Ncrit}} + \phi_{{C},t,g} \end{array} $$
(12)

Hence, the objective function is defined as follows using (11) and (12).

$$ \begin{array}{@{}rcl@{}} \theta = \theta_{\xi} + \theta_{g} \end{array} $$
(13)

Further, the entire energy consumption multimedia data like, energy cost to consumer, electricity consumption (in kWh), electricity price, gas consumption (in kWh), gas price for particular consumer C at timeslot t require to be accessible by all stakeholders ∈{SG,UP,C}. The entire energy data (ED) is stored on IPFS and hash-key (hkey) is generated to published on EB at timeslot t for real-time access, which is securely accessible EDsecure by {SG,UP,C} using their hash-key (SGipfs−>hkey,UPipfs−>hkey,Cipfs−>hkey) (generated at EB). The EDsecure stoage cost is high on EB, therefore a unique access mechanism, i.e., IPFS is uses for low data storage cost \({ED}_{secure}^{\mathbb {S}\mathbb {C}}\) and make proposed scheme scalable \({ED}_{secure}^{\mathbb {S}\mathbb {C}\mathbb {A}\mathbb {L}}\).

$$ \begin{array}{@{}rcl@{}} {ED}_{secure} \leftarrow \{\mathfrak{C}, \xi_{{C},t}, Cost_{{C},t,\xi},g_{{C},t}, Cost_{{C},t,g}\}. \end{array} $$
(14)

Therefore, the objective of the proposed Q-MSEM scheme is defined as follows using (13).

$$ \begin{array}{@{}rcl@{}} &&\mathbb{O} = min\{\theta, {ED}_{secure}^{\mathbb{S}\mathbb{C}}\} + max\{{ED}_{secure}, {ED}_{secure}^{\mathbb{S}\mathbb{C}\mathbb{A}\mathbb{L}}\}, \\ &&{\kern-3.1pc}\leftarrow min\{{\xi_{Price^{t}}} * \xi_{{C},t}^{{\varOmega}_{Ncrit}} + \phi_{{C},t,\xi} ~~~~~~~~~~~~~~~~~~~ \\ &&{\kern-2.1pc}+ {g_{Price^{t}}} * g_{{C},t}^{{\varOmega}_{Ncrit}} + \phi_{{C},t,g}, {ED}_{secure}^{\mathbb{S}\mathbb{C}}\} ~~~~~ \\ &&{\kern-2.1pc}+ max\{{ED}_{secure}, {ED}_{secure}^{\mathbb{S}\mathbb{C}\mathbb{A}\mathbb{L}}\}.~~~~~ \\ &&{\kern-4.1pc}Subject \ to \ the \ following \ constraints,~~~~~~~~~~~~~~~~~~~~~~~~~ \\ &C1&: \xi_{{C},t}, g_{{C},t} \ne 0, \forall {C} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \\ &C2&: t \ne 0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ &C3&: {ED}_{secure} \Leftrightarrow EB_{IPFS} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ &C4&: \{ipfs->hkey\} = valid_{hkey}, \forall \{SG,UP,C\} ~~~~~~ \end{array} $$
(15)

where, first term represents the minimisation of energy consumption and energy multimedia data storage cost at EB. Then, second term illustrate the maximization of security of ED, i.e., EDsecure and maximize the REM system scalability in the proposed scheme. The real-time availability of EDsecure over EB makes the approach fairly efficient and handle single-point failure issue. Here, constraint C1 shows energy consumption ξC,t and gC,t should have non-zero value for all consumer C and constraint C2 represents timeslot start from 1. Then, constraint C3 shows that \({ED}_{secure}^{\mathbb {S}\mathbb {C}}\) and \({ED}_{secure}^{\mathbb {S}\mathbb {C}\mathbb {A}\mathbb {L}}\), which is calculated in case EDsecure is stored on EB associated with IPFS mechanism and consisting valid {ipfs− > key} for all stakeholders in constraint C4.

5 Q-MSEM: The proposed scheme

This section highlight the workflow of the proposed Q-MSEM scheme using RL methodology. Figure 2 illustrate the workflow of Q-MSEM, where US comprises two REM agent (one for gas and another for electricity) for energy consumption reduction and decision-making for DRP. Energy consumption at residential houses creates stochastic environment, which is dynamic in nature. Q-MSEM start its execution from beginning of a day and initialises various parameters like Q-value to 0, timeslot t to 1, and episodes i to 1. Then, other simulation parameters are set in the beginning.

figure a
Fig. 2
figure 2

Worflow of the proposed Q-MSEM scheme

Initially, \(\xi _{{C},t}^{crit}\), \(\xi _{{C},t}^{Ncrit},g_{{C},t}^{crit}\), and \(g_{{C},t}^{Ncrit}\) is calculated based on the HAs used in a residential building. Then, dynamic energy load is articulated using MDP in distinct finite-step in stochastic environment. Here, energy demand and reward depends on dynamic load \(\xi _{{C},t}^{{\varOmega }_{Ncrit}}\) and \(g_{{C},t}^{{\varOmega }_{Ncrit}}\) at the specific timeslot t. It consists various elements, for example : action \(\mathbb {A}\), discrete timeslot t, states \(\mathbb {S}({\xi _{C},t}^{Ncrit})\) for electricity, \(\mathbb {S}(g_{{C},t}^{Ncrit})\) for gas, and Reward \(\mathbb {R}\). Then, dynamic load \(\{\xi _{{C},t}^{{\varOmega }_{Ncrit}}, g_{{C},t}^{{\varOmega }_{Ncrit}}\}\) are executed as \(\mathbb {A}\). The energy load demand before applying dynamic load is \(\xi _{{C},t}^{{Ncrit}}\) and after is \(\xi _{{C},t}^{{\varOmega }_{Ncrit}}\), then, reward \(\mathbb {R}\) at timeslot t is obtained from the objective function using (13).

For adopting optimal action decision, the proposed scheme considered a discrete finite horizon MDP that contains the Markov property, where state transitions are solely dependent only on the current action and state, resulted independent of all previous states and actions are taken by the agent. Here, the policy ρ mapping states to actions are defined as follows.

$$ \begin{array}{@{}rcl@{}} {\rho} : \mathbb{A}_{k,t} = \rho(\mathbb{S}_{k,t}) \end{array} $$
(16)

Further, the main aim is to discover an optimal policy ρ for each state \(\mathbb {S}_{k,t}\), so that selected action \(\mathbb {A}_{k,t}\) maximizes the reward \(\mathbb {R}(\mathbb {S}_{k,t}, \mathbb {A}_{k,t})\) [15]. The reward maximization based on optimise policy ρ can be calculated using Bellman Equation as follows:

$$ \begin{array}{@{}rcl@{}} \mathbb{Q}^{*}_{\rho}(\mathbb{S}_{k,t},\mathbb{A}_{k,t}) = \mathbb{R}(\mathbb{S}_{k,t},\mathbb{A}_{k,t}) + \zeta \cdot max \mathbb{Q}(\mathbb{S}_{k,t+1},\mathbb{A}_{k,t+1}) \end{array} $$
(17)

where, ζ is convergence parameter. Algorithm 1 is used for Q-value convergence, where after executing various episodes, Q-value converges to maximum Q-value. Here, trial and error mechanism is used by REM agents to store and update Q-values. REM agent perform action in every hour and Q-value in Q-table is updated using (17) as follows.

$$ \begin{array}{@{}rcl@{}} \mathbb{Q}(\mathbb{S}_{k,t},\mathbb{A}_{k,t}) \leftarrow \mathbb{Q}(\mathbb{S}_{k,t},\mathbb{A}_{k,t}) + \chi \cdot \Big[\mathbb{R}(\mathbb{S}_{k,t},\mathbb{A}_{k,t}) \\ + \zeta \cdot max \mathbb{Q}(\mathbb{S}_{k,t+1},\mathbb{A}_{k,t+1}) - \mathbb{Q}(\mathbb{S}_{k,t},\mathbb{A}_{k,t}) \Big] \end{array} $$
(18)

Where, χ is learning rate ∈ [0,1] and ζ is the future reward discount factor, which is set as 0.95 for accumulation of high reward. Q-value is convergence to a maximum Q-value using optimal policy that is formulated as follows:

$$ \begin{array}{@{}rcl@{}} \rho = argmax(\mathbb{Q}(\mathbb{S}_{k,t},\mathbb{A}_{k,t})). \end{array} $$
(19)

Furthermore, 𝜖-greedy approach is applied within dynamic energy consumption boundaries (maximum and minimum consumption) and agent select actions randomly. Then, reward is calculated and repeated the process till the end of a particular day. Next, REM agent converge to maximum Q-value using convergence parameter value ζ with terminating condition |Q(i+ 1)Qi|≤ ς, where ς is a system dependent parameter.

Once Q-value converges, the details of optimal energy consumption, its cost, and energy costs for specific consumer C at timeslot t are required to be accessible in real-time by all stakeholders ∈{C,SG,UP,}. So, ESC and Algorithm 2 is proposed to maximize the security of EDsecure compare to the existing approaches like Li et al. [13] & Jindal et al. [9] and published energy multimedia data on EB, which is incorporated with IPFS mechanism. Here, payment of energy CostC,t depends on the consumer by various means like using EB, online payments or others.

The energy data stoage cost \({ED}_{secure}^{\mathbb {S}\mathbb {C}}\) is high on EB, so it necessitates to develop a unique access mechanism, i.e., IPFS protocol for low \({ED}_{secure}^{\mathbb {S}\mathbb {C}}\) and make proposed scheme more scalable \(ED_{secure}^{\mathbb {S}\mathbb {C}\mathbb {A}\mathbb {L}}\). In conventional EB, one word contains 256 bits ≈ 25 Bytes. Next, a word is stored in EB using ethereum gas (EG) and SSTORE, where SSTORE = 20000 gas, i.e., EBsstore = 20000 gas [30].

$$ \begin{array}{@{}rcl@{}} 1word = 20000gas.\\ 1KB = (2^{10})/(2^{5}) \times 20000gas. \end{array} $$

Then, EG price EGprice (in Gwei) and Ethereum cryptocurrency (Ether) ETHprice (in USD) are dynamic in nature. EGprice is 10.928 Gwei as of 12th January, 2020.

$$ \begin{array}{@{}rcl@{}} EG_{price} = \mathbb{X} Gwei , \\ EG_{price} = \mathbb{X} / 10^{9} Ether. \end{array} $$
(20)

Where, \(\mathbb {X}\) changes as per cryptocurrency market, So, one word storage cost in EB is calculated as follows.

$$ \begin{array}{@{}rcl@{}} Storage_{cost}^{1} = EG_{price} * 1word , \\ \end{array} $$
(21)

Now, calculated \(Storage_{cost}^{\mathbb {W}}\) for W-words using (20) as follows [30].

$$ \begin{array}{@{}rcl@{}} Storage_{cost}^{\mathbb{W}} = (\mathbb{X} * \mathbb{W} * 1word) / 10^{9} Ether. \end{array} $$
(22)

Based on the ETHprice in USD, conversion of the storage cost in USD is calculated as follows.

$$ \begin{array}{@{}rcl@{}} {Storage_{cost}^{\mathbb{W}}}^{USD} = USD \{{ Storage_{cost}^{\mathbb{W}} \times ETH_{price}} \}, \\ \Rightarrow \{{ (EG_{price} * \mathbb{W} * 1word) \times ETH_{price}}\}, \\ \Rightarrow \{{ (\mathbb{X} * \mathbb{W} * 1word)/10^{9} \times ETH_{price}}\}. \end{array} $$
(23)

Therefore, the energy data storage cost \({ED}_{secure}^{\mathbb {S}\mathbb {C}}\) for \(\mathbb {W}\)-words is calculated in USD using (23).

$$ \begin{array}{@{}rcl@{}} {ED}_{secure}^{\mathbb{S}\mathbb{C}} = {Storage_{cost}^{\mathbb{W}}}^{USD}. \end{array} $$
(24)

The proposed Q-MSEM scheme comprises IPFS-based EB, which involves only hashkey storage cost in place of complete energy data storage. Next, IPFS is an open-source, distributed storage system that is immutable and free. IPFS receives the energy data {C_ID,ξC,t,gC,t,CostC,t,ξ,CostC,t,g} from US and generate the hashkey of it. IPFS splits the EDsecure file into chunks that are encrypted with a random encryption key and satisfies the ESC conditions. Then, these encrypted chunks are stored into the EB. The hashkey size is 256bit for SHA-256 or 160 bits for SHA-1 in EB, hence a single tuple involves 1word-size in case of proposed scheme. The off-chain storage of EDsecure improves security and scalability of the scheme by adding more number of energy data at low cost. All transactions in Q-MSEM are authenticated by stakeholders due to EB characteristics like immutability, distributed, etc. that ensures no forgery with energy data.

figure b

6 Experimental results

In this section, we highlight results obtained from experiments and evaluate the performance of the proposed Q-MSEM scheme.

6.1 Experimental setup and dataset

The proposed Q-MSEM scheme is implemented using python programming language on Windows operating system configured as Intel(R) Core(TM) CPU @ 2.60GHz, 8GB RAM. The Open Energy Information (openEI) datasets is taken [16], which contains electricity and gas consumption data by various HAs and pre-processed using a sci-kit-learn library. The critical load data (for instance refrigerator and other) is extracted from Pecanstreet [20]. Then, hourly electricity prices is taken from PJM Data Miner as of 12th January, 2020 [22] and residential gas prices is taken from openEI [4] and hourly gas price dataset is created from it.

Firstly, the collected energy (electricity and gas) data is pre-processed for cleaning. Then, unwanted observations, for instance, irrelevant information and duplicate data values are removed from data. Next, we filter out the unwanted outliers, which are observed in both the electricity and gas consumption data. Then, we address the issue of zero and null value occurrence by employing a linear interpolation mechanism. Next, zero and null values are replaced by the interpolated values and then, cleaned energy data is prepared. Here, the major aim for cleaning is to remove errors, handle redundancy, increase energy data reliability for optimal analysis, and to make efficient use of memory resources.

6.2 Results and comparative analysis of Q-MSEM

For simulation, a residential house is considered with critical HA (refrigerator), and non-critical/controllable HA (AC, Basic facility, lights, and others). In case of gas consumption, room heater is considered as critical HA and rest as non-critical HA. Initially, all simulation parameters are set as illustrated in Table 3. Then, Algorithm 1 is used for energy consumption reduction and Q-value converges based on dynamic consumption of electricity and gas. The proposed scheme is effortlessly expendable with more number of HAs, which can be included due to flexible and scalable architecture of Q-MSEM scheme.

Table 3 Simulation parameter settings

Figure 3a and b shows the accumulated critical and non-critical electricity and gas consumption of a specific consumer, which evident that non-critical HAs play major contribution in energy consumption through out the day. So, only non-critical energy load can be adjusted in DRP of REM system as critical load are those load which require energy through out a day, which is discussed in detail in Section 4.

Fig. 3
figure 3

Hourly energy consumption

Figures 4 and 5 describe the energy consumption reduction for electricity and gas of a consumer based on dynamic energy load associated with Q-learning. It depicts from the figure that energy consumption is reduced during peak hour (as prices increases the energy demand of consumer gets reduce). More, the non-peak hour having high consumption to balance the dissatisfaction cost to consumer in terms of energy demand. Figure 6 shows the total reduction in a day for electricity and gas. Before applying Q-MSEM scheme, electricity consumption is 106.8 (kWh), which reduces to 103.67 (kWh), and in case of gas consumption reduces from 1200.27 (kWh) to 1179.95 (kWh).

Fig. 4
figure 4

Gas consumption reduction using proposed Q-MSEM scheme

Fig. 5
figure 5

Electricity reduction using proposed Q-MSEM scheme

Fig. 6
figure 6

Total Energy reduction using proposed Q-MSEM scheme

Figure 7a shows the Q-value convergence and initially it has high convergence then it learns with trial and error approach and result to maximum Q-values and compared with other existing approaches such as Baseline 1 (Lu et al. [15]) and Baseline 2 (Xu et al. [31]). Figure 7b illustrate the hourly electricity cost reduction of the proposed approach. Graph shows the effectiveness of the proposed scheme while comparing with the other existing approaches: Baseline 1 (Lu et al. [15]), Baseline 2 (Ahrarinouri et al. [1]), and Baseline 3 (Rastegar et al. [24]). The total cost reduction is reported as 15.82% in case of electricity in the proposed Q-MSEM scheme, which is moderately high compare to Baseline 1 (7.3%), Baseline 2 (12%), and Baseline 3 (10%). The total gas consumption cost reduction is acheived 1.72% in the proposed Q-MSEM scheme.

Fig. 7
figure 7

Comparative analysis of the proposed Q-MSEM scheme and existing approaches: (a) Convergence of Q-value and (b) Cost comparison

6.3 Ethereum smart contract

In the proposed Q-MSEM scheme, after performing the DRP activity using optimal energy load, the details of electricity and gas consumption, hourly energy prices are required to be shared in real-time with all stakeholders using Algorithm 2. The ESC is developed in Remix IDE and deployed over Truffle suits to perform block verification on EB.

Figure 8 shows the Q-MSEM interface uses ESC for real-time electricity and gas data access and decision making. It supports several real-time functionalities related to energy data publishing on EB with high energy data security. It facilitates several security features, which are time-stamps dependent such as tx.origin, re-entrancy, execution cost, and others. It is one of the critical tasks to design ESC as it can not be altered after deployment. Hence, we verified the security vulnerabilities of the proposed Q-MSEM scheme on open-source tool Mythril [3]. Figure 9 shows the successful verification of security vulnerabilities of ESC on Mythril.

Fig. 8
figure 8

Smart contract interface

Fig. 9
figure 9

Security verification on Mythril

Figure 10a shows comparative analysis between conventional EB-based system and the proposed Q-MSEM scheme in terms of data storage cost. In a conventional EB-based system, the energy data storage cost is calculated using (24), whereas in the proposed scheme IPFS mechanism is incorporated, which is distributed in nature and is a low-cost storage system compared to conventional EB such as Li et al. [13] & Jindal et al. [9]. Figure 10a evident that as Ether price upsurge (11thJanuary,2020 to 20thJanuary,2020), then energy storage cost gets impacted slightly due to the off-chain storage facility compare to the conventional-EB system, which has a huge impact on price change.

Fig. 10
figure 10

Comparative analysis of conventional EB and the proposed Q-MSEM scheme: (a) Storage cost comparison and (b) Scalability comparison

Figure 10b illustrate scalability comparison of the proposed Q-MSEM scheme based on the transactions time and blocks mined during execution of ESC. In the proposed scheme, only the hash-key of energy data is sent to the EB for storage as complete multimedia data stores on IPFS off-chain database. Here, hash-key size is 256bit (SHA-256) /160 bits (SHA-1). Hence, Q-MSEM scheme facilitates to accommodate more transactions on the EB and service more consumers at the smart grid. With the increase of the end-consumers, more blocks need to be mined with the increase request, which resulted to upsurge the transaction time. In the proposed Q-MSEM scheme, 256 bits haskey is generated for energy data that is comparatively less than the actual size of a transaction. So, Q-MSEM scheme facilitates adding more transactions on the EB to service more end-consumers.

The proposed Q-MSEM scheme benefited multicarrier REM systems with data security though it only works in an energy consumption environment with finite & discrete action spaces and state. To extend the proposed Q-MSEM scheme to richer environments, function approximators are needed to be applied instead of storing the full state-action table, which is often infeasible. Therefore, an extension of this research work will be employing deep RL to handle uncertain load patterns.

7 Conclusion

In this paper, we proposed Q-MSEM scheme, a multiagent-based secure REM approach for multimedia grid communication using Q-learning with the aim to reduce energy consumption with a notch of discomfort in DRP. Motivated by the fact that electricity prices are impacted by natural gas prices, so, DRP requires energy consumption reduction for gas and electricity both in multicarrier REM systems, which improve smart grid reliability and reduce the burden of energy generation and supply on the grid with QoS. So, we have adapted an RL-based approach using Q-learning, to incorporate dynamic energy consumption to reduce load. The empirical results validated the effectiveness of the proposed approach compared to state-of-the-art approaches.

In the future, real-time energy prices, working time of HAs, and fault detection with the effect of Current Transformer (CT) saturation will be an extension of this research work using deep RL to handle uncertain load patterns.