1 Introduction

In a wireless rechargeable sensor network (WRSN), energy provisioning to sensor nodes is performed by a wireless charging vehicle (WCV). Subject to charging capability constraint such as WCV’s battery capacity, however, determining an efficient order in which sensor nodes should be charged, is a challenging problem to be solved (Ma et al. 2018). Existing charging schemes can be classified in two kinds of periodic and on-demand charging.

In periodic charging schemes, WCV replenishes energy to sensor nodes traveling along the pre-prepared charging path. On the other hand, sensor nodes adopting on-demand scheme monitor their own energy state actively and send charging requests (CRs) to the BS or WCV once their energy is lower than the predefined threshold (Cheng and Yu 2020). This means that it doesn’t need a fixed charging schedule and that it has the capability to cope with change of uneven and dynamic energy consumption rate of nodes, thus many researches have been made in this field.

WCV charges sensor nodes in two schemes of single-node charging scheme and multi-node charging one. In the single-node charging, only one sensor node can be recharged at a time. But in the multi-node charging, several sensors fallen within the charging range can be recharged at the same time to reduce waiting time caused in each sensor and to improve charging efficiency. Besides, which would be used among single-WCV and multi-WCV schemes is decided according to the size of monitoring area or the density of sensor nodes. In the single-WCV scheme, only one WCV is responsible for energy provisioning. With the multi-WCV scheme, more than two WCVs charge all sensor nodes, thus it is more favorable for prolonging the network lifetime. In special, it can make the energy usage efficiency maximal by adopting the cooperative charging schemes between WCVs (Zhang et al. 2015; Lin et al. 2018a, 2016a; Madhja et al. 2016).

So far, various charging scheduling schemes using single-WCV and multi-WCV with single-node and multi-node charging methods have been studied in WRSNs, each with its own advantages and disadvantages. The common disadvantage of charging schemes except single WCV-single node ones is that they did not adopt proactive charging scheme that charges the potential-to-be-bottlenecked nodes (pBNs) preferentially, though they did not issue CRs yet, as long as WCV has redundant charging capability with respect to battery power and time. Recently, intelligent on-demand charging schemes which make charging scheduling decisions by jointly considering multi-criteria characterizing CR nodes, have been proposed (Tomar et al. 2019, 2021; Tomar and Jana 2021; Nguyen 2021; Mangun et al. 2023). There are fuzzy logic-based schemes (Tomar et al. 2019; Tomar and Jana 2021), an integrated AHP-TOPSIS with analytic hierarchy process (AHP) and technique for order preference by similarity to ideal solution (TOPSIS) (Lin et al. 2016a), Fuzzy Q-charging using fuzzy logic and Q-Learning (Nguyen 2021), an integrated Fuzzy AHP (FAHP)-variable weight analysis (VWA)-TOPSIS-based schemes (Mangun et al. 2023) in the MCDM-based on-demand charging scheduling schemes proposed so far.

Meanwhile, Cheng et al. (Cheng and Yu 2020) have proposed proactive charging scheme for single WCV-single node scheme. They convert time and distance-based charging scheduling into only distance-based charging scheduling by adopting a bottleneck prediction and removal mechanism (BP&R). This leads to improvement of network performance. However, the deadline estimation and BP&R mechanisms have following disadvantages. First, in the deadline estimation, each sensor node uses a fixed predefined deadline threshold. In reality, if energy consumption rate of each node is changed by occurrence of an accidental event and etc., CR issuing frequency is also changed. This means that an allowable error range threshold, deadline threshold should be changed dynamically according to the CR issuing frequency. Next, this scheme makes WCV charge the pBNs randomly selected within the bottleneck window. As a result, special nodes that occupy the important locations (such as roads or battle fields) and play the important roles (i.e. backbone nodes such as cluster head nodes) in the network may be excluded from this random selection. In order to resolve these problems, the authors proposed a novel approach which exploits an integrated FAHP-VWA-TOPSIS through the overall charging scheduling, where a SoC scheduling is accompanied by when the WCV has the redundant capability recently (Mangun et al. 2023). This approach does not use the deadline estimation and BP&R mechanisms in Cheng and Yu (2020) at all to solve the above problems. However, the exploitation of intelligent algorithms such as Q-Learning have not been considered in any stage of charging scheduling for WRSNs and have been left room of possibility which can apply several intelligent algorithms in charging scheduling.

The goal of this work is to find all possible ways to improve the proactive charging performance in a semi-on-demand scheme by single WCV-single node scheme of full charging and to develop those algorithms. The main contributions of our work are as follows:

  • Up to the authors’ knowledge, we are the first to conceive a novel design method which makes the best use of FAHP-VWA and Q-Learning in SoC scheduling.

  • We propose a method to determine the exact weights of multi-criteria which characterize the pBNs, with FAHP-VWA and predict the pBNs corresponding to WCV's charging capability exactly based on these weights.

  • A methodology to design the reward function for updating Q-value based on weights of multi-criteria by FAHP-VWA, is proposed. Based on it, we propose a method to select the most suitable pBNs, proactive charging nodes that would be included in a charging round, with Q-Learning.

  • Extensive simulations are proceeded and it is demonstrated that the proposed algorithm predominates over other existing algorithms.

The remainder of this article is organized as follows: Sect. 2 gives a brief survey of prior works. Section 3 includes preliminaries including problem description and Sect. 4 presents the proposed algorithm. Expensive experiment results of the proposed scheme and analysis of them are included in Sect. 5. Finally, this work is concluded in Sect. 6.

2 Related Works

Various algorithms on charging scheduling in WRSNs have been developed. Here, we give brief overview on-demand charging scheduling schemes according to the number of WCVs in the network and the number of nodes charged at a time.

Charging scheduling methods focusing on a single WCV-single node scheme have studied in Lin et al. (2018b), He et al. (2015), Fu et al. (2016), Kaswan et al. (2018), Lin et al. (2016b), Shu et al. (2015). This scheme covers majority of charging scheduling methods already reported. Lin et al. (2016b) developed a time–space priority scheduling method and in Lin et al. (2018b), the authors proposed a time–space charging method aiming at finding the optimal charging path to minimize the number of dead nodes. He et al. (2015) proposed a preemptive on-demand charging scheme that recharges the nearest node from service queue firstly. The authors in Fu et al. (2016) have found an approximated solution by using the conception of smallest enclosure disk and determined optimal charging locations for WCV. In (Kaswan et al. 2018), the researchers presented linear programming of charging scheduling problem in the single WCV-multi node scheme and proposed an on-demand charging method based on gravitational research algorithm (GSA).

In order to solve the problem that a single WCV-single node scheme degrades the energy usage efficiency, some works (Ma et al. 2018; Tomar et al. 2019; Nguyen 2021; Xie et al. 2015; Khelladi et al. 2017) tried to study a single WCV-multi node charging scheme. Ma et al. (Ma et al. 2018) developed a multi-node charging scheme using a single WCV that schedules sensor nodes according to the charging utility gain only relied on residual energy of each node. Literature (Khelladi et al. 2017) considered a multi-node charging for energy replenishment to the CR nodes but they did not achieve the goal of minimizing the charging latency. Xie et al. (2015) developed a formal optimization framework by jointly optimizing traveling path, flow routing, and charging time at each cell. In (Tomar et al. 2019), the authors developed a fuzzy logic-based charging algorithm to blend network parameters such as residual energy, distance to WCV and critical node density to determine the next-to-be-charged node. Tomar and Jana (2021) proposed a scheduling scheme which is based on two multi-criteria decision making (MCDM) methods, namely AHP and TOPSIS that is able to choose the most suitable node for charging by evaluating several network criteria. In (Nguyen 2021), the Fuzzy Q-Charging was developed, in which the partial charging time is determined by fuzzy logic, while the next sojourn point of WCV is selected with Q-Learning. In this scheme, fuzzy logic was used to determine the partial charging time corresponding to the safe energy level at each sojourn point and the reward function in charging ranking using Q-Learning is designed from three factors such as energy severity, node priority and target monitoring. An approach which exploits an integrated FAHP-VWA-TOPSIS through the overall charging scheduling was developed by the authors (Mangun et al. 2023). Here, a SoC scheduling is performed when the WCV has the redundant charging capability. In this SoC scheduling, the potential nodes that may be bottlenecked in the future among non-CR nodes are predicted using the relative weights assigned by FAHP-VWA, and the most suitable proactive charging nodes among the predicted potential nodes are selected by TOPSIS.

Also, multi WCV-single node charging scheduling problems have studied in Xu et al. (2016), Gharaei et al. (2020), Hu et al. (2018), Jiang et al. (2014), Liang et al. (2016), Mo et al. (2019). In (Lin et al. 2018a), Xu et al. solved the multi WCV-single node charging scheduling problem by minimizing the total traveling distance in large-scale WRSN with nodes densely deployed. Gharaei et al. (2020) developed a route optimization algorithm of WCV to determine the optimal charging path of WCV to travel along sensor nodes so that the balanced energy exhaustion time of nodes can be attained. In (Hu et al. 2018), a gap-based periodic charging scheduling algorithm and a charging path planning algorithm is proposed, while in Jiang et al. (2014), three heuristics are proposed to study the problem of on-demand charging scheduling that maximizes the coverage of event monitoring. Liang et al. (Liang et al. 2016) formulated the problem of minimizing the number of WCVs charging sensor nodes and developed an approximate algorithm for it. In (Mo et al. 2019), the authors have solved the adjustment problem of multiple WCVs with a goal to minimize the whole energy consumption of WCVs by adjusting their traveling and charging time.

On the other hand, study on multi WCV-multi node charging schemes to increase the energy usage efficiency of multiple WCVs in multi-WCV operating environment has been carried out in Tomar et al. (2021), Xu et al. (2021), Han et al. (2019), Rault (2019). In (Xu et al. 2021), authors newly formulated a charging scheduling for minimizing the longest delay and obtained a closed charging path for each WCV while preventing two or more WCVs from charging one sensor node at the same time. Tomar et al. (Tomar et al. 2021) proposed a new on-demand charging scheme based on a fuzzy logic that blends multiple network criteria including residual energy of nodes, distance to WCV, crucial node density, and energy consumption rate to balance the network traffic load evenly, so that each WCV finds the next charging node location within its domain and simultaneously charges the nodes within the charging range. In (Han et al. 2019), the authors proposed a charging scheme that first clusters the sensor nodes into unequal clusters and then draws up the charging schedule for WCVs. (Rault 2019) proposed to balance the charging load evenly between multiple WCVs by dividing the energy CRs of nodes. The problem of drawing up optimal charging schedule for multiple WCVs in the heterogynous WRSNs has studied by Priyadarshani et al. (2021) and integrated as a popular MCDM method (AHP-TOPSIS) with non-dominated sorting genetic algorithm (NSGA-II) which secures pareto-optimal solution.

The common disadvantage of a large number of charging schemes categorized according to the number of WCVs used in WRSN and the number of sensor nodes that one WCV charges at a time is that they did not take into account the proactive charging scheme where nodes, which could potentially cause a charging bottleneck though they did not issue CRs yet, are charged preferentially so long as WCV has redundant capability. Under such circumstance, the authors focused on the on-demand charging scheduling to design a novel scheme which may be accompanied by a SoC scheme (Mangun et al. 2023). In this article, we focus on a SoC scheduling and further enrich the contents of the SoC in such a way that choose the most suitable pBNs among the predicted pBNs with Q-Learning unlike TOPSIS used in Mangun et al. (2023), based on weights of multi-criteria assigned by FAHP-VWA like (Mangun et al. 2023). Comparisons between the existing charging scheduling schemes considered above and the proposed scheme are presented in Table 1.

Table 1 Summary of the existing on-demand charging schemes

3 Preliminaries

3.1 Symbols and Definitions

The main symbols used in this article are shown in Table 2.

Table 2 Symbols and their definitions

Also, some terms and criteria to describe the proposed algorithm are defined below.

Definition 1

Node location importance degree: This criterion is reflective of importance degree of voronoi region which each node locates when the whole monitoring area is divided into \(m \times n\) discrete grids with its own impact factor, \(D = \{ g_{ij} \}_{m \times n}\) (Xiao et al. 2018). The importance of each grid is defined as an advent frequency of the monitored object appearing within the grid and can be obtained mostly through prior knowledge. It can be seen that location importance degree of each node is directly related to the importance of each grid. The location importance degree of node i, \(NLID_{i} (t)\), is denoted as follows:

$$NLID_{i} (t) = \min \left\{ {C \times w_{i} (t) \times N/\varphi ,\,1} \right\}$$
(1)

where \(N\) denotes the number of sensor nodes, \(w_{i} \,\) is the weight of the voronoi region of the grid \(g_{ij} \,\), \(\varphi\) is the total amount of maximum monitoring efficiency of each grid, C (\(C \in [0,10]\)) is the perspective factor that takes into account the influence of the environmental changes such as topology, node failure, etc., and the wrong prior knowledge of each grid, respectively. They are expressed as follows:

$$w_{i} (t) = \sum\limits_{{g_{ij} \in \Xi_{i} }} {\phi_{ij} (t)}$$
(2)
$$\varphi = \sum\limits_{{g_{ij} \in D}} {\max (\phi_{ij} (t)} )$$
(3)
$$C = 10 \times \left( \frac{h}{N} \right)^{4}$$
(4)

In above equations, \(\phi_{ij} (t) = 1 - e^{{ - a_{ij} }} [1 - \phi_{ij} (t - 1)]\) (where \(a_{ij}\) is the importance degree of the grid \(g_{ij} \,\)) is surveillance efficiency of the grid, \(h\) is the frequency detected during t.

Definition 2

Node role importance degree: It reflects importance degree related to role of each node in the network. The role of each node is evaluated with the traffic load which it forwards. Using the concept of edge betweenness (EB), this criterion is calculated as follows (Cuzzocrea et al. 2012);

$$NRID = EB(y) = \sum\limits_{x \ne y \ne z} {\frac{{\sigma_{xz} (y)}}{{\sigma_{xz} }}}$$
(5)

where \(EB(y)\) is EB of network edge \(y\), \(\sigma_{xz} (y)\) is the aggregate number of the shortest paths between node \(x\) and node \(z\) which go through y, \(\sigma_{xz}\) the aggregate number of the shortest paths between node \(x\) and node \(z\). This criterion is the same as the concept of node centrality in Zhong et al. (2018).

Definition 3

Proactive charging nodes: It refers to pBNs selected to be included in a charging round among the pBNs being homologous to WCV’s charging capability.

Definition 4

Charging capability of WCV: We define this criterion as the number of CR nodes which a single WCV with the limited energy can charge within a charging round. In this article, we consider the average number of CR nodes which a WCV can give service as WCV’s charging capability. The sum of energy consumed for WCV travelling including returning back to BS from the final charging node and energy consumed for charging the \(n_{average}\) of request nodes which is regarded as WCV’s charging capability, should not exceed WCV capacity. If we consider that all the sensor nodes have the same capacity \(E_{{}}^{cap}\) and the same CR threshold \(E_{{}}^{cr\_thres}\) during one charging round, the feasibility condition for WCV is formulated as.

$$(n_{average} + 1) \times d_{average} \times ECR_{MC}^{{}} + n_{average} \times (E_{{}}^{cap} - E_{{}}^{cr\_thres} + \frac{1}{{n_{average} }}\sum\limits_{i = 1}^{{n_{average} }} {ECR_{i}^{{}} } \times \frac{1}{{n_{average} }}\sum\limits_{i = 1}^{{n_{average} }} {t_{i}^{complete} } ) \le E_{MC}^{cap}$$
(6)

The average of distance between any two sensor nodes \(d_{average}\) and time from when a CR of request node \(i\) is issued to when charging service is completed for request node \(i\) \(t_{i}^{complete}\) are calculated as follows, respectively:

$$d_{average} = \frac{1}{N \times (N + 1)}\sum\limits_{i = 1}^{N + 1} {\sum\limits_{j = 1,i \ne j}^{N + 1} {d_{i,j} } }$$
(7)
$$t_{i}^{complete} = t_{i}^{arrival} - t_{i}^{request} + t_{i}^{ch\arg e}$$
(8)
$$t_{i}^{arrival} = \frac{{d_{BS,1} + \sum\nolimits_{j = 1}^{i} {d_{j,j + 1} } }}{{v_{MC}^{{}} }} + \sum\limits_{j = 1}^{i - 1} {t_{j}^{ch\arg e} }$$
(9)

In the above equations, \(N\) is the total number of nodes in the network, \(N + 1\) is the total number of nodes considering BS, and \(d_{i,j}\) is the Euclidean distance between node \(i\) and \(j\). The first and second items in Eq. (9) denote time which WCV arrives at node \(i\) and the sum of time taken to charge (\(i - 1\)) of request nodes, respectively.

In this article, we simply evaluate \(n_{average}\), WCV’s charging capability, assuming that when WCV arrives at each request node, the request nodes run out of energy. At this time, Eq. (6) is denoted as follows:

$$(n_{average} + 1) \times d_{average} \times ECR_{MC}^{{}} + n_{average} \times E_{{}}^{cap} \le E_{MC}^{cap}$$
(10)

It is noted that \(n_{average}\) obtained from Eq. (6) can be varied between \(n_{\max }\) and \(n_{\min }\) calculated by using \(D_{\min } = L\sqrt {2(n - 2)} + 4L\) (Wang et al. 2016) and \(d_{\max } = \sqrt 2 L_{{}}\) for a square area of \(L \times L\)[m2], respectively, where \(D_{\min }\) is the shortest charging path length and \(d_{\max }\) is the maximum distance between any two nodes in the network.

Definition 5

SoC: This reflects on-demand charging-enabled proactive charging of request nodes which did not generate CRs but may be the pBN. If the number of CR nodes generated does not exceed a WCV’s charging capability, the introduction of SoC can bring more increase of energy usage efficiency and improvement of network lifetime.

3.2 System Model

A WRSN consists of three main components: a WCV, a fixed base station (BS) and sensor nodes. Sensor nodes with a rechargeable battery are randomly deployed in a 2 dimensional monitoring region. The BS is fixed at the center of the square surveillance area. It can collect data sensed, communicate with the WCV directly, and replace the WCV's battery in a disregardful time. It is assumed that the BS knows exactly where sensor nodes are in the entire network. Also, we assume that no sensor nodes are isolated and that connectivity between sensor nodes always exists. We assume that sensor nodes with energy capacity of \(E_{i}^{cap}\) consume energy for data sensing, data transmission, and data reception, and have different energy consumption rates due to different traffic loads. The WCV with energy capacity of \(E_{MC}^{cap}\) moves to the location of the CR node and charges the nodes in a single-node scheme. Using the charging capability of the WCV of definition 4, the maximum allowable latency of CR nodes \(T_{upper}\) is about set to time to finish the current charging round including \(n_{\max }\) charging tasks. Using \(T_{upper}\), the CR threshold \(E_{i}^{cr\_thres}\) is expressed as follows, where energy consumption rate of node i is denoted as \(ECR_{i}\).

$$E_{i}^{cr\_thres} = ECR_{i} \times T_{upper}$$
(11)

The main operation of system is as follows: The BS maintains the CRs of sensor nodes in its service queue. When the residual energy of sensor nodes reaches a threshold expressed by Eq. (11), they send the CR messages formed as < IDi, \(E_{i}^{res}\),\(NLID_{i}\),\(NRID_{i}\),\(TS_{i}\) > to BS through a single hop or multiple hops, where IDi, \(E_{i}^{res}\), \(NLID_{i}\), \(NRID_{i}\) and \(TS_{i}\) denote a identifier, residual energy, node location importance degree, node role importance degree and time stamp of a CR node i, respectively. Various criteria except the indicated criteria may be used to characterize nodes. For example, when routing and data gathering algorithm not considering cross-coverage problem (Xuegang et al. 2017) is adopted, a criterion named node cross-coverage degree can be further introduced to characterize member nodes within cross-coverage formed by two or more clusters in distinction from other member nodes, and node forwarding degree which characterizes traffic load on each node as the number of son nodes that relay their data through it, may be specified. We also assume that sensor nodes which do not yet generate a CR periodically send the sensed data containing these multiple criteria values to BS. The BS draws up an on-demand charging schedule if the number of CR nodes reaches the WCV's charging capability \(n_{average}\). However, when the number of CRs does not exceed the WCV's charging capability, scheduling is performed in a SoC scheme. Then, the BS passes the drawn up schedule to the WCV. The WCV leaves BS and travels to CR nodes, and charges CR nodes within the charging radius. After recharging for \(n_{average}\) CR nodes, WCV returns to BS, replenishes energy, and awaits the next charging round. SoC beginning by BS is also performed when latency is one charging round interval, even though the number of CRs in a service queue did not reach \(n_{average}\). When more CRs are generated and exceed the WCV's charging capacity, the BS allows the sensor nodes, which their residual energy reaches the threshold determined by the increased maximum allowable waiting time, to generate CRs. The system uses a charging model like (Zhu et al. 2018).

4 Proposed Scheme

The main process of the proposed scheme is shown in Fig. 1. The BS assigns relative weights to multi-criteria used in charging scheduling of not only CR nodes but also sensor nodes that did not generate CRs. For this, FAHP-VWA is employed to assign weights to multi-criteria including residual energy, energy consumption rate, node location importance, and node role importance degree for a SoC scheme. Based on these weights, the BS predicts the pBNs being homologous to WCV's charging capability in advance and ranks them with Q-Learning. When the number of CRs, \(n_{on - demand}\), is more than WCV’s charging capability, charging scheduling is performed in the same scheme as in Mangun et al. (2023) proposed by the authors. If the number of CRs generated does not exceed the WCV's charging capability, that is, \(n_{average} > n_{on - demand} \ge 0\), a charging schedule is drawn up in a SoC scheme. The BS includes \(n_{on - demand}\) of CR nodes to the charging round preferentially and then selects \(n_{average} - n_{on - demand}\) of the most suitable pBNs. \(n_{average}\) of the chosen charging tasks are scheduled by using the NJNP charging scheduling algorithm (Xie et al. 2015).

Fig. 1
figure 1

The main operating flow diagram of the system

NJNP only takes into account a distance factor, thus it provides the shortest charging path length for the whole charging scheduling. Namely, after selecting \(n_{on - demand}\) request nodes and (\(n_{average} - n_{on - demand}\)) pBNs, charging prioritization is performed at once to make a charging schedule by considering only distance between WCV and request nodes and pBNs of \(n_{average}\) The completed schedule is transferred to WCV by BS. WCV receives the charging schedule from BS and arrives at the target node location in near job first scheme based on distance only, charges the sensor nodes in single-node charging scheme and returns to BS to immediately replace or recharge its battery for the next charging round.

4.1 Overview of FAHP-VWA and Q-Learning

4.1.1 Weighting Multi-Criteria with FAHP

A weight is allotted to each criterion from the paired comparison of the importance evaluation of the criteria using a triangular fuzzy number (Metaxas et al. 2016; Calabrese et al. 2016).

$$\tilde{a}_{ij} = (l_{ij} ,m_{ij} ,u_{ij} ),\,\,\frac{1}{9} \le m_{ij} \le 9$$
(12)

where \(l_{ij}\) and \(u_{ij}\) are calculated for \(m_{ij} \ge 1\) as follows:

$$l_{ij} = \left\{ \begin{gathered} m_{ij} - \frac{d}{2},\,\,\,\,\,\,\,m_{ij} - \frac{d}{2} \ge 1\, \hfill \\ \,\,\,\,\,\,\,1,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,m_{ij} - \frac{d}{2} < 1 \hfill \\ \end{gathered} \right.,\,\,0 \le d \le 8$$
(13)
$$u_{ij} = \left\{ \begin{gathered} m_{ij} + \frac{d}{2},\,\,\,\,\,\,\,m_{ij} + \frac{d}{2} \le 9\, \hfill \\ \,\,\,\,\,\,\,9,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,m_{ij} + \frac{d}{2} > 9 \hfill \\ \end{gathered} \right.\,,\,\,0 \le d \le 8$$
(14)
$$\tilde{a}_{ji} = \frac{1}{{\tilde{a}_{ij} }} = (\frac{1}{{u_{ij} }},\,\frac{1}{{m_{ij} }},\,\frac{1}{{l_{ij} }})$$
(15)
$$\tilde{a}_{ij} = (1,\,1,\,1),\,\,for\,\,i = j$$
(16)

\(m_{ij}\) is a pliant value which measures how significant criterion i is related to criterion j. The triangular fuzzy number associated to \(m_{ij}\) has a pliant width d, a dispersal value which measures the lack of confidence in the value allotted to \(m_{ij}\). The lower the value of a pliant width d, the higher the level of certitude in the value allotted to \(m_{ij}\). To smooth the consistency conservation, we will use the dispersal value as only one value of d for each comparison matrix. To guarantee the certitude of the allotted values, the below conditions must be obeyed:

$$m_{ij} = m_{i(j - 1)} \times m_{(j - 1)j} ,\,\,i \ge 1;\,\,j \ge i + 2$$
(17)

If Eq. (17) is used to allot the fuzzy value, it does not perform consistency verification. In this way, a comparison matrix is obtained by the paired comparison of the criteria. From comparison of fuzzy values, we obtain fuzzy weights for each criterion. The value of crisp weight for each criterion is computed from the paired comparison matrix and normalized. The crisp weight values are represented by the column vector as below:

$$w = (w_{1} ,w_{2} , \cdots ,w_{M} )^{T}$$
(18)

The decision matrix X normalized in the same way as in Chang et al. (2017) is composed as below and will be used to input to VWA and Q-Learning.

$$X = \left[ \begin{gathered} x_{11} \,\,\,\,\,x_{12} \,\,\,\,\,\,\, \cdots \,\,\,\,\,\,\,\,x_{1M} \hfill \\ x_{21} \,\,\,\,\,x_{22} \,\,\,\,\,\, \cdots \,\,\,\,\,\,\,\,x_{2M} \hfill \\ \, \vdots \,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\, \ddots \,\,\,\,\,\,\,\,\, \vdots \hfill \\ x_{N1} \,\,\,\,x_{N2} \,\,\,\,\, \cdots \,\,\,\,x_{NM} \hfill \\ \end{gathered} \right]$$
(19)

4.1.2 Weight Compensation by VWA

The weights of criteria represented as Eq. (18) by FAHP are compensated as follows (Li and Li 2004; Zeng et al. 2016):

$$w^{\prime}_{j} = \frac{{s(x{}_{j})w_{j} }}{{\sum\nolimits_{j = 1}^{M} {s(x{}_{j})w_{j} } }}$$
(20)

where \(s(x{}_{j})\) is exponent type state variable weight vector with penalty for the criterion \(j\) and \(w_{j}\) is the weight calculated by Eq. (18). This weight vector is represented as follows:

$$s(x_{j} ) = e^{{\alpha \frac{{\sigma_{j} }}{{\left| {\overline{x}_{j} } \right|}}}} ,\,\,\alpha \ge 0$$
(21)

where \(\alpha\) is the variable level of weights and \(\sigma\) is the standard deviation, i.e., variance. If \(\alpha = 0\), weight compensation is not done. That is, \(w^{\prime}_{j} = w_{j}\).

4.1.3 Overview of Q-Learning

Q-learning (Barto and Sutton 1999) is a model-free reinforcement learning (RL) technique that find the optimal policy, which is a broadly applied one. Moreover, this RL requires the lightest computational resources. Thus, it is appropriate that this RL method is applied to energy-limited networks such as WRSNs. More ceremoniously, it learns to calculate Q values, the quality of any state-action combination. We define Q: S × A → R as Q function, where S, A and R denote the sets of all possible states, all possible actions and all possible rewards, respectively. Before learning, the Q function returns undetermined fixed values represented by policy π, which are defined by the designer. In the learning process, the agent chooses an action at in a given state st at each time t. Then, it makes observation of the new state st+1 and a reward rt+1 obtained by this new state, and based on these observations, it renews the Q value. Finally, after several iterations, the agent will find an optimal policy π ∗ . This policy offers to the agent the knowledge to select the optimal action in a given state to achieve its goal. The renewing rule for the Q-learning is denoted as follows:

$$\begin{gathered} Q(s_{t} ,a_{t} ) \leftarrow (1 - \alpha )Q(s_{t} ,a_{t} ) + \alpha [r_{t} + \gamma \mathop {\max }\limits_{a} Q(s_{t} ,a)] \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, = Q(s_{t} ,a_{t} ) + \alpha [r_{t} + \gamma \mathop {\max }\limits_{a} Q(s_{t + 1} ,a) - Q(s_{t} ,a_{t} )] \hfill \\ \end{gathered}$$
(22)

where α is the learning rate and γ represents the discount factor of future rewards, with all α, γ ∈ [0, 1]. where Q(st, at) is the Q-value when the action at is chosen at a given state st. rt is the reward got if accomplishing action in the state st. Furthermore, \(\mathop {\max }\limits_{a} Q(s_{t} ,a)\) is the maximal Q-value in the next state st+1 for all of possible actions a.

4.2 SoC Scheduling Using FAHP-VWA and Q-Learning

4.2.1 Weight Determination of Four Multi-Criteria

Residual energy (RE), energy consumption rate (ECR), node location importance degree (NLID), and node role importance degree (NRID) are used as four multi-criteria in the current charging round. These multi-criteria are weighted using FAHP-VWA. First, the triangular fuzzy number as shown in Table 3 is assigned to each criterion by FAHP as relative weights. For all of paired comparisons, it is chosen \(d = 2\).

Table 3 Pairwise comparison matrix between evaluation criteria

Weights of each evaluation criteria are calculated as shown in Table 4.

Table 4 Weight of each criterion

VWA compensates the weights of criteria by FAHP. The used value of \(\alpha\) is 0.2. The compensated weights are shown in Table 5.

Table 5 Compensated weights for evaluation criteria

4.2.2 Selection of Proactive Charging Nodes among the pBNs by Q-Learning

When performing the SoC scheduling, the BS includes \(n_{on - demand}\) of CR nodes to the charging schedule to be generated preferentially and then selects \(n_{average} - n_{on - demand}\) of the most suitable pBNs with Q-learning.

In our WRSN using Q-learning, the network and the WCV are regarded as the environment and the agent, respectively. A state correspond to the current charging location of the WCV and an action is defined by a movement to the next charging location. The WCV maintains its Q-table that is constructed as a 2D array. Each row and each column represent a state and an action, respectively. An item Q(j, i) in the jth row and ith column denotes the Q-function value being homologues to an action when the WCV travels from the current charging location j to the next location i. The Q-value decision matrix is constructed as follows:

$$Q = \left[ \begin{gathered} q_{11} \,\,\,\,\,q_{12} \,\,\,\,\,\,\, \cdots \,\,\,\,\,\,\,\,q_{1m} \hfill \\ q_{21} \,\,\,\,\,q_{22} \,\,\,\,\,\, \cdots \,\,\,\,\,\,\,\,q_{2m} \hfill \\ \, \vdots \,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\, \ddots \,\,\,\,\,\,\,\,\, \vdots \hfill \\ q_{m1} \,\,\,\,q_{m2} \,\,\,\,\,\,\, \cdots \,\,\,\,\,\,\,\,q_{mm} \hfill \\ \end{gathered} \right]$$
(23)

where \(q_{ij}\) i.e., \(Q(i,j)\) is the Q-value when the WCV moves from pBN i \((i = \overline{1,m} )\) to pBN \(j\)\((j = \overline{1,m} )\). This is updated as follows:

$$\begin{gathered} q_{ij} = Q(i,j) \leftarrow (1 - \alpha )Q(i,j) + \alpha [r(j) + \gamma \mathop {\max }\limits_{1 \le k \le m} Q(j,k)] \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, = Q(i,j) + \alpha [r(j) + \gamma \mathop {\max }\limits_{1 \le k \le m} Q(j,k) - Q(i,j)] \hfill \\ \end{gathered}$$
(24)

where \(i\) and \(j\) denote the current pBN and the next one, respectively. \(Q(i,j)\) and \(r(j)\) denote the Q-value and the reward value when the WCV travels from the current pBN to the next one, respectively, \(\alpha\) and \(\gamma\) are the learning rate and the future reward discount factor set to the values between 0 and 1, respectively. From Eq. (24), we can see that the current Q-value is updated to the Q-value corresponding to the temporal difference, which means the interval between the estimated target Q-value (\(r(j) + \gamma \mathop {\max }\limits_{1 \le k \le m} Q(j,k)\)) and the current Q-value (\(Q(i,j)\)). In the target Q-value, \(\mathop {\max }\limits_{1 \le k \le m} Q(j,k)\) represents the largest value of Q-values when all possible pBNs \(k\) become the next pBN. In a word, the new Q-value is computed from the current Q-value, the reward value, and the estimated maximum Q-value.

The reward value for updating the Q-value is calculated as follows:

To do this, normalization of the data dimensions of \(m\) pBNs characterized by four multi-criteria is done in the same way as (Chang et al. 2017) and decision matrix \(X = [x_{ij} \left| {i = \overline{1,m} ;\,\,\,\,j = \overline{1,4} } \right.]\) is obtained. Using the criteria weights in Table 5 obtained from FAHP-VWA and Eq. (23), the weighted decision matrix \(Y\) is obtained as follows:

$$Y = \left[ \begin{gathered} y_{11} \,\,\,\,\,y_{12} \,\,\,\,\,\,y_{13} \,\,\,\,\,\,\,y_{14} \, \hfill \\ y_{21} \,\,\,\,\,y_{22} \,\,\,\,\,\,y_{23} \,\,\,\,\,\,y_{24} \, \hfill \\ \,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\, \hfill \\ y_{m1} \,\,\,\,y_{m2} \,\,\,\,\,\,y_{m3} \,\,\,\,y_{m4} \, \hfill \\ \end{gathered} \right]$$
(25)

where \(y_{ij} = w_{j} \times x_{ij} ,\,\,i = \overline{1,m} ;\,\,\,\,j = \overline{1,4}\).

Using the weighted decision matrix \(Y\), the reward value \(r_{1} (j)\)\((j = \overline{1,m} )\) when moving from the current pBN to the next one is calculated as follows:

$$r_{1} (j) = \frac{{1 - y_{j1} }}{{\sum\limits_{i = 1}^{m} {y_{i1} } }} + \frac{{y_{j2}^{{}} }}{{\sum\limits_{i = 1}^{m} {y_{i2} } }} + \frac{{y_{j3} }}{{\sum\limits_{i = 1}^{m} {y_{i3} } }} + \frac{{y_{i4} }}{{\sum\limits_{i = 1}^{m} {y_{i4} } }},\,j = \overline{1,m}$$
(26)

The obtained reward value then is renormalized as a value between interval [0,1].

$$r(j) = \frac{{r_{1} (j)}}{{\mathop {\max }\limits_{j \in BN} r_{1} (j)}},\,j = \overline{1,m}$$
(27)

where \(y_{ji} (i = \overline{1,4} )\) are the weighted normalized decision values of four multi-criteria such as RE, ECR, NLID and NEW for criterion j, respectively. \((1 - y_{ji} )(i = \overline{1,m} )\) is the weighted normalized decision value for criterion such as RE, which should give higher priority when it has smaller value. From Eq. (26), we can see that in case of four multi-criteria such as RE, ECR, NLID and NRID, the lower residual energy, the higher energy consumption rate, the higher node location importance degree and the higher node role importance degree, then the higher reward value a CR node would have.

Using this reward value, in the charging prioritization, if the pBN of each row in Q-value decision matrix of Eq. (25) becomes the current one, the BS selects the pBN for the column with the largest Q-value in that row as the next one. This selection process is repeated till the number of the pBNs becomes \(n_{average} - n_{on - demand}\). Table 6 shows an example of the Q-value decision matrix values for 6 proactive charging nodes selected from \(n_{average}\) of the predicted pBNs when the charging capability of WCV \(n_{average}\) is 10 and the number of CR nodes is 4. Assume that request node 4 (RN4) is the last CR node. BS first includes pBN5 with the highest Q-value among 6 pBNs in RN4 row of Table 6 in the charging round and makes Q-value at the crossing point of pBN5 row and pBN5 column ‘0’. Continuously, moving to pBN5 row, repeat the above action. Such action is repeated until \(n_{average} - n_{on - demand}\) pBNs are included in the charging round. Out of this, it can be seen that selecting order of 6 pBNs is pBN5 → pBN6 → pBN2 → pBN1 → pBN3 → pBN4. After all, a charging schedule including 4 CR nodes preferentially is obtained after electing above 6 selected pBNs with Q-learning.

Table 6 Q-values of 6 pBNs

The BS estimates the energy consumption rate in the same way as (Zhu et al. 2018) and updates Q-value decision matrix using the received or measured information of multi-criteria for the pBNs including it.

The pseudo code of the SoC scheme using FAHP-VWA and Q-Learning considered above is shown in Algorithm 1.

Algorithm 1
figure a

SoC scheduling using FAHP-VWA and Q-Learning

5 Performance Evaluation

In this section, we present results of an extensive simulation of the proposed scheme and their analysis. To this end, we compare to the performance of the following four schemes, including BP&R proposed in Cheng and Yu (2020), and perform a comparative analysis.

  • BP&R: a scheme that uses a fixed deadline threshold-based pBNs prediction method and selects the proactive charging nodes among the predicted pBNs randomly

  • FL-based: a scheme that uses the same pBNs prediction method as the BP&R and selects the proactive charging nodes among the predicted pBNs with fuzzy logic (Tomar et al. 2019)

  • AHP&TOPSIS: a scheme that assigns weights to multi-criteria with AHP, predicts the pBNs among non-CR nodes using these weights and selects the proactive charging nodes among the predicted pBNs with TOPSIS (Tomar and Jana 2021)

  • FAHP-VWA-TOPSIS: a scheme that primarily assigns weights to multi-criteria with FAHP, then compensates the weights by VWA, predicts the pBNs among non-CR nodes using the these weights assigned by FAHP-VWA, and selects the proactive charging nodes among the predicted pBNs with TOPSIS (Mangun et al. 2023)

To avoid an unfair comparison, the compared schemes are partially corrected. That is, the same four criteria as the proposed scheme are used in prediction and selection of the pBNs for the AHP&TOPSIS and FAHP-VWA-TOPSIS, and in the pBNs selection for the FL-based scheme.

Performance metrics include energy usage efficiency, density of high efficiency nodes, proactive charging rate of backbone nodes, received packet rate, and network lifetime. Since the performance of SoC schemes is strongly related to CR frequency, we only evaluate the impact of the CR frequency \(f\) on the above performance metrics although several parameters such as the number of sensor nodes, simulation time, and moving speed of WCV affect charging and network performance.

5.1 Simulation Environment

Simulation is conducted in MATLAB version R2016a on a HP 6360t with 4 GB RAM and Intel Core i5 processor. Simulation environment is a 1000 m × 1000 m area with 2000 nodes uniformly and randomly arranged. BS settles in the bottom left corner at coordinates (0,0) and other parameters are set as described in Table 7 with reference to Cheng and Yu (2020). Batterycapacity of each sensor node and WCV is 100 J and 200000 J, respectively. We set the maximum charging capability of WCV, i.e., \(n_{\max }\) to 30. Then, the upper bound of the CR issuing frequency that NJNP can safely control becomes 0.012 for 1000 m × 1000 m area. Namely, WCV can charge the nodes in a SoC scheme if and only if the CR frequency is less than 0.012. In experiments, CR frequency is represented as the number of CR issued per 1000 s to avoid confusion and misinterpretation. At this time, it is noted that the upper bound of the CR issuing frequency becomes 12. WCV moves with a fixed speed of 5 m/s at a moving energy consumption rate of 10 J/m. Each node is allowed to have a standard energy consumption rate of 10−3 J/s for normal data sensing, processing, and transmitting and receiving, and to vary randomly between 10−4 J/s and 10−2 J/s considering the occurrence of an incident. A polygon prescribed by the voronoi diagram denotes the grid that is the actual are monitored by each sensor node. When representing the whole monitoring area as D, Fig. 2 shows the important grids within D figured as green points. In Fig. 2, green points called G. indicate important locations, i.e. locations like roads or battle fields. In our simulation, assuming that the sensing task needs to be detected at t1 = 2.5 s intervals for grids within G, and t2 = 5 s intervals for other grids, we compute the importance degree of each grid as follows:

$$a_{ij}^{{}} = \left\{ \begin{gathered} 1/t_{1} ,\,\,\,\,\,\,\,\,g_{ij} \in G \hfill \\ 1/t_{2} ,\,\,\,\,\,\,\,\,g_{ij} \in D - G \hfill \\ \end{gathered} \right.$$
(28)
Table 7 Simulation parameters
Fig. 2
figure 2

Simulation network with nodes’ voronoi diagram and important locations figured as green points

From this, \(\phi_{ij} (t)\) and \(w_{i} (t)\) can be calculated. The monitoring objects appear in G two times more frequently than in other locations.

For the heterogeneous traffic load of nodes in sensing data collection or target tracking, we used the equal hierarchical cluster-based method in Man Gun Ri Aug. (2022). In selection of the proactive charging nodes among the pBNs by Q-Learning, \(\alpha\) and \(\gamma\), the learning rate and the discount factor of future reward are set to 0.5, respectively. For the fairness of the comparison, we plotted simulation results with the average of 20 random cases.

5.2 Simulation Results and Analysis

5.2.1 Energy Usage Efficiency

It is defined as the ratio between energy obtained by sensor nodes and the total energy transferred from BS to WCV (Zhu et al. 2018). Figure 3 shows simulation results. From simulation results, it can be seen that the proposed scheme and FAHP-VWA-TOPSIS have higher energy usage efficiency than three compared schemes. The BP&R scheme shows the most woeful energy usage efficiency among five schemes and the next in order with respect to the woeful energy usage efficiency is the FL-based. It is because in the BP&R and FL-based schemes, a pBN is estimated with more energy left, since the deadline is always calculated using the largest fixed threshold unrelated to increase of CR frequency. Consequently, under the same condition of energy consumed by the travel of WCV, the charging energy transferred to the proactive charging nodes is actually less than the proposed scheme or FAHP-VWA-TOPSIS and AHP&TOPSIS based on the prediction of the pBNs within a charging round, thereby resulting in a higher amount of energy left in WCV, which leads to a lower energy usage efficiency. The AHP&TOPSIS scheme not only does not use fuzzy number, but also does not compensate the assigned weights. However, the proposed scheme and FAHP-VWA-TOPSIS predict the pBNs corresponding to WCV’s charging capability more exactly based on multi-criteria weights which assigned with FAHP and compensated by VWA, so it always has higher energy usage efficiency than the AHP&TOPSIS scheme.

Fig. 3
figure 3

Energy usage efficiency in terms of f

Meanwhile, the pBNs selection algorithms also influence energy usage efficiency performance. Among five compared schemes, the BP&R scheme only uses the random pBNs selection algorithm, so it influences this metric most woefully. For the proposed scheme, its performance depends on the proactive charging node selection method. That is, it depends on the optimality of the proactive charging node selection by Q-Learning. Since the proposed scheme takes into account four multi-criteria jointly to select \(n_{proactive} = n_{average} - n_{on - demand}\) pBNs as proactive charging nodes, this metric shows a progressively increasing trend with increasing f. The energy usage efficiency performance of FAHP-VWA-TOPSIS scheme is almost equal to the proposed scheme or a little better than the proposed scheme. When the CR frequency increases, more CRs are issued to BS. In the end, the average traveling distance can be further reduced through scheduling method that only consider the distance factor, thus reducing the average of energy consumption for WCV movement between nodes, which leads to better energy usage efficiency. On the other hand, if the CR frequency increases, the number of proactive charging nodes is reduced, so the improvement in charging usage efficiency also is reduced.

5.2.1.1 Density of High Efficient Nodes

Density of high efficient nodes (Zhang et al. 2015) is defined as the density of nodes with energy higher than the certain threshold. This performance of the proposed scheme and FAHP-VWA-TOPSIS are the highest and the BP&R scheme shows the lowest density of high efficient nodes. The next orders of the proposed scheme are the AHP&TOPSIS and FL-based schemes, respectively. Like the analysis of the former metric, the simulation results in terms of this metric also show that the pBNs prediction and selection algorithms influence the performance greatly.

In contrast to the simulation results for the case that energy consumption rate of nodes are constant, this metric shows a higher performance of 5% on average in the overall frequency varying range used in the simulation than BP&R scheme for the proposed scheme and FAHP-VWA-TOPSIS as shown in Fig. 4. In other words, the proposed scheme includes more 100 nodes with a higher energy level than the specified threshold. The FAHP-VWA-TOPSIS scheme also shows almost similar performance with the proposed scheme. Thus, there are nodes of higher energy level in the proposed scheme and FAHP-VWA-TOPSIS, indicating that the scheduling algorithms for SoC can well maintain the energy charging on nodes so that WCV can operate in equilibrium for longer periods of time. For the proposed scheme and FAHP-VWA-TOPSIS, this metric shows a relatively similar opposite behavior as in the former energy usage efficiency simulation within the overall frequency ranging where the proactive charging set up in the simulation is possible. Namely, if the CR frequency which reflects increase of the number of on-demand charging nodes increases, the number of proactive charging nodes which always have energy higher than the certain threshold is reduced, so the density of high efficient nodes also is reduced.

Fig. 4
figure 4

Density of high efficient nodes in terms of f

5.2.1.2 Proactive Charging Rate of Backbone Nodes

This metric, which is represented by the number of successfully proactive-charged backbone nodes over the total number of backbone nodes predicted as potential bottlenecks, can be considered as one of the important metrics to evaluate the proactive charging characteristics of the SoC scheme. From these results in Fig. 5, it can be seen that the BP&R and FL-based schemes provide the proactive charging rate of 100% up to 14 and 15 of CR issuing frequency beyond 12 of the upper bound frequency respectively, but the proactive charging rate of 100% up to 16 in the proposed, FAHP-VWA-TOPSIS and AHP&TOPSIS schemes.

Fig. 5
figure 5

Proactive charging rate of backbone nodes in terms of f

This is because the proposed, FAHP-VWA-TOPSIS and AHP&TOPSIS schemes have the potential to preferentially charge the backbone nodes, taking into account node location importance degree (NLID) and node role importance degree (NRID), even when the number of proactive charging nodes decreases with increasing frequency f. NLID is evaluated by prioritizing the pBNs located in the monitoring area with higher frequency of occurrence of the target and NRID prioritizing the pBNs with heavier traffic load such as cluster head nodes. As a result, proactive charging rate of backbone nodes significantly is improved in comparison with the BP&R scheme. The proposed scheme and FAHP-VWA-TOPSIS show almost similar performance in terms of this metric. The FL-based scheme also uses the NLID and NRID criteria to select the pBNs, and thus it has higher proactive charging rate of backbone nodes than the BP&R scheme selecting the pBNs randomly.

5.2.1.3 Received Packet Rate

This metric is defined as the ratio of the number of packets received by BS to the total number of packets generated by sensor nodes in the network. From the simulation results in Fig. 6, it can be seen that the received packet rates of the proposed scheme and FAHP-VWA-TOPSIS are higher than the other compared schemes. In the proposed scheme and FAHP-VWA-TOPSIS, in addition to introducing the fuzzy number in assigning the weights to multi-criteria, weights compensation are performed, so that more than 80% of the received packet rate may be achieved for 18 of the CR issuing frequency operating in the pure on-demand charging. It is natural that since the BP&R scheme uses the fixed deadline threshold-based deadline estimation method and random pBNs selection algorithm, it has the lowest received packet rate among five schemes.

Fig. 6
figure 6

Received packet rate in terms of f

When the CR issuing frequency becomes 18, the AHP&TOPSIS and FL-based schemes provide 73% and 70% of the received packet rate, respectively. Although these two schemes use the same four criteria as the proposed scheme and FAHP-VWA-TOPSIS in selecting the pBNs, they adopt AHP and a fixed deadline threshold-based pBNs prediction respectively, the low-grade methods than the proposed and FAHP-VWA-TOPSIS schemes in predicting the pBNs. Thus the received packet rate is low compared to the proposed scheme and FAHP-VWA-TOPSIS.

5.2.1.4 Network Lifetime

Simulation results of the network lifetime denoted as the time till the first sensor node dies, are shown in Fig. 7. These results indicate that among five compared schemes, the proposed scheme and FAHP-VWA-TOPSIS improve the network lifetime of WRSN greatly, since both of them predict exactly the pBNs with FAHP-VWA and use Q-Learning and TOPSIS respectively for selecting the optimal proactive charging nodes. It is because the proposed scheme and FAHP-VWA-TOPSIS remarkably improve all charging performance such as energy usage efficiency, density of high efficiency nodes, and proactive charging rate of backbone nodes as shown in the former simulation results. The FAHP-VWA-TOPSIS scheme becomes a little higher than the proposed scheme beginning from when the CR issuing frequency becomes 17. This means that when operating in the pure on-demand scheme, TOPSIS is superior to Q-Learning in selecting the CR nodes. The next in order with respect to network lifetime is the AHP&TOPSIS scheme. This scheme also predicts the pBNs with the AHP and selects the proactive charging nodes among the predicted pBNs with TOPSIS, thus achieving more high network lifetime than the FL-based and BP&R schemes. In BP&R and FL-based schemes, a dead node begins to occur since the CR issuing frequency is 15, while 16 in the proposed and AHP&TOPSIS schemes. Unlike the BP&R and FL-based schemes only using one deadline criterion with a fixed deadline threshold, the proposed and AHP&TOPSIS schemes use NLID and NRID criteria as well as deadline criterion, thereby calculating the dynamic change of energy consumption rate by occurrence of an accidental event and the heterogeneous traffic load of the pBNs without the deadline estimation mechanism, which leads to extension of the network lifetime.

Fig. 7
figure 7

Network lifetime in terms of f

Also, unlike the BP&R scheme randomly selecting the proactive charging nodes among the predicted pBNs, the proposed, FAHP-VWA-TOPSIS, AHP&TOPSIS and FL-based schemes use four multi-criteria including NLID and NRID to select the suitable pBNs including cluster head nodes as the proactive charging nodes with Q-Learning, TOPSIS and fuzzy logic, respectively. Especially, the proposed scheme exactly assigns four criteria’s weights with FAHP-VWA and based on it, computes Q-values by considering the correct state of each pBN with Q-Learning, thereby selecting the most suitable pBNs as the proactive charging nodes, which further extends the network lifetime.

Till now, we mainly considered a scenario that the number of CRs does not exceed WCV’s charging capability. In such a practical scenario that the number of CRs exceeds WCV’s charging capability, it is natural that a single WCV-based method is not scalable. At this time, the entire network is divided by using k–means or fuzzy c-means (FCM) to balance CR workload and one WCV may be assigned to each sub-area. Then, a charging schedule is made by using on-demand charging scheme or SoC scheme proposed in this article according to CR workload in each sub-area and WCV’s charging capability.

6 Conclusion

The key to SoC scheduling is the accurate prediction of the pBNs according to the CR issuing frequency in the network and the optimal selection of proactive charging nodes to be included in a charging round among the predicted pBNs. A SoC scheme using FAHP-VWA and Q-Learning proposed in this article, allows BS to accurately predict the pBNs with FAHP-VWA according to CR issuing frequency, and the optimal proactive charging node selection method based on Q-Learning makes optimal selection of proactive charging nodes possible by jointly using multi-criteria such as residual energy, energy consumption rate, node location importance degree, and node role importance degree. Simulative experiment results show that the proposed scheme greatly improves the network performance compared to the previous method in terms of energy usage efficiency, density of high efficient nodes, proactive charging rate of backbone nodes, received packet rate, and network lifetime. The proposed scheme assigns weight to multi-criteria by FAHP weight-compensated. However, since FAHP uses fuzzy paired ratio scale, it may still evaluate weights of criteria exaggeratively. In addition, a comprehensive comparison on which MCDM is the best method for selecting the proactive charging nodes among the predicted pBNs, has not been yet investigated for several MCDMs such as TOPSIS, VIKOR, ELECTRE, and PROMETHEE including Q-Learning. Moreover, we only focus on a SoC scheduling in this article.

In the future, we firstly will extend the proposed design idea into entire process of on-demand charging scheduling using an integrated FAHP-VWA&Q-Learning and then combine a MCDM that adopts fuzzy paired interval scale with the best method among the above-mentioned several MCDMs.