1 Introduction

Due to the development of the Internet of Things (IoT), we are now facing a wide range of applications in commercial [1], industrial [2, 3], military [4], monitoring [5], smart homes [6], smart cities [7,8,9], underwater [10] and medical [11] fields [12]. The Internet of Medical Things (IoMT) is one of the important applications that use a combination of biosensors and IoT devices to track human physiological activity and reactions. Due to the importance of medical applications, recently, a great deal of research has been conducted in the field of IoMT. However, most of the existing methods have only focused on one aspect of the requirements of IoMT. Since IoMT applications generate large amounts of data from various sensors, they require QoS-aware computations for real-time processing. To address these needs, a multi-layer architecture has recently been proposed [13, 14]. While this architecture uses three layers, it fails to consider the challenges faced in each layer and only develops its method in the first layer. To bridge this gap, we have developed a three-layer architecture that coordinates and integrates all layers for medical use. Therefore, the main motive of this research is to propose a three-tier architecture for data collection, routing, processing, and storage, which this architecture improves network effectiveness.

In the first layer, the biosensors can be lightweight, small-size, and low-power and can be placed inside/outside the body. They sense various physiological data of the body and the patient’s vital signs, such as temperature, heart rate, blood pressure, electrocardiogram (ECG) and electroencephalogram (EEG), and then send data to the coordinator node [15]. Then, the coordinator nodes send data to medical centers through intermediary nodes using wireless communication. The coordinator nodes are in charge of preprocessing and data aggregation in addition to routing. The connections between the coordinator nodes and the sink are defined in the second tier. After initial processing on an edge device, data are transmitted to a remote server or cloud environment for storage or processing. In the third tier, a general practitioner or staff member of a hospital or medical center evaluates the detected parameters remotely and makes the best decision for the patient. Furthermore, gathered data on the cloud can be analyzed by using machine learning techniques to predict, categorize, or identify diseases.

According to the proposed architecture, it is essential to develop a centralized scheduling method for the first layer and an appropriate routing protocol for the second layer in order to address issues such as high node density, varying data rates, network mobility, resource limitations, and reasonable quality of service [16, 17]. Since Wireless Sensor Networks (WSN) routing protocols or routing methods in IoT cannot be used in IoMT networks [18]. Therefore, in order to take into account the characteristics of IoMT networks, we attempt to present a centralized scheduling approach and a machine learning-based routing protocol in this paper. Although different routing protocols have been proposed, they couldn’t address all the mentioned challenges. For this reason, in this paper, we attempt to propose a routing schema to address most of these challenges. As a matter of fact, the aim of EQRSRL is to provision all IoMT requirements simultaneously. To do this, we focus on the following main points of interest in designing EQRSRL:

  • QoS There are different types of IoMT data because they deal with human vital indicators. Decisions concerning the patient’s life-threatening factors should be made as soon as possible. Furthermore, patient data with severe symptoms that require prompt treatment should be provided as soon as possible with high reliability. Quality of service (QoS) is a primary concern at IoMT, with the aim of maintaining a secure connection, high availability, minimal transmission delays and packet loss.

  • Energy awareness It is difficult to charge or replace sensors on IoT devices. Therefore, power consumption is an important concern in IoMT and it requires energy-efficient routing methods.

  • Mobility The network architecture defines how various sensor nodes interact with one another. IoMT requires the use of a routing algorithm that takes network mobility into account.

  • Data rates It can be measured as the speed at which data is transferred between source and destination. In the medical field, data transfer rates and the importance of data in different sensors vary. It’s crucial to develop routing protocols that can deal with the variety of data rates in IoMT.

Since human life is at the sake, providing the mentioned parameters in an IoMT environment is an essential issue. Hence, EQRSRL considers these parameters to increase QoS in IoMT as well as energy efficiency.

The organization of this paper is as follows. In the section 2, we review some related work in IoMT. The Sect. 3 describes the architecture and model of EQRSRL. We experimentally evaluate EQRSRL in Sect. 4 and the results of the simulation are explained. Finally, Sect. 5 concludes the paper.

2 Related work

In this section, we review and describe some earlier research work in the scope of IoMT. We categorize routing protocols into four groups: QoS-aware, Temperature-aware, Cluster-based, Cost or Energy-aware [15]. Each group covers a different aspect of IoMT’s limitations. In the following, we will examine each of them separately.

2.1 QoS-aware routing protocols

This group of protocols considers different metrics to provide QoS. In fact, they provide higher reliability, lower end-to-end delay, and higher packet delivery ratio [19,20,21]. In [22], the authors proposed a local multipurpose routing algorithm. In their method, data traffic has been categorized into four classes: Normal, reliability-sensitive, delay-sensitive, and critical traffic with respect to the desired QoS. To do this, they have considered four different modules, each of which is responsible to route a specific class. In IoMT networks, mobility is an unavoidable presumption; nonetheless, the authors paid no attention to the nodes’ mobility. DMQoS [23] is a geographical routing in which data packets are classified into four categories (normal, critical, latency, and reliability packets). Moreover, it consists of five modules to route each of the mentioned categories. However, using the location data of sensor nodes in Wireless Body Area Network (WBAN) is a challenging issue and is not practical in most circumstances. Additionally, DMQoS increases traffic overload, which causes a collision and decreases reliability. In [24], the authors proposed a QoS-aware routing protocol for delay-sensitive data named QPRD, in which data are split into normal and delay-sensitive groups. The best path is selected for both groups of data regarding the QoS requirements. However, it does not provide any mechanism for ensuring transmission reliability. ZEQoS [20], is another QoS-aware routing protocol to consider energy requirements, end-to-end delay, and reliability in order to improve QoS. It uses more power than the QPRD protocol and does not have a reasonable performance in dynamic networks. ZEQoS considers QoS requirements, but it does not attention to nodes’ movement and changing network topology. A reinforcement Q-Learning routing for IoMT is presented to identify optimum routing rules based on QoS [25]. While this method uses reward and penalty in dynamic environments and adapts to the condition of the network, it is not scalable and does not take energy into account. Ahmed et al. proposed an improved quality-aware routing protocol (IM-QRP) for remote health monitoring of the elderly or chronically ill in hospitals and residential environments [26]. The proposed protocol shows a significant improvement in energy consumption and quality of service criteria compared to similar routing protocols. Although IM-QRP follows a three-layer architecture, routing is performed only in the first layer. Memon et al. present a QoS-aware routing protocol that considers temperature and QoS requirements simultaneously (TLD-RP) [27]. QoS metrics such as link stability, reliability, and delay are taken into account in TLD-RP. The proposed TLD-RP strategy improves WBSN performance along with throughput, packet delivery, network overhead, and link stability, and the simulation results support its effectiveness and efficacy.

Although QoS-based protocols pay attention to the criteria of delay and reliability, in most of these methods, routing is only done in the first layer (from biosensors to coordinator nodes). To bridge this gap, we considered QoS metrics in the second layer, where the most delay and packet loss occurs.

2.2 Thermal-aware routing protocols

These protocols use the temperature of sensor nodes as route selection criteria [9]. Thermal-Aware Routing Algorithm (TARA) [28] is one of the first of these protocols in which data routes from the source to the sink by avoiding hotspot nodes. In TARA, some important factors such as reliability are not guaranteed. Bag et al. proposed LTR to select the nodes with the lowest temperature as neighbours. LTR uses a fixed number of hops count to send packets to the sink node [29]. In another work, they proposed ALTR [30] as the extension of LTR. The difference between ALTR and LTR is that when the hops count exceeds the threshold, ALTR sends the packet to the sink through a path with the lower number of hops. LTRT [31] is a combination of LTR and shortest path routing that selects a node with the lowest temperature as the next-hop. LTRT requires precise information about the node temperature in the network. This policy is not efficient, because it consumes a great deal of energy. In [32], the authors proposed a routing mechanism in which the nodes adjust their energy level based on the distance to the neighbouring nodes. In their method, the energy value of each node is calculated from the value of the Received Signal Strength Index (RSSI) of its neighbors. By choosing a path that is close to the maximum RSSI value, less energy is required, leading to energy efficiency and minimal heat production. EOCC-TARA [33] uses advanced multi-objective spider monkey optimization in order to provide a temperature-aware routing algorithm in WBAN. EOCC-TARA simultaneously considers several metrics, including energy, link reliability, path loss, and queue length. However, EOCC-TARA needs to gather information from the whole of the network in SDN to find the best routes, therefore it has a high routing overhead.

Most temperature-aware routing protocols apply single or combined routing criteria such as temperature, hop count, or energy. Some studies use the sleep-wake mechanism to prevent skin damage. To do this, when the temperature of a sensor exceeds a threshold, it goes to sleep mode. Therefore, vital information is ignored until the sensor cools [34, 35]. Therefore, some of these methods do not consider QoS and are not suitable for real-time applications.

2.3 Cluster-based routing protocols

These types of protocols divide the entire network into several clusters. Clusters are composed of a collection of members and a cluster head (CH). Many techniques have been proposed to select the cluster head. CH’s task is to gather, aggregate, and transmit data from member nodes to the sink. The main goal of these protocols is to decrease direct communications between the sources and sink node [36]. In [37], the authors use a fuzzy logic system to select the CHs. Their proposed protocol is energy-efficient and uses direct transmission between the source and the base station node, depending on the location of the sensor node. The authors, in [38], introduced a self-organizing routing protocol that divides the network into clusters in which the CHs are randomly chosen, which leads to consuming more energy. Multiple WBANs are connected by the solution that the authors in [39] presented. Their method uses high-reliability CHs to transfer data packets from source to sink while minimizing collision and improving QoS. Their protocol increases network stability and extends the life of the network, but it causes increased delay. Recently, heuristic and meta-heuristic algorithms have been used to cluster nodes in different fields of IoT applications. In [40], prior to creating clusters, the malignant nodes in the network are first detected. Then, a multi-objective firefly algorithm is used to choose the cluster head (CH) for each cluster. Different types of data were gathered by a number of body sensor nodes and transferred to CH. Then, CH provides the information to the system via the gateway after sending the data it has gathered to the sink. Nazari et al. introduce a clustering strategy for the Internet of Things (IoT) based on software-based networking (SDN) and using genetic algorithms. Their method determines the necessary number of clusters and makes sure that CHs are distributed throughout the environment. After clustering, CHs use greedy distance-based routing for sending data to the sink. Therefore, multi-hop routing causes a prolonged network lifetime [6]. DECR presents an energy-efficient two-hop-based clustering and routing protocol [41]. DECR utilizes a modified grey-wolf optimization algorithm to select the cluster head (CH) and optimize routing, taking into account node connectivity and residual energy. DECR determines the optimal number of clusters by considering intra-cluster and inter-cluster transmission distances. A routing algorithm is also proposed to ensure energy-efficient packet delivery from CH to sink. Simulation results show that DECR significantly outperforms existing clustering and routing protocols in various performance metrics.

Although, clustering has been successful in reducing energy consumption in many IoT applications. However, using this technique in IoMT will be very inefficient due to the delay of the cluster-based routing process.

Table 1 Comparison between related researches

2.4 Cost-based protocols

Considering energy as a cost is one of the most important aspects of designing routing protocols for IoMT. Cost-based protocols attempt to estimate the cost of a link regarding the suitableness of that link in the routing process [42]. LAEEA [43] is a routing protocol that selects the least count of hops to reduce energy consumption in the routing process for the transmission of data packets toward the sink node. Nodes with the most energy and the shortest distance from the sink are selected as intermediate nodes in this protocol. In [45], an efficient and reliable energy routing scheme (ERRS) is proposed to increase reliability. ERRS uses a technique called adaptive static clustering to improve reliability and prolong the lifetime of the network. However, it does not provide any assurance of QoS and interference protection. IM-SIMPLE [44] is a cost-based protocol that includes three phases: Setup, Next-Hop Selection, and Scheduling. In IM-SIMPLE, the cost function is calculated based on the remaining energy of the nodes and their distance to the sink. At the setup phase, the sink and other sensor nodes broadcast a packet including information about their position, identifier, and remaining energy. Then, each node investigates the information obtained from its neighbor nodes, and selects the node with the lowest cost as a next-hop. Finally, at the scheduling phase, the sending node allocates a time slot using TDMA for the next-hop node. To decrease energy consumption, IM-SIMPLE employs a linear mathematical model. In SMORP [46], energy consumption is distributed equally among all nodes using a spider monkey optimization algorithm. Energy distribution serves to preserve network connectivity and stability. SMORP chooses nodes with the highest residual energy, the lowest traffic load, and the closest proximity to the sink, as a next-hop.

Cost-based methods deal with different aspects of network requirements. Creating a trade-off between different requirements is the main challenge in these protocols. It should be noted that most of the previous research has followed the routing issue only in the first layer. So paying attention to all layers seems necessary to be ignored in most of the tasks.

In order to compare the above research work in terms of some influential parameters in IoMT such as mobility, energy, latency, reliability and temperature, we summarize them in Table 1. As shown in this table, some research works focus only on temperature or energy consumption. However, none of them have been specifically designed to provide better reliability and QoS along with low power consumption and high mobility. In this paper, we aim to achieve better performance in terms of overall latency, reliability, power consumption, and mobility by providing a routing protocol based on reinforcement learning.

3 EQRSRL architecture

In EQRSRL, we assume that the network is based on a three-tier architecture. In the first layer, patients are connected to several biosensors, each of which is in charge of gathering particular data. The sensors collect data and transfer it to the coordinator node according to a proposed scheduling algorithm. At the second layer, coordinators perform data preprocessing and forward data toward a sink node by using hop-by-hop communication. The sink node is connected to an edge device, and each patient’s information flow is processed separately on this node. The processed data are aggregated, summarized, and finally delivered to the third layer for further processing or storage. Various techniques, such as machine learning algorithms, can be implemented in the third layer for data analysis such as disease detection, drug prescription effects, and so forth. Therefore, experts can view these results in an application and use them to analyze the disease recovery process. The overall architecture of the system is shown in Fig. 1.

Fig. 1
figure 1

System Model. (Tier 1: patient information is collected through biosensors and transferred to the coordinator node according to the proposed scheduling algorithm. Tier 2: Coordinators perform data preprocessing and send data to the sink node using hop-by-hop communication according to the proposed routing algorithm. Tier 3: All the data are stored in the cloud, then machine learning algorithms are applied to analyze data such as disease diagnosis, drug administration effects, etc.)

As shown in Fig. 1, after collecting and sending data by biosensors, the coordinator node divides them into three categories: normal, high priority, and real-time. Energy-sensitive, delay-sensitive and service-sensitive paths are used to send normal, high priority and real-time data, respectively. As aforementioned in Sect. 1, in order to calculate appropriate routes, energy, location, latency and packet delivery ratio (PDR) should be taken into account. It is worth noting that considering all of these metrics in a single tier is difficult and in most cases does not lead to desirable results. Hence, in order to take these metrics into account, we attempt to implement a cross-layer solution in which the physical and MAC layer functionalities are deployed in the first tier. Moreover, the routing functionalities are deployed in the second tier. As depicted in Fig. 1, the main goal of the first and second tiers is to gather and deliver the data to the third tier (Cloud tier) for further analysis. In this paper, we do not involve in the third tier and propose EQRSRL to consider the QoS issues in the first two tiers for IoMT applications.

3.1 First tier: physical parameters

As it is shown in Fig. 1, the functionality of the first tier is to sense the environment (Human body) and send the sensed data in a hop-by-hop fashion toward the coordinator. In this tier, in order to make EQRSRL more realistic, we attempt to model energy consumption, delay and PDR similar to real world communication models. We describe each of them as follows.

We assume that in EQRSRL, biosensors and coordinator nodes use different power consumption models. Hence, we use the energy consumption model for biosensor nodes similar to [33] which is described as Eq. (1):

$$\begin{aligned} \begin{aligned} E_N = E_{Sens,N} +E_{TX,N} +E_{RX,N} +E_{Proc,N} +E_{Trans,N} \end{aligned} \end{aligned}$$
(1)

Where \(E_{Sens}\) is the required energy to sense and collect data from the physical environment, \(E_{TX}\) and \(E_{RX}\) are the required energy for transmission and reception, respectively. \(E_{Proc}\) is the required energy for processing the data and \(E_{Trans}\) is the amount of energy which is used to switch between active, idle, and sleep modes. The details of information whether \(E_{Sens}\), \(E_{TX}\), \(E_{RX}\), \(E_{Proc}\) and \(E_{Trans}\) are calculated are given in [33, 47].

To model the energy consumption for the coordinator nodes in EQRSRL, we use Eq. (2). It is worth mentioning that, coordinators do not have the ability to sense the environment and can only send, receive and process the data. Hence, the estimated model for a coordinator can be expressed as follows.

$$\begin{aligned} \begin{aligned} E_C = E_{TX,C} +E_{RX,C} +E_{Proc,C} +E_{Trans,C} \end{aligned} \end{aligned}$$
(2)

According to Eq. (1), the energy consumption during a given period for each sensor node can be estimated by taking summation of \(E_{Sens}\), \(E_{TX}\), \(E_{Proc}\). Moreover, since a sensor during a given period only receives one packet from its coordinator and only one time switches from sleep to idle mode, we can estimate the total energy consumption for that period using Eq. (3) as follows.

$$\begin{aligned} \begin{aligned} E_N = \sum _{Sam-rate} E_{Sens} + \sum _{Sam-rate} E_{Proc} + \\ \sum _{Tran-rate} E_{TX} + E_{RX} + E_{Trans} \end{aligned} \end{aligned}$$
(3)

The energy for each coordinator node in a given period can be calculated by Eq. (4).

$$\begin{aligned} \begin{aligned} E_C = \sum _{m=1}^{Members} (E_{RX} +E_{Proc}) + \sum _{p=1}^{ Packets} (E_{TX}) + \\ \sum _{k\in Neighbours} (E_{RX} +E_{Proc} + E_{TX}) + \sum _{t\in T} E_{Trans} \end{aligned} \end{aligned}$$
(4)

The first part of Eq. (4) denotes the energy consumed for receiving and processing packets from members of a coordinator inside the same cluster. The second part is the energy consumed for sending the data to the next hop, which can be an intermediate node or even a sink node. If the current coordinator node is selected as an intermediate node, it should receive, process and then forward packets to the next hop as calculated in the third part of Eq. (4). Since a coordinator node may switch from sleep to active mode several times during a given period, the fourth part of Eq. (4) is considered to estimate the energy for these transitions.

In the jargon of computer networks, nodal delay includes transmission, propagation, processing and queuing delay which is expressed as follows.

$$\begin{aligned} \begin{aligned} D_n = D_{Trans}+D_{Prop} +D_{Proc} +D_{Queu} \end{aligned} \end{aligned}$$
(5)

According to Eq. (5), end-to-end delay for a path between a source and destination node consisting of m hops is estimated by summation of the nodal delay in each hop, as shown in Eq. (6).

$$\begin{aligned} \begin{aligned} D_{e2e} = \sum _{i=1}^{m} D_i \end{aligned} \end{aligned}$$
(6)

An electromagnetic signal power density decreases as it propagates through space, and known as path loss. Different path loss models are introduced in the literature, such as [48, 49]. For EQRSRL, we use FRIIS Free Space propagation model, which is calculated as follows.

$$\begin{aligned} \begin{aligned} Pr = Pt + Gt + Gr + 20*log(\lambda )-20*log(4*\pi *d)-10*log(L) \end{aligned} \end{aligned}$$
(7)

Where Pr is the power of the received signal as a function of the distance (d meters) between the transmitter and receiver, Pt is the power of the transmitted signal, Gt and Gr determine the gain of the transmitter and receiver antennas, \(\lambda\) is the wavelength of the carrier signal in meters and L represents other losses.

3.2 First tier: scheduling

In EQRSRL, we propose a centralized scheduling schema to manage the shared communication channel. The proposed schema uses TSCH’s [50,51,52] approach to assign a time slot for the biosensor. To do this, it divides the frequency band into some sub-channel, each of which consists of some fixed-length a time slot. The scheduling schema is performed by the coordinator, and it assigns a time slot to each biosensor. Each sensor only can send its data in the assigned a time slot. In EQRSRL, to improve QoS and save energy, the coordinator calculates a time slot for each sensor regarding its transmission rate. Table 2 lists the features of some popular biosensors in terms of physical layer characteristics. As it is shown in this table, for sensors such as ECG, we need to send 1 pps that denotes to assigning more consecutive a time slots for this type of sensor. Nevertheless, even in critical applications such as IoMT, it is not necessary to send all the data sensed by biosensors. On the one hand, the function of the body’s organs moves much slower than the sampling rate, and on the other hand, sending data per sample increases energy consumption and network traffic. For this reason, in EQRSRL, we consider an adaptable transmission rate depending on the type of each sensor.

Table 2 Caption text

In this paper, we assume that the topology is a star network and all sensors are directly connected to the coordinator node as shown in Fig. 2. In this network, it is important to consider the following constraints and assumptions when dealing with the scheduling issue.

Fig. 2
figure 2

Topology of first layer

In EQRSRL similar to TSCH, if each slot-frame has 101 slot-time and 16 channels, and the transmission rate per sensor is the same as Table 2, the number of required cells is 7 cells. In EQRSRL, the control packets are sent in the zero slot-time. Furthermore, the first slot-time can be randomly chosen from [1 to 96]. Each frequency sub-channel is also randomly selected from 16 sub-channels. For example, as shown in Fig. 3, the scheduling schema of the topology of Fig. 22 consists of 5 sub-channels, each of which has 7 time slots. In this example, we assume that the transmission rates of sensors B and C are twice times more than the other sensors.

Fig. 3
figure 3

Scheduling example

3.3 Second tier: routing protocol

The second tier in EQRSRL architecture implements the routing functionalities, including two steps: setup and routing. In the setup phase, each node identifies its position and those of the neighbors from which it can have a path to the sink via them. In fact, after the setup phase, the network environment is divided into multiple zones as depicted in Fig. 4(b). In nutshell, the following activities are performed during the setup phase:

  1. 1.

    First, the sink sends a setup message to the neighbors, which includes its energy, ID, and zone number. The initial value of the zone number is zero where the sink is located.

  2. 2.

    The nodes that are close to the sink node receive the setup message and extract the zone number from it. They then increment this number by one and set their zone value to that number. Subsequently, they send a new message to their neighboring nodes, containing information about their energy level, ID, and zone number.

  3. 3.

    If a node receives a message from several nodes, it sets its zone number based on the smallest received value. Moreover, it updates its neighbor’s table, which is used in the routing phase.

  4. 4.

    The above process continues till all the nodes in the network identify their zone, as depicted in Fig. 4(b).

Fig. 4
figure 4

Zoning phase

Upon finishing the setup phase, a reinforcement learning-based routing process is initiated. Generally, reinforcement learning enables devices to learn how to utilize local information gathered from their neighbors to select an appropriate next node. This policy not only reduces energy consumption, but also provides better QoS for various IoMT applications.

As aforementioned, some important information such as neighbor position, zone and energy is obtained during the setup phase. Moreover, the delay of each link can be calculated regarding the position information. As a result, in reinforcement learning, each node calculates the initial Q-value for all neighbors whose zone is less than its current zone (i.e., the neighbor closer to the sink). Since we assume that the traffic is divided into three different types, it is needed to identify three different types of paths. Therefore, for all three types of traffic, the Q values are calculated and updated separately. The initial values for energy-sensitive traffic, delay-sensitive traffic, and real-time traffic are given in Eqs. (8)–(10), respectively.

$$\begin{aligned} \begin{aligned} r_{Q_1}^{t_0}=\alpha * E_r/E_0 + \beta * P_{Succ}/(P_{Succ}+P_{Loss} ) \end{aligned} \end{aligned}$$
(8)
$$\begin{aligned} \begin{aligned} r_{Q_2}^{t_0}=\alpha * E_r/E_0 + \beta * P_{Succ}/(P_{Succ}+P_{Loss} ) + \gamma * 1/d \end{aligned} \end{aligned}$$
(9)
$$\begin{aligned} \begin{aligned} r_{Q_3}^{t_0}=\alpha * P_{Succ}/(P_{Succ}+P_{Loss}) + \beta *1/delay + \gamma * 1/d \end{aligned} \end{aligned}$$
(10)

To learn the real cost of an action, we have to compute the action-value function, which defines whether an action is good to be performed from a given state based on a policy \(\pi\). The action-value function is computed as follows.

$$\begin{aligned} \begin{aligned} Q_\pi (s, a)=E[G_t \Vert S_t=s,A_t=a],G_t=\sum _{k=0}^{\infty } \gamma ^k * R_{t+k+1} \end{aligned} \end{aligned}$$
(11)

Generally, the Q-learning reinforcement method is a model-free approach that does not require an environment model. The action-value approximate function depends on the policy pursued by the agent [53]. Therefore, the optimal policy is the best choice policy in any situation. The sender selects a neighbor with the highest Q-value, denoted by a maximum (Q (s, a)) as calculated in Eq. (12).

$$\begin{aligned} \begin{aligned} Q_{\mu ^*}=Q^*(s,a),V^*(s,a)=max(Q(s,a)) \end{aligned} \end{aligned}$$
(12)

One agent performs each action, it receives a reward and uses it to update the Q-value using Eq. (13).

$$\begin{aligned} \begin{aligned} Q_{t+1}(s,a)=(1-\alpha ) * Q_t (s,a)+\alpha * (r_{t+1} (s,a)+\gamma * max(Q(s,a)) \end{aligned} \end{aligned}$$
(13)

\(\alpha\) is the learning rate and \(r_{t+1}(s,a)\) is the immediate reward that is calculated using Eq. (14) for all types of traffic separately. discount factor varies between 0 and 1.

$$\begin{aligned} r_{Q_i}^{t+1} = \left\{ \begin{array}{ll} r_{Q_i}^{t}, &\quad E_r>0 \text {and packet is received} \\ r_{Q_i}^{t}- \gamma * P_{Loss}/(P_{Succ}+P_{Loss} ), &\quad E_r>0 \text { and packet is lost}\\ -100, &\quad E_r<=0 \end{array} \right. \end{aligned}$$
(14)
Fig. 5
figure 5

Management of dynamic topology changes

3.4 Second tier: topology management

Since patients and sensor nodes might move from one place to another place, changing the topology in IoMT is inevitable. For EQRSRL, we assume this type of movement follows the random walking model [54]. Hence, we attempt to manage these topology changes with the least number of packets. As the matter of fact, in EQRSRL, the nodes exchange a minimum number of packets to inform each other about the movement of some other nodes. In nutshell, when a node slightly moves to another position, it follows the following steps.

  1. 1.

    It sends a message requesting the zone and energy to its neighbors and asks for their zone number.

  2. 2.

    If there is a sink among the respondents, the current zone is one. (Fig. 5(a))

  3. 3.

    If the node receives more than two different zones, the node considers its zone number equal to the zone from which it has received the highest messages. (Fig. 5(b))

  4. 4.

    If the node receives more than two different zones (three), in this case, it sets its zone number equal to the average number of zones.(Fig. 5(c))

  5. 5.

    If a node receives a response only from nodes in one zone, the node increases its zone number by one and forms a new layer. (Fig. 5(d))

To model the mobility, a stochastic differential equation is used to compute the velocity of the patient i at time t [54].

$$\begin{aligned} \begin{aligned} v_i^t= - \begin{bmatrix} -log(\rho 1) &{} \theta \\ -\theta &{} -log(\rho 2) \end{bmatrix} *[v_i^t- \mu ]*dt+JdB_t \end{aligned} \end{aligned}$$
(15)

Where \(\rho 1\) and \(\rho 2\) show the auto-correlation parameters in the first and second coordinates, \(\mu\) and \(\theta\) show the mean velocity vector and the mean rotation angle, respectively. In addition, J is a lower 2 \(\times\) 2 triangular matrix with positive oblique elements that determines the covariance of velocity changes, and \(B_t\) represents the standard Brownian motion at time t. The current patient’s location is calculated using Eq. (16).

$$\begin{aligned} \begin{aligned} L_i^t= L_i^(t-1)+v_i^t \end{aligned} \end{aligned}$$
(16)
Table 3 Simulation parameters

4 Performance evaluation

In this section, we carry out a comprehensive performance evaluation. We implemented EQRSRL in MATLAB simulator and the source code of it has been uploaded in GitHub [55]. To make a fair comparison, we evaluate EQRSRL and benchmarks in different scenarios. All experiments are executed on a computer with a CPU Intel Core i7-4700MQ\(-\)2.4 GHz and 16 GB of memory. In our simulations, the sink node is located in the center of the topology, while other IoT devices are randomly deployed in a 200 m \(\times\) 200 m area. It is assumed that each patient is equipped with 5 biosensors and one coordinator. Table 3 shows the simulation parameters. Since EOCC-TARA [33], SMORP [46], THE [34], mobTHE [35] and ISDNC [6] are research works similar to EQRSRL, we have used these methods as benchmarks and compare the performance of EQRSRL with them. We consider the following important QoS metrics for the performance evaluation.

  • Total Energy Consumption it denotes the total energy consumed by IoT devices for sensing, processing, sending/receiving packets to/from coordinator nodes, switching from active to sleep mode, and routing packets to the edge node. To make a more accurate comparison, we compare energy consumption for routing; the average energy consumption for routing all packets.

  • Average End-to-End Delay The average elapsed time of a packet from an IoMT to the sink node. The delay of each link is calculated based on the equations stated in section 3.1, and the delay of a route is considered with the sum of the delay of links.

  • Packet Delivery Ratio it is the ratio between the number of packets successfully delivered to the sink node and all the packets that originated from the nodes.

Energy consumption is one of the most important criteria of the Internet of Things. Figure 6 shows the results of different algorithms for the average energy consumption of each patient at the end of the simulation.

Fig. 6
figure 6

Energy consumption

Figure 6 illustrates EQRSRL outperforms other techniques in terms of energy consumption. This is because of two reasons: first, EQRSRL manages topology locally and prevents sending broadcast messages. The use of layering and the prevention of direct data transmission is another reason that leads to less energy consumption in EQRSRL. Moreover, PDR is an important metric that has an adversary effect on energy consumption. Once a packet is dropped in the network, it should be resent, as a consequence, consumes more energy. The reason that ISDNCR has less efficiency compared to EQRSRL is that it is based solely on energy consumption and the distance from the source to the next hop and the distance from the next hop to the sink. The SMORP method only pays attention to the energy criterion, but does not consider the distance from the source node to the next hop. Similarly, EOCC-TARA method only takes the reliability of the link into consideration, and does not involve the energy level of the next hop. Both THE and mobTHE use routing in the first tier, in which each biosensor must be aware of the temperature/link quality of other sensors to choose the best next-hop based on the neighbor’s status. Thus, to update the neighboring list, it is necessary to send additional packets, which consumes a lot of energy. In addition, they send data packets to the sink directly, which requires a lot of energy because energy consumption and distance are directly related. According to Fig. 6, it is evident that EQRSRL is more scalable than other methods and with increasing the number of nodes, its energy consumption increases slightly.

To investigate the overhead of the routing in EQRSRL and benchmarks, we only measure the energy consumption for the routing process as shown in Fig. 7. This figure clearly shows that EQRSRL significantly is energy efficient in routing process compared to the other benchmarks.

Fig. 7
figure 7

Energy consumption

Figure 8 shows that the EQRSRL method performs well in terms of average end-to-end delay in different scenarios. The reason for this is that in EQRSRL, the cost of each link is calculated based on the QoS metrics such as the delay of that link which leads to making a path with minimum delay. On the other hand, in order to find the optimal path in EOCC-TARA and SMORP, general network information should be collected in the sink node, because both of these methods leverage SDN architecture. Hence, sending packets, making decisions on the SDN controller and then sending optimal paths to nodes leads to higher delay in the network. In the second layer, THE and mobTHE send data packets directly to the sink. As a result, the network will be congested, and it takes longer in MAC layer for a device to capture the channel in order to forward the received packet. As shown in Fig. 8, ISDNCR results in long end-to-end delay, because it only involves the distance between the next hop to the sink node without any consideration between the current node and the next hop.

Fig. 8
figure 8

Delay

Fig. 9
figure 9

Delay for real-time traffic

Fig. 10
figure 10

Delay for delay-sensitive packets

Fig. 11
figure 11

Delay for normal packets

As mentioned in previous sections, the importance of delay depends on the type of traffic. Generally, in real-time traffics, the delay is much more significant than in other types of traffic. Figures 9,  10 and 11 show the delay for real-time, delay-sensitive and normal traffic, respectively. As shown in all of these figures, EQRSRL outperforms other benchmarks in terms of delay in scenarios with a different number of patients. This is because, during the routing process, EQRSRL calculates the paths for each type of traffic by considering the most critical QoS metrics such as PDR and delay. Therefore, the delay for all types of packets is expected to be acceptable and in the worst case is about 400msec.

Figure 12 shows the packets’ delivery ratio for different scenarios. According to the propagation model described in Sect. 3.1, as the distance between the transmitter and receiver increases, the strength of the received signal decreases, and, as a consequence, the probability of packet loss increases. In EQRSRL, dividing the network into multiple zones leads to multi-hop routing. This prevents the direct sending of packets and also reduces the energy consumption, as well as the packets, lost ratio. Furthermore, EQRSRL also takes QoS metrics into account, which causes the calculation of appropriate paths in terms of packet loss.

Fig. 12
figure 12

Packets delivery ratio

Since the sleep mechanism is used in THE and mobTHE. Therefore, if the temperature of the sensor rises above a threshold, the sensor will turn off until the temperature drops. Therefore, many packets are lost at this time.

EOCC-TARA and SMORP require selecting the best route using the evolutionary algorithm. The evolutionary algorithm time complexity is equal to O(\(N_{iter}\)* \(N_{pop}\)* \(N_{nodes}\)). Where \(N_{iter}\) is the number of evolutionary algorithm iterations, \(N_{pop}\) is THE calculates a utility value for each node based on data transmission rate, distance, energy, and temperature. Then it selects the best next hop with O(N) based on the utility. Similarly, mobTHE computes the link quality based on the Received Signal Strength Indicator (RSSI) that has time complexity equal to O(N). If the length of the route is equal to M, then the complexity of THE and mobTHE is O(N*M). EQRSRL leverages Q-Learning, and Q-Values are initialized and updated with O(N). The next hop selects with O(N); thus the similarity of THE and mobTHE, EQRSRL has complexity time equal to O(N*M).

5 Conclusion

In this paper, we proposed EQRSRL as a routing schema to improve QoS and reduce energy consumption for IoMT applications. One of the benefits of EQRSRL is that it treats each network traffic with respect to its QoS requirements. In order to achieve this goal, we implement EQRSRL in a cross-layer fashion. Hence, we apply a scheduling mechanism similar to the TSCH protocol in the MAC layer, as well as reinforcement learning-based routing in the network layer. This policy leads to providing a high level of QoS for real-time medical applications in which human life is at the sake. As the matter of fact, QoS provisioning helps in the transmission of real-time medical data from biosensors to the cloud, as well as the transmission of control commands from the cloud to the IoMT coordinator nodes. Moreover, EQRSRL also reduces the energy consumption of biosensors and prolongs the network lifetime. To make it more realistic, we developed the physical layer communications using well-known energy consumption and propagation models. Simulation results show that EQRSRL significantly reduces energy consumption and increases QoS in terms of end-to-end delay and packet delivery ratio.