1 Introduction

Internet of Things (IoT) is an emerging technology that promises to attend physical objects in our daily lives. The main idea of the IoT comes equipped with a built-in with embedded devices that turn them to smarter object and devices, which are connected to the Internet distinctly and able to communicate with each other. The purpose of IoT is collaboration between the devices, which will lead to performing complicated tasks. Therefore, these devices have capabilities such as collecting, processing and transmission of information. It is possible only through the integration of existing technologies, such as intelligent sensor networks, Radio Frequency Identification (RFID), Near Field Communication (NFC), mobile technology and the Internet. By the integration of these technologies into a single system, the IoT from a variety of devices such as powerful and complicated servers to RFID tags will be created. Notably, despite the heterogeneous environment IoT, it is expected some devices have a low efficiency and performance. Because of limited processing, low power, low memory, and low bandwidth, the devices in networks have a low Throughput and High delay [1].

Considering the limitations; such as requirement of significant amount of resources and energy; the traditional solutions to the problem of routing on such networks do not work properly. To develop the limited traditional solutions, too many researches have been done in recent years. To use the IoT in people ordinary life, the challenges and routing algorithm must be improved. Routing data from source to destination in large-scale networks is a major challenge. In the IoT, the communication devices with various network standards, creates new challenges and limitation that are not considered in previous routing protocols. Thus requiring routing to dynamically handle the constant changing of network topologies. In this paper, we propose a routing protocol for IoT with ability to select the optimal route for routing. The suggested method, intelligent agents selects routes based on power of battery, the distance to sink, available bandwidth, and based on the path stability rate.

The used intelligent method to train the nodes to choose the optimized paths is the reinforcement learning with fuzzy logic method. The efficiency of the Suggested protocol is compared with the IEEE802.15.4 standard and routing protocol with the fuzzy logic, the throughput, end-to-end delay, packet delivery ratio are the sample criteria used in comparing the efficiency. In the suggested protocol, the OPNET simulator version 11.5 is used.

2 Related Works

By the development of IoT, devices that is interconnected with diverse computational capabilities has been increased. For multicast streaming multimedia on IoT, a fixed service quality cannot respond to heterogeneous devices, and maybe a device cannot receive information from all devices. In order to responsiveness the requirements of devices heterogeneity. In [2], the authors proposed a distributed algorithm to optimize the problem of the flow between the layers. This algorithm is based on network encryption for broadcasting multimedia streams on IoT and network coding for streaming multimedia Internet network objects. The authors used from particle Primal decomposition and Primal dual in this paper. They also used LYAPUNOV’s theory to prove the convergence and the overall stability of the proposed algorithm. Wireless sensor networks are widely used as a convenient media for aggregating the physical world and the world of information related to the IoT. At the same time with power consumption keeps down the WSN need more communication. Multicast is a general operation that it’s performed by the base station and the data is sent to a group of recipients. This problem can be decomposed into the famous problem of the Steiner tree that has proved to be a NP-complete problem.

In [3], a new routing protocol for multicast is called multicast routing algorithm for the solution of the problem with the quality of service constraints have suggested. Estimation of links is considered to be accidental due to information aggregation or a sudden change in traffic load. And also because the link criteria are not aggregate, deterministic algorithms for searching the multi-thread tree are not applicable. Therefore, random criteria convert links to definite descriptors that can be given as inputs to deterministic or traditional algorithms.

In [4], the authors examined the use of information-centric networks to support the implementation of Internet-based objects and provide a flexible architecture that allows the optimal operation of IoT in domain of an information-centric network. The proposed architecture has paid special attention to naming requirements, internal operations, security and energy efficiency.

In this paper, the authors also evaluated the communication overhead created by security mechanisms in IoT devices.

In [5], the authors proposed a Greedy Method with Small World model properties (GMSW) for inhomogeneous sensor networks within the IoT. They first introduced two greedy criteria that are used in the GMSW to distinguish the importance of different nodes in the network. Based on these criteria, they have introduced the concept of the local significance of the nodes. Then they introduce an algorithm which has changed the network so that it can affect the characteristics of a small world model by adding shortcuts between specific nodes and based on their local significance.

In [6] the authors, inspired by the Ant Clustering algorithm, have proposed a content-aware method for clustering sensor nodes in the form of semantic sensor networks (SSON) which in the sensor nodes that are similar in content information are within a cluster.

First, the sensors are grouped according to their type to create SSONs. Then, the proposed algorithm called ANTCLUST is used to cluster the nodes based on their content on the network nodes. In addition, effective changes have been made to reduce the cost of the sensor nodes searching process and a comparative mode is proposed to maintain network performance against dynamic changes in the IoT environment.

In [6], the authors proposed a modified workplace for monitoring and sensitizing the environment which can monitor parameters such as air temperature, humidity, light and other things in order to automate the building. In the developed system, the data is sent through the transfer node and by using a modified method to the destination node. The received data in the receiver node is monitored and will be stored in a personal computer. These works are performed by a GUI created in LABVIEW.

In [7] the authors have proposed a new routing method for the real-time response of IoT, which is based on general information decisions that can improve the efficiency of secure transfer and real-time response, on IoT. In particular, the authors have devised a method called delay iterative method (DIM) which is based on Delay Estimates and it is used to solve the problem of ignoring valid routes, In addition, a new sending method called REPC is proposed that balances the load inside the network by focusing on the remaining energy of the nodes.

in [8] the authors by focusing on IEEE 802.15.4 and Internet Engineering Task Force(IETF) IPv6 Routing Protocol for Low-Power and Lossy Networks(RPL) (IETF RPL) standard have analyzed the routing protocols and IoT Medium Access Control (MAC), the result shows that the existed protocols will not consider the cooperation of routing layers and MAC layers, new criteria are suggested based on this fact.

These criteria will make the routing protocols capable to adjust the routing protocols and MA in a better way and to change the parameters based on limitation of the specified environment.

In [9] the authors have integrated the quality of service requirements on the Internet objects and the methods of optimization of service quality criteria. They simplify the complex computational model of service quality, and each model is presented with its owned computational way. By combining the technology of quality service of complex services, the authors have modified the algorithm to find near optimal services at a lower cost than the quality constraints. Suggested algorithm can easily find computing services of service quality faster than other algorithms for large Internet objects of large objects.

3 Suggested Method

In this method, we do not need a dynamic environment model and at the begging it was assumed that the agents know nothing about network details and also the reward was calculated at the simulation commencement, and for all routs the reward function amount will be considered as zero. And the network nodes have Global Positioning System (GPS) which the nodes position will be determined by its usage.

After learning in each environment possible mode, the intelligent agents are able to select the best possible act for any environment possible mode.

Three different scenarios for suggested protocol are considered, in the 1st one, to estimate the routs stability rate based on the distance to sink, available band width, and the level of buttery criteria. The nodes are trained by using reinforcement learning with fuzzy logic and by encouragement and punishment.

In the 2nd scenario, use fuzzy logic to train the nodes, in the 3rd scenario, the routing will be performed, by using 1EEE802.15.4 protocol [10]. Recently, this protocol as a communication standard for WSN of low power, low cost used. And it has high flexibility and parameters can setup and also the protocol by Guaranteed Time Slot (GTS), real time applications guaranteed. This feature is suitable for sensitive caries in WSN. We connectivity the network with 50 nodes consider, that for all of suggested algorithm, the same connectivity assumed. The following we deal to describes the suggested method according to three scenarios.

Modes: WSN nodes are in the network.

Actions: select the best rout with high sustainability for sending data information.

Reward for transfer between actions: the total reflects the quality of the path (sum of the stability parameters of each path) \({\text{V}}^{\pi } ( {\text{s}}_{\text{t}} )= {\text{r}}_{\text{t}} + {\varvec{\upgamma}}{\text{r}}_{\text{t + 1}} + {\varvec{\upgamma}}^{ 2} {\text{r}}_{\text{t + 2}} + {\varvec{\upgamma}}^{ 3} {\text{r}}_{\text{t + 3}} + \cdots = {\text{r}}_{\text{t}} + {\varvec{\upgamma}} [ {\text{V}}^{\pi } ( {\text{s}}_{\text{t + 1}} ) ].\)

Also we using fuzzy logic to rate the criteria. The mechanism we have suggested in this scenario for routing the IoT, is a controller based on fuzzy logic and is considered as a comparative fuzzy control In fact, in order to design a fuzzy controller, we need to discover the implicit and explicit communication in the system using intelligent agents, and Accordingly, we use fuzzy control rules along with the knowledge base to estimate the rating of the evaluation criteria (distance to sink, remaining energy, available bandwidth). The block diagram of this controller system is shown in Fig. 1.

Fig. 1
figure 1

A fuzzy system has four sections: the rules base, the fuzzy inference engine, and the fuzzy input and non-fuzzy input variables described below

  1. 1.

    The rules base

A rule base contains a suitable number of rules that cover the entire input space. This number of rules is determined in such a way that the base of the rules is so-called complete, That is, every point in the input space has at least one rule. The rules base of a fuzzy system is usually based on empirical data obtained by expert and certified people. It is created in the form of if then rules and makes the basis of the rules of a fuzzy system.

In this research, using fuzzy rules, we calculate the path stability rate according to the amount of considered parameters, such as distance to sink, available bandwidth and node battery energy.

Each fuzzy rule consists of two parts, one part; introduction “If the energy level of the battery is low and the distance to the sink is high so the bandwidth is low,” and the other part; result “then the path stability is very low.”

For each of the three input parameters, we define two sets fuzzy (Low and High) that produce eight fuzzy bases. These eight bases are defined in Table 1.

Table 1 Fuzzy rules base
  1. 2.

    Fuzzy inference engine

Fuzzy Inference is the process of formulating the input mapping to an output using fuzzy logic to provide a basis for what can be our decision, or what the decision pattern is for us. In the inference step, using the fuzzy rules, we calculate the stability rate of the path according to the amount of parameters considered.

  1. 3.

    Fuzzy the inputs

In this step, fuzzy sets are defined for fuzzy input and output variables. Fuzzification is the converting of actual inputs to fuzzy sets which are suitable for applying the inference engine. In other words, the fuzzification is the interface between the real inputs and the inference engine.

For each input variable, we define two fuzzy sets with trapezoidal membership functions. H for upper limit and L for lower limit that shown in Figs. 2, 3 and 4. The reason for using these functions is their high accuracy. For output, the path stability of the five fuzzy sets has Triangular membership functions (H for upper limit, very high VH, M for middle limit, L for lower limit and VL for very low), shown in Fig. 5.

Fig. 2
figure 2

Membership functions for input variables from distance to sink

Fig. 3
figure 3

Membership functions for bandwidth input variables

Fig. 4
figure 4

Membership functions for inputs of battery level energy

Fig. 5
figure 5

Membership functions for output variable path stability

  1. 4.

    Defuzzification mechanism

A Defuzzification is used to translate the fuzzy output to a numerical value. The input of any non-fuzzy process is a fuzzy set (the result of the collection of fuzzy output sets) and although Fuzzfier contributes to valuation in the middle stages the final output for each variable is a number. However, the collection of fuzzy sets involves a series of output values and so it should be Defuzzification to convert the fuzzy set to a single output number.

In the suggested method, the Defuzzification mean of the centers is used, which is calculated using relation (1)

$${\text{Path}}\,{\text{Stability}} = \frac{{\mathop \sum \nolimits_{l = 1}^{m} y^{ - l} \mathop \prod \nolimits_{i = 1}^{n} \mu A_{{_{i} }}^{l} (X_{i} )}}{{\mathop \sum \nolimits_{l = 1}^{m} \mathop \prod \nolimits_{i = 1}^{n} \mu A_{{_{i} }}^{l} (X_{i} )}} .$$
(1)

The parameters of this formula are: i: path index, m: the number of fuzzy rules (here it is 9): the number of membership functions of the input variables (here is 3), \(\mu A_{{_{i} }}^{l} (X_{i} )\): the fuzzy value of the membership functions and \(y^{ - l}\) also the output centers.

In the suggested method, we are going to recognize the suitability of a path by using a reinforcement learning system in order to send information to the sink, if the suitable route is detected the node will extract and store the path record. Which consequently it will increase the sustainability of routes.

This process is in two phases, discovering the route and keeping rout is done.

With an example the proposed protocol works will be explained in the following.

3.1 Rout Discovery Phase

A sensor network can be described with a graph of nodes (nodes include sensors and sinks). If two Machines can communicate directly with their radio system, we say these two nodes are connected. (In the network graph, there is an edge between these two nodes). Due to possibility that one of these two nodes may have more powerful transmitter than other one, it is possible that the communication between A and B is established but not vice versa, but for simplicity, we assume that All communications are two-way and symmetrical Also, whenever two nodes are in the radio range of each other, there is no guarantee that their communication would exist, There may be buildings, hills or other obstacles between them Which prevents their communication (Although they are in the range of each other). We also assume that the nodes are connected to the Global Positioning System (GPS) and the node’s physical location can be obtained through GPS.

To describe how the proposed protocol performance consider a graph in which a process in sensor node “A” send a packet to a sink node “S”. In the suggested protocol each node has a table, the key is the sink address and each of the records in this table contains sink information and also the delivery destination details.

Source node “A” checks the routing table to find the destination, if it finds a route; it sends data from that path. Assume that “A” has searched the routing table, and does not find any element like “S” in it, now it has to discover a path to “S”, to facilitate data routing, in this algorithm, a direct virtual backbone (DVB), which is made up of sensor nodes, is used that rooted in the sink, which has been developed to make the DVD centralized and the sink sends a rout request (RREQ) message that includes sink ID and the sink physical position is in its own communicating range broadcast, each sensor node which receive this message will knows the Sink as its parent and updated the message and broadcast in its radio range, then each sensor node that received the message, will store the message in a list and then put these sensor nodes as its own parent and the node will updated the message again and will broadcast in its communication range, it continues until all sensors know their own parents and until each node has one or more paths to the sink.

To answer the packet request, all sensor nodes calculate their distance information up to the sink according to Relation 2, the calculation includes the physical location, identification number, available bandwidth, distance to the sink and the amount of energy remaining on the battery level which be inserted in the path response packet and will broadcast it in their own range.

$$D_{{i{\text{s}}}} = \sqrt {\left( {x_{i} - x_{s} } \right)^{2} + \left( {y_{i} - y_{s} } \right)^{2} + \left( {y_{i} - y_{\text{s}} } \right)^{2} }$$
(2)

So that \(D_{{i{\text{s}}}}\) is the distance between the sensor node i and the sink S, and (\(y_{i}\), \(x_{i}\), \(Z_{i}\)) is the location of the sensor node “I” and the (\(y_{\text{s}}\), \(x_{s}\), \(Z_{s}\)) location of the sink node.

When the path response packet arrives to an intermediate sensor node, it will be processed as follows:

First Characteristic (identification number of the sensor of the source, the request ID) are searched in a local table which holds the records of such packets and determine whether the package has already been downloaded and processed if the package was duplicated, the package will be deleted and its processing will be end, in case not duplicated, this packet characteristic will be imported in the record table so that it will not be processed in the future, by doing this, it actually prevents the loop in routing, then the processing continues. For example it is checked if there is enough battery power to send to this node, if they have enough energy (In the high interval) a positive point is assigned by fuzzy logic to encourage the node and otherwise a negative point is assigned. For other explained parameters such as speed and available bandwidth and distance to the sink the same applies and then calculate the total points to determine the reward of the node and according to Eq. (3) by fuzzy logic and added to the reward field in the path request packet.

$${\text{V}}^{\pi } ( {\text{s}}_{\text{t}} )= {\text{ r}}_{\text{t}} + {\varvec{\upgamma}}{\text{r}}_{\text{t + 1}} + {\varvec{\upgamma}}^{ 2} {\text{r}}_{\text{t + 2}} + {\varvec{\upgamma}}^{ 3} {\text{r}}_{\text{t + 3}} + \cdots = {\text{r}}_{\text{t}} + {\varvec{\upgamma}} [ {\text{V}}^{\pi } ( {\text{s}}_{\text{t + 1}} ) ]$$
(3)

where \({\text{V}}^{\pi } ( {\text{s}}_{\text{t}} )\) is reward value for each sensor node, \({\text{ r}}_{\text{t}}\) is the reward in time t. \({\varvec{\upgamma}}\) in the above equation is the discount rate which is set to 0.9 in our algorithm.

Then the middle node adds one unit to the value of the step count field and it will publish the path request package by entering the stability parameters to the DVB father nodes, Of course it will extracted the inside packet data and store it as a new element in the table (reverse paths). This information is useful to create a reverse path and to return the response of this request in the future through it.

As soon as a new reverse direction is created, a timer is set for it, if this timer expired and the response to the path request package not return, the created path will be deleted, this process will continue until the path request packet arrived to the sink. Based on the total reward field, sink choose the best path (high reward path) that will be a steady path then by produce a path declaration packet the sink will notify all sensor nodes about the high stable path and the sensor nodes which are on the opposite direction will be aware of the path to “S” (One of benefits of discovering a path by “A” is Awareness of the middle nodes), The nodes which delivered the path request package on the going path are not in the reverse direction, after expiration of the timer the reverse direction to “A” will be removed.

3.2 Rout Keeping Phase

Because in this type of network the Sensor nodes may be turned off due to reduce the battery level, the network structure will be changed occasionally. This protocol should be able to somehow resolve this problem.

Each node in the network will alternately broadcast the “Hello message” which contains the identifier and its location information to the neighbors. It is expected that the neighbors will respond to this message, in the case of no response, the publisher of the message is informed that his neighbor is out of range and the connection does not exist anymore.

This information can be used to clear the paths which are no longer working.

Each node such as “I”, will keep a list for each destination node which the node neighbors have sent a package in the past ΔT second to destination through “I”, this list is called as “active neighbors i”, Node “I” has a table that the key is the network nodes sink address, also the next node to reach the sink and the active neighbors list are inserted on this list.

Whenever one of the neighbors of “I” is out of reach, the node “I” will check its routing table to find the network nodes which were using the deleted node in their path, and all nodes will be notified about the invalid path and also active neighbors will inform their active neighbors too and this process will be continued until all dependent paths removed from the entire routing table.

4 Simulation of Suggested Method

4.1 Simulation Environment

In this research, to simulate the suggested method and compare with the IEEE 802.15.4 protocol, the 11.5 [11] simulator is used.

The simulation environment is in 1000 * 1000 * 1000 m3 area with 50 sensor nodes which is randomly distributed in the environment and a sink node, the simulation time is considered to be 200 s and the range of hypothetical radio transmission is 250 m.

Figure 6 shows an overview of the network topology, the simulation parameters are shown in Table 2 and Fig. 7 shows the characteristics which defined for each sensor node.

Fig. 6
figure 6

View of a network topology with 50 nodes

Table 2 Simulation parameters
Fig. 7
figure 7

Characteristics of each sensor node

5 Simulation Results

Figure 8 shows a comparison of the average power consumption of the network for the randomized broadcast scenario With IEEE802.15.4 standard, random broadcast with fuzzy logic and the proposed scenario of Reinforcement Learning.

Fig. 8
figure 8

Average network energy consumption

The Ox axis is simulation of the time and the Oy axis is the rate of energy consumption. As expected, the random play mode is IEEE802.15.4 has the highest energy consumption than to the previous two methods. The reason is that the network nodes act unconsciously and they send date from the node directly to the sink node without regard to its amount of energy, while in the suggested method if one node has a low remained energy it will not be selected as a middle node and it will save energy.

Figure 9 compare the end-to-end delay for random broadcast scenario with IEEE802.15.4 standard, fuzzy logic and reinforcement learning.

Fig. 9
figure 9

End to end delay

The Ox axis is the simulation time and the Oy is the end-to-end delay as it is shown delay will be increases in random play mode with IEEE802.15.4 standard. Because some of the network nodes may send part of the data and have not enough energy to continue the sending process and complete the information transfer.

In the proposed method, because all data are transferred by the high bandwidth nodes which have a low distance to sink and also we have selected a node that got the highest level of energy, Therefore the energy of this node is more than other nodes in the cluster and it is expected that the node will not be turned off until the end of data transferring process, so the transmission delay will be reduced.

Figure 10 compares medium access delay in the random broadcast scenario with the IEEE802.15.4 standard and fuzzy logic scenario and reinforcement learning scenarios.

Fig. 10
figure 10

Medium access delay

The Ox axis is simulation time and Oy is the medium access delay.

As it is shown, the delay will increase in random broadcast mode with the IEEE802.15.4 standard because some of the network nodes may send part of data and not have enough energy to continue the sending process or due to bandwidth reduction, it faces noise and lose the data.

Figure 11 compares the probability of success for sending data to the sink node for random broadcast scenarios according to IEEE802.15.4 and fuzzy logic scenario and scenario of the suggested method. The Ox axis shows the simulation time and the Oy axis is probability of success for sending data to the sink. As it is shown in random broadcast mode, the percentage of successful sending is low and the network Throughput rate is reduced because during the data transferring some network nodes may be turned off and the transfer of information is not completed. In the suggested protocol, after routing operation, the nodes with the highest path energy are selected. After discovering the stable path, we are sure that there the path will exist till the end of the data transmission phase and the energy of the nodes in the chosen path will not end early. Therefore, the stable path will not change until the end of the data transmission phase, so in the proposed method the number of delivered packets to the sink node will be higher.

Fig. 11
figure 11

The probability of success in sending data to the sink node

Figure 12 compares the signal-to-noise ratio for randomized broadcast scenarios with IEEE802.15.4 and fuzzy logic scenarios and reinforcement learning. The Ox axis shows the simulation time and the Oy axis shows the signal-to-noise ratio. According to Fig. 12, the IEEE 802.15.4 protocol has a lower signal-to-noise ratio than the proposed protocol, because the IEEE 802.15.4 protocol may use unstable paths to send and during the transferring the number of bits with error has increased and the signal-to-noise ratio is reduced, also it is possible that the transmitted information signal by the IEEE 802.15.4 protocol are eliminated in effect of congestion and Turbulence and the possibility of noise increases. Therefore, the quality of sent data, multimedia data and text data will be reduced.

Fig. 12
figure 12

Signal to noise ratio

Figure 13 shows the data package error rate.

Fig. 13
figure 13

Error rate of data packs

The Ox axis shows the simulation time and the Oy axis is error rate in the data packets. According to Fig. 13 the IEEE 802.15.4 protocol, has more error rate than the suggested method, because the IEEE 802.15.4 protocol may uses the nodes that have less energy for data transferring and the nodes may be exhausted and shut down during power transmission, therefore the packet cannot send data to the sink node, and it will lose more bits than the suggested method.

6 Conclusion

This paper proposed three approaches for routing problem in IoT networks. We used three different scenarios to represent our proposed protocols. In the first scenario using a fuzzy logic all nodes are trained and they will be capable to determine the stability of each link in the network and finally determine the path stability rate by use of stability input variables such as available distance to sink, battery level energy, bandwidth. In the second scenario we use fuzzy logic and reinforcement learning jointly to train nodes in order to estimate optimal paths. Third scenario uses sensor nodes were randomly distributed in a WSN according to IEEE 802.15.4 protocol which is a standard and highly flexible protocol with low data rate, low energy consumption and low construction cost. This protocol is suitable for real-time applications [12]. We conduct several experiments to verify the performance of our algorithms. The simulation results indicate a better performance of the suggested method than the fuzzy logic and IEEE802.15.4 protocol from the point of view end-to-end delay, Rate delivered data to the sink, Bit error rate in data packets, Signal to Noise Ratio and the amount of energy consumed.