1 Introduction

The Internet of Things (IoT) emerged as a powerful technology to change modern society. IoT enlarges the horizon of possible services and applications by connecting the cyber-physical infrastructure with complex wireless systems. This connection enables remote data processing, adding more intelligence to data collection. Moreover, the growth of IoT research brings more confidence to developers, contributing to the success of IoT in many different areas, such as in Industry 4.0, Smart Cities, Smart Homes, and Smart Agriculture [1,2,3,4].

IoT devices have hardware constraints with known battery limitations, making energy consumption a considerable concern [5]. Communications, however, are at the top position for energy consumption, even considering the various processing tasks that IoT devices need to execute. If we consider multi-hop communications, this becomes even more critical as the traffic generated by any node requires multiple wireless medium accesses until it finally reaches the destination. Thus, saving energy is a challenge for extending the expected lifetime of participating devices.

Following the typical IoT architecture, data collection and transmission between devices or between devices and a central entity represent a fundamental task. Hence, communications are intrinsic to IoT, which must consider the multi-hop possibility for proposing energy savings alternatives. In this direction, the mobile agent (MA) paradigm emerges as an efficient data collection and transmission approach in IoT [6,7,8,9,10,11,12].

A mobile agent can be defined as a computer code dispatched by a central entity, which could be the network gateway, to the network. Network devices use this code to perform specific tasks, e.g., data fusion and aggregation. Using MAs, we can change the behavior of devices by temporarily installing different codes on them.

Thus, rather than performing data aggregation only at the gateway, the MA can save network energy by compressing raw data at each device before forwarding it [13,14,15]. This even allows MA to discard data that is not relevant, preventing its transmission through the wireless network to the gateway for remote processing. To complete the assigned tasks, the MA must visit different network devices following a predetermined itinerary computed using an optimized algorithm (itinerary planning) [16].

The MA itinerary is composed of source and intermediate nodes. We call source nodes those with the desired information that needs to be collected and processed by the MA. Intermediate nodes, in opposition, are those used to forward the MA through the multi-hop network. The MA itinerary can be computed using either static or dynamic approaches. On the one hand, in the static approach, the central entity is in charge of the itinerary computation.

Once established, the MA uses source routing at the application level [17,18,19] with the itinerary registered before being dispatched to the network. In the static itinerary planning approach, the itinerary is registered in the MA’s code by the gate-way, which is composed of source and intermediate nodes.

On the other hand, in the dynamic approach, the MA determines the itinerary composed of IoT devices hop-by-hop based on the current network status. Hence, the computation of the MA itinerary reveals a tradeoff between the maintenance of a network global view at the central entity and additional processing power at the devices. Our proposal focuses on static itineraries to privilege simplicity at the network devices, given their typical simple design. Even though a MA can provide significant energy-saving results, the solution has some restrictions related to delays in completing the proposed itineraries. This can be a point of attention for some delay-sensitive applications, which we circumvent by assuming cache deployment at the network edges.

This paper proposes Agent-Knap, a network architecture that combines MAs with opportunistic data collection and cache deployment at the network edge to reduce the energy consumption at network devices. Our proposal relies on static itineraries, which include an itinerary composed of source and intermediate nodes, and a selection mechanism that prioritizes data acquisition based on a pre-computed data utility. Traditional MA (TMA) migrates from the current source node to the next, ignoring the data available on intermediate nodes. These intermediate devices, however, may contain relevant data that network clients may request soon. Thus, unlike previous proposals [20,21,22,23], in which a static itinerary is computed for data collection at source nodes, our proposal also considers opportunistic data collection on intermediate nodes within the MA itinerary. This proactive approach aims to save energy from IoT devices by reducing MA dispatching rounds. In addition, Agent-Knap assigns priority to opportunistic data as a 0-1 knapsack problem. This model computes the knapsack award based on weights inversely proportional to the data stored freshness in the cache located at the network gateway. Agent-Knap considers the MA payload size as the maximum backpack capacity.

We compare the Agent-Knap with the TMA, i.e., without using opportunistic data collection at intermediate nodes. Also, we improve a previous version of Agent-Knap by introducing the possibility of data aggregation [24]. We observe that the energy consumed for MA transmission also becomes significant after a specific payload size. Simulation results reveal a substantial reduction in energy consumption and network traffic. We summarize our main contributions as follows:

  • We propose Agent-Knap, an approach to collect data from IoT networks based on mobile agents (MAs).

  • We propose the use of opportunistic data gathering to reduce energy consumption by proactively collecting data at nodes located at the MA itinerary.

  • We propose a data aggregation approach to further improve energy savings by reducing the MA size and, consequently, the energy consumed for each MA transmission.

This paper is organized as follows. Section 2 overviews the related work. Section 3 presents the proposed data collection mechanism, Agent-Knap, and its data aggregation improvement. In the following, Sect. 4 describes the simulation environment and the results achieved. Finally, Sect. 5 concludes this work and presents future directions.

2 Related work

Efficient data collection in sensor and IoT networks has been the subject of many recent works [7, 8, 11, 21, 25], where authors start looking for new approaches to reduce energy consumption and network traffic. Gavalas et al. [26] propose a static itinerary mechanism for multiple mobile agents. They primarily consider energy efficiency influenced by the increase in MA size as it moves forward and aggregates data from network nodes. Unlike Agent-Knap, Gavalas et al. do not evaluate data collection on intermediate devices while the MA moves along the itinerary. As far as we know, our work contributes to the state-of-the-art using opportunistic data collection with MA.

2.1 Network edge caching

The use of cache in IoT systems is a possible strategy to reduce the energy consumption of network devices [27,28,29,30,31,32]. Caches introduce more memory resources at the network edges and can reduce the response time as it enables temporary storage of collected data. For example, an IoT gateway can store the data collected by local sensors. In addition to preventing new requests from being directly sent to sensor nodes, this strategy allows quicker response times for Internet clients. It is worth noting that the data gathered by devices may not drastically vary in the physical environment in the short term. Real IoT deployments may work well with data governed by different expiration times, which can be minutes, hours, or even days, depending on the application [33].

Zhou et al. [34] propose to store the collected data at the gateway cache. The main goal is to use the cached data to reply to upcoming requests for the same data, avoiding energy consumption at the sensor network and, at the same time, reducing the response time. The work also considers that some sub-regions of the area of interest (AoI) are more likely to have requested information than others. Thus, the authors develop a mechanism to proactively collect data from popular sub-regions. The authors identify that frequent requests can impose high communication costs that may exceed the network capacity. Thus, the authors argue that the most popular content, among all recently requested, is more likely to be requested again soon. The proposal collects the most popular data by periodically sending requests from the gateway to the sensors. From simulations, the authors show that the proposed mechanism performs well, mainly considering the number of cache hits and the energy consumed in the network.

The fundamental difference between our proposal to Zhou et al. [34] is that we consider opportunistic data collection. In our proposal, the MA collects unsolicited data from the network that will shortly become relevant. Zhou et al. [34] deploy proactive data collection, requiring the transmission of additional messages throughout the network. This alternative leads to increased network traffic and energy consumption. Our proposal, instead, conducts opportunistic data updates from the sensor network using MAs. This strategy enables proactive data collection, a key feature of our proposal.

Other proposals rely on attributes like content popularity to estimate the request probability in the future. The goal is to anticipate which content should be stored in a centralized cache. Wei et al. [31] consider a caching scenario at the IoT network edge. The network comprises mobile devices and content servers positioned at the network edge. These servers aim to make content available to IoT devices with lower latency and minimize the traffic sent from devices to the cloud infrastructure. The work proposes a caching policy called SAPoC that considers, in addition to popularity, the concept of content similarity with other content previously requested by users. The main goal is to reduce the slow-start phenomenon incurred by existing history-based caching strategies. The proposal focuses on dynamic systems where devices arrive and leave the network over time. When a device requests new content, its popularity tends to be high if it holds high similarity with other cached popular content. This strategy helps predict future content popularity, contributing to proactive caching. The authors obtained high performance when comparing the proposal with other cache policies.

Our proposal focuses on another strategy that does not use popularity estimation because it requires high CPU processing at the edge nodes. Agent-Knap chooses a simple priority computation strategy that considers content utility in the cache strategy instead.

3 Data gathering using Agent-Knap

This paper proposes Agent-Knap, which can operate in two modes: with and without data aggregation. We consider a network composed of one gateway and multiple IoT devices randomly deployed in an area of interest (AoI). Each device has a set of sensors, and each sensor collects a specific data type, or content, from the environment (e.g., temperature, pressure, or vibration). Also, each device stores only the last sample of each data collected by a sensor. IoT devices can be source nodes, providing the requested data, or intermediate nodes, interconnecting consecutive source nodes at the MA itinerary. The proposed architecture is centralized at the gateway, our central entity, which runs itinerary planning and cache management functions besides processing all requests from and responses to network clients.

The network operation relies on data requests from external clients sent to the gateway. Each request has information about the desired content and the desired AoI. Depending on the data availability and freshness in the cache, the gateway decides whether a new data-gathering round is needed, i.e., if it needs to dispatch a new MA.

At the system initialization, however, each device must register and inform the list of different contents it offers, i.e., the services it provides and the size of each collected data type in bytes. After the system initialization, we assume that the gateway has information about the network topology, including the geographical position of all devices, which is essential to compute the MA itinerary. We also assume that nodes exchange vicinity discovery messages at the system bootstrap.

If, however, we add dynamics during network operation, the needed information for MA itinerary computation and node vicinity could be maintained with typical updates provided by wireless routing protocols. Assuming that the devices do not move and their corresponding sensors do not change after the system initialization, topology-control information from proactive routing protocols, e.g., Optimized Link State Routing (OLSR) and Routing Protocol for Low Power and Lossy Networks (RPL), would be enough. Hence, during network operation, we would not add any extra overhead.

In summary, upon receiving a data request from an external client, the gateway decides whether a new data collection round is needed. If this is the case, the gateway dispatches a MA that follows a predetermined itinerary containing all sensors of interest, named source nodes. These nodes provide the requested data and may not be neighbors at the network topology. The MA can either concatenate the data collected from the multiple source nodes or execute a data aggregation mechanism using the data collected hop-by-hop as input. Hence, for example, assuming that a client requests the temperature of an AoI and this data is not fully available in the cache, the gateway dispatches a MA to collect the missing data. The MA can either concatenate all temperature readings collected at source nodes along the itinerary or merge all readings into one representative temperature of the entire AoI. Figure 1 depicts a traditional data collection using MAs without data aggregation. The gateway dispatches a MA in red that concatenates all the data collected at the predetermined source nodes, also in red, along the computed itinerary. Note that the MA size grows since more data is put together as it moves. In Fig. 1, in the end, the MA has a payload of size 4, which is the number of source nodes on the itinerary. The following sections detail the main features of the Agent-Knap, including the proposed opportunistic data collection.

Fig. 1
figure 1

Traditional data collection using MA without aggregation

3.1 Source node selection

When the gateway receives a client request, it first conducts a cache lookup for updated data. The idea is to provide a fast and complete response to the client. We assume that a complete response comprises unexpired content of a specific type from multiple devices covering the entire AoI. If the cache does not provide a complete response, the gateway starts a data collection by dispatching a MA. The gateway must collect fresh data from IoT devices to complete the non-expired information in the cache. These particular devices are referred to as source nodes, colored in red in Fig. 1.

In each round, the selection of source nodes determines the group of devices the MA must visit to complete the data in the cache. The group of source nodes is selected, taking into account the best possible AoI coverage, which guarantees a complete response regarding the data requested by the client. Hence, considering Fig. 1, the selection of source nodes determines that the devices in red are enough to provide a complete view of the entire AoI.

We model the source node selection mechanism as a classic coverage problem, the Weighted Set Cover. In our model, each sensor \(n_i\in \mathcal {N}\) is associated with a coverage subarea. A weight \(w_{n_i}\) is also associated with each sensor to assign priority in the selection process. The freshness related to each content stored at the gateway cache determines \(w_{n_i}\). The older the data of a given sensor stored in the cache, the higher the priority in the Weighted Set Cover problem.

In terms of complexity, the Weighted Set Cover is NP-hard. The heuristic used in our proposal has time complexity \(O(\log n)\).

After selecting the source nodes, the gateway can determine the correct itinerary \(\mathcal {I}\) for the MA.

3.2 Itinerary planning

The gateway must compute the itinerary, the sequence of source nodes to be visited, and those connecting them (intermediate nodes), before sending the MA to the network. The itinerary forms a closed loop starting at the gateway. The loop must visit all source nodes the gateway selects, as described in Sect. 3.1.

We assume that each device has a unique identifier on the local WSN and that the information about the identifiers of all devices on the MA itinerary is inserted into its code structure. Each device discovers its directly connected neighbors on the system bootstrap and saves their identifiers. In summary, the MA forwarding process is carried out by performing the following steps:

  • Upon receiving a MA, each device verifies the next node in the MA itinerary before forwarding it. The next node can be either a source or an intermediate node.

  • Before forwarding the MA, the device processes its code to perform the programmed tasks (data concatenation or aggregation in our case).

  • When transmitting the MA over lossy links, the transport layer manages the transmission reliability. We assume this can be done using TCP, for instance.

Let \(\mathcal {N}\) be the set of IoT devices. Hence, the gateway starts the itinerary computation using the set of source nodes needed for the next gathering process, i.e., the itinerary computation uses \(\mathcal {N}_{tsp}\subseteq \mathcal {N}\) as input. In this step, we use both algorithms of Christofides and Dijkstra. The first one, the Christofides algorithm, is a heuristic used to solve the Traveling Salesman Problem (TSP), which is the one tackled for MA itinerary computation. The output of the algorithm is the sequence of nodes that must be visited. Because these nodes may not be directly connected, we use the Dijkstra algorithm to determine the complete itinerary by computing the shortest path between consecutive source nodes.

The Christofides algorithm is a TSP heuristic that, in the worst case, guarantees a solution at most 3/2 worse than the optimal one [35]. The heuristic used in our proposal has time complexity \(O(n^3)\). Thus, the complete itinerary is composed of two sets, \(\mathcal {N}_{tsp}\) with all source nodes needed and \(\mathcal {N}_{sp}\subseteq \mathcal {N}\) containing all intermediate nodes included by the Dijkstra algorithm. Note that \(\mathcal {N}_{tsp}\cap \mathcal {N}_{sp}=\emptyset \).

3.3 Opportunistic data gathering

After computing the itinerary, the gateway must decide on the payload, filling it with data obtained only from source nodes or with data from source and intermediate nodes. Hence, the main challenge is to manage the MA payload.

The MA packet format is divided into four main parts: the MA ID, for unique identification of the mobile agent; the itinerary; the processing code, for data manipulation; and the payload, reserved for the data collected and possibly processed by the MA. The payload has a maximum size (P) in bytes and is further divided into two parts: a fixed size reserved for the data collected from source nodes (G) and a size (C) used to carry the data collected from intermediate nodes. The size of C depends on the number of intermediate nodes visited, the size of each collected data, and whether data aggregation is enabled. Even though C is variable, it is upper bounded by a maximum value determined by the gateway.

The fixed size G carries the guaranteed data from the source nodes, whereas the size C carries all sorts of data from intermediate nodes visited in the itinerary. Hence, we call this data collection opportunistic because it may be used to collect important data that was not requested at the current MA round but can be requested soon.

Each IoT device in the set \(\mathcal {N}\) contains distinct contents its sensors collect. Hence, each IoT device, \(n_i\in \mathcal {N}\), has a subset of all contents provided in the network. Let \(\mathcal {K}\) be the set of contents available in the network, then each content type \(k_j\in \mathcal {K}\) has an associated size \(s_j\) in bytes. We denote the content \(k_j\) collected at node \(n_i\) as \(k^i_j\), where \(i,j\in \mathbb {N}\) are the indexes of the devices and the available content types, respectively. Figure 2 details the format proposed for the payload. Considering the example, G is determined by the content \(k_1\) collected at four different devices, with IDs 2, 3, 8, and 12. Hence, G contains \(k^2_1\), \(k^3_1\), \(k^8_1\), and \(k^{12}_1\). C, on the other hand, is determined by four different data types, \(k_3\), \(k_5\), \(k_2\), and \(k_4\) opportunistically collected at four different intermediate devices, with IDs 1, 7, 8, and 12, i.e., \(k^1_3\), \(k^7_5\), \(k^8_2\), and \(k^{12}_4\).

Note that the guaranteed data must be of the same content, \(k_1\) in Fig. 2, requested by the client node. This is a consequence of our assumption that clients can only request data of one desirable content at each round. The opportunistic data, however, can be of any type. Also, we assume that the same data type has the same size and format.

Fig. 2
figure 2

MA payload P with guaranteed (G) and opportunistic data (C)

To fill C with data from intermediate nodes in \(\mathcal {N}_{sp}\), the gateway models the problem as a knapsack problem. The knapsack problem solved by the gateway must consider the size C as the “backpack” capacity used to select the intermediate nodes that have the data opportunistically collected. The MA adds all the data collected to the payload without violating the maximum C size. Nevertheless, filling the payload with opportunistic data depends on whether the MA aggregates data.

The Knapsack problem is NP-complete [36]. Hence, the heuristic used in our proposal is a polynomial-time approximate algorithm with time complexity \(O(k^i_j*C)\).

The following section (Sect. 3.4) explains data aggregation in more detail.

Fig. 3
figure 3

Data aggregation using the proposed Agent-Knap

3.4 Data aggregation

  In Agent-Knap, when data aggregation is enabled, all data samples of the same type are aggregated. Data samples of different types can also be collected, but instead of being aggregated, they are concatenated at the payload. For example, if a temperature request is received and a MA is dispatched to collect samples, pressure and humidity data samples can be opportunistically collected at the same round. The MA payload concatenates the aggregated temperature with the aggregated pressure and the aggregated humidity data. For the sake of simplicity, our proposal considers data aggregation with a factor of 1. Thus, same-type data aggregation does not increase the payload size used by the MA.

Figure 3a depicts an example of MA collection with data aggregation. Data samples of the same type are always aggregated in the payload, whereas aggregated data of different types are concatenated. The knapsack problem solved at the gateway must consider the data aggregation process. Hence, if the MA uses data aggregation, its initial size does not change if the samples collected are of the same type. Figure 3b illustrates the proposed data-gathering process without aggregation. Four same-type data samples collected at different source nodes in G and four different-type data samples opportunistically collected at intermediate nodes in C fill the MA payload. In this case, the payload size increases with every data sample collected at network devices.

In our implementation, the gateway must indicate whether data aggregation is active using a flag. Hence, devices on the itinerary can proceed with data aggregation or not. If data aggregation is enabled, the gateway considers the different data types available at the intermediate nodes as items for the knapsack problem. With data aggregation disabled, all data samples available at intermediate nodes in the itinerary are considered items for the knapsack problem.

For the system modeling, the main goal of our proposal is to save energy of IoT devices when a multi-hop MA is used to collect data. The collected data at each device is stored in the gateway cache until an expiration timer remains valid. Each stored data has a timer and, while valid, avoids a new data collection round. Thus, we save network resources by reducing the number of requests sent to the network. Opportunistic data gathering plays a vital role in the process, as it brings more data back to the gateway, increasing data freshness.

3.5 Payload computation

Let \(A_cache ^{k_j}\) be the corresponding area covered with non-expired data stored in the cache for content \(k_j\). The proposed Agent-Knap solves a 0-1 knapsack problem in which the solution provides the data contents the MA must collect on each intermediate device along the itinerary. The parameters to solve the knapsack problem are the following:

  • The list \(\mathcal {L}\) of all data contents that are available at the intermediate nodes of the computed itinerary

  • The priority \(p_{k_j}\) computed for the content \(k_j\)

  • The size \(s_j\) in bytes of each content type

  • The MA opportunistic content C size in bytes

For the knapsack problem, the priority computed for a specific content is numerically proportional to the contribution it provides to complete \(A_cache ^{k_j}\) in the gateway. Thus, the lower the intersection between \(A_cache ^{k_j}\) and the total AoI, the higher the priority for \(k_j\) in the current gathering round. Equation 1 presents the priority \(p_{k_j}\) computation for a given content \(k_j\).

$$\begin{aligned} p_{k_j} = AoI - (AoI \cap A_cache ^{k_j}) \end{aligned}$$
(1)

The payload computation procedure determines the opportunistic content collected and the corresponding devices along the itinerary. These devices are those providing the content. This ensures that the MA payload is optimally populated by the most valuable content provided by the devices at the itinerary \(\mathcal {I}\) without violating C.

3.6 Cache update

In Agent-Knap, the gateway cache is always updated with data collected by the MA. The cache has enough memory to store the data produced by all sensors from all devices within the AoI. Nevertheless, when data aggregation is enabled, the MA merges same-type data from different sensors to calculate a single value, which can be the mean, maximum, or smallest value, depending on the aggregation approach used by the MA. In this case, the gateway loses individual measurements as the MA does not transfer them along the itinerary.

Hence, when aggregation is enabled, the data stored in the gateway’s cache for each sensor does not match the real value measured by the device. Instead, it is the aggregated value of all nodes in the itinerary. Thus, there is an error between the real and the value stored in the cache when data aggregation is on. Nevertheless, when data aggregation is disabled, the MA maintains the exact value measured by each sensor, keeping it in the gateway’s cache. There is then a clear tradeoff between MA size and measurement precision when using data aggregation.

All timing information is provided at the gateway to avoid synchronization issues between the gateway and the IoT devices. Thus, Agent-Knap does not require synchronization, as caching updates are exclusively handled by the gateway. Therefore, the gateway updates the cache and the corresponding timing information upon the MA arrival for each collected sample. Timing information could also be produced at the devices, providing even more precision (or accuracy) to the system. This, however, would require a more complex system design for time synchronization across the entire network, which is not our goal in this paper.

4 Simulation results and discussions

In this section, we evaluate the performance of the proposed Agent-Knap using simulations. We have implemented a simulator using Python and the NetworkX package. The algorithms used in our proposal to solve Knapsack, Weighted Set Cover, and Traveling Salesman problems are polynomial-time approximate algorithms.

4.1 Simulation setup

The topology consists of fixed devices randomly positioned in a geographical area. The number of devices is always enough to guarantee complete coverage of our AoI. Each device has a fixed number of sensors and, consequently, can collect different contents from the AoI.

We simplify the analysis by considering a few assumptions. First, we consider that all devices have the same initial battery level. Also, we assume that the different content types have the same size; all network devices have four different sensors, one sensor for each type of content; and the number of different contents available in the network is four. At last, the sensing range of each sensor is the same for the entire network.

In this simulation, we represent a remote monitoring application that can benefit from analyzing historical data collected from sensing devices. Considering wide and flat areas, this strategy would fit scenarios such as agriculture, where data sensing is useful for irrigation, soil, and nutrient management. This scenario requires devices spread across greenhouses or plantations to collect data such as humidity, temperature, light, and soil characteristics [4].

Our application considers Internet clients sending request messages to the gateway. Such requests are received with an inter-arrival rate following a Poisson distribution with \(\lambda =5\). In our plots, the expiration time is a multiple of \(1/\lambda \). For instance, we assume an expiration time 30 times larger than \(1/\lambda \). Then, we keep increasing this value until it becomes 180 times greater than \(1/\lambda \).

In addition, the link cost between devices is proportional to the Euclidean distance between them. Each device can communicate with neighbors inside its communication range, and the network gateway has a fixed position at the center of the AoI. Table 1 shows the parameters used in all simulations. The energy consumption values are the same as those adopted in [37] to evaluate Tmote Sky sensors.

Fig. 4
figure 4

Average remaining energy for each device. Experiments consider a 180-device network, enabled and disabled data aggregation, and 60- and 200-byte payload (P) sizes

In addition, the required MA size imposes networking technologies supporting larger packet sizes, such as the one used in this work with 1,024 bytes. Note, however, that the selected MA size is based on the related literature [15, 26].

Table 1 Simulation parameters

We compare our proposal in all simulations with the Traditional MA (TMA) and the Client–Server multi-hop approach. TMA collects data only from the subset of source nodes needed to cover the AoI completely, i.e., the source nodes that do not have valid data at the gateway cache. Conversely, the Agent-Knap opportunistically collects the available data of different content types at intermediate devices. In the Client–Server approach, the MA collects data from sensor nodes in a traditional multi-hop routing fashion. In every round, the gateway dispatches a MA to a single source node and waits for the requested data. The same procedure is repeated until all source nodes for that round are covered. Simulations computed three different metrics: energy consumption, cache hits, and the collected data accuracy. This last metric is critical to evaluate the tradeoff between aggregating or not collecting data. If we aggregate the content, we save energy at the cost of data accuracy.

Fig. 5
figure 5

Average remaining energy for each device. Experiments consider a 300-device network, enabled and disabled data aggregation, and 60- and 200-byte payload (P) sizes

Fig. 6
figure 6

Proportion of cache hits. Experiments consider a 180-device network, enabled and disabled data aggregation, and 60- and 200-byte payload (P) sizes

All simulations are performed for a total period of \(400/\lambda \) and considered two different aggregation factors for the MA, zero and one. Thus, the MA is either executing complete data agg-regation of the same content type along the itinerary (factor one) or simple content concatenation (factor zero). When the MA conducts complete aggregation, there is no increase in the payload size occupied by collected data of the same type. We considered that same-type data can always be aggregated.

For each simulation round, we perform ten runs with a confidence level of 95%. We additionally analyze the impact of different MA payload sizes, i.e., two different P sizes of 60 and 200 bytes.

4.2 Energy consumption

In this simulation, we compute the average remaining energy of all nodes after each round. Both data-collecting approaches, with and without aggregation, are considered for TMA and Agent-Knap. Figure 4 shows the relationship between the average device remaining energy and the expiration time \(\lambda \) of data stored in the cache, when the network has 180 devices. Similarly, Fig. 5 shows the same result when the network has 300 devices. In all cases, the remaining energy increases with data persistence at the cache. Moreover, it is clear that data aggregation is an essential feature of Agent-Knap. The average remaining energy is lower for 180 nodes than for 300 nodes because each device receives more MA visits. Even though P has a subtle impact, it is possible to note that the remaining energy is more significant for 200 bytes as fewer MA rounds are needed when the data of interest is more persistent in the cache. Our results show that this is enough for Agent-Knap to consume less energy for larger values of P compared with client–server approach and TMA, even without data aggregation.

In all cases, we also observe that data gathering with the multi-hop Client–Server approach had the lowest remaining energy compared with a MA dispatched through a pre-defined itinerary. We note that the Client–Server approach incurs, on average, more hops for a dispatched MA to reach all source nodes and return with the updated data to the gateway.

Data aggregation brings more gains because the MA size grows at a lower rate as the MA moves through the itinerary. When the aggregation is enabled, the MA size grows only when data of different content types are collected.

4.3 Cache hits

The cache-hit evaluation aims to verify the performance of the cache infrastructure fixed in the gateway, including scenarios where data aggregation is enabled or disabled. A cache hit happens when a data request is replied to by the gateway with the information available in its cache, i.e., without dispatching a MA. For example, considering a request for the average temperature on a specific AoI, one cache-hit is computed if the non-expired data in the cache is enough to cover all the AoI. In this case, the gateway can send a complete response to the client. Similarly to the previous section, we conduct analysis using 60- and 200-byte payload sizes.

Fig. 7
figure 7

Proportion of cache hits. Experiments consider a 300-device network, enabled and disabled data aggregation, and 60- and 200-byte payload (P) sizes

Figure 6 presents the results for 180 nodes. With Agent-Knap, a larger payload results in more cache hits, especially when data aggregation is enabled. Thus, Agent-Knap performs better with a larger payload (P).

When data aggregation is enabled, the MA can retrieve more data in each round than when the MA does not use data aggregation. This happens because the knapsack algorithm has a maximum payload size to determine which nodes belonging to the itinerary will be selected for the data-gathering process. Using data aggregation, more nodes can have their data collected by the MA because aggregated data does not increase the payload size.

Agent-Knap with enabled data aggregation and 200-byte payload size has the best result. This combination achieves 227% more cache hits than TMA with data aggregation. Figure 7 shows that the proportion of cache hits increases with more network devices. This occurs because with more nodes, more redundant data is available.

4.4 Data accuracy

We compute the root mean square deviation (RMSD) of data gathered by the MA to evaluate the impact of data freshness at the cache and the tradeoff of data aggregation and data accuracy. We compare both data-gathering methods using TMA and Agent-Knap.

The RMSD is calculated per round after the MA returns to the gateway with fresh data, according to Eq. 2.

$$\begin{aligned} RMSD = \sqrt{\frac{1}{M}\sum _{i=1}^{M}\delta _{i}^{2}}. \end{aligned}$$
(2)

We compare valid cached data, i.e., non-expired data at the gateway’s cache, with the real valid data registered in all sensors in the AoI, \(X_{i}\).

In our proposal, the gateway always replies to an Internet client request with an aggregation of valid data.

For example, if the client request is the average temperature in an AoI, the gateway first verifies if all valid cached data covers the entire AoI. If so, the gateway immediately sends the response to the client with the calculated average of the valid data in cache, \(\overline{x_{i}}\). If not, the gateway completes the information with fresh data collected by the MA and then sends a response.

The computed RMSD in every round compares the aggregation of the real values that are registered by network sensors for a specific content in the field, \(X_{i}\), with the aggregated value computed and sent by the gateway to the client as a response.

Thus, in Eq. 2, M is the number of valid cached data in the gateway for content \(k_j\). Furthermore, we have \(\delta _{i} = X_{i} - \overline{x_{i}}\).

Data accuracy results for simulations with 180 devices are displayed in Fig. 8 for payload sizes of 60 and 200 bytes and with and without data aggregation. Figure 9 shows the same results for 300 devices. In the simulation setup, each one of the four sensors in each device has a fixed value measured. These simulated values were randomly generated in a range between 1 and 30 at the system start-up.

Fig. 8
figure 8

Root mean square deviation (RMSD). Experiments consider a 180-device network, enabled and disabled data aggregation, and 60- and 200-byte payload (P) sizes

Fig. 9
figure 9

Root mean square deviation (RMSD). Experiments consider a 300-device network, enabled and disabled data aggregation, and 60- and 200-byte payload (P) sizes

In all scenarios, Agent-Knap keeps a higher data accuracy. This is because opportunistic data gathering improves the data refreshes at the gateway cache. The higher number of valid samples in the cache results in lower error in the answer sent to the client, compared with the TMA method.

Fig. 10
figure 10

Average remaining energy for each device. Experiments consider a 300-device network in a lossy scenario and 200-byte payload (P) sizes

Also, an error reduction is expected when data aggregation is disabled since data aggregation has an intrinsic error.

We observed that Agent-Knap reduces the RMSD when the data aggregation is disabled. In both network sizes, 180 and 300 nodes, and payload sizes, 60 and 200 bytes, the error reduction is evident when we use Agent-Knap with aggregation. This reduction indicates that the data freshness provided by Agent-Knap can also contribute to data accuracy at the network cache. The results using Agent-Knap reveal a tradeoff between data accuracy and energy consumption. Hence, enabling data aggregation depends on the IoT application, i.e., if this tolerates lower accuracy at the cost of lower energy consumption.

4.5 Lossy network scenario

In this simulation, we compute the average remaining energy of all nodes after each round in a lossy network. We introduced a uniform packet error rate of \(10\%\) on all network links.

Hence, if a MA loss occurs, the recovery is conducted on a hop-by-hop basis at the transport level. Note that the application inserts the MA itinerary into the MA code, as described in Sect. 3.2.

We compare the average remaining energy of TMA and Agent-Knap on both lossy and lossless scenarios. All simulations were performed on the 300 devices network topology with MA carrying 200-byte payload (P) sizes. Figure 10 shows the robustness of the Agent-Knap solution even in the presence of transmission failures. The best performance of Agent-Knap appears for the lowest expiration time, which corresponds to the scenario with the largest number of MA transmissions on the network.

5 Conclusion

We proposed the Agent-Knap, a new mechanism for data collection using mobile agents with static itineraries in IoT networks. Agent-Knap improves network communication efficiency by reducing the number of requests sent to the network. We proposed using opportunistic data gathering for proactive caching updates to accomplish that. Our simulation results have shown the impact of the proposed data collection mechanism by reducing the energy consumption of network devices. In addition, Agent-Knap reduces data accuracy, especially when data aggregation is disabled.

The association of the knapsack algorithm with the device selection mechanism allowed the implementation of an intelligent data-gathering process, prioritizing the items of interest. Data aggregation improves energy savings but shows a tradeoff concerning data accuracy for Agent-Knap that must be evaluated depending on the application.

In future work, we plan to dynamically change the payload size reserved for opportunistic data and include a selection process for source nodes that considers upper-layer QoS requirements. We also plan to implement the Agent-Knap with multiple agents with dynamic itinerary planning and evaluate its performance in a low-power and lossy scenario. Finally, we would like to introduce security to our current design by assuming encrypted payloads or secure link-layer strategies.