1 Introduction

The recent advances and developments in the field of micro-electro mechanical system and wireless communications have paved a way for wireless sensor networks (WSN). WSN have come to be known as one of the significant research areas because of their abilities to change the way of interaction between human and physical world. In the last few years, WSN have turned out to be a subject matter of immense interest. WSN is a simple network with little infrastructure, consisting of a large number of tiny sensor nodes with limited energy and computational capability. In WSN, sensor nodes are densely deployed in varied environmental conditions and are used for monitoring factors like temperature, humidity, pressure etc. [1, 2].

Dense and distributed systems capable of performing more complex tasks and inferences are replacing the traditional centralized architectures at a prodigious rates. A distributed sensor network (DSN) is a collection of large number of homogenous or heterogeneous sensor nodes distributed logically, spatially or geographically over a region of interest and connected through a network [3]. The sensors continuously collect data from their environments, processes it and then transmit it through the network. The information gathered from different parts of the network is then fused together to get the final inferences [4]. Since the sensors are randomly distributed in hazardous and unreliable environments, therefore, human intervention is not much required. Sensors are self-aware, self-configurable and autonomous entities collecting data and transmitting it by themselves. Some of the important applications of WSN include: military applications, remote environment monitoring, health monitoring, home automation, and habitat monitoring etc. [1].

The need for using distributed computing in WSN can be justified with the following two points: firstly, sending of all the raw sensed data to the centralized base station consumes a lot of energy and is quite impractical for larger networks. Secondly, since sensor nodes are embedded devices, so using them merely for collecting data is not really an optimum utilization of resources.

Sensor nodes are battery powered devices having constrained computational capability and restricted sensing range, so, collaboration (i.e. correlation among spatially located nodes to reduce the total data transmitted to the processing centre), is required so that the nodes can make up for one another’s shortcomings to control the redundant flow of data in the network, thus, making it energy efficient. In order to facilitate implementation of collaborative information processing in WSN, distributed computing paradigms are required [5, 6].

Client server paradigm is one of the most popular and widespread model used for distributed and collaborative computing. In this model, server offers services, resources and other information for the execution of various services, and client sends request to server for execution of service. On receiving the request, server executes the service with the help of the corresponding information contained in it. However, it faces some disadvantages like single point of failure, high bandwidth consumption, huge network traffic and reduced network lifetime.

To overcome these problems, mobile agent (MA) paradigm [7, 8] was proposed for WSN. In this model, in order to perform the allotted task, MAs migrate from node to node using the resources of the nodes. MA based model has many advantages like reduction in network load, overcoming the problem of network latency, robustness and fault tolerance. But at the same time some disadvantages like network overhead, energy issues in case of low density networks are also there [9]. An MA selectively migrates among sensor nodes by moving the processing code among the target nodes, performs local processing by efficiently utilizing resources available at the local nodes rather than transmitting entire redundant data to a central processor (sink), and incrementally fuses local information on each sensor node to reach a progressively accurate global consensus [10].

In this paper, we propose MA based data aggregation approach, in which the route is decided on the fly. We have also included remaining energy of the node as an important parameter in the cost function so that the energy consumption is balanced among the nodes. Thus, our approach is also able to balance the energy consumption among the nodes that increases the overall life time of the sensor network.

Our approach provides the flexibility in terms of deciding the impact of individual parameters included in the cost function. Thus depending upon the application’s requirement one can adjust the weight factors accordingly giving the higher or lower priority to the parameters based on the scenario. Also, if required we can adjust the weight factors of the cost function for changing environments to offer more flexibility in terms of impact of individual parameters in deciding the route of MA migration.

The paper proceeds as follows: Section 2 reviews various MA based approaches. Section 3 gives the network preliminaries. Section 4 explains the proposed approach for data dissemination. Section 5 presents the simulation results and discussions. Section 6 is devoted to the conclusion of the paper.

2 Related Work

Discovering an optimal itinerary for MA traversal is a significant research issue for the effective and efficient data collection from numerous sensor nodes in mobile agent based sensor network [11]. The sequence of nodes in an itinerary has a crucial impact on the quality and accuracy of data fusion which eventually influences the major purpose of WSN, in applications like target tracking or environment monitoring. Itineraries can be categorized in two ways: firstly, whether they are static or dynamic; secondly, whether they use a single agent or multiple agents. Itineraries can be planned either statically or dynamically [11]. Static itineraries are calculated at the processing centre before MA is dispatched while dynamic itineraries are calculated on the fly after MA has been dispatched. Even though dynamic itineraries are quite flexible to changing topologies and are best suited for varying environmental conditions, calculating them result in added computational time and energy consumption as compared to their static counterparts. Itinerary planning can use either a single agent or multiple agents for the network traversal. In single agent based itinerary planning, only one MA traverses the entire network and concludes the task of data fusion; whereas in multi-agent itinerary planning, multiple MAs are dispatched, each MA is assigned a group of source nodes and has an itinerary for traversal to complete the task of data fusion.

In [12], authors have presented two approaches namely: Local Closest First (LCF) in which the MA looks up for the next node having the least distance from the current node; and Global Closest First (GCF), where the MA searches for the next node having the smallest distance from the PE. An approach quiet similar to LCF has been proposed in [13] called Mobile Agent-based Directed Diffusion (MADD). MADD differs from LCF only in the selection of the first source node; rather than choosing the nearest node from the PE it starts by selecting the farthest node as the initial one. However, all these approaches take into consideration only the spatial locations of the nodes and thus are not energy efficient. A Genetic Algorithm (GA) [14] based approach does not require finding of any stating node for the algorithm to execute; instead it chooses any active node as the initial node. As, each node has to report its status for maintaining global information at the PE, so GA incurs a lot of communication overhead. Two energy efficient approaches Itinerary Energy Minimum for First-source-selection (IEMF) and Itinerary Energy Minimum Algorithm (IEMA) are presented in [15]. IEMF selects the first source node as the one whose subsequent itinerary has lowest estimated energy cost amongst other itineraries, and then it uses LCF approach to plan the remaining itinerary. IEMA iterates IEMF k times to further enhance the energy efficiency.

3 Network Preliminaries

3.1 Basic Assumptions

Following assumptions have been taken about the sensors and sensor networks in our proposed algorithm for the mobile agent migration:

  • Sensor nodes are localized and their location is known in form of Cartesian coordinates.

  • Each node has at least two neighbor nodes and deployment is dense.

  • Clocks of all sensor nodes are locally synchronized.

  • We have assumed 2D plane for the simplicity.

  • Energy level at the time of deployment of each and every node is same and known.

3.2 Network Model

In our proposed methodology we have modelled the network as a graph G(V, E) where V is the set of static sensor nodes, \(V = \{ v1,v2, \ldots vn\} ,\) and E is the set of wireless links \(eij\) between nodes, \(E = \{ eij = \{ vi,vj\} |vi,vj \in v,i \ne j\}\). The network consists of N sensor nodes, scattered in a rectangular area A having Gaussian distribution. The PE denoted by \(v_{0}\) is the start and end points of MA migration route. It is assumed that all sensor nodes are resource constrained especially in terms of energy and bandwidth except for the PE. The sensor nodes are aware of their geographical location in the form of (x, y) coordinates and also aware of their remaining energy. Each node has maximum transmission range R, where they can recognize their neighbor nodes presence in every time intervals. Each sensor node broadcasts a beacon frame including the amount of remaining energy, the geographical location, and the number of times it was visited by MA. It is supposed that the current source node being met by MA is denoted by \(v_{i}\). The candidates of next source node are denoted by \(v_{j}\) as one of them will be selected as the next hop among the source nodes. The MA only passes through intermediate nodes between current and next source nodes.

3.3 Energy Model

We use the same energy model as stated in [15]. The communication energy at a node is computed by:

$$e(S_{rx,} S_{tx} ) = m_{rx} \cdot S_{rx} + m_{tx} \cdot S_{tx} + e_{ctrl}$$

where Srx and Stx are the size of received packet and transmitted packet respectively. ectrl be the energy consumed in exchange of control messages. mrx and mtx be the energy used in transmitting and receiving a data bit.

Let Src k−1ip and Src kip be two nodes and hop distance between them is calculated by \(H_{k - 1}^{k} = \frac{d(k - 1,k)}{R}\) ·S k−1ma is the size of MA at Src k−1ip . The energy consumed in MA traversal from one node to another node is computed as:

$$E_{k - 1}^{k} (S_{ma}^{k - 1} ) = m_{p} \cdot S_{data} + e(0,S_{ma}^{k - 1} ) + H_{k - 1}^{k} \cdot e(S_{ma}^{k - 1} ,S_{ma}^{k - 1} ) + e(S_{ma}^{k - 1} ,0)$$

where, mp · Src k−1ip be the data processing energy at Src k−1ip . e(0,S k−1ma ) is the energy required by Src k−1ip to transmit MA. \({\text{H}}_{k - 1}^{k} \cdot e(S_{ma}^{k - 1} ,S_{ma}^{k - 1} )\) is the energy consumed in intermediate sensor nodes between Src k−1ip and Src kip · e(S kma ,0) is the energy required by Src kip to receive MA.

So, the total itinerary energy is computed as:

$$E_{itinerary} = H(t,Src_{ip}^{1} ) \cdot e(S_{ma}^{0} ,S_{ma}^{0} ) + \sum\limits_{k = 2}^{n} {E_{k - 1}^{k} (S_{ma}^{k - 1} ) + } H(t,Src_{ip}^{n} ) \cdot e(S_{ma}^{n} ,S_{ma}^{n} )$$

4 Proposed Approach

Once MA is dispatched from PE it migrates to the nearest node with minimum cost calculated as per the cost function equation. After reaching at first source node, MA aggregates the data at that node. After that MA finds the next candidate for source node.

Firstly the entire one-hop neighbors will be checked using the cost function to designate the least costly candidate node. Second the difference of information gain for this candidate node with the current value of information gain is calculated. If difference is greater than the predefined threshold value, then MA will migrate to that node and perform the data aggregation. Otherwise MA will migrate to the nearest node two hops away from the current node for which data is aggregated.

Because MA has no prior knowledge regarding the network topology so that it first migrates to nearest neighbor at one hop distance that has not been selected; after that MA moves to next closest neighbor decided according to cost function. After arriving at node two hop away, the threshold for information gain is checked. If difference does not satisfies the threshold value MA will migrate to nearest node four hops away from the current node. This process of searching is continued in all 2n hops nodes until a most informative node is found as the next source node. Upon finding the informative node MA again starts to aggregate the data and finds next node in all one hop neighbors. Finally MA will migrate to PE, when it reaches to a desired level of accuracy or visits all the desired nodes.

Decision of selecting the best candidate node for the next hope is done by cost function Cij.Cost function represents the cost of migration of MA from current source node vi to next candidate source node vj. Cost function basically have the tradeoff between increasing benefits and reducing the losses. Motive of cost function is to decrease the energy consumption and increase information gain and network life time. Therefore the cost function tries to increase the probability of selecting the candidate node with the lowest cost among all the candidate nodes for the next hop migration for MA. Cost function Cij for migrating an MA from vi to vj have been defined as:

$$C_{ij} = a\left( {1 - \frac{{I_{j} (x,y)}}{{I_{max} }}} \right) + b(N_{visit} + 1)\left( {\frac{{e_{j} }}{{e_{max} }}} \right) + c\frac{{E_{ij} }}{{E_{max} }}\quad 0 \le a,b,c \le 1,N_{visit} \ge 0$$

where,

Ij(x,y): information gain of candidate node vj for MA migration.

Imax: maximum information gain of the nodes.

Eij: amount of energy consumed in migration of MA from vi to vj.

Emax: maximum energy required for migrating the MA from one node to another.

ej: remaining energy level of the node vj.

emax: maximum initial energy level of the nodes at the time of deployment.

Nvisit: number of visits of node vj by an MA. Number of times that a node can be visited by MA has been kept limited, because frequent selection of node vj as the source node may cause more energy loss of a single node and hence may reduce network life time.

a, b, c are the constant weight factors given to the parameters information gain, remaining energy and migration energy respectively.

figure a

5 Results and Discussion

In the simulation of our proposed scheme we have considered all ways of energy consumption including both communication and computational costs. First, impact weight factor ‘a’ of information gain used in the cost function on the performance of our approach have been evaluated. Then we have calculated Aggregation Ratio and End-to-End Delay. And finally we have compared our scheme with the model proposed by Xu–Qi’s [7] by varying number of nodes from 100 to 400. Comparison has been done based on the parameters average remaining energy and average end-to-end delay (Table 1).

Table 1 Simulation parameters

The metrics used for the purpose of evaluation are as following:

  • Aggregation ratio: is the precision of aggregated data by MA. If MA visits all the nodes in the network, precision will be 1.

  • End-to-end delay: refers to the time taken by MA from its dispatch by PE to the time it finally returns.

  • Residual energy: A node having the highest residual energy among the other nodes in its neighborhood will be chosen. In order to prolong the network lifetime, a node with lower residual energy will not be selected as the subsequent node for MA migration.

  • Migration cost: It is the cost of migration of MA from current node to the subsequent node. This cost function is a tradeoff between decrease in energy consumption and increase in information and network life time.

  • Information gain: If a source node senses the data all the nodes in that locality may sense the same data with nominal differences. So, instead of migrating the MA to adjacent nodes, migrating it to a far away node helps to accomplish the higher degree of information gain. Thus information gain is directly associated with the nodes distance.

Figure 1 shows the impact of hop count on migration time. It can be seen that with the increase in migration time, the hop count increases i.e. MA migrates in the network for more number of hops and its distance from the MA dispatching unit is more. For 5 hop count, migration time of MA is around 25 ms, which means that MA has been migrating in the network from 25 ms and has traversed 5 hops away from the dispatching unit. Similarly it goes for the other values also. The migration goes on until the desired information gain has been achieved or the energy of the network gets exhausted. However, in extreme cases, the migration time can lowered to 50 ms (hop count between 10 and 15) if the event has occurred in a close proximity of the dispatching unit. Further to evaluate the proposed MA itinerary planning algorithm: aggregation ratio, energy usage and delay have been used as performance metrics.

Fig. 1
figure 1

Migration time versus hop count

Figure 2 shows the impact of weight factor on aggregation ratio. In this experiment, we have varied the value of weight factor ‘a’ in the range 0–1 and analysed its effect on the aggregation ratio. It can be observed from the figure that there is a direct relationship between the information gain and the weight factor ‘a’ used in the cost function.

Fig. 2
figure 2

Aggregation ratio versus weight factor ‘a’

Therefore on increasing the value of weight factor ‘a’, the effectiveness of information gain on the cost function is enhanced and consequently the data aggregation is performed with higher precision.

Figure 3 shows the impact of weight factor on end-to-end delay. In this experiment, we vary the value of weight factor ‘a’ from 0 to 1 and analyse its effect on end-to-end delay. There are basically two related points: firstly information is gained by increasing weight factor ‘a’; second, more information is gained by visiting farther nodes from current source node. Considering these points the MA migrates to farther nodes by increasing the value of weight factor ‘a’. Consequently, end-to-end delay will be increased.

Fig. 3
figure 3

Average end-to-end delay versus weight factor ‘a’

Figure 4 shows the comparative analysis of average percentage of remaining energy. In the next phase of simulation we have compared our scheme with the Xu–Qi’s model [7] based on the average remaining energy percentage. In our simulation we vary the number of nodes from 100 to 400. Since our proposed algorithm selects the next node for MA migration which consumes comparatively less power for transmission of MA. Thus our algorithm balances the energy consumption among the source nodes. Also the desired accuracy is achieved by visiting limited number of source nodes, hence our algorithm can save approximately 40 % energy as compared to existing scheme.

Fig. 4
figure 4

Remaining energy percent

Figure 5 shows the comparative analysis of end-to-end delay. It can be seen that on increasing the number of nodes in the network the end-to-end delay will be increased. But our proposed algorithm has lower delay as compared to existing scheme. The reason for this enhancement is that our algorithm is able to find most informative route by traversing comparatively less number of nodes consequently MA takes less time to return to PE. Our scheme is able to reduce the average end-to-end delay by 12 %.

Fig. 5
figure 5

End to end delay

Thus simulation results verify the practicability of the proposed algorithm. It is clear from the results shown above that our proposed scheme improves the model proposed by Xu–Qi in terms of energy consumption, end-to-end delay and aggregation precision ratio while balancing energy consumption among nodes thus increasing the overall life time of the WSN.

6 Conclusion

In this paper, the authors’ have presented an energy efficient data aggregation approach using MA for WSN. Unlike the previously proposed schemes where, source node selection process is based upon optimum value of cost function (trade-off between energy consumption and information gain), in our approach the cost function has been improved such that highest possible information gain can be achieved while keeping the energy consumption minimum, thereby increasing the accuracy of aggregated data. Simulation results verify the viability of our approach. It is clear from the results that our proposed approach performs better than earlier schemes in terms of energy consumption, end-to-end delay, aggregation precision ratio and network lifetime.

Future work includes extending the proposed work to support the multiple cooperative agents to further enhance aggregation precision and reduce delay in MA migration. Also, by considering the security and reliability parameters in the cost function, data dissemination process could be made more secure and reliable.