Towards a Self-healing Multi-agent Platform for Distributed Data Management

Rodríguez, Arles; Gómez, Jonatan; Diaconescu, Ada

doi:10.1007/978-3-319-59930-4_36

Arles Rodríguez^17,18,
Jonatan Gómez¹⁸ &
Ada Diaconescu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10349))

Included in the following conference series:

International Conference on Practical Applications of Agents and Multi-Agent Systems

1396 Accesses

Abstract

We demonstrate a self-healing multi-agent simulation platform for distributed data-management tasks, including data collection and synchronisation. Collective tasks can be simulated within two types of environments: uncharted terrains with various obstacles, and computing networks with different complex topologies. Agents explore their environment, collect and update local data, and exchange data with agents that they encounter, until the collective task is completed. We have previously implemented several agent exploration algorithms and evaluated their performance in terms of completion speed (essential when agents may fail) and resource overheads (essential in constrained environments). Here, we focus on the agents’ ability to self-heal, via local replication, so as to ensure task completion. We focus on computing network environment, where software replication is more feasible. Envisaged applications include data management in computing clouds, distributed databases, sensor networks, robot swarms and the Internet of Things.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Exploring Complex Networks with Failure-Prone Agents

A decentralised self-healing approach for network topology maintenance

Article 27 November 2020

DeCoF: A Decentralized Coordination Framework for Various Multi-Agent Systems

Keywords

1 Introduction

Data centres, server farms and clouds are distributed systems consisting of a myriad of computing resources interconnected via a network, and coordinating their actions, transparently to users, in order to accomplish various tasks [1]. Such systems are difficult to manage – e.g. software updates, failed component replacements – and downtimes can cost companies in the order of thousands of dollars per minute [2]. Autonomic Computing [2, 3] drew inspiration from nature and proposed to enable computing systems to self-manage, minimising expensive and error-prone human intervention. Notably, self-healing allows systems to recover and pursue their tasks despite failures [4, 5].

The proposed demonstration presents a multi-agent simulator for exploring decentralised self-healing functions and evaluating robustness in distributed systems. Within this simulator, we model and experiment with failure-prone agents which cooperate to achieve collective data-management tasks, such as data collection from uncharted terrains [6] and data synchronisation across complex networks [7]. We evaluated different agent exploration algorithms, e.g. based on random movement, swarm intelligence and Lévy walks. In uncharted terrain environments, results show that a pheromone-based exploration approach ensures the fastest task completion and hence better robustness in case of agent failures. In complex network environments, the same pheromone-based algorithm performs best for most network topologies (e.g. Random, Community, or Small World), yet random exploration is better in topologies with large hubs – i.e. with large values for the standard deviation of the betweenness centrality of their nodes (e.g. some Scale Free or Hub & Spoke topologies).

The present work proposes a self-healing function based on local agent replication. In short, each distributed node keeps track of agents departing for neighbouring nodes. Upon arrival at a new node agents send a confirmation message back to their departing node, which consequently stops tracking them. When a node does not receive a confirmation message from a departed agent within a time-out interval, it creates a new agent and injects its local state (i.e. local data) into it. If a confirmation message arrives late (i.e. after the time-out and after a replica has already been created) the node removes the next agent that arrives at the node (after copying its data) and updates its local time-out (i.e. learning). Details and results are available from the accompanying paper^{Footnote 1}.

The simulator provides results on task success rates, completion speed and replication overheads (e.g. extra memory and communication). We believe that these findings and platform can help to experiment with various multi-agent solutions for a wide variety of data-intensive distributed systems.

2 Platform Purpose and Implementation

The presented simulation platform^{Footnote 2} allows developing various multi-agent data-management solutions, with self-healing capabilities, and evaluating their performance and robustness in different distributed environments. The simulator is implemented in Java, based on the multi-agent platform in [8] – with agents implemented via a family of classes, and running in separate Threads. In demonstrated scenarios the agents are specified as in [7] in terms of exploration algorithms, data management and inter-agent exchanges. The environment is defined as another extensible family of classes that allows agents to interact (e.g. a bi-dimensional terrain or a complex network).

Simulation metrics are defined using the Observer design pattern, which separates simulations from generated metric reports. These reports allow obtaining various statistics (e.g. box-plots and histograms), including the number of steps required for task completion, the number of message exchanges, task success rates, or the evolution of agent numbers over time. The simulator’s statistics module can also be extended and modified to develop custom metrics.

3 Demonstration

The demonstration shows different types of simulations that were developed using the proposed platform. Firstly, as in Fig. 1a, we provide a simulation of failure-prone agents with different strategies for exploring a bi-dimensional terrain [6]. In Fig. 1a, the upper part shows the agents’ terrain coverage (purple traces), the middle part shows the terrain information collected (yellow marks), and the bottom part shows graphs plotting the live agents (failing with a certain probability) against the simulation round number. This simulation allowed us to determine which exploration strategies are more robust in case of agent failures, faster in terms of simulation rounds, and lighter in terms of resource overheads.

Secondly, as in Fig. 1b, we present a simulation of agents (in yellow) collecting and synchronising data within various complex networks [7]. Locations explored by agents are in blue and locations not explored in red. Implemented topologies include Small World, Scale Free and Community (using JUNG [9]), as well as simpler ones such as Hub & Spoke, Lattice, Line and Circle (for testing extreme conditions). This allows us to profile the performance and dependability of different agent exploration strategies against each network topology, for different agent failure rates. Results show a correlation between these evaluation metrics and the standard deviation of the node betweenness centrality – intuitively, pheromone-based exploration techniques are hindered by topologies featuring large hubs and few alternative routes, since hubs get pheromone-marked and become temporarily inaccessible for further passing.

Thirdly, we extend the previous simulation by endowing agents with self-healing capabilities. In this case, results show that agents can successfully complete the collective task even in the presence of high-failure rates (which was not the case without self-healing), while inducing limited local overheads.

4 Conclusions and Future Work

This demonstration shows an agent-based simulator for modelling distributed tasks. Agents are modelled to carry internal states, to explore their environments (either continuous surfaces or complex networks), to perform local data-management tasks, and to communicate with each other when they meet.

The main contribution of this simulator is to help design and evaluate different decentralised data-management solutions, applicable to various distributed environments, with different characteristics (e.g. diverse tasks, resource constraints, performance requirements, or agent failure rates).

The simulator collects metrics that enable statistic analysis, which are critical for profiling new agent designs. So far, this allowed us to determine the best agent exploration strategy for performing a distributed task in different types of terrains and network topologies, with different agent failure rates.

Future work will model and simulate new strategies for recovering from node failures and corrupt data collection. Our objective is to provide a theoretical and experimental base for developing real applications for different distributed environments – e.g. data collection and replication in clouds, clusters and the Internet of Things. The source code and results obtained are available at http://www.alife.unal.edu.co/%7Eaerodriguezp/networksim/.

Notes

1.
“Replication-based Self-healing of Mobile Agents Exploring Complex Networks” – submitted to PAAMS 2017.
2.
http://www.alife.unal.edu.co/%7Eaerodriguezp/networksim/.

References

Tanenbaum, A., Steen, M.V.: Distributed Systems: Principles and Paradigms. Prentice-Hall, Upper Saddle River (2006)
MATH Google Scholar
Lalanda, P., Mccann, J.A., Diaconescu, A.: Autonomic Computing: Principles, Design and Implementation. Springer, Heidelberg (2013)
Book Google Scholar
Kephart, J.O., Chess, D.M., Jeffrey, O., David, M.: The vision of autonomic computing. Computer 36, 41–50 (2003)
Article Google Scholar
Hu, J., Gao, J.I., Liao, B.S., Chen, J.J., Jun, W.: Multi-agent system based autonomic computing environment. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics, vol. 1, pp. 105–110 (2004)
Google Scholar
Bisadi, M., Sharifi, M.: A biologically-inspired preventive mechanism for self-healing of distributed software components. In: The Second International Conference on Advanced Engineering Computing and Applications in Sciences, ADVCOMP 2008, pp. 152–157 (2008)
Google Scholar
Rodriguez, A., Gomez, J., Diaconescu, A.: Foraging-inspired self-organisation for terrain exploration with failure-prone agents. In: 2015 IEEE 9th International Conference on Self-Adaptive and Self-Organizing Systems, pp. 121–130. IEEE, October 2015
Google Scholar
Rodriguez, A., Gomez, J., Diaconescu, A.: Exploring complex networks with failure-prone agents. In: Verlag, S. (ed.) 15th Mexican International Conference on Artificial Intelligence, MICAI 2016. LNCS (2016)
Google Scholar
Gomez, J.: Unalcol agents (2016). https://github.com/jgomezpe/unalcol/tree/master/agents/src/unalcol/agents
White, S.: Analysis and visualization of network data using JUNG. J. Stat. Softw. VV, 1–35 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Fundación Universitaria Konrad Lorenz, Bogotá, Colombia
Arles Rodríguez
ALIFE Research Group, Universidad Nacional de Colombia, Bogotá, Colombia
Arles Rodríguez & Jonatan Gómez
Telecom ParisTech, IMT, Paris-Saclay University, Paris, France
Ada Diaconescu

Authors

Arles Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Jonatan Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Ada Diaconescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arles Rodríguez .

Editor information

Editors and Affiliations

Centre National de la Rech. Scientifique , Grenoble, France
Yves Demazeau
Malmö University , Malmö, Sweden
Paul Davidsson
Universidad Politécnica de Madrid , Madrid, Spain
Javier Bajo
Polytechnic Institute of Porto , Porto, Portugal
Zita Vale

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodríguez, A., Gómez, J., Diaconescu, A. (2017). Towards a Self-healing Multi-agent Platform for Distributed Data Management. In: Demazeau, Y., Davidsson, P., Bajo, J., Vale, Z. (eds) Advances in Practical Applications of Cyber-Physical Multi-Agent Systems: The PAAMS Collection. PAAMS 2017. Lecture Notes in Computer Science(), vol 10349. Springer, Cham. https://doi.org/10.1007/978-3-319-59930-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-59930-4_36
Published: 03 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59929-8
Online ISBN: 978-3-319-59930-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards a Self-healing Multi-agent Platform for Distributed Data Management

Abstract

Similar content being viewed by others

Exploring Complex Networks with Failure-Prone Agents

A decentralised self-healing approach for network topology maintenance

DeCoF: A Decentralized Coordination Framework for Various Multi-Agent Systems

Keywords

1 Introduction

2 Platform Purpose and Implementation

3 Demonstration

4 Conclusions and Future Work

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Towards a Self-healing Multi-agent Platform for Distributed Data Management

Abstract

Similar content being viewed by others

Exploring Complex Networks with Failure-Prone Agents

A decentralised self-healing approach for network topology maintenance

DeCoF: A Decentralized Coordination Framework for Various Multi-Agent Systems

Keywords

1 Introduction

2 Platform Purpose and Implementation

3 Demonstration

4 Conclusions and Future Work

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation