Keywords

1 Introduction

A typical monolithic enterprise systems we build today are difficult to scale, difficult to understand and difficult to maintain. Written in a monolithic way, these systems tend to have strong coupling between the components in the service and between services. A system with the services tangled and interdependent is harder to write, understand, test, evolve, upgrade and operate independently. Strong coupling can also lead to cascading failures: one failing service can take down the entire system, instead of allowing you to deal with the failure in isolation. Popular application servers (e.g., WebLogic, WebSphere, JBoss or Tomcat) encourage this monolithic model.

Microservices-based architecture is free of these problems [2, 16, 18, 24, 28]. It advocates creating a system from a collection of small, isolated services, each of which owns their data, and is independently isolated, scalable and resilient to failure. Services integrate with other services in order to form a cohesive system thats far more flexible than a typical monolithic system. One of the key principles in employing a microservices-based architecture is the decomposition of the system into discrete isolated subsystems communicating over well defined asynchronous protocols and decoupled in time (allowing concurrency) and space (allowing distribution and mobility – the ability to move services around).

Some developers and researches believe that the concept of microserices is a specific pattern of implementation of service-oriented architecture (SOA). However the microservice pattern has the following unique specifics: microservices use lightweight HTTP mechanisms for communication, they are independently deployable by fully automated machinery, and there is only a bare minimum of centralized management [24]. Enterprise service bus (ESB) is a typical software model used for designing and implementing communication between mutually interacting software applications in SOA. ESB provides all of the routing and data transformation required to get the parts of an application talking to each other. In the microservices-based architecture there is no central unit like ESB which does the routing. The accidental complexity is shifted from inside of an monolithic application into the infrastructure. It is possible because now we have many more ways to manage that complexity: programmable infrastructure, infrastructure automation, and the movement to the cloud [28].

Today we have a much more refined foundation for isolation of services, using virtualization, Linux Containers (LXC), Docker, and Unikernels [15, 19]. This has made it possible to treat isolation as a necessity for resilience, scalability, continuous delivery and operations efficiency. It has also paved the way for the rising interest in microservices-based architectures, allowing you to slice up the monolith and develop, deploy, run, scale and manage the services independently of each other. The value of microservices and containers lies in how they enable smaller, faster, more frequent change [5, 14]. While cloud computing changed how we manage “machines,” it didnt change the basic things we managed. Containers, on the other hand, promise a world that transcends our attachment to traditional servers applications and application components. One might claim that represent the fruition of the object-oriented, component-based vision for application architecture.

So how do you build a smart system from a data center filled with dumb servers? This is where tools like Google Kubernetes [4] and open source Apache Mesos [13] data center operating system come in. Also of note is Dockers platform, using its Machine, Swarm and Compose tools [26]. The role of orchestration and scheduling within these container platforms is to match applications to resources. Google developed Kubernetes for managing large numbers of containers. Instead of assigning each container to a host machine, Kubernetes groups containers into pods. For instance, a multi-tier application, with a database in one container and the application logic in another container, can be grouped into a single pod. The administrator only needs to move a single pod from one compute resource to another, rather than worrying about dozens of individual containers. Apache Mesos is a cluster manager that can help the administrator schedule workloads on a cluster of servers. Mesos excels at handling very large workloads, such as an implementation of the Spark or Hadoop data processing platforms. Docker Swarm is a clustering and scheduling tool that automatically optimizes a distributed applications infrastructure based on the applications lifecycle stage, container usage and performance needs. All these container orchestration systems are monolithic applications running as daemons on dedicated nodes of the cloud. They orchestrate containers in a centralized fashion.

Usually decentralized orchestration systems offer performance improvements. For example decentralized orchestration of composite web services yields increased throughput, better scalability, and lower response time [6]. In this paper a decentralized system for load balancing of containerized microservices is proposed. In Sect. 2 the internals of a virtualization container are analyzed. In addition to cloud application it can run an additional process implementing the mobile agent intelligence. In Sect. 3 the swarm-like algorithm of container migration in the cloud is introduced. The number of containers on each host plays a role analogous to a pheromone in colonies of insects or simple transceivers mounted on autonomous robots. In Sect. 4 some preliminary experimental results for a simple cloud consisting of 18 hosts are presented. We finish with a summary and brief remarks in Sect. 5.

2 Container Internals

The startup time for a container is around a second. Public cloud virtual machines take from tens of seconds to several minutes, because they boot a full operating system every time. Thus recently the cloud industry is moving beyond self-contained, isolated, and monolithic virtual machine images in favor of container-type virtualization [17, 25]. Containers introduce autonomy for applications by packaging apps with the libraries and other binaries on which they depend. This avoids conflicts between apps that otherwise rely on key components of the underlying host operating system. Containers do not contain a operating system kernel, which makes them faster and more agile than virtual machines. Container-type virtualization is an ability to run multiple isolated sets of processes, each set for each application, under a single kernel instance. Having such an isolation opens the possibility to save the complete state of (in other words, to checkpoint) a container and later to restart it. This feature allows one to checkpoint the state of a running container and restart it later on the same or a different host, in a way transparent for running applications and network connections [11, 17].

In this paper container-type virtualization is used to build a swarm of tasks in a cloud. Each container in addition to the application and their libraries contains an separate process representing the mobile agent [1, 12, 27]. It deals with sensing the neighboring containers and initiating live migration of its container to another host. A typical modern server can run only about 10 virtual machines or about 100 containers. Therefore the density of container based mobile agents in a cloud can be 10 times higher. Typical migration time of a virtual machine is about 10 s—container can be migrated in time of order of 1 s [29]. Thus container migration is about 10 times faster. Containers can call system functions of the operating system kernel running on the server. Therefore in principle they can initiate container migration without help of a separate daemon process running on the host server.

The years from 2002 to 2010 represented a time of experimentation, when two projects in particular moved the needle on virtualization containers in Linux. VServer project patched the Linux kernel in order to split things up into virtual servers, an early version of what today we would call containers. The second project was OpenVZ, which transformed the Linux kernel, so that you could run containers in production. Despite its success, OpenVZ never managed to get the containerization technology merged into the stock Linux kernel and always required a custom patch to make it possible. At later time, control groups and namespaces [22] were introduced, making LXC containers a functionality available within the stock Linux kernel. It thus became possible to use something that looked like a container without patching the kernel. At the time, Salomon Hykes was leading dotCloud, an infrastructure platform as a service (PaaS) company that was committed to applying standards in the deployment of distributed architecture for applications. They spent three years running a cloud platform production using LXC, so they had a lot of operational experience. They learned that this technology was not practical, so they wrote a tool that was more stable and allowed to deliver container-based application deployment for large-scale hybrid cloud environments. In this way a popular Docker technology based on a libcontainer format was born.

Docker containers cannot be live migrated between hosts—they can only be snapshotted and restored on the same or other host. The generally accepted method for managing Docker container data is to have stateless containers running in the production environment that store no data on their own and are purely transactional. Stateless containers store processed data on the outside, beyond the realm of their container space, preferably to a dedicated storage service that is backed by a reliable and available persistence backend. Another class of container instances are these that host storage services, like upon pattern is to use data containers. The runtime engines of these stateful services get linked at runtime with the data containers. In practical terms, this would mean having a database engine that would run on a container, but using a “data container” that is mounted as a volume to store the state. Therefore to run a cloud hosting environment, it is important to have a distributed storage solution, like Gluster and Ceph, to provide shared mount points. This is useful if the container instances move around the cloud based on availability.

Parallels®Virtuozzo is another widely deployed container-based virtualization software for Linux and Windows operating systems. As opposed to Docker Virtuozzo allows for live migration of containers. The results presented in this paper were obtained using an open source version of this software called OpenVZ [17]. OpenVZ is available for Linux operating system only and runs on a custom kernel. There have been several studies on various optimizations of container migration algorithms [17]. Two best known examples are lazy migration and iterative migration. Lazy migration is the migration of memory after actual migration of container, i.e., memory pages are transferred from the source server to the destination on demand. In the case of an iterative migration iterative migration of memory happens before actual migration of container. In our experiments with stripped down OpenVZ containers with a size of 50 MB in a test system consisting of two nodes connected by a 100 Megabit Ethernet network we measured the migration time seen by the host \(T = 6.61\) s and the migration time seen by the container \(\tau = 2.25\) s. The later is three times smaller than the former due to optimizations described above.

We have altered the OpenVZ kernel by adding a system function allowing the container to ask the host to migrate it to another host. By calling this function a container is placed in a queue in the kernel - a dedicated daemon reads this queue and migrates all the containers waiting in it. Thus containers can leave the host only in a sequence. Our studies indicate that a parallel migration is possible, but the performance gain is negligible—migration speed increases only by 8 %. In addition as shown in our previous paper dealing with two hosts only sequential migration helps to stabilize the swarm algorithm [23].

Processes running inside an OpenVZ container have their own disk with partitions and file systems. In reality this is a virtual disk and its image is stored as a file on the physical disk of the host. This solution makes the migration of the container’s data to another host is very easy—only a single file needs to be copied between servers. The network of the container is isolated in a way that allows the container to have they own IP address on the network. This is not the IP address of host but it can be reached from the other containers and hosts. Each container maintains its own state: network connections, file descriptors, memory usage etc. Containers share only the kernel with the host operating system. Thanks to state isolation from other containers a container can be migrated to another system and resumed.

When we launch our container for the first time it does not know on what host it was started. However it can use an ICMP echo/reply mechanism to detect the IP address of the host. Each ICMP packet has a TTL (Time-to-Live) value. When this packet is routed trough router this value is decreased. When it reaches 0 the packet is destroyed and an error ICMP packet is send back. This ICMP error packet will have last router IP address. Thus to detect the IP address of the host our container can send an ICMP echo packet with value 1 of TTL to some arbitrary external IP address. The host system acts as a router for container’s network. When this special packet is sent by the container it will never reach the destination but the host system will send back an ICMP error packet with its IP address.

Host system keeps all the containers filesystems mounted on its local file system. Each container’s file system is visible as a folder located in /var/lib/vz/root/CID where CID is an unique container identification number. The location can be exported trough an network file system like NFS. Our container will mount it locally. To do this it needs to know the export path /var/lib/vz/root and the IP address of its host system. By counting the number of entries in this folder it can detect other containers running on the same hosts and count their number N. By calling a custom system function written by us it can also check for the number Q of containers queued for migration in the kernel. A container can also log in via ssh to another host and ask for these parameters there. Each container knows how many hosts we have H and knows their IP addresses. It also knows how many other containers are there C in the cloud. These numbers can be updated dynamically at runtime by probing other containers and hosts using ICMP echo/reply protocol either by the container itself or by the host.

3 Swarm Algorithm

There many examples in biology how complex global behaviors can arise from simple interactions between large numbers of relatively unintelligent agents. Examples of self-organized processes of natural aggregation are nest construction, foraging, brood sorting, hunting, navigation, and emigration. All involve only local interactions between individuals and between individuals and their environment. For example the ants rely only on physical contact and pheromone communication, but simple individual ant behaviors result in group behaviors that are thought to be optimal for the entire colony. Emerging technologies are making it possible to cheaply manufacture small robots with sensors, actuators and computation. Swarm approaches to robotics, involving large numbers of simple robots rather than a small number of sophisticated robots, has many advantages with respect to robustness and efficiency. Such systems can typically absorb many types of failures and unplanned behavior at the individual agent level, without sacrificing task completion [3, 79, 20, 21]. These properties make swarm intelligence an attractive solution also for other problem domains. In this paper we use this approach for task scheduling in a complex distributed system—the cloud.

Let us now propose a swarm-like decentralized algorithm for container migration inspired by pheromone robots [20, 21]. The proposed approach threats the containers as mobile agents and is also capable of automatic self-repair; the system can quickly recover from most patterns of agent death and can receive an influx of new agents at any location without blocking problems. Each host is described by a pheromone p which can be either repulsive \(0< p < 1\) or attractive \(p < 0\). The complete algorithm executed by a dedicated process running inside each container reads as follows:

figure a

Thus the pheromone p can be viewed as a migration probability of a container. The simplest choice for p is the fraction of the number of containers on a host:

$$\begin{aligned} p = \frac{N - Q - n}{N - Q} \end{aligned}$$
(1)

above the equilibrium value where the containers are equally distributed between the hosts:

$$\begin{aligned} n = \frac{C}{H} \end{aligned}$$
(2)

In this case the tasks are migrated between the nodes until the number of tasks on each node is the same and equal to n. Using an analogy with physics the nodes can be imagined to be gas containers, and the tasks running on them—gas molecules. The network links between the nodes are tubes connecting the containers between each other. The pressure of the gas equilibrates until it is the same in each container. Similar analogy is used in a self-repairing formation of mobile agents [8]. Note that the subtraction of Q in Eq. (1) is necessary in order to avoid oscillations of the containers [23].

In realistic cloud environments, tasks often differ regarding CPU and I/O load. Hence, optimization towards an equal distribution of tasks across available hosts does not seem to be optimal in each case. Instead of using Eq. (2) the desired number of containers of a given type n could be computed on each server separately using Dominant Resource Fairness (DRF) algorithm [10]. For example consider a server with 9 CPU cores and 18 GB of RAM. Container of type A needs 1 CPU core and 4 GB of RAM, and container of type B—3 CPU cores and 1 GB od RAM. DRF gives \(n_A = 3\), and \(n_B = 2\). The pheromone value p from Eq. (1) needs to be calculated separately for each container type. There are separate migration queues Q for containers of different types.

4 Experimental Results

The experiments were performed on \(H = 18\) servers equipped with Intel®i5-3570 Quad-Core CPU, 8 GB of RAM each connected by a dedicated 100 Megabit Ethernet network. All servers were running Debian GNU/Linux operating system with OpenVZ software installed. The Linux kernel was modified by adding new system functions as described in Sect. 3. At the initial time \(C = 18\cdotp 17 = 306\) identical containers are launched on the first 17 hosts, and the last one was empty:

$$\begin{aligned} N_i = 18, \quad i = 1, \ldots 17, \qquad N_{18} = 0 \end{aligned}$$
(3)

Each container has a size of about 100 MB and its migration to another host takes \(T = 16\) s. Each container is running a Python script implementing the algorithm from Sect. 3. It starts by scanning the network using nmap to find the number of containers C, and the number of hosts \(H = 18\) and their IP addresses. Than the mean number of containers \(n = 17\) is calculated and the script enters a loop in which it periodically checks the pheromone value p, and decides with probability p whether to migrate to another host. In addition on the host server a monitor program was started which periodically (period 5 s) checked the number of containers N in the filesystem and the number of containers queued for migration Q in the kernel queue (access to this data from a user process was possible by a custom system function added to the kernel).

Fig. 1.
figure 1

Number of containers on each host versus time.

In Fig. 1 we have the numbers of containers N on each host plotted versus time t. It is seen from inspection of this plot that the containers can arrive to the destination host in parallel thus network bandwith was apparently not a problem during this experiment. Notice that around \(t \simeq 4\, T\) the yellow line drops below the equilibrium value of \(N = 17\)—this happens because the migration process is inherently a probabilistic one. The migration probability is \(p = 1/18\) but sometimes more than \(p\, N = 1\) container can decide to jump to another host. Also the containers do not move independently but interact with each other. If more than one excess container asks the host for migration, then one of them must wait in a queue until the first one leaves the host. The system reaches equilibrium and migration stops around \(t \simeq 9\, T\):

$$\begin{aligned} N_i = 17, \quad i = 1, \ldots 18 \end{aligned}$$
(4)

Thus at average two containers arrive to the destination host during time T.

Fig. 2.
figure 2

Number of containers waiting for migration on each host versus time. (Color figure online)

Containers startup is not instantaneous. The experiment was arranged in such a way that the migration agent processes inside the containers were started in a loop—therefore some were started later than the other. In Fig. 2 we have the numbers of containers waiting for migration Q on each host plotted versus time t. Indeed we see a small delay in entering the queue. The first container leaves the migration queue around \(t \simeq 1.5\, T\) (red line) but is deleted from the file system with some delay only after \(t > 2\, T\) (c.f., Fig. 1).

Fig. 3.
figure 3

Histogram of times needed to reach equilibrium.

To investigate the container self-organization process even further in Fig. 3 we have a histogram of times needed to reach equilibrium \(t_0\) obtained from 300 runs of the algorithm. It is seen that the case discussed earlier is a fairly typical one.

5 Summary

In summary, the OpenVZ containerization software was used to implement a swarm of tasks executing in a cloud. Each task includes a mobile agent process which governs its migration to another nodes of the cloud. A variant of the Contained Gas Model known from self-repairing formations of autonomous robots is used. The tasks running on the nodes of the cloud self-organize to maintain a constant load among the servers. The system automatically adapts to creation and destruction of tasks as well as extension of the cloud by new servers. It can be easily adopted to react on server failures: a failing server can produce an artificial pheromone by creating entries in the /var/lib/vz/root directory of its filesystem. This will cause all the tasks running on it to migrate away from the pheromone. The performance of the swarm-like algorithm proposed to control the containers was experimentally tested on a simple “cloud” consisting of 18 nodes.