Keywords

1 Feedback Loops in Computing Systems

1.1 Adaptive and Reconfigurable Computing Systems

Computing systems are becoming more and more dynamically reconfigurable or adaptive. The motivations for this are that, on the one hand, these systems should dynamically react to changes on their environment or in their execution platform, in order to improve performance and/or energy efficiency. On the other hand, complex systems are too large to continue being administrated manually and must be automated, in order to avoid error-prone or slow decisions and manipulations.

This trend can be observed at very diverse levels of services and application software, middle-ware and virtual machines, operating systems, and hardware architectures. The automation of such dynamical adaptation manages various aspects such as computing and communication resources, quality of service, fault tolerance. It can concern small embedded systems like sensors networks, up to large-scale systems such as data-centers and the Cloud. For example, data-centers infrastructures have administration loops managing their computing resources, typically with energy-aware objectives in mind, and possibly involving management of the cooling system. At a lower level, FPGA-based architectures (Field-Programmable Gate Arrays) are hardware circuits that can be configured at run-time with the logics they should implement: they can be reconfigured dynamically and partially (i.e. on part of the reconfigurable surface) in response to environment or application events; such reconfiguration decisions are taken based on monitoring the system’s and its environment’s features.

1.2 Autonomic Computing

Autonomic computing [50] proposes a general feedback loop structure to take adaptive and reconfigurable computing into account. In this closed loop, systems are instrumented with monitors of sensors, and with reconfiguration actions or actuators; these two have to be related by a control and decision component, which implements the dynamic adaptation policy or strategy. The loop can be defined as shown in Fig. 1 with the MAPE-K approach, with sub-components for Analysis of Monitored data, Planning response actions, Execution of these actions, all of them based on a Knowledge representation of the system under administration. Autonomic computing has now gained a large audience [54].

Fig. 1.
figure 1

The MAPE-K autonomic manager for administration loop.

Such autonomic loops can be designed and developed in many different ways, relying on techniques from e.g. Artificial Intelligence. However, an important issues remains in that it is generally difficult to master the behavior of the automated closed-looped systems with precision.

1.3 Need for Control

We are therefore particularly interested in an approach where this MAPE-K loop is considered as a case of a control loop. Then, techniques stemming from control theory can be used to design efficient, safe, and predictable controllers. Control theory provides designers with a framework of methods and techniques to build automated systems with well-mastered behavior. A control loop involves sensors and actuators that are connected to the process or “plant” i.e., the system to be controlled. A model of the dynamical behavior of the process is built, and a specification is given for the control objective, and on these bases the control is derived. Although there are approaches to the formal derivation of software from specifications, such as the B method [3], this methodology is not usual in Computer Science, where often a solution is designed directly, and only then it is analyzed and verified formally, and the distinction between the process and its controller is not made systematically.

We observe that this approach of using Control Theory methods and techniques for computing systems, although well identified [44, 77], is still only emerging. Works are scattered in very separate and dispersed efforts, in different areas of the field of reconfigurable or adaptive computing, be it at software or architecture level, in communities not always communicating with each other. Some surveys are beginning to be offered [35], some offering a classification [66], or concentrating on Real-Time computing systems [7, 22] but a synthesis of all these efforts is still lacking. The community begins to structure itself, notably around workshops focused specifically on the topic e.g., Feedback Computing [21].

There exist related works in the community on Software Engineering for self-adaptive systems presenting approaches to integrate Control Theory, typically with continuous models [34]: here we focus on the MAPE-K loop as a target for Control techniques. Also, some works exist considering different approaches to discrete control, in the sense of considering events and states (see Sect. 3.3, [56]) related to planning techniques from Artificial Intelligence e.g., [17, 28, 70] or reactive synthesis in Formal Methods or Game Theory e.g., [13, 29]. A wide overview is given in another chapter of this book [56]. In this paper we make the choice to focus in more detail on approaches from the Control Theory community, based on continuous models and on the supervisory control of Discrete Event Systems [18, 69, 73].

Our point in this paper is to contribute to relating on the one hand, the general notion of control loop in Autonomic Computing, and on the other hand, design models and techniques stemming specifically from Control Theory. In doing so, we perform choices and do not pretend to exhaustivity. In particular, we do not include in our scope techniques from different approaches and communities like, e.g., Artificial Intelligence or Formal Methods, even though they may have features not covered here.

Indeed several layers and different flavors of control must be combined to fully handle the complexity of real systems. This already lead a long time ago to hierarchical control, e.g. [5], where low level (i.e. close to the hardware) fast control layers are further coordinated by slower higher level management layers. Besides the different bandwidth between layers, it also happens that different control technologies must be combined. For example, it is often observed in robot control architectures that low level control laws, designed in the realm of continuous control, are managed by a control actions scheduler based on discrete events systems, such as in [2]. Both the continuous and discrete time control designs described in the next sections are example of building blocks to be further used in hierarchical control for Autonomic Computing.

1.4 Outline

This paper proposes interpretations of the MAPE-K loop from Autonomic Computing in terms of models and techniques from Control Theory. We first consider this from the point of view of continuous control in Sect. 2, starting with the classical PID up to more elaborate nonlinear and event-based control.

Discrete control is then considered in Sect. 3, where discrete event systems are modeled by transition systems e.g., Petri nets or labelled automata.

Then some illustrative case studies are presented in Sect. 4, showing how these concepts can be put into practice in the framework of real-world computing systems.

Finally, Sect. 5 provides discussions and perspectives.

2 Continuous Control for Autonomic Computing

2.1 Brief Basics of Continuous Control

The basic paradigm in control is feedback, or closed-loop control, as depicted in Fig. 2. The underlying idea is that control signals are continuously computed from the error signal, i.e. the difference between the actual behavior of the plant (measured by its outputs) and its desired behavior (specified by a reference signal). The loop is closed through the controlled plant, and control actions are computed indefinitely at a rate fast enough with respect to the process dynamics [8, 9].

Fig. 2.
figure 2

The control loop for continuous control.

The behavior of the controlled process, whatever its nature (e.g. electro-mechanical or chemical for physical devices, but also digital for schedulers, databases and other numerical components) is never perfectly known. It is known only through a dynamic model which aims to capture the main characteristics of the process. It is most often given as of a set of difference equations where the continuous time is sampled (usually periodically) at instants \(t_k, t_{k+1}, ...\):

$$\begin{aligned} \begin{array}{ccl} \mathbf{x}_{k+1} &{} = &{} f(\mathbf{x}_k,\mathbf{u}_k),\qquad x(t=0) = x_0 \\ \mathbf{y}_k &{} = &{} g(\mathbf{x}_k) \end{array} \end{aligned}$$
(1)

Here x is the state vector of the process able to describe its behavior over time, \(x_0\) is its value at the initialization of the system. y is a vector of outputs measured on the process via sensors and f(.) and g(.) are respectively the state evolution function and the output function of the plant model. u is the vector of control signals sent to the actuators, it is repetitively computed by a controller, e.g. by a state feedback as:

$$\begin{aligned} \mathbf{u}_k = K(\mathbf{\hat{x}}_k, \mathbf{r}_k) \end{aligned}$$
(2)

where \(\mathbf{\hat{x}}\) is an estimate of the state vector, r is a set of reference signals to be tracked and K(.) is a function (which can be static or dynamic) of the state (or outputs) and of the reference signals.

For example, considering the control model of a web server, the system state may gather the number of admitted requests and an estimate of the CPU load, the observed output vector may gather a measure of the CPU activity filtered over some time window and the observed response time, and the control input can be an admission controller. The control objective can be the result of the minimization of a Quality of Service criterion gathering the rejection rate and the response time of the server under constraint of CPU saturation avoidance. Note that most often the variables of interest are not directly available from simple measurements, and that the system state must be reconstructed and smoothed using signal processing and filtering from the raw sensors outputs.

Remember that (1) is a model of the plant, i.e. an abstraction of reality where the structure is simplified and the value of the parameters is uncertain. Therefore, an open-loop control, using the inverse of the model to compute control inputs from the desired outputs, inevitably drifts away and fails after some time. Compared with open-loop control (which would rely on an utopian perfect knowledge of the plant), the closed-loop structure brings up several distinctive and attractive properties:  

Stability.:

Briefly speaking, a system is said stable if its output and state remain bounded provided that its inputs and disturbances remain bounded (formal definitions, such as BIBO stability, Lyapunov stability and others, can be easily found thorough the control literature). It is a crucial property of a control system, ensuring that the trajectory of the controlled plant is kept close enough to the specified behavior to satisfy the end-user’s requirements. A distinctive property of feedback controllers is that, if they are well designed and tuned, they are able to improve the stability of marginally stable plants, and even to stabilize naturally unstable systems, e.g., modern aircrafts. Anyway the stability of a control system must be carefully assessed, as poor design or tuning can drive the system to unstability;

Robustness.:

The control actions are repeated at a rate which is fast compared with the system dynamics, hence the increments of tracking errors due to imperfect modeling are small, and they are corrected at every sample. Indeed, besides the main directions of the dynamics, the model does not need to capture the details of the process dynamics. Most often, poorly known high frequencies components are discarded, and exact values of the parameters are not needed. Therefore, feedback is able to provide precise and effective control for systems made of uncertain components.

Tracking and performance shaping.:

As the controller amplifies and adjusts the tracking error before feeding the actuators, it is able to shape the performance of the controlled plant. For example, the closed-loop system can be tuned for a response faster than the open-loop behavior, and disturbances can be rejected in specified frequency bands. Thanks to robustness, tracking objectives can be successfully performed over a large range of input trajectories without need for on-line re-tuning;

 

2.2 The MAPE-K Loop as a Continuous Control Loop

The MAPE-K description corresponding to the model (1) with control (2) is as shown in Fig. 3. The Monitor phase of the MAPE-K loop corresponds to the sampling of the system, typically, it defines the frequency at which the data must be acquired. It is usually related to sampling theory (Shannon theorem). The Analyse phase is in our description represented by the hat over x. This notation is commonly used to denote that the exact value of x is not known, either because of noise that requires a filtering action or because it can not be measured directly from the system. It is for instance the case of energy consumption that must be estimated using other variables like CPU or disk usage. The analyse phase would in that case include the signal estimation/reconstruction or more simply the filtering. The Plan phase corresponds to the computation of the control law using the Knowledge of the system held in the model. Finally, the Execute phase consist in changing the value of the actuator at a frequency which is most often identical to the sampling frequency of the monitor phase.

Fig. 3.
figure 3

The continuous control loop as a MAPEK diagram

2.3 Continuous Feedback Computing

Let us now examine how feedback can be applied to computing system administration, resource management or network management. Although feedback control was first developed to control physical devices like steam engines, factory lines or various kind of vehicles [9], the closed-loop concept has been adapted for the control of digital devices such database servers, e.g. [44, 60], or real-time schedulers as in [62]. However, compared with usual control applications, the nature of the controlled process deeply differs in the case of control for computing devices, and the usual components of the control loops must be adapted for the particular technology.

 

Models.:

Usual models for the control of continuous process are given as a set of differential equations, which are further discretized as difference equations for numerical control purpose. In contrast, at a detailed level, digital objects can be often described by large FSMs which are not well suited for closed-loop control. However, thanks to robustness, a feedback-control compliant model only needs to capture the essential of the plant dynamics. For example, computing devices can be approached by “fluid modeling” where, e.g., flow equations describe flows of input requests and levels in tanks represent the state of message queues [44]. Using such abstractions leads to quite simple difference models, where for example queues behave as integrators and provide the basic dynamics in the model. Besides metrics related to performance, as computation loads or control bandwidth, some relevant properties of a software are related to reliability. For example, the failure probabilities of software components may lead to a reliability formal model given as a Discrete Time Markov Chain, further used for the design of a predictive feedback control law [33]. Some control related modeling issues are detailed in another chapter of this book [56].

Sensors.:

Sensors provide raw measurements from the controlled process, they are provided by hardware or software probes in the operating system or in the application code. Basic measurements record the system activity such as the CPU use, deadlines misses or response times of individual components. Raw measurements are gathered and pre-processed to provide compound records and quality of service clues to the controller. Note that the CPU use is always of interest, as CPU overloading is a permanent constraint. However, it is only meaningful when estimated over windows of time: the size of these measurement windows and associated damping filters provide a second main source of dynamics in the plant model.

Actuators.:

In software control, the actuators are provided by function calls issued by the operating system, or by other software components with enough authority. Scheduling parameters such as clock rates, deadline assignments and threads priorities can be used to manage multitasking activities. Admission control can be used to handle unpredictable input request flows. The frequency scaling capability of modern chips is also an effective tool to optimize the energy consumption due to high speed calculations in mobile devices.

Controllers.:

Potentially all the control algorithms found in the existing control toolbox can be adapted and used for the control of digital devices [66]. Most often, thanks to the usually simple dynamic models considered for software components, simple and cheap controllers can be effective as detailed in Sect. 2.4. Anyway some more complex control algorithms have been worked out to better handle the complexity of QoS control for computing systems (Sect. 2.5).

 

2.4 Basic Control

PID (Proportional, Integral, Derivative) control algorithms are widely used, as they are able to control many single input/single output (SISO) systems through a very simple modeling and tuning effort. In that case the control input u is written as a function of the error signal -that is the difference between a desired output and its measure on the real system- \(e(t) = r(t) - y(t)\) as:

$$\begin{aligned} u(t) = K e(t) + \frac{K}{T_i} \int _0^t e(\tau ) d\tau + K T_d \frac{d}{dt}e(t) \end{aligned}$$
(3)

Here the proportional term \(K\cdot e(t)\) controls the bandwidth and rising time of the control loop, the derivative term \(K T_d \frac{d}{dt}e(t) \) damps the oscillations and overshoots and the integral term \(\frac{K}{T_i} \int _0^t e(\tau ) d\tau \) nullifies the static errors, that is the value of the error e(t) when t goes to the infinite.

Indeed, the ideal continuous PID must be discretized for implementation purpose. For example, using a backward difference method (with \(T_s\) the sampling period) yields

$$\begin{aligned} u_k = u_{k-1} + K[e_k - e_{k-1}] + \frac{K \cdot T_s}{T_i} e_k + \frac{K \cdot T_d}{T_s} [e_k - 2 e_{k-1} + e_{k-2}] \end{aligned}$$
(4)

The MAPEK diagram of the PID controller then follows as in Fig. 4. Tuning the PID is made using the knowledge of the system. This knowledge can take the form of a state space or transfer function model but can also reside in an empirical tuning of the parameter of the PID controller.

Fig. 4.
figure 4

The PID continuous control loop as a MAPEK diagram

2.5 Advanced Modeling and Control

Besides PID controllers which have been used in pioneering works (e.g. [59]), other simple/linear control schemes were also implemented in the context of computer science. [44] is an emblematic example of a black-box modeling in order to derive a controller using classical linear control theory aiming to maximize the efficiency of Lotus Notes. Other linear approaches were also implemented for periods rescaling [19] or to control elasticity of distributed storage in cloud environment [65]. All linear systems share the same superposition property and can be analyzed and assessed using well established mathematical tools. Unfortunately, their use to real systems that are most of the time non-linear is possible only on a limited range of the state space for which linearization is meaningful.

Indeed, many classical non-linearities of computer systems (limited range for variables, limited bandwidth, etc.) can hardly be taken into account only with linear tools [34]. In particular, optimizing a computing system operation may need to intentionally load the actuators (i.e. the CPUs) until saturation, rather than avoiding the actuators limits as in the linear control approach. Therefore, in addition to the linear control theory, the control toolbox now contains a rich set of advanced control algorithms, which have been developed over years to supplement the shortage of simple linear control in specific cases.

For example, early models of servers were simple linear fluid models and the corresponding linear controller as [44]. However, handling trashing in servers needs to model the overhead due to parallel operations: the resources needed by a server to serve requests is not proportional to the number of requests. Non-linear models and control are needed in that case detailed in Sect. 4.2 [62].

In another case study [67], handling the static input and output non-linearities of a software reservation system is made by the combination of linear and non-linear blocks in a Hammerstein-Wiener block structure. Then, the corresponding QoS controller is designed in the predictive control framework. Note that even when these more elaborated non-linear models are considered, the resulting controllers remain simple with a very small run-time overhead and a moderate programming effort.

Other non-linear, switched, hybrid, hierarchical and cascaded schemes were implemented on various computing systems (see for instance [76, 78] and the references therein).

Indeed it appears that, considering the quite simple dynamics of the controlled system, the time devoted to modeling is by far larger than the time devoted to control design. Models well suited for control purpose must be simple enough to allow for the synthesis of a control algorithm, while being able to capture the essence of the system behavior. Typically, modeling for the control of autonomic computing systems needs to consider trade-offs between the control objectives and the cost needed to reach them through the execution of parallel activities running on shared hardware and software resources [55].

For example, it has been shown in [61] that a game theoretic framework allows for the decoupling between the resource assignment and the quality setting, finally leading to a resource manager with a linear time complexity in the number of applications running in parallel. Once the resources and concurrent activities has been suitably modeled, the control decisions can be implemented as a hierarchy of layered controllers ranging from the control of elementary components up-to QoS optimization, e.g., as in [57].

Finally, the execution cost of the controller itself must be considered. Traditionally control systems are time triggered, allowing for a quite simple stability analysis in the framework of periodic sampling. However, the choice of the triggering period is an open issue, as reactivity needs fast sampling leading to a high computing cost. However, fast reactions are not always necessary, for example in case of slowly varying workloads. To avoid wasting the computing resource, the event-based control paradigm has been proposed (e.g. [75]). With this approach, the controller is activated only when some significant event acting on the system triggers an event detector.

3 Discrete Control for Autonomic Computing

3.1 Brief Basics of Supervisory Control of Discrete Event Systems

Amongst the different approaches to discrete control (see Sects. 1.3 and 3.3, [56]) this Section focuses on and technically details the supervisory control of Discrete Event Systems [18, 69]. Figure 5 shows a control loop for the case of discrete control with a memorized state, a transition function, and a supervisory controller obtained by discrete controller synthesis.

Fig. 5.
figure 5

The control loop for discrete control.

The characterization of Discrete Event Systems [18] is given by the nature of the state space of the considered system: when it can be described by a set of discrete values, like integers, or vectors of Booleans, and state changes are observed only at discrete points in time, then such transitions between states are associated with events. In this section we very briefly and informally summarize some essential notions.

The modeling of sequences of such events, like the sequence of values, at time k, of \(y_k\) and \(u_k\) in Fig. 5, can be approached by formal languages. They enable to specify structure in the sequences of events, representing possible behaviors of a system, or desired behaviors in the interaction with a system. Operations on languages can help composing them, or making computations on the represented behaviors.

Automata are formal devices that are capable of representing languages, in the graphical and intuitive form of state and transition graphs, also called transition systems, or Finite State Machines (FSM). As shown in Fig. 5, they involve two main features. On the one hand, there is a memorizing of a state, the current value \(x_k\) resulting from the previous transition at \(k-1\) (with an initial value xi at time 0). On the other hand, is a transition function T computing the next value of the state \(x'_k\) in function of the current observed value \(y_k\) (we do not yet distinguish controllable variables c) and current state \(x_k\). It can also compute values \(u_k\) that can be used to send commands to the controlled system. Figure 5 also shows possible pre-processing between raw data and \(y_k\), or post-processing between \(u_k\) and concrete actions, e.g. corresponding to implementation-specific filters.

$$\begin{aligned} \begin{array}{ccc} (\mathbf{x'}_k, \mathbf{u}_k) &{} = &{} T( \mathbf{y}_k, \mathbf{x}_k ) \\ \mathbf{x}_k &{} = &{} \mathbf{x'}_{k-1} \\ \mathbf{x}_0 &{} = &{} \mathbf{xi} \\ \end{array} \end{aligned}$$
(5)

Such automata can be associated with properties pertaining to their general behavior e.g., determinism, reactivity, or more specific like reachability of a state, and manipulated with operations e.g., parallel or hierarchical composition. The transitions are labelled by the events which are recognized when they are taken. Such automata-based models are precisely the basic semantic formalism underlying reactive systems and languages [11, 43]. Related models in terms of transitions systems also include Petri nets, where the transitions are connecting places which are associated with tokens: the marking of the set of places by present tokens defines the state of the Petri net. The transitions can be labelled by events and their sequences define languages. The relationship with automata is given by the graph of reachable markings of the net. Analysis of such transition systems is made possible by algorithmic techniques exploring the reachable states graph in order to check typically for safety properties (e.g., using model checking techniques and tools), or concerning diagnosis of the occurrence of unobservable events from the observations on the behavior of a system.

Control of transition systems has then been defined as the problem of restricting the uncontrolled behaviors of a system. The latter can be described by an automaton G, control restricts its behavior so that it remains in a subset of the language of G, defined by a control objective, describing the desired behavior. The notion of supervisory control of discrete event systems has been introduced [69], which defines a supervisor that can inhibit some transitions of G, called controllable (controllability of a system can be partial), in such a way that, whatever the sequences of uncontrollable events, these controllable transitions can be taken in order for the desired behavior to be enforced, and the undesirable behavior avoided. Typical desired behaviors, or control objectives, in the supervisory control approach are safety properties: deadlock avoidance, or invariance of a subset of the state space (considered good). A specially interesting property of the supervisor is it should be restricting only the behaviors violating the desired objectives, or in other terms it should be maximally permissive. It can be noted that this important property is possible in this approach, whereas it is not considered or defined in approached dealing with more expressive goals such as liveness. As shown in Fig. 5, the resulting synthesized controller C gives values to controllable variables c, which are part of the parameters of the transition function T:

$$\begin{aligned} \begin{array}{ccc} (\mathbf{x'}_k, \mathbf{u}_k) &{} = &{} T( \mathbf{y}_k, \mathbf{c}_k, \mathbf{x}_k ) \\ \mathbf{c}_k &{} = &{} C( \mathbf{y}_k, \mathbf{x}_k ) \\ \mathbf{x}_k &{} = &{} \mathbf{x'}_{k-1} \\ \mathbf{x}_0 &{} = &{} \mathbf{xi} \\ \end{array} \end{aligned}$$
(6)

Tools available to users who wish to apply such automated controller synthesis techniques, adopting the approach of supervisory control for Discrete event Systems, include: TCT, based on languages models and theory [74]; Supremica, related to the manufacturing languages of the IEC standard [4]; SMACS, which achieves Controller Synthesis for Symbolic Transition Systems with partial information [47]; Sigali, which is integrated in the synchronous reactive programming environments [63], and in the compiler of the BZR language [23]. A new tool, ReaX, extends the expressivity to Discrete Controller Synthesis for infinite state systems, and treats logic-numeric properties [12].

Fig. 6.
figure 6

The discrete control loop as a MAPEK diagram. (a): simple automaton-based manager; (b): exhibiting observability and controllability.

3.2 The MAPE-K Loop as a Discrete Supervisory Control Loop

In the general framework for autonomic computing shown in Fig. 1, discrete control can be integrated as shown in Fig. 6(a): it instantiates the general autonomic loop with knowledge on possible behaviors represented as a formal state machine, and planning and execution as the automaton transition function, with outputs triggering the actuator. As evoked in previous Section, the models used in supervisory control of DES enable to address properties on the order of events or the reachability of states, with tool-equipped techniques for verification (e.g. model checking) and especially Discrete Controller Synthesis (DCS). The latter is automated and constructive, hence we use it for the logic control of autonomic systems, encapsulated in a design process for users experts of systems, not of formalisms.

In the autonomic framework, in order to support coordination of several autonomic managers by an upper layer, some additional observability can be obtained by having additional outputs, as shown by dashed arrows in Fig. 6(b) for a FSM autonomic manager, exhibiting (some) of the knowledge and sensor information (raw, or analyzed); this can feature state information on the autonomic manager itself or of managed elements below. At that level, additional inputs can provide for controllability by an external coordinator.

3.3 Discrete Feedback Computing

As was noted by other authors, while classical control theory has been readily applied to computing systems [44], applying Discrete Control Theory to computing systems is more recent. One of the earliest works deals with controlling workflow scheduling [71]. Some focus on the use of Petri nets [45, 46, 58] or finite state automata [68].

In the area of fault-tolerant systems, some works [15, 53] present notions similar to control synthesis, not explicitly considering uncontrollables. In that sense, it resembles more open-loop control, considering the internals of a computing system, to which we prefer closed-loop control, taking into account events from its environment.

A whole line of work focuses on the computing systems problem of deadlock avoidance in shared-memory multi-threaded programs. These work rely on the literature in Discrete Control Theory concerning deadlock avoidance, which was originally motivated by instances of the problem in manufacturing systems. [73] is a programming language-level approach, that and relies upon Petri net formal models, where control logic is synthesized, in the form of additional control places in the Petri nets, in order to inhibit behaviors leading to interlocking. The Gadara project elaborates on these topics [72]. They apply Discrete Control internally to the compilation, only for deadlock avoidance, in a way independent of the application. Other works also target deadlock avoidance in computing systems with multi-thread code [10, 30].

Another kind of software problem is attacked by [37, 38]: they consider run-time exceptions raised by programs and not handled by the code. Supervisory control is used to modify programs in such a way that the un-handled exceptions will be inhibited. In terms of autonomic computing, this corresponds to a form of self-healing of the system. Applications of the Ramadge and Wonham framework to computing systems can also be found concerning component-based systems reconfiguration control, enforcing structural as well as behavioral properties [51], and more generally adaptive systems, as one of the decision techniques in a multi-tier architecture [28].

In an approach related to reactive systems and synchronous programming, discrete controller synthesis, as defined and implemented in the tool Sigali, is integrated in a programming language compiler. [27] describes “how” compilation works, with modular DCS computations, performing invariance control. This language treats expression of objectives as a first class programming language feature. The programming language, called BZR, is used in works concerning component-based software [16]. It is extended to handle logico-numeric properties, by replacing, in the modular architecture of the compiler, Sigali with the new tool ReaX [12]. Other previous work related to the synchronous languages involved some separate and partial aspects of the problem, testing the idea in the framework of a more modest specialized language [25], and particular methods and manual application of the techniques [36], and elaborating on the articulation between reactive programs and DCS [6, 24, 64], as well as application to fault-tolerance [31, 39].

As noted above, some other related work can be found in computer science and Formal Methods, in the notions of program synthesis. It consists in translating a property on inputs and outputs of a system, expressed in temporal logics, into a lower-level model, typically in terms of transition systems. For example, it is proposed as form of liberated programming [41] in a UML-related framework, with the synthesis of StateChart from Live Sequence Charts [42, 52]. Other approaches concern angelic non-determinism [14], where a non-deterministic operator is at the basis of refinement-based programming. These program synthesis approaches do not seem to have been aware of Discrete Control Theory, or reciprocally: however there seems to be a relationship between them, as well as with game theory, but it is out of the scope of this paper.

Also, interface synthesis [20] is related to Discrete Controller Synthesis. It consists in the generation of interfacing wrappers for components, to adapt them for the composition into given component assemblies w.r.t. the communication protocols between them.

4 Case Studies

4.1 Video Decoding and DVFS

Energy availability is one of the main limiting factors for mobile platforms powered by batteries. Dynamic Voltage and Frequency Scaling (DVFS) is a very effective way to decrease the energy consumption of a chip, by reducing both the clock frequency and the supply voltage of the CPUs when high computation speeds are not necessary. Many chips used in embedded or mobile systems are now fitted with such capabilities, and the computing speed is adapted on-the-fly thanks to some estimated computing performance requirement.

Using feedback loops is an effective way to robustly adapt the chip computing speed even if the incoming computation load is hard to predict, as in the example described in [32]. The problem is to minimize the energy consumption of a H.264 video decoder by using the lowest possible computing speed able to decode the frames with respect to the output display rate (25 frames/sec).

The computing speed is adapted thanks to the control architecture depicted in (Fig. 7a). At low level, a computing speed controller -integrated in silicon- drives the DVFS hardware with frequency and Vdd set points (see [32] for details). It is driven from estimates of the needed computation load (i.e. the number of CPU cycles) and decoding deadline for the incoming frame. These estimates are themselves computed by an outer frame control loop.

Measurements of decoding execution times (Fig. 7b) show that, between noisy and almost flat segments, the decoding times exhibit sharp and unpredictable isolated peaks when switching between plans. Therefore, rather than trying to compute any prediction, the estimation of the next frame computation load \(\hat{\varOmega }_{i+1}\) can be simply taken equal to the last one, i.e. \(\varOmega _i\), recorded by the instrumentation inserted in the H.264 decoder. Even better, it can be provided by smoothed past values through a low pass filter:

$$\begin{aligned} \hat{\varOmega }_{i+1} = \alpha \hat{\varOmega }_{i-1} + (1 - \alpha )\varOmega _i \end{aligned}$$
(7)

where \(0 \le \alpha < 1\) controls the filter damping.

Fig. 7.
figure 7

(a) Control architecture – (b) Frames computation times

This rough estimate is used by the frame controller to compute the ideal deadline for the incoming frame using a simple proportional controller:

$$\begin{aligned} \varDelta _{r_{i+1}} = \tau _{i+1} + \beta \, \delta _i, \quad 0 < \beta \le 1 \end{aligned}$$
(8)

where \(\delta _i\) is the observed overshoot for the last decoded frame. Indeed this controller aims at driving the end-of-computation of frame \(i+1\) towards \(\tau _{i+1}\), which is the theoretic timing of the periodic video rate.

Despite the apparently overly simple computing load model (7), the very simple and low cost frame controller (8) is able to regulate the decoding timing overshoot to very small values (Fig. 8a), thus keeping an fluid display rate. Computing a penalty function based on the viewing quality (Fig. 8b) shows that using these elementary feedback loops allow both for a better viewing quality and up-to 24% energy saving compared with the uncontrolled decoding case.

Fig. 8.
figure 8

(a) Deadline with control – (b) Video viewing penalty function

Hence, this example show that even very simple control loops with negligible computation overheads, if carefully designed, may have a very positive impact on an embedded system adaptiveness and robustness against a poorly modelled environment.

4.2 Server Provisioning

A classical technique used to prevent servers from thrashing when the workload increases consists in limiting client concurrency on servers using admission control. Admission control has a direct impact on server performance, availability, and quality of service (QoS). Modeling of servers and feedback control of their QoS has been one of the first application domain targeted by feedback scheduling, first using linear models [44, 60]. However, it appears that to handle trashing, the model must accurately capture the dynamics and the nonlinear behavior of server systems, while being simple enough to be deployed on existing systems.

Based on numerous experiments and identification, a nonlinear continuous-time control theory based on fluid approximations has been designed in [62]. It is both simple to use and able to capture the overhead due to the parallel processing of requests responsible for trashing (Fig. 9a).

Fig. 9.
figure 9

(a) Fluid model – (b) Rejection rate

The request queue is considered as a fluid tank receiving client request flows M and N and emitting a served requested flow with latency L for the served requests. The system state is defined by the number of concurrently admitted requests \(N_e\), the server throughput \(T_0\) and the rejection rate \(\alpha \). The modelling effort leads to the following model for the input/output latency:

$$\begin{aligned} L(N_e,M,t) = a(M,t)N_e^2+b(M,t)N_e+c(M,t) \end{aligned}$$
(9)

where the latency L is a non-linear function of the number of \(N_e\), of the server mix load M and of continuous time t. The rejection rate is given by

$$\begin{aligned} \dot{\alpha }(t)=-\frac{1}{\varDelta }\left( \alpha (t) - \frac{N_e(t)}{AC(t)} \cdot \left( 1-\frac{T_o(t)}{T_i(t)}\right) \right) \end{aligned}$$
(10)

with \(\varDelta \) the sampling rate, \(T_i\) the input flow and AC the admission control value.

Then two control laws could be derived for different control objectives:

  • \(AC=\frac{N_e}{1+ \gamma _{_L} (L-L_{\max }) }\) maximizes the availability of the server, i.e. minimizes the rejection rate;

  • \(AC=\frac{\alpha N_e}{\alpha - \gamma _{_\alpha }(\alpha -\alpha _{\max }) }\) maximizes the performance, i.e. minimizes the latency for the admitted requests.

These simple control laws are cost effective and easy to tune, as they both use a single tuning parameter \(\gamma _{_L}\) or \(\gamma _{_\alpha }\).

Fig. 10.
figure 10

(a) Latency control – (b) Rejection rate control

Despite their simplicity, using these simple controllers allows for an efficient on-line management of an Apache web server. For example, Fig. 10 show that the rejection rate can be kept close to a desired goal, or that the latency of the served requested can be regulated around the requested value.

Even more important, these controllers –with negligible computing cost– turns the nature of the system into a safer behavior. Indeed, trashing is automatically avoided even in the case of huge overloads with no need for an operator to manually re-tune the AC parameters.

4.3 Coordination of Multiple Autonomic Administration Loops

Real autonomic systems require multiple management loops, each complex to design, and possibly of different kinds (quantitative, synchronization, involving learning, ...). However their uncoordinated co-existence leads to inconsistency or redundancy of action. Therefore we apply discrete control for the interactions of managers [40]. We validate this method on a multiple-loop multi-tier system.

Controllable managers as seen in Fig. 6(b) can be assembled in composites, where the coordination is performed in a hierarchical framework, using the possibilities offered by each of them, through control interfaces, in order to enforce a coordination policy or strategy. We base our approach on the hierarchical structure in Fig. 11: the top-level AM coordinates lower-level ones.

Fig. 11.
figure 11

Autonomic coordination for multiple administration loops.

We consider the case study of the coordination of two administration loops for the management of a replicated servers system based on the load balancing scheme. Self-sizing addresses resource optimization, and dynamically adapts the degree of replication depending on the CPU load of the machines hosting the active servers. Self-repair addresses server recovery upon fail-stop failure of a machine hosting a single or replicated server. Co-existence problems occur when failures trigger incoherent decisions by self-sizing: The failure of the load balancer can cause an under-load of the replicated servers since the latter do not receive requests until the load balancer is repaired. The failure of a replicated server can cause an overload of the remaining servers because they receive more requests due to the load balancing. A strategy to achieve an efficient resource optimization could be to (1) avoid removing a replicated server when the load balancer fails, and (2) avoid adding a replicated server when one fails.

Fig. 12.
figure 12

Managers models: self-sizing (left) and self-repair (right).

Figure 12 shows the automata modelling the behaviors of the managers, in the BZR language [26], abstracted to the relevant activity information. In the right of the Figure, the self-sizing manager is composed of three sub-automata. In brief, the two external ones model the control of the adding (resp. removal) of servers, with disU (resp. disD), which, when true, prevent transitions where output add (resp. rem) triggers operations. The center one models the behaviors in reaction to load variation, for which all detail is available elsewhere [40]. In the left of the Figure are self-repair managers for the load balancer (LB) and the replicated servers (S). The right automaton concerns servers, and is initially in OkS. When failS is true, it emits repair order rS and goes to the RepS state, where repS is true. It returns back to OkS after repair termination (Sr is true). Repair of the LB is similar. The automata in Fig. 12 are composed in order to have the global behavior model, and a contract specifies the coordination policy. The policies (1) and (2) in Sect. 4.3 are enforced by making invariant, by control upon the controllable variables Cu, Cd, the subset of states where the predicate holds: ((repLB => disD) and (repS => disU))

This controller was validated experimentally on a multi-tier system based on Apache servers for load balancing (with a self-repair manager) and replicated Tomcat servers (with both self-sizing and self-repair managers), with injection of workloads and failures to which the system responded properly, without overreacting, according to the objective.

5 Conclusions and Perspectives

We propose a discussion of the problem of controlling autonomic computing systems, which is gaining importance due to the fact that computing systems are becoming more and more dynamically reconfigurable or adaptive, to be flexible w.r.t. their environment and to automate their administration. We observe that one approach consists of using Control Theory methods and techniques for computing systems: although it is well identified [44], it is still only emerging, and works are scattered in separate areas and communities of Computer Science.

We aim at conveying to Computer Scientists the interest and advantages of adopting a Control Theory perspective for the efficient and predictable design of autonomic systems. Compared with open-loop, closed-loop control provides adaptability and robustness, allowing for the design of fault-tolerant systems against varying and uncertain operating conditions. However, there still is a deep need for research in the problems of mapping from high-level objectives in terms of Quality of Service (QoS) or Service Level Objectives (SLO) and abstract models towards lower-levels effective actions on the managed systems. In the area of Computing Systems research, there is an important topic in the design of architectures so that they are made controllable [48], as self-aware software (adaptive, reflective, configuring, repairing...) needs explicitly built-in sensing and acting capabilities [49]. On the other side, the kind of models usual in Control Theory must be totally reworked to be useful for computing systems, and this is a research challenge for the Control community. Also, an important issue is that complex systems will involve multiple control loops, and their well-mastered composition and coordination to avoid interferences is a difficult and hardly tackled question [1].

One lesson learned in this work is that the open problems are concerning both Control Theory and Computer science, and that solutions going beyond simple cases require active cooperation between both fields. As noted by other authors e.g., [33], this bi-disciplinary field is only beginning, and the problems involved require competences in Control, as well as expertise in the computing systems. There is a costly investment in time to build common vocabulary and understanding, for which the MAPE-K loop offers a common ground, and this investment opens the way for better controlled autonomic computing systems, safer and optimized.