Introduction

Facing the international competition, delivering the right product at the right time for the right production cost is becoming more and more crucial for industrial manufacturers, and especially for the ones producing high-added value products (transportation industry, high-tech...), justifying the need to spend important effort on innovation on processes and manufacturing. Meanwhile, several factors make this harder and harder. First, a constantly changing environment, favored by the emergence of new needs, new technologies and new opportunities. Second, products, and especially high added-value manufactured goods, are becoming more integrated, multi-technologies and customized products, which increases the variety of products manufactured and the complexity, for an effective and efficient manufacturing process, of these products (Kuehnle 2007). For example, lost time due to changeover, setup, assembly errors due to combinatorial explosion of possibilities constitutes a real risk for these industrials. Last, uncertainties coming from outside the manufacturing system (volatility of customers’ need, dynamic energy pricing...) and inside the manufacturing system (curative maintenance, supply shortage, quality issue...) render the whole manufacturing problem hard to solve not only from a predictive point of view (projected planning or scheduling for the next period to be implemented) but also from a reactive point of view (dynamic scheduling defined in reaction to unpredicted events). As a consequence, nowadays, it is important to provide industrials with models and methods that are not only able to provide efficient overall production performance, but also adaptable and reactive facing a growing set of unpredicted events, including a clear estimation of the current and possible future states of their production systems, as well as a correct product traceability throughout the whole supply chain. Lean manufacturing offers powerful technological solutions when some requirements can be met (production organized in flow-line, possible sequencing of tasks using takt times...), but solutions are often “local”, “short-term performances” and require being refined to take into account solutions effects on global system. Thus, even in lean-organized enterprises (typically car manufacturers), reactivity and agility are seek as major challenges at mid and long terms to keep competitiveness. Industrialists now want control systems that provide satisfactory, adaptable and robust solutions rather than optimal solutions that require meeting several hard assumptions (Thomas et al. 2012), in a single word: agile.

There exist several scientific approaches that can be identified to position the different contributions of literature regarding the introduced industrial needs. Classical (historical) approaches consist in using a centralized scheduling (and/or planning) system (typically, an Enterprise Resource Planning ERP) loosely coupled with a control system such as Manufacturing Execution System (MES). The scheduling models are often based on a mathematical representation of the production and decision system from which an optimization or heuristic algorithm is designed and computed in a centralized way to propose a predictive schedule for the next period to be implemented. This approach leads to determine or to approximate the optimal sequence of tasks to be executed in the system in order to maximize one or several criterion(s) somehow related to productivity, customer satisfaction. The result of the calculation is then used by the MES for the Production Activity Control of the production system (Berry et al. 1991). This approach is considered as effective as long as the modeling of the production system is realistic but also deterministic. In such an approach, parameters and models are simplified in order to fasten up the calculations. For example, if stochastic changes of parameters are significant (e.g. duration of manual operations, breakdowns or failures), the execution of the schedule in the production system gives results that are generally far from optimal or even inapplicable (Cardin et al. 2013). The easiest implementation of rescheduling is to halt the system at the time when a disruption is detected all along the execution of the scheduler, which decides how to react to this change and eventually generates a new schedule. If the rescheduling phase is long or if disruptions happen frequently, the duration of the rescheduling phase may lead to a drastic reduction of the overall performance. As a consequence, this centralized approach, despite the fact that it has been widely used for several years in a number of industries, cannot be considered as sufficiently efficient nowadays since reactivity issues grow more and more important. Since few years, a research field, dealing with proactive scheduling, has emerged. The main idea is to propose alternatives to the lack of robustness of the centralized schedule and, as a consequence, to limit the “nervousness” of the scheduling/rescheduling iterations, see for example (Chaari et al. 2011). These technics typically use redundancy (temporal or resource-oriented), probabilistic methods, contingent methods (that is, they try to design several possible schedules that can be switched from one to another according to real time events), and last, objective functions that integrate robustness criteria evaluating the risk to not respect a candidate schedule given possible perturbations. A growing activity from operation research has emerged in the last few years in that field (Ghezail et al. 2010). Thus, it is remarkable that a predictive (and centralized) approach has little ambition to be robust while a proactive (and centralized) approach tries to solve the robustness problem but always faces computational time limits.

Reactive approaches, on the other hand, consider every event in real time and ways or tools to react at particular events occurring on the shop-floor and perturbing the production schedule. To do that, it is possible to envisage a centralized decision process or a more distributed one. Predictive centralized approaches generally lead to prohibitive response delays. Reactive approaches consider on the opposite events without anticipation, and lead to design real-time feed-back control of the production. Several approaches can be identified depending on the fact that the control architectures are centralized or distributed. Centralized, priority rules (e.g., heuristics-based) are defined and used on the fly, that is, whenever a decision must be taken. The choice of the rule to apply can also be decided dynamically (Shahzad and Mebarki 2012). Distributed, control decisions are distributed among a set of cooperative control entities, being typically agents or holons in the literature, with or without hierarchical relationships among them. Distributed control approaches have been studied by researchers massively in the 90’s, see for example (Prabhu and Duffie 1996), one of the historical reference in this field. These approaches are known to generate applicable solutions since decisions are taken locally according to the real state of the production system. Despite this, they are also known to have their performance rapidly decreasing with time compared to pure centralized ones (or predictive ones) if no perturbation occurs (that is, all data are known and certain from the beginning).

Due to the limitations of these two historical approaches facing the current industrial needs, researchers are more and more considering a third kind of approach by trying to propose integrated scheduling and control architectures that integrate local distributed-reactive mechanisms implemented into products/resources control holons/agents with global centralized scheduling mechanisms, being robust or not. Such architectures named hybrid scheduling and control architectures (in short, HCA for Hybrid Control Architectures in the remaining of this document) are intended to capitalize the advantages of reactive and predictive/proactive approaches, while limiting their drawbacks (Pach et al. 2014; Thomas et al. 2009) .

In such HCA, the fundamental decision, facing a perturbation, for control holons/agents is whether to still follow the predictive/proactive, but centralized, schedule or not, leading to the definition of two basic modes for the holons/agents: a centralized mode and a distributed mode. To react, if the decision is to not follow the centralized (predictive/proactive) schedule, then a switch down to a distributed mode occurs, where decisions are handled in real time, overriding scheduled ones, with the intention to switch back to a centralized mode as soon as possible. The main issue for researchers is then to provide accurate mechanisms to define the best switching dates (and/or the best switching decision-making levels) for control holons/agents so that they behave in a sense that the whole behavior of the HCA stays globally optimized despite disturbances. This centralized-distributed coupling issue is not easy to solve: for example, if a broken machine can be repaired quickly, then it may not be necessary for its control holon/agent to switch into the distributed mode if the pre-determined schedule still remains realizable because of some slack in the original schedule. Another issue is related to the possible nervousness of the architecture that may often switch from one mode to another (Barbosa et al. 2012).

In details, this global issue can be broken down into the three following scientific challenges:

  • First, it is necessary to provide tools that enable the estimation of future performances, including disturbance detection, diagnosis and prognostic mechanisms (i.e., evaluation of the impact of a disturbance on the global performances).

  • Second, based on these estimators and in case of envisaged reaction, it is necessary to design pertinent switching indicators leading to decide if and when to switch down in the distributed mode, then to design efficient synchronization mechanisms between scheduling and control with the real state of the system, leading typically to the design of a proper indicator to determine if and when it is pertinent to switch back to the centralized mode.

  • Third, efficient switching strategies based on these synchronization mechanisms must be designed. These strategies must lead to a fair use of reactive modes (sufficiently to absorb uncertainties, but used as less as possible to avoid decreasing the performance).

First section of this paper introduces a literature review based on the papers introducing HCA proposals in the last decade. Then, the paper is structured following these three underlined challenges in the context of HCA, each section introducing a conclusion about the contributions and limits depicted in the literature review and one or several leads for future researches in this field.

State of the art on hybrid control architectures

This section focuses on recent works which introduced more flexibility in the architecture in order to adapt easier to the cases where disruptions level are globally unknown or variant. Auto-adaptive architectures were basically studied under three concepts, either with multi-agent systems (MAS), holonic manufacturing systems (HMS) or product-driven systems (PDS). All these works show various characteristics, which were classified by (Pach et al. 2014) as:

  • static or dynamic structure-based, which represents the fact that, along the production, the structure of the control hierarchy might change in order to adapt to unexpected events. An example can be found in HMS, where holarchies are subject to dynamic reconfiguration when a resource is collapsing;

  • with heterogeneous or homogeneous control among the entities of the system. The homogeneity of control deals with the behavior of each entity of the control system. For example, if a disruption occurs on a machine of a large scale flexible manufacturing system, a choice could be to temporarily switch the behavior of the products and machines impacted by this failure to a more reactive mode. In a heterogeneous control system, it is possible to switch only these impacted entities, and to leave the rest of the system unharmed: various behaviors co-exist then in the system.

Table 1 lists the references that are under study in this paper. Almost 60 % of these are dealing with HMS, which is a paradigm fitting perfectly those kinds of organizations. About 67 % of the references developed a dynamic architecture, but only 75 % of them tested a heterogeneous control inside these architectures.

Table 1 Flexible architectures structure sorted according to (Pach et al. 2014) classification
Fig. 1
figure 1

General behavior of state-of-the-art-adaptive control architectures

Table 2 Disturbance detection and evaluation solutions and limitations of literature
Table 3 Switching mechanisms solutions and limitations of literature

Figure 1 introduces a graphical synthesis of the behavior of the architectures listed in the state of the art expressed in the form of an algorithm. This algorithm is meant to illustrate using iterative mechanisms the way these control architectures face disturbance. It is important to note the introduced references may consider or not some of these mechanisms, depending for example on design choices, but they do not use other mechanisms instead. Also, for clarity purposes, the hypothesis that is made here is that the control is centralized, but a rather similar algorithm could be designed for embedding into autonomous entities part of a distributed control. From an initial control strategy (indexed 1 in the figure), the production is both launched and monitored. When a disruption is detected, the first task is to evaluate if the disturbance has a significant impact on the performance of the system. Then, an evaluation has to be done whether to change the control strategy in order to cope with the disturbance. If so, the new control strategy, together with its integration into the control architecture, needs to be designed and evaluated. Then, this new control strategy (denoted i) is applied on the system. From this behavioral algorithm, several key issues emerge that have a significant impact on the performance of the adaptation, denoted Challenges in the figure that each of the state-of-the-art architecture must address when facing disturbances. The next paragraphs describe how the studied architectures answer these challenges, while the next sections intend to provide a detailed explanation and leads for future researches on these specific points.

Table 2 has been constructed according to the introduced algorithm. It intends to put in perspective the contributions of each work to the disturbance detection and evaluation functionalities. This table is organized as a review of the proposition of each reference and a comment about the limitation that is induced by the proposition. Most of the papers focus their evaluation on the immediate situation, not taking into account the future of the disturbance. This leads to induce a bias, as it is not possible to precisely know which manufacturing entities are actually impacted by the disturbance, hence efficiently defining a heterogeneous control. In this field, several leads are given to design efficient detection mechanisms, but no generic answer was given on the evaluation purpose, apart from the use of ant colonies. The major issue with this tool is that the ants have to explore all the possibilities of future schedules of the resources. This limits their use to a relatively short term, which might be too short considering the delay needed to reschedule.

Second, Table 3 presents in the same way the contributions about the switching mechanisms inside the architectures. Three classes of papers appear. First, those where the rescheduling is meant to be so short that the optimal switch is found immediately are subject to performance diminution when the problem complexity rises. Second, some papers do not give any framework for defining the KPIs or objective functions that will evaluate the actual need to switch. Finally, many papers are evaluating the need to switch down, but proposed HCA only switch back when the shop floor gets empty, i.e. when the whole WIP is ended. This solution is neither optimal nor applicable to systems where production orders constantly arrive “on the fly”. On a general point of view, some leads are given on the evaluation of the pertinence to switch, but no effective solutions are suggested for the switching back functionality.

Finally, Table 4 sums up the contributions on alternative control and architecture design. In this field, some architectures could be evaluated as ideal, but the coupling with the previous issues is not performed, so many choices still have to be envisaged and justified. Among others, a simplification hypothesis was often chosen when reducing the possible number of different configurations. This hypothesis is very limitary, as it does not provide the opportunity to optimize the coupling between predictive–reactive/centralized-distributed possibilities, although a fine tuning of these characteristics is obviously necessary when dealing with large-scale industrial systems.

Table 4 Alternative control strategies solutions and limitations of literature

As a conclusion, this overview shows that no reference paper has still been published with a HCA and dynamics that would fit all the expected challenges of predictive–reactive control architecture. Nevertheless, many ideas have been suggested and shall be pursed. For that purpose, and aligned with the three introduced challenges, the next sections put in perspective what the authors think as the major issues to be solved to reach real industrial applications of HCA.

First challenge: estimation of future performances

Challenge description

One fundamental obstacle in the implementation of predictive/proactive–reactive coupling in HCA is related to the difficulty for researchers to design models enabling them to estimate future performances of the production system. This difficulty is directly correlated with the real current state observation issues (e.g., locate products and their state), and extrapolation of possible evolution scenarios in the near future. This feature is though mandatory in order to face the related topics in the implementation of HCA, which can be decomposed in the three following issues.

First, it is necessary to detect at which moment the control should switch down from a centralized to a reactive and local mode. This detection might only be based on prediction models, split into two classes in the literature: analytic models, rapidly limited by the size of the considered systems because of their algorithmic complexity, and discrete-event simulation models, able to handle large systems but extremely time-consuming. This last characteristic often limits their use in the context of real-time decision making.

Second, a diagnoser should be designed, able to evaluate the impact of the difference towards expected state of the system on its global behavior (Zaytoon and Lafortune 2013). For example, if it is obviously necessary to detect the delay in execution of a task from the predictive schedule, some of these delays might not be critical for the behavior of the system, either because they are very short, or thanks to the available free margin.

Finally, it is necessary to foresee the state of the system at the switch back point, i.e. at the date the predictive mode will be in charge again. This foreseen state is meant to be used for the following challenges, namely in calculating the optimal switch back time and state and determining the initial state for the future control strategy calculation.

Insights and limits of the literature facing this challenge

The works previously introduced showed several interesting leads in order to face the issue of disturbance detection. Centralized approaches, such as KPI calculations or expert systems, could be considered alone or together with distributed approaches, where any entity is able to monitor its state and trigger detection events communicated to the rest of the entities of the architecture. The ant colonies approach is also promising, as it enables both the calculation of the nominal state of the system and the calculation of its possible evolution in the near future.

Fig. 2
figure 2

Concept and implementation of a discrete-event observer

Many modelling formalisms are classically used to build diagnosers, including automata (Sampath et al. 1995) and their timed and probabilistic extensions, Petri nets (Basile et al. 2009), (Cabasino et al. 2010), (Dotoli et al. 2011), statecharts and hierarchical state machines (Idghamishi and Hashtrudi Zad 2004), (Paoli and Lafortune 2008). In HCA literature, no proposition was made to face this specific issue, as stated previously. No more tools are defined for future state forecasting.

Suggestions to address this challenge in the near future

To solve the issue of disturbance detection, one proposal would be to implement an observer able to detect abnormal behavior (difference between theoretical expected behavior and observed behavior—state reconstructor abilities). Using, among others, the concept of simulation-based observer developed in (Cardin and Castagna 2009) and (Cardin and Castagna 2011), it should be possible to integrate in the HCA a discrete-event observer using discrete event simulation modeling tools and software in order to benefit from their modeling power. The idea of this observer is to mimic the behavior of the system, be synchronized with the real system and put its entire state at the disposal of the decision support system. This state is considered as the most accurate and up-to-date image of the actual state of the system and might be used inside an automated control loop. Each synchronization might thus be considered as an indication of deviation of the actual behavior towards the expected one, represented by the state of the observer.

It is also possible to couple both disturbance impact evaluation and state forecasting in one single tool. A dynamic model of the system needs to be designed in order to forecast the behavior of the system. This model could be used in disturbed mode to evaluate the impact of the disturbance on the whole system or to foresee the state of the system at the switch back state. In this case, the previously mentioned observer might enable a simulation-based evaluation of the future behavior of the system, possibly implemented using online simulation concepts, which are efficient but generally hard to implement forecasting tools. These tools are usually dedicated to the dimensioning phase (offline), but their benefits would definitely be increased by making them actual systems control tools, included in the control loop, i.e. online (Fig. 2). However, the integration of these tools in-the-loop in the architecture requires tackling basic implementation problems, such as the methodology of choice of the simulation horizon, and its impact on the maximum scheduling calculation duration for example.

Second challenge: designing efficient synchronization mechanisms

Challenge description

To clearly fix the stakes related to this challenge, a case study inspired from a real industrial situation is considered. This case concerns a provider of the automotive industry who produces turbochargers. In this company, the shop-floor is organized in manufacturing cells. Each of them is dedicated to an automotive brand. The cell organization model is described in Fig. 3.

Fig. 3
figure 3

Cell organization model

The company’s ERP Master planning function proposes a weekly predictive (centralized) schedule for the cells. The launching of manufacturing orders is redefined, each day, in a centralized way (one center of decision and all resources are concerned). Obviously, failures or unexpected events in all systems including manufacturing systems are inevitable. In this case, traditionally reactive rescheduling decisions are asked from ERP system. A typical scenario is, for example, at the decoupling point when a disturbance occurs on the assembly line B during the manufacturing of a shop order, which is composed by several lots. The operators have to decide what to do with the remaining lots of the shop order. One of the various solutions could be to split those remaining lots between the two other lines A and C. Then, the operators will have to ask for an ERP reschedule (the decision is made on centralized way). In such an organization, it is very difficult to find an optimal solution in a very short time and a lot of working time is lost according to the time needed to report information, to estimate the current and future possible states, to generate new scenarios, to choose one of them and finally to re-launch the new reschedule. As a consequence, the whole system is often in a disturbed mode, which leads to low levels of key performance indicators.

Let’s first assume that products and resources are intelligent holons or agents. In this context, the decisional system has two functioning modes: Centralized (ex: using ERP) and distributed (via a product-driven system). To react in case of disturbing event the product has to decide autonomously if the decisional system must be in centralized or distributed mode. Then it has to decide among three choices:

  1. 1.

    Do nothing and wait for recovery,

  2. 2.

    Decide autonomously to do something (switch down to distributed mode for local re-scheduling decision), or,

  3. 3.

    Decide that the decisional system stays or switch back to centralized mode and ask for rescheduling at a higher decision level.

This example clearly illustrates the stakes of both switch down and switch back synchronization mechanisms with the insertion of the remaining lots in the ongoing production of other lines when a disturbance happens on one of them.

Insights and limits of the literature facing this challenge

As seen in the literature review part, the “switch down” mechanism is already addressed in the literature (event-driven or threshold-driven switch). But an issue appears in the sense that researchers do not really pay attention to the actual need to “switch down”, as illustrated in the previous example. In addition, the “switch back” mechanism that concerns the way the centralized mode is reused after and instead of the distributed mode is rarely addressed or even mentioned. All these switching decisions should be taken into account according to global performance objectives targeted by the production manager.

Suggestions to address this challenge in the near future

Figure 4 shows a possible implementation of a HCA for the case study. A simulation model can be built to estimate the states of shop floor and an initial schedule (predictive) can be done as traditionally by the ERP. In case of disturbances, because of the nature of the data, some fuzzy criteria (alpha and beta on the figure) may be useful. They would lead the in-progress product to choose one of the three presented above choices.

Fig. 4
figure 4

Criteria based switching policies

To correctly address this challenge, two questions relevant to synchronization of the two modes (centralized and distributed) have first to be answered to:

  1. i)

    What are the most pertinent criteria to switch down or back?

  2. ii)

    How to reinsert these concerned in-progress products in the remaining of the material flow (switch down case), or how to synchronize the new re-optimized schedule with the state of the manufacturing system after this optimized schedule is obtained (switch back case)?

The first question deals with performance indicators system leading to be able to estimate when it is pertinent to switch down or back according to the circumstances (i.e., the physical context: flexible manufacturing system, shop floor, constraints, management rules, etc.). It is obvious that objectives and performance indicators must be determined according to the industrial context and it seems difficult to design generic indicators. Those indicators probably have to be designed according to the physical context or at least according to an industrial system class, and built on a learning system. One of possible research directions is to use a multi-criteria optimization based on some methods as Choquet integrals leading to establish switching limits according to measured drifts and situations. This approach is close to the one proposed by (Chan et al. 2000). In this paper, the authors proposed an integrated approach for the automatic design of flexible manufacturing systems using simulation and multi-criteria decision-making techniques. The selection of the most suitable design, based on a multi-criteria decision-making technique (the Analytic Hierarchy Process AHP), is employed to analyze the output from the flexible manufacturing system simulation models. Intelligent tools such as expert systems, fuzzy systems and neural networks, were developed to support the design process of the flexible manufacturing system.

(Muhl et al. 2003) proposed for the automotive industry a way to optimize, in a centralized way, the schedule of the car assembly line according to a unique performance indicator and the determination of the pertinent parameters which were periodically recalculated to assure the best synchronization between the real shop-floor state and the new schedule. Another way to design this indicators system could be found using learning mechanisms as neural networks, fuzzy approaches or Choquet integrals usage (Thomas and Thomas 2011).

(Herrera et al. 2011) first merged the two centralized and distributed approaches applied to a similar industrial case (Thomas et al. 2009). He proposed a multi-level parametric model to solve the re-scheduling problem. But the performance indicator leading to the switch decision has been chosen empirically, and the distributed decisions were limited to: i) nothing to do (choice number 1) or ii) with a simple splitting decision reinsert the remaining parts in the existing predictive-centralized schedule.

Another research work focusing on the synchronization problem has been done by (El Haouzi et al. 2009). The authors proposed an original architecture to control manufacturing flows on two assembly lines. In case of disturbances, products can arrive early or late at the synchronization point between the main assembly line and its feeders. The architecture was composed of an ERP and a distributed decision system. The on-line information was provided by Auto-ID technologies.

The second question is dealing with the issue of proposing an optimization model able to insert remaining parts in the rest of the schedule in case of switching down and autonomous decision (choice 2), or to re-schedule the whole sets of manufacturing orders and remaining parts according to the real state of the physical system in case of switching back in predictive mode (choice 3). The optimization model has to be supported by a quantitative framework. The review of the research papers shows a gap of methods or frameworks to deal with the efficient synchronization mechanism between the two modes. However, a first attempt (concerning the Fig. 4 case) have been investigated using a fuzzy logic approach (Li et al. 2015). Authors propose a novel approach to deal with all parts and lots concerned by a breakdown. First, a dynamic switching function, taking into account a forecasted duration of the breakdown and a changeable threshold with respect to the processed product setup time, is generated to decide if it is needed to switch into centralized situation for a global re-scheduling. Then, a local decision-making method based on fuzzy logic is built to manage the remaining products, in case of the decision of staying in current distributed situation was selected at previous stage. At last, a classical dynamic re-scheduling approach is proposed to re-arrange the remaining products taking into account the setup time.

Third challenge: designing efficient switching strategies integrated into a hybrid control architecture

Challenge description

(Dilts et al. 1991) have shown that a strong link exists between the architecture of an intelligent manufacturing system and the efficiency of the scheduling and the control, being reactive or predictive. They have identified key design decisions that are affected by each type of control structure. The question of measure of relevance of key decisions in “intelligent manufacturing systems” is thus crucial, which is a major issue addressed in this challenge. This challenge has to be addressed when the two previously introduced have been solved. It consists in integrating into a HCA the introduced observer and the switching mechanisms (being “down” or “back”). This integration will require defining the exact role of each entity of the HCA, being holons or agents for example and mechanisms ensuring the global consistency of the whole HCA must be designed. The HCA must also interoperate with existing data bases and manufacturing systems as well as interoperate with human supervisors.

Insights and limits of the literature facing this challenge

Focusing on this challenge, one can face an important activity at the European level. Typically, several European projects addressed the design of distributed/hybrid control architectures into the so-called “smart factories”. PABADIS (Lüder et al. 2004) and PABADIS PROMISE (Ferrarini et al. 2006) are among the firsts EU projects in that direction. More recently, let’s mention GRACEFootnote 1, SMARTPRODUCTFootnote 2 and ARUMFootnote 3 projects. The GRACE project (Matthias et al. 2013) is in line with the current need to build modular, intelligent and distributed manufacturing control systems and studied more precisely the impact of manufacturing operation on quality. The distributed control architecture is interfaced with a Manufacturing Execution System (MES). The SMARTPRODUCT project (Miche et al. 2012) focused the work on the embedding of “proactive knowledge” into smart products. “Proactive” Smart products “talk”, “guide”, and “assist” designers, workers and consumers dealing with them. Some proactive knowledge will be co-constructed with the product, while other parts are gathered during the product lifecycle using embedded sensing and communication. Neither GRACE nor SMARTPRODUCT addressed the optimization of the control architecture, being hybrid or not. More recently, an interesting initiative, the ARUM project (Leitao et al. 2013; Stellingwerff and Pazienza 2014), aimed at designing a holonic multi-agent system combined with a service architecture designed to improve performance and scalability beyond the state of the art. The proposed solution integrates multiple layers of sensors, legacy systems and agent-based tools for beneficial services like learning, quality, and risk and cost management, including ecological footprints aspects. The ARUM solution runs in two modes: predictive/centralized and real-time/distributed simulation, but is clearly air-craft industry oriented, which may lead to application-oriented developments. The objective is preferably to define solutions as application-independent as possible.

Fig. 5
figure 5

ORCA: Optimized reactive control architecture

In these projects, the main idea is to take advantages of two basic structuration mechanisms: hierarchical (vertical relationships, toward prediction and centralization of information and decisions) and heterarchical (horizontal relationships, towards reaction, distribution of information and decisions) mechanisms. By doing this, it is expected to avoid their respective drawback (typically: lack of reactivity for hierarchies and myopia for heterarchies). Thus, usually, the hierarchical part of the architecture is responsible for the predictive, centralized and global optimization, while the heterarchical part allows reactivity and local optimization. Famous flagship HCA are ADACOR (Leitao et al. 2005), PROSA (Van Brussel et al. 1998) or more recently D-MAS (Verstraete et al. 2008). Such HCA are composed of cooperative decisional control entities, typically modeled as holons or agents.

Many other projects are dedicated to interoperability of systems, in an open manufacturing context (e.g. LinkedDesign—Linked Knowledge in Manufacturing, Engineering and Design for Next-Generation Production (Kiritsis et al. 2013)). This transverse problematic is crucial for industrial implementation and dealt with on a global point of view in parallel of the previous ones. The definition of ontologies is in this context a widespread approach to facilitate data formalization and exchange to ensure an efficient level of interoperability with existing industrial information systems.

Suggestions to address this challenge in the near future

Dynamic HCA (cf. Table 1) are very promising since they provide (self-*) mechanisms needed to improve the agility of the control system, such as self-adaptation (Barbosa et al. 2012). In such architectures, switching mechanisms to/from predictive/reactive modes adapt dynamically the structure of the control architecture to the production uncertainties in ensuring the performance. Of course, more generally, there may be different intermediary levels and mode between a fully predictive and a fully reactive mode. Some first ideas have been proposed by (Pach et al. 2014) in the ORCA architecture (Fig. 5).

ORCA is a dynamic architecture, it has two functioning modes: normal mode and disrupted mode. An entity (composed of a local optimizer and a physical part (e.g. robot, conveying subsystem...)) in normal mode is controlled hierarchically. The global optimizer optimizes the system behavior and transmits its orders to each local optimizer. Each local optimizer on basis of these orders manages the behavior of its own entity. If a local optimizer detects a perturbation, it switches to disrupted mode. In disrupted mode, the local optimizer completely controls its entity’s behavior, and is responsible for the optimization, which is now local and reactive. Since, in ORCA, the functioning mode is defined locally in each local optimizer, the two modes (i.e., normal, disrupted) can exist simultaneously in the system.

ORCA is a first step, and researches and formalizations are again needed. For example, in ORCA, the production order set was assumed to be provided as a whole at the start of a new production, in a static manner, with no “on the fly” orders. In addition, the switch back was made only at the end of the production of the whole order set.

This challenge is complex to address and despite the growing number of HCA proposed in literature, the way prediction and reaction are coupled is neither optimized nor even clearly justified. This contributes clearly to a lack of applications of such contributions in real situations in industries despite the fact that they respond to a real industrial need. As an illustration, to the best of our knowledge, only P2000+ (Bussmann and Schild 2001) was applied in Daimler but it failed because of issues related to the proposed research topic (and others issues, such as global cost).

Special attention has to be paid to the design of effective switching (down and back) mechanisms using online simulation, intelligent products and optimization tools, leading to homogeneous or heterogeneous type of hybrid structure. A first step would probably be to address the homogeneous type where the entire control holon/Agent switch down/back at the same time (temporal switch), which will imply the use of a re-scheduling optimization model. More complex and heterogeneous types have to be studied. Defining ontologies will help to integrate ORCA with existing industrial information systems, especially manufacturing execution systems (MES) and enterprise resource planning (ERP) to make them interoperate.

Conclusion

Design of robust control architectures is an active scientific area, recently reinforced by the definition of highly flexible and scalable control paradigm. The implementation of HCA, able to deal with heavy disturbances during the execution of the production scenario, is facing three major challenges, namely disturbance detection and impact evaluation, control strategy switching mechanisms definition and HCA design. Research in manufacturing scheduling and control is constantly growing, leading to an increasing number of innovative hybrid architecture solutions, each of them characterized by specific assumptions and potential advantages. This paper introduces a review of these architectures in order to identify the solutions given to these three major challenges and the reasons limiting their industrial implementation.

The paper also introduced research topics and possible leads aiming at proposing a dynamic and homogeneous hybrid scheduling and control architecture where the coupling of reactive-distributed and predictive/reactive centralized mechanisms is optimized. This includes decision support for control holons/agents to help them in their switching strategy, from/to different modes. More precisely, the idea is to provide these agents/holons with information and mechanisms that would help them to decide online their best behavior facing expected and unexpected events (e.g., stay in predictive mode, switch to reactive mode, switch back to predictive mode, switch to an intermediary constrained mode...). Even though the HCA exhibited in the state of the art show promising performances on academic examples, three main challenges are still to be investigated from the authors’ perspective. Several leads are given to orient future research activities in this field, with the objective of making these concepts applicable on industrial shop floors in the next few years.