Keywords

1 Introduction

In the engineering and manufacturing domain, there is an atmosphere of departure to a new era of digitized production, where traditional industrial engineering methods are synergistically combined with IT and internet technologies, such as cyber-physical systems, sensor networks, big data analytics, and semantic data integration. In different regions, initiatives in these directions are known under different names, such as industrie du futur in France, industrial internet in the US or Industrie 4.0 in Germany. A core vision of these initiatives is to make manufacturing and production more flexible, efficient, and less error-prone by shifting more ‘intelligence’ to the edge. This shall be achieved by enabling sensors, devices, machines, and storage and transport equipments to directly communicate with each other. To realize this Industry 4.0 vision, a vast variety of areas related to manufacturing, security, and machine communication need to interoperate by aligning their information models using domain-specific standards.

The Automation Markup Language (AutomationML or AML) for exchanging plant engineering information as specified by IEC 62714 [4, 9, 17, 21] is one of the core standards of Industry 4.0. AutomationML can describe plant components and their sub-components from different perspectives, e.g., mechanical or electrical. A key challenge in such settings is intra-standard interoperability, i.e., the consistent integration of multiple pieces of information described in AutomationML. To overcome this challenge, we present Alligator, a deductive approach to integrate AutomationML specifications, and potentially similar document types.

We define an RDF-based representation of AutomationML input documents, aiming to resolve structural semantic inconsistencies, such as granularity of representations, schematic differences, and groupings and aggregations. Based on this semantic representation, we define a set of Datalog rules for identifying conflicts that generate structural semantic inconsistencies. A deductive engine is used to compute the conflicts from the Datalog representations. Conflict resolution is utilized to merge the input documents and produce an integrated AutomationML document.

By automatizing a crucial part of the engineering and modeling processes, Alligator addresses a key pillar of the Industry 4.0 vision. To the best of our knowledge, Alligator is the first comprehensive approach for automatically resolving the semantic ambiguity of AutomationML. As a result, the Alligator approach enhances scalability, efficiency, and coherence of models for Industry 4.0 manufacturing environments. Although our initial implementation and evaluation of the approach focuses on AutomationML, the approach is easily transferable to other Industry 4.0 standardization initiatives. We empirically evaluated the quality of Alligator against a benchmark of AutomationML documents. The evaluation results suggest that Alligator accurately identifies various types of conflicts between AutomationML documents.

In summary, this work makes the following contributions:

  1. 1.

    Alligator, a deductive approach, that combines Deductive Database and Semantic Web technologies for the integration of Industry 4.0 Standards.

  2. 2.

    A set of Datalog rules to characterize semantic heterogeneity types among AutomationML documents.

  3. 3.

    An empirical evaluation that reveals the effectiveness of Alligator during the integration of AutomationML documents.

The remainder of this paper is structured as follows. Section 2 motivates the problem with a concrete example. Section 3 gives an overview on the background and introduces the terminology relevant to our approach. Section 4 presents the Alligator approach, which is evaluated in Sect. 6. Section 7 reviews related work. Section 8 concludes and gives an outlook to future work.

Fig. 1.
figure 1

Motivating example. Results of an engineering process where a motor engine is modeled from different views: a mechanical and an electrical view. Identical elements of the motor engine are defined as different elements in the views, resulting in conflicts between the views.

2 Motivating Example

A typical scenario in the mechatronic domain is data exchange between engineering tools during the modeling process. Engineering tools are utilized in different disciplines, such as mechanical and electrical engineering, or systems control. Figure 1 illustrates the results of an engineering modeling process where a motor engine is modeled from mechanical and electrical viewpoints. Mechanical engineers design the motor engine from the mechanical point of view, whereas electrical engineers model the electrical wiring topology inside the motor engine. AutomationML is utilized in both views to semantically describe the engine. However, because physical structures in these views are modeled with different properties, conflicts might arise when integrating these designs, thus inducing structural semantic inconsistencies.

Figure 2 details the mechanical and electrical views of the motor engine given in Fig. 1. The motor engine is identified as 0173-1#01-AKE162#012 DC Engine according to the eCl@ss product classification standardFootnote 1. This reference enables the semantic description of the mechatronic component by pointing to the standard definition of a motor engine in eCl@ss. The AML document Motor-Engine-Mec.aml (cf. Fig. 2a) specifies the motor in terms of its construction form as a DC Engine (mechanical view). The AML element RoleClassLib (lines 2–23) comprises two AML elements RoleClass. The first RoleClass (lines 4–14) contains AML attributes with references to eCl@ss that semantically describe the engine according to the standard definition of version, classification in the eCl@ss catalog, and the International Registration Data Identifier (IRDI). The second RoleClass (lines 15–20) is composed of an AML attribute that defines the construction form of the DC Engine; RefSemantic (line 18) refers to the eCl@ss standard definition of this AML attribute (0173-1#02-BAE069#007).

Figure 2b depicts an AML document that aims at defining the same engine from the electrical viewpoint. As in the mechanical view, the first RoleClass (lines 4–14) semantically describes the engine using eCl@ss, while the second RoleClass (lines 15–20) defines not the engine as a whole, but a data cable in the engine. The Attribute in line 16 specifies the data cable and includes the semantic reference to eCl@ss (line 18).

Albeit the structural definition in these views of the DC Engine differs in the AML documents, the specification of AutomationML and its eCl@ss integration [19] imply that both descriptions are semantically equivalent. On one hand, the references to eCl@ss indicate that the AML elements between lines 4 and 14 in the two views correspond to the same element in the real world. For example, the specification of AutomationML states that two RoleClass elements are semantically equivalent whenever they share the same eCl@ss references for the AML attributes eClassVersion, eClassClassification, and eClassIRDI [19]. However, these views describe different real-world objects, and they should not be defined using RoleClass elements in the mechanical and electrical views which are considered semantically identical according to AML. Therefore, these elements are in conflict. Accordingly, there are five pairs of conflicting AML elements in this simplified example; each pair of these needs to be merged into one AML element in case the two views are integrated.

Currently, this integration is performed manually by experts, negatively affecting engineering processes. We present Alligator, a deductive framework that exploits the features of logic programming and the RDF data model for representing AML documents, as well as for detecting conflicts whenever AML documents are integrated.

Fig. 2.
figure 2

Example of AML Documents. A motor engine is semantically described in terms of the eCl@ss standard. Role classes (highlighted in red) model the engine in terms of (a) a construction form and (b) a data cable. Elements of the same type (highlighted in yellow) correspond to conflicts between the views. (Color figure online)

3 Background

AutomationML. AutomationML (Automation Markup Language, IEC 62714) is a standard to exchange information about engineering tools, such as mechanical plant engineering, electrical design, or robot control. AutomationML provides an XML Schema, incorporating three different standards for describing real plant components [20]. At the top level there is the CAEX (IEC 62424) format for plant topology, storing hierarchical object information, properties, and libraries [8]. Secondly, the geometry (mechanical drawings) and kinematics (physical properties, such as force, speed, or torsion) are implemented with COLLADA [3]. Finally, the logic (sequencing, behavior, and control information) is implemented with PLCopen XML (IEC 61131).

AutomationML is built upon four main CAEX concepts: RoleClassLibrary, SystemUnitClassLibrary, InterfaceClassLibrary, and InstanceHierarchy. RoleClassLibrary specifies vendor independent requirements for the specification of system equipment objects; a RoleClassLibrary may comprise several RoleClasses, which provide role descriptions of a given class. Such descriptions aim at representing a physical or logical object, e.g., a motor or a robot. The InterfaceClassLibrary defines a set of interfaces to describe a plant model. First, it can define relations between the objects of a plant topology. Secondly, it can reference external information, e.g., a 3D description of a motor. The InstanceHierarchy describes the plant topology, and defines specific equipment for actual projects. Further, Attributes are used to define properties, e.g., length or size, of AML objects, e.g., RoleClasses or Internal Elements. In this paper, we focus on modeling topology information by means of the CAEX format.

AutomationML. Biffl et al. [4] and Kovalenko and Euzenat [11] have characterized mappings to deal with semantic heterogeneity in the engineering domain, and specifically in AutomationML. The authors have identified the following types of semantic heterogeneity: (M1) Value processing same properties are not modeled equally, e.g., using different datatypes; (M2) Granularity same objects are modeled at different levels of detail; (M3) Schematic differences differences in the way how semantics is represented for the same object; (M4) Conditional mappings relations between entities exist only if certain conditions occur; (M5) Bidirectional mappings relations between entities have to be defined bidirectionally; (M6) Grouping and aggregation different semantic modeling criteria are applied to group elements for the same object; and (M7) Restrictions on values mandatory values for properties in the object that have to be handled in the mapping process. As a proof of concept, we focus on semantic heterogeneity types, such as granularity (M2), schematic differences (M3), and grouping and aggregation (M6). We selected these types because they present major semantic structural differences to describe similar objects. Additionally, they characterize semantic mappings between two AML elements that can be performed in two ways:

  1. 1.

    Direct identification considers two elements to refer to the same entity if the same identifier is used.

  2. 2.

    Indirect identification considers two elements to refer to the same entity if both refer to the same identity-providing elements from an external catalog, e.g., RoleClass or Attributes. For more complex structures as RoleClasses, it is assumed that if the combination of the eCl@ss IRDI, classification level, and version are equal, then the RoleClasses are considered to be the same.

AutomationML Vocabulary. Several approaches exist for adding semantics to the AutomationML language by means of ontologies [1, 2, 5, 6, 12, 15]. With the exception of the AutomationML ontologyFootnote 2, designed for the AutomationML Analyzer [16], none of the aforementioned ontologies covers all concepts given in the AutomationML schema. Additionally, they are not available on the web for consulting or querying. Crucial information for Alligator, such as the mapping with eCl@ass concepts, are not included in the AutomationML Analyzer vocabulary. Therefore, we have developed an RDFS vocabulary describing the main concepts of the AutomationML language.Footnote 3 Also, we have included concepts related to the integration with the eCl@ss standard.

4 Our Approach: Alligator

In this section, we present a formalization of AML documents, as well as the integration problems and proposed solution addressed by the Alligator approach. Finally, the architecture of Alligator is described in detail.

4.1 Alligator Representation of AML Documents

Definition 1

(Alligator Document). An Alligator document is a tuple \(\varGamma =\langle \theta , V, F\rangle \) such that \(\theta \) is a set of URIs that identify AML elements, V is a set of properties in the AML vocabulary and F is an RDF graph composed of triples in \(\theta \times V \times (\theta \cup L)\) where L is a set of literals.

An Alligator document \(\varGamma =\langle \theta , V, F\rangle \) can represent information from one or several AML documents \(D_i\), where \(\theta \) is the set of URIs that identify the AML elements in \(D_i\), and the RDF graph F describes the relationships between the AML elements in \(D_i\). In general, V can refer to different vocabularies, e.g., for other standards than AML such as OPC UA, but in this work, we focus on the AML vocabulary.

Example 1

Consider the RDF graph \(F_1\) in Fig. 3. This graph comprises RDF resources representing the AutomationML elements in the mechanical and electrical views shown in Fig. 2; the AutomationML RDF vocabulary is used to describe these resources. An Alligator document \(\varGamma _1=\langle \theta _1, V, F_1\rangle \) formally describes this RDF representation of the two views, where \(\theta _1\) is the set of the resources in \(F_1\), and V is the AutomationML RDF vocabulary.

Fig. 3.
figure 3

RDF graph of an Alligator document. An RDF graph representing AML elements in the union of the mechanical and electrical views in Fig. 2

Fig. 4.
figure 4

Ideal conflict-free Alligator document. (a) An RDF graph where there is only one RDF resource for the conflicting resources in the mechanical and electrical views of Fig. 6. (b) A homomorphism \(\sigma \) maps conflicting resources in the RDF graph in Fig. 3 to the same resource in the ideal RDF graph.

Definition 2

(Ideal Alligator Document). Given an Alligator document \(\varGamma =\langle \theta , V, F\rangle \), there is an ideal Alligator document \(\varGamma ^*=\langle \theta ^*, V, F^*\rangle \) such that \(\varGamma ^*\) comprises only conflict-free AML elements. Additionally, there is a homomorphism \(\sigma :\theta \rightarrow \theta ^*\). The RDF ideal graph \(F^*\) is defined as follows:

$$\begin{aligned} F^*=\{(\sigma (s), \; p, \; \sigma (o)) \mid (s,\; p, \; o) \in F\} \end{aligned}$$

Example 2

Consider the RDF graph in Fig. 4a. The Alligator document \(\varGamma ^*=\langle \theta ^*, V, F^*\rangle \) describes this RDF graph, where \(\theta \) is the set of RDF resources in the graph, V is the AutomationML RDF vocabulary, and \(F^*\) is this RDF graph. \(\varGamma ^*\) represents the ideal conflict-free Alligator document of \(\varGamma _1\). Figure 4b shows a homomorphism \(\sigma \) that maps two conflicting resources in the RDF graph in Fig. 3 to the same resource in Fig. 4a.

Definition 3

Consider an Alligator document \(\varGamma =\langle \theta , V, F\rangle \), an ideal conflict-free Alligator document \(\varGamma ^*=\langle \theta ^*, V, F^*\rangle \), and a homomorphism \(\sigma :\theta \rightarrow \theta ^*\). A set of conflicts in \(\varGamma \) with respect to \(\varGamma ^*\) and \(\sigma \), conflicts(\(\varGamma \mid \varGamma ^*,\sigma \)), corresponds to the set of AML element pairs \((E_i,E_j)\) in \(\theta \times \theta \) such that \(E_i\) and \(E_j\) are different but that \(\sigma \) maps to the same target AML element in \(\theta ^*\):

$$\begin{aligned} \text {conflicts(}\varGamma \mid \varGamma ^*,\sigma \text {)=}\{(E_i,E_j) \mid E_i, E_j \in \theta \text { and } E_i \ne E_j \text { and } \sigma (E_i)=\sigma (E_j)\} \end{aligned}$$

Example 3

Given Alligator documents \(\varGamma _1\) and \(\varGamma ^*\) from Examples 1 and 2, and the homomorphism \(\sigma \) in Fig. 4b. The set of conflicts(\(\varGamma _1 \mid \varGamma ^*, \sigma \)) corresponds to the set of pairs of RDF resources in the RDF graph of Fig. 3 that \(\sigma \) maps to the same resource in the ideal RDF graph (Fig. 4b).

4.2 Problem Definition and Proposed Solution

Given an Alligator document \(\varGamma =\langle \theta , V, F\rangle \), the AML Conflict Identification problem determines if a pair \((E_k,E_l)\) of AML elements in \(\theta \) is conflicting.

Definition 4

Consider an Alligator document \(\varGamma =\langle \theta , V, F\rangle \), an ideal conflict-free Alligator document \(\varGamma ^*=\langle \theta ^*, V, F^*\rangle \), and a homomorphism \(\sigma :\theta \rightarrow \theta ^*\). The AML Conflict Identification problem corresponds to the problem of deciding if \((E_k,E_l) \in \theta \times \theta \) belongs to conflicts(\(\varGamma \mid \varGamma ^*,\sigma \)).

Solving the AML Conflict Identification problem requires the existence of the ideal conflict-free AML document \(\varGamma ^*\) and the homomorphism \(\sigma \). However, in practice neither \(\varGamma ^*\) and \(\sigma \) is known, and Alligator computes an approximation of the problem. We use SC(\(\varGamma \)) to refer to the set of pairs \((E_k,E_l)\) that correspond to the solutions of this problem. Once a set SC(\(\varGamma \)) of conflicting AML elements in F is identified as the solution of the AML Conflict Identification problem, the problem of AML Conflict Resolution corresponds to the problem of creating an Alligator document where conflicts in SC(\(\varGamma \)) are solved.

Definition 5

Consider an Alligator document \(\varGamma =\langle \theta , V, F\rangle \) and a set SC(\(\varGamma \)) of pairs of conflicting AML elements in F. The problem of AML Conflict Resolution corresponds to the problem of creating an Alligator document \(\varGamma '=\langle \theta ', V, F'\rangle \) and a homomorphism \(\sigma ':\theta \rightarrow \theta '\), such that:

  • For each \((E_i,E_j)\) in SC(\(\varGamma \)), there is an AML element \(E_m\) in \(\theta '\) such that \(\sigma '(E_i)=\sigma '(E_j)=E_m\).

  • \(F'=\{(\sigma '(s), \; p, \; \sigma '(o)) \mid (s,\; p, \; o) \in F\}\).

\(\varGamma '\) represents the Alligator document where pairs of AML elements in SC(\(\varGamma \)) are represented as one RDF AML element.

We developed Alligator, an integration tool that relies on deductive database techniques for solving the problems of AML Conflict Identification and AML Conflict Resolution. Figure 5 depicts the architectural components of Alligator. Given a set of AML documents, the Alligator Data Model Creation component generates an Alligator document \(\varGamma =\langle \theta ,V,F\rangle \) that formally describes the union of these input AML documents. Additionally, a set of Datalog extensional facts (EDB) representing the triples in the RDF document F is created. The Deductive System Engine relies on the set of Datalog intentional rules (IDB) to compute the set SC(\(\varGamma \)) from the Datalog representation of \(\varGamma \). The set of Datalog intentional rules (IDB) defines different types of semantic heterogeneity that can occur among AML documents that correspond to views of the same mechatronic object definition. SC(\(\varGamma \)) is computed as the least minimal fixpoint of the Datalog rules in IDB and the facts in EDB. Further, SC(\(\varGamma \)) is utilized by the Integrated AML Document Creation component to solve the AML Conflict Resolution problem, and to produce an integrated AML document where RDF AML elements in SC(\(\varGamma \)) are integrated as one AML element.

Fig. 5.
figure 5

The Alligator Architecture. Alligator receives AML documents and creates an integrated AML document. AML documents are represented as RDF graphs and Datalog predicates (EDB); Datalog intentional rules (IDB) characterize semantic heterogeneity types. A bottom-up evaluation of the Datalog program identifies conflicts between AML documents

4.3 Alligator Data Model and Deductive System Engine

Alligator represents AML documents as RDF graphs. AML documents are translated to RDF using Krextor [13], an XSLT-based framework for converting XML to RDF. The RDF AML vocabulary is used to describe AML elements and relations. Further, AML documents are modeled as facts in an extensional database (EDB) of a Datalog program P; for each type of AML element in the AutomationML standard exists an extensional Datalog predicate in P. Rules in the intensional database (IDB) of the Datalog program P characterize types of semantic heterogeneity. Intensional Datalog predicates represent conflicts that can exist between the different AML elements according to the types of semantic heterogeneity. The Alligator Deductive System Engine performs a bottom-up evaluation of P following a semi-naïve algorithm that stops when the least fixed-point is reached [7]. The intensional predicates inferred in the evaluation of P correspond to the pairs of conflicts in the set \(SC(\varGamma )\).

5 Alligator rule-based representation of AutomationML Semantic Heterogeneity

One of the key innovations of Alligator revolves on the use of a Datalog-rule approach to effectively solve types of semantic heterogeneity. We have developed a set of rules covering the main characteristics of AML. Regarding the attributes, it is possible to determine that, if two attributes refer to the same eCl@ss value, i.e., eCl@ss IRDI, it can be assumed that their semantic meaning is the same. In detail, the AML element refSemantic refers to the eCl@ss IRDI using CorrespondingAttributePath (cf. Fig. 2 line 18). Thereby, even if two attributes are defined with different names, e.g., Length and StrictLength, they can still be semantically equivalent whenever they are linked to the same IRDI reference. It is important to remark that these rules have been defined taking into account the AML vocabulary properties. Based on this, the rule in Listing 1 states when two attributes are semantically equivalent.

figure a

To determine that two RoleClasses are semantically equivalent according to their reference to eCl@ss, they have to contain the same version, classification, and IRDI. Based on these three conditions, Rule 2 (cf. Listing 2) defines two semantically equivalent RoleClasses.

figure b

Rule 2 relies on simpler rules such as Rule 3 (cf. Listing 3), which defines the equivalence of two eClassIRDI attributes. Similarly, we have defined rules to decide if two values of eClassVersion and eClassClassification are the same.

figure c

These three rules are only examples of the type of rules implemented in Alligator; the complete set of rules is given on GitHubFootnote 4.

6 Empirical Evaluation

We studied the effectiveness of Alligator in the solution of the problems of AML Conflict Identification and AML Conflict Resolution. In particular, we assessed the following research questions: (RQ1) Is Alligator able to identify pairs of conflicting AML elements in AML documents?; (RQ2) Does Alligator exhibit equal behavior whenever different types of semantic heterogeneity occur during the integration of AML documents? The experimental configuration to evaluate these research questions was as follows:

Testbeds. Testbeds were based on the semantic mapping types M2 (granularity), M3 (schematic differences), and M6 (grouping and aggregation), with ten testbeds for each of them, respectively. First, a seed (AML document) was manually created for each testbed according to the type of semantic mapping. Next, we automatically generated two AML documents derived from this seed containing a random number of conflicting AML elementsFootnote 5. The generation was performed following a uniform distribution. Testbeds corresponded to pairs of AML documents, and thirty testbeds were evaluated in the studyFootnote 6.

Gold Standard. To compile a Gold Standard, we relied on the generated testbeds. Formally, the Gold Standard corresponds to an ideal conflict-free Alligator document \(\varGamma ^*=\langle \theta ^*, V, F^*\rangle \), for each pair of the AML documents in the testbeds. The creation of the conflict-free document as well as the computation of the conflicting elements and different elements was performed manually.

Metrics. We measured the behavior of Alligator in terms of the following metrics:

  1. (a)

    Precision is the fraction of the conflicts identified by Alligator (i.e., \(SC(\varGamma )\)) that are conflicts in an AML document (i.e., conflicts(\(\varGamma \mid \varGamma ^*,\sigma \))).

    $$\begin{aligned} Precision = \frac{|SC(\varGamma ) \cap conflicts (\varGamma \mid \varGamma ^*,\sigma )|}{|SC(\varGamma )|} \end{aligned}$$
  2. (b)

    Recall is the fraction of the conflicts in an AML document (i.e., conflicts(\(\varGamma \mid \varGamma ^*,\sigma \)) that are identified by Alligator (i.e., \(SC(\varGamma )\)).

    $$\begin{aligned} Recall = \frac{|SC(\varGamma ) \cap conflicts (\varGamma \mid \varGamma ^*,\sigma )|}{| conflicts (\varGamma \mid \varGamma ^*,\sigma )|} \end{aligned}$$
  3. (c)

    F-measure is the harmonic mean of Precision and Recall.

Implementation. Experiments were run on a Windows 8 machine with an Intel I7-4710HQ 2.5 GHz CPU and 8 GB 1333 MHz DDR3 RAM. We implemented the Deductive System Engine as a meta-interpreter in Prolog that follows the semi-naïve bottom-up evaluation of Datalog programs [7]; we utilized SWI-Prolog version 7.2.3 and the Prolog Development Tool (PDTFootnote 7). An AML extraction module was developed as a part of Krextor to transform AML documents into RDF graphs. This module comprised a set of mapping rulesFootnote 8 that are executed in Krextor to create RDF graphs using the AML vocabulary. Further, the transformation of the RDF files into Datalog extensional predicates was implemented in Java 1.8. The Alligator framework, the testbed generator, and the testbeds evaluated in this experiment are publicly available on GitHubFootnote 9.

Fig. 6.
figure 6

Size of Integrated AML Documents. Per type of semantic heterogeneity: Granularity (M2), Schematic (M3), and Grouping (M6), the size of the integrated AML documents was reported in terms of the number of conflicts solved (light grey bars), and the different AML elements in the document (dark grey bars). In all the evaluated testbeds, the solved conflicts comprised at least 25 % of the total number of AML elements in the AML document, showing the heterogeneity of the evaluated testbeds

Size of the Integrated AML Documents. The goal of this evaluation was to analyze the size of the integrated AML documents with respect to conflicting and different elements. For each type of semantic heterogeneity and testbed of that type, we computed the number of conflicts solved by Alligator. Further, the number of different AML elements was measured; a different AML element corresponded to an element that appeared in one of the AML documents in the testbed, and was not conflicting with any other AML element. For example, the AML elements in line 15 of the two views in Figs. 2a and 2b are different elements. In consequence, both should be included in the integrated AML document. On the other hand, the AML elements in lines 2, 4, 6, 9, and 12 in both views are pair-wise conflicted AML elements, and each pair should be integrated into only one AML element. Figure 2 reports on the number of conflicted and different AML elements. We observed that a large number of AML elements in the integrated AML documents result from solving the Conflict Resolution problem; being the number of these AML elements at least 25 % of the total elements in the integrated documents. These results illustrated the complexity of the evaluated testbeds, and clearly showed the enhancement assessed by Alligator during the integration of AML documents.

Effectiveness of Alligator The goal of this experiment was to answer our research questions RQ1 and RQ2. Alligator was run on each of the 30 testbeds to create \(SC(\varGamma )\), and precision, recall, and F-measure were computed according to the Gold Standard (\(\textit{conflicts}(\varGamma \mid \varGamma ^*,\sigma )\)). Table 1 reports on the values of these metrics for each type of semantic heterogeneity, i.e., M2, M3, and M6. We observed that for these semantic heterogeneity types, the value for precision is 1.0, i.e., Alligator correctly detected all the conflicting elements in \(\textit{conflicts}(\varGamma \mid \varGamma ^*,\sigma )\). Further, recall and F-measure are also 1.0 in the testbeds of semantic heterogeneity M2. These results suggest that Alligator rules capture the knowledge required to accurately solve the AML Conflict Identification problem. For the semantic heterogeneity types M3 and M6, Alligator rules are not completely covering all possible conflicts generated between nested structures composed of conflicting AML elements. Thus, Alligator could not identify at most two conflicts in five out of 20 testbeds of type M3 and M6. These results allowed us to positively answer research questions RQ1 and RQ2.

Table 1. Effectiveness of Alligator. Per semantic heterogeneity type, the effectiveness of Alligator is reported. In all the testbeds, precision is 1.0. Alligator exhibits the highest performance in the testbeds of type M2 (F-measure is always 1.0), while in M3 and M6, the F-measure values are at least 0.8

7 Related Work

In the literature, many different approaches are proposed for integrating CAEX documents. In [18], a tool to map two CAEX files is presented. It allows to integrate the AutomationML documents, their respective descriptions, and the modified parts of one file into the other. Further, a mapping algorithm for CAEX files is presented. Nevertheless, the process of mapping is performed manually. Himmler [10] presents a framework to create standardized application interfaces in plant engineering based on AutomationML. The work provides a function-based based standardization framework for the plant engineering domain. Persson et al. [14] utilize an RDF-based approach to integrate robotized production information modeled with AutomationML. Kovalenko et al. [12] explore how AutomationML can be represented by means of Model-Driven Engineering and the Semantic Web. A small part of an AutomationML ontology is developed, based on the main concepts of the language. Also, the use of rules for consistency checking is proposed, using the Semantic Web Rule Language (SWRL), but no explicit definition of the role of Semantic Web technologies on the integration problem is presented. The AutomationML Analyzer [16] is an online tool to browse, query and analyse different AML data by means of Semantic Web technologies; a conceptual design to overcome integration problems in AML is described. All these approaches have the potential to solve specific integration problems for AML. However, they solve rather isolated problems, and a general method capable to automatically integrate AML information from different perspectives is not provided. Contrary, Alligator combines deductive databases and Semantic Web technologies to effectively integrate documents specified using Industry 4.0 Standards like AML.

8 Conclusions and Future Work

This paper presented Alligator, a deductive framework for the integration of AML documents. Alligator relies on Datalog and RDF to accurately represent the knowledge that characterizes different types of semantic heterogeneity in AML documents. The results of the empirical evaluation indicate that Alligator is able to effectively solve the problems of AML Conflict Identification and AML Conflict Resolution, and exhibits similar behavior for the three studied semantic heterogeneity types, i.e., granularity (M2), schematic (M3), and grouping (M6). In the future, we will empower the Alligator Deductive System Engine with the expressiveness of Datalog with negation and built-in predicates. Thus, Alligator will be able to represent other types of semantic heterogeneity in AML, e.g., value processing (M1) and conditional mappings (M4). Further, we plan to extend Alligator to integrate documents of other Industry 4.0 Standards, such as the OPC-UA machine-to-machine communication protocol.