Keywords

1 Introduction

Augmented reality (AR) plays an important role in the ongoing convergence of the physical and the digital world [28]. At its core, augmented reality enhances the user’s perception by superimposing visual information such as images, videos, or three-dimensional (3D) visualizations onto real-world environments in real time [2, 39]. It uses computer vision techniques to align objects in the virtual and physical worlds and displays the virtual information using see-through displays or screens, e.g., on smartphones or head-mounted displays [32]. AR reverts to markers or detectors of real-world objects to determine their location and orientation in three-dimensional space to accurately map visual information onto them. For realizing complex AR workflows in practical work scenarios, additional concepts such as the integration of external data sources in combination with triggers, conditions, and actions to process this data become necessary.

Recent technological advances have made augmented reality affordable via its availability on standard smartphones and tablets [38]. In addition, the future open W3C standard WebXR Device API is being developed for accessing AR devices on the web across a wide variety of hardware form factors [18]. In terms of industrial applications, market research by Gartner [27] and PwC [7] indicates that AR is a highly promising technology allowing for broad usage in industrial scenarios such as maintenance tasks or training [15].

Creating augmented reality applications requires today advanced programming skills, e.g., for platforms and APIs such as VuforiaFootnote 1, ARKitFootnote 2, Google ARCoreFootnote 3, or MRTKFootnote 4. For easing the creation of AR applications, several proposals have been made in model-driven engineering (MDE) and conceptual modeling. This includes, for example XML and JSON schemas for describing AR scenes in generic, platform-independent formats [21, 30] or with a focus on learning experiences [37]; domain-specific languages for creating AR model editors using Vuforia, ARKit, or MRTK [6, 29, 33]; or a BPMN extension for representing process information in AR using the Unity platform [15]. In addition, commercial low-code and no-code tools are offered that aim to empower non-technical users to create AR applications. This includes tools such as UniteARFootnote 5, or Adobe AeroFootnote 6. However, these tools are mostly designed for creating a single AR scene or very simple workflows.

What is missing so far is a visual modeling approach that can represent complex AR workflows for diverse application scenarios, that can be easily adapted to new requirements, and that is based on open standards. To facilitate the creation of AR applications that take advantage of the accessibility, portability, interoperability, and openness of the web, we propose a domain-specific modeling language (DSML) based on models conforming to the W3C WebXR Device API recommendation, thereby enabling the definition of different scenarios such as assembly processes, maintenance tasks, or learning experiences. The development of the language follows guidelines for DSML development proposed by Frank [13]. The DSML has been implemented on the ADOxx metamodeling platform and applied to a furniture assembly use case [11]. For a first evaluation, we conduct a feature comparison with similar languages in the area of augmented reality [34].

The remainder of the paper is organized as follows. Section 2 describes fundamental concepts in AR and the most important development platforms for achieving a common understanding. In Sect. 3, we analyze previous related work in MDE and conceptual modeling in the context of AR. From these insights, we derive generic and specific requirements for a domain-specific visual modeling language for AR applications and present its specification and implementation in Sect. 4. This is followed by a use case in Sect. 5. In Sect. 6, we evaluate the language through a feature comparison. Finally, in Sect. 7, we conclude the paper and point to future work.

2 Foundations

As augmented reality relies on a range of specific techniques from computer vision to achieve the intended user experience, we will briefly explain the most important concepts in the following for ensuring a common understanding.

2.1 Augmented Reality

Augmented reality is a technology that allows computer-generated virtual images to be embedded in the real environment [39], thereby creating a three-dimensional alignment between virtual and real objects that allows for interaction in real-time [2].

Augmented reality relies on three core concepts from the field of computer vision [32]: (1) Detectables/Trackables, (2) Coordinate Mappings, and (3) Augmentations. First, for determining the location and orientation of the real-world environment, computer vision algorithms are used to estimate the position and orientation based on two-dimensional (2D) or 3D sensor information, e.g., from a camera stream or a LiDAR scanner [9, 31]. This detection can either revert to detectables in the form of natural features or markers such as QR codes as surrogates for simplifying the detection and tracking [32]. Coordinate mappings are then needed to align objects in the real and the virtual world to each other. Thereby, a real world origin reference position, e.g., stemming from global positioning system (GPS) coordinates, must be mapped to the global coordinate system of the virtual environment. Further, local coordinate systems are used for any real-world or virtual object. These permit to define reference points for placing virtual objects relative to other objects, independent of the current global coordinates. Finally, virtual information is superimposed on the real world through so-called augmentations. These can be animations, 2D images, videos, audio, text labels, 3D objects, hyperlinks, checklists, or forms. By defining anchors, augmentations can be fixed at a particular position in real space.

For more complex AR scenarios, further concepts are necessary. This includes in particular the integration and processing of additional data that is acquired throughout the life-cycle of an AR scenario via sensors or user interactions. To enable dynamic changes in the AR environment, at least basic workflow concepts such as triggers, conditions, and actions need to be foreseen [37]. Thereby, triggers include: click, detection, sensor, or timer events; voice commands; entry/exit of defined spatial areas; or, gestures. Conditions specify the branchings into different process flows and actions refer to any change applied to the virtual objects such as the appearance and disappearance of objects or transformations, i.e., rotation, scaling, and positioning.

2.2 Implementation Platforms

For creating AR applications, several development platforms and software development kits (SDK) are provided. Most of them require significant programming skills and are either commercial or closed-source. Examples include the Unity runtime and development environment, Apples ARKit, Wikitude, Vuforia, Kudan, Unreal Engine, or Adobe Aero. In addition, open source platforms and SDKs are available, such as Google ARCore, ARToolKit+, OpenXR, or Holokit.

An alternative to the above platforms and SDKs is the WebXR Device API [18]. It specifies a web Application Programming Interface (API) that provides browser-based access to handheld or head-mounted augmented reality and virtual reality devices, including sensors. This allows AR content to be rendered by any compatible WebXR-enabled browser without the need to install additional software or use SDKs. As of today, WebXR is supported, for example, by Chromium-based browsers on the Android operating systemFootnote 7, including handheld smartphones and tablets, as well as head-mounted displays, e.g., the Microsoft HoloLens 2Footnote 8. Further, WebXR is already included in the WebKit engine used by iOS SafariFootnote 9 and will be supported by the Apple Vision ProFootnote 10. WebXR does not facilitate the development of technical applications, but applications developed with it are more accessible.

3 Related Work

Several approaches have explored the application of conceptual modeling and model-driven engineering for augmented reality applications. In a comprehensive literature analysis, we previously identified 201 relevant papers at the intersection of conceptual modeling and virtual reality/augmented reality and derived the major research streams in these areas [26]. From the results of this study, we selected the most important contributions in the area of model-driven engineering and conceptual modeling for AR which are related to our approach. These will be briefly characterized in the following.

Ruminski and Walczak [30] describe a text-based declarative language for modeling dynamic, contextual augmented reality environments called CARL. They claim that CARL can simplify the creation of AR experiences by allowing developers to create reusable, modular components. Their development approach is based on textual modeling and does not include a visual representation.

Wild et al. [37] focused on data exchange formats for AR experiences in manufacturing workplaces. They propose two textual modeling languages that include the definition of learning activities (activityML) and the definition of workplaces (workplaceML). Based on this work, a new IEEE standard for Augmented Reality Learning Experience Models has been developed [36], which includes a reference implementationFootnote 11. It enables the direct definition of learning workflows within an AR context. However, the textual models for these workflows are stored only at runtime, precluding a definition outside the tool.

A similar approach has been developed by Lechner [21]. He proposes the XML-based Augmented Reality Markup Language (ARML 2.0) for describing virtual objects, their appearance, and anchors in an AR scene in relation to the real world. ARML 2.0 has been included in a standard issued by the Open Geospatial ConsortiumFootnote 12 in the form of an XML grammar.

Ruiz-Rube et al. [29] proposed a model-driven development approach for creating AR-based model editors, aiming at more efficient means of creating and editing conceptual models in AR. Thus, the generated applications target modeling itself. They demonstrate their approach by a tool called ARE4DSLFootnote 13. It only allows for the definition of AR-based modeling applications and not for the definition of other types of AR applications.

Seiger et al. [33] presented Holoflows, a modeling approach for creating Internet of Things (IoT) processes in augmented reality environments. The approach includes an interface allowing non-experts to design IoT processes without process or modeling knowledge. The approach is specific to the IoT domain and modeling is only possible within the provided AR application.

Grambow et al. [15] introduced an approach called BPMN-CARX. It stands for a solution integrating context-awareness, visual AR support, and process modeling in BPMN of Industrial Internet of Things (IIoT) processes. The approach allows to extend business process management software with AR and IIoT capabilities. Further, it supports the modeling of context-aware and AR-enabled business processes. BPMN-CARX extends BPMN with new elements including a graphical notation. The approach is specific to business process modeling and does not seem applicable to other scenarios.

Campos-Lopez et al. [6] and Brunschwig et al. [5] proposed an automated approach for constructing AR-based interfaces for information systems using model-driven and software language engineering principles without the need for coding knowledge. They introduced a model-driven approach for AR interface construction, where the interface is automatically generated from a high-level domain metamodel of the system and includes AR features like augmentations, a mechanism for anchors based on real-world position, or the recognition of barcodes and quick response (QR) codes. Additionally, it is possible to define API calls to be performed upon certain user interactions, e.g., the creation of objects. The approach is mainly designed for modeling systems that use AR, however, there is no possibility to define states or executable workflows. They demonstrate the feasibility of their approach through a prototypical iOS app called AlteR that is based on Apple’s ARKitFootnote 14.

In summary, approaches exist for (1) generating specific AR applications based on models and schemata, (2) generating AR-based modeling tools based on MDE, and (3) AR modeling applications based on conceptual modeling languages. However, to the best of our knowledge, there is no visual modeling approach available so far for representing executable AR workflows for diverse application scenarios and that is based on open AR standards. Therefore, we advance in the next section to the definition of the requirements of such a modeling language and its implementation, as well as an exemplary use case.

4 Derivation of the Visual Modeling Language

Domain-specific languages in general provide constructs that are tailored to a specific field of application with the goal of gaining expressiveness and ease of use to increase productivity [22]. In the area of model-driven software development, typically languages with a visual notation are proposed, which we will denote in the following as domain-specific visual modeling languages, cf. [13, 19]. Related to this is a trend found today in industrial software development with the rise of low-code and no-code approaches which aim at empowering users to develop software with less or no programming expertise [3, 8]. We will thus derive a domain-specific visual modeling language for creating augmented reality applications.

4.1 Methodology

Several guidelines and methodologies have been proposed for the development of domain-specific languages, cf. [13, 17, 20, 35]. We will mainly follow the macro process proposed by Frank [13], who describes seven phases including details for each phase - see Fig. 1. For the language specification and the creation of the modeling tool we further considered the methodology by Visic et al. [35], which focuses on the interplay between a modeling language and algorithms and the deployment of the modeling tool.

Fig. 1.
figure 1

Seven Phases for Domain-Specific Language Development [13] [p. 8]

In terms of scope and purpose, we aim for a language that permits users with no programming expertise to create augmented reality applications that include complex workflows and run in a web browser without further plugins or software components on a broad range of devices.

4.2 Requirements

Frank distinguishes between generic and specific requirements that need to be analyzed prior to the language specification [13]. As Gulden and Yu pointed out, these requirements have to be carefully balanced for considering trade-offs between different design alternatives [16], especially in terms of simplicity, comprehensibility, and convenience of use of the language [13].

Thus, we defined the following seven generic requirements (GR\(_{1-7}\)) for our language as proposed by Frank [13] and in similar fashion by Karsai et al. [20], as well as Jannaber et al. [17]: GR\(_1\): The language should allow the specification of AR applications of various types without programming skills, making AR application development more intuitive and user-friendly than traditional approaches. GR\(_2\): The modeling language shall use concepts that a potential user is familiar with, i.e., concepts that are either common in everyday life or related to AR environments. GR\(_3\): The modeling language shall contain special constructs that are tailored to the domain of augmented reality. These terms need to be understood in the same way in all situations and by all users. GR\(_4\): The constructs of the language should allow modeling at a level of detail sufficient for all foreseeable AR applications. GR\(_5\): The language shall provide different levels of abstraction to avoid overloading and thus compromising the proper interpretation of a model. GR\(_6\): There shall be a clear association between the language constructs and the constructs of the relevant target representations in the AR application. GR\(_7\): In addition, Frank describes the requirement of choosing an appropriate metamodeling language that is consistent with the generic requirements described, which we will consider later for the language specification.

Further, we added twelve specific requirements SR\(_{1-12}\) that originate from: (a) our analysis of the domain of augmented reality in the form of fundamental concepts and existing software platforms and approaches – see Sect. 2, (b) previously identified academic approaches in the area of model-driven engineering for AR [26], and (c) requirements concerning the implementation of the language in terms of satisfying the purpose of platform-independent execution using WebXR [18]. The specific requirements have been further grouped into three categories: Domain, Abstraction, and Implementation.

The category Domain refers to specific requirements that emerge from the domain of augmented reality applications. SR\(_1\): Superimposing virtual objects on the real world (Augmentation) is the main functionality of augmented reality applications [6, 15, 21, 29, 30, 33, 37]. The domain-specific modeling language must allow the user to represent virtual augmentations in various forms such as images, text labels, animations, or 3D objects. SR\(_2\): To create a realistic AR experience, the digital augmentations superimposed on the physical world must align with the real world [6, 29, 37]. A virtual augmentation placed on a real object should remain in its original position relative to the real object, even as the user moves around. Therefore, the modeling language must provide a concept for creating a local real-world origin to provide a reference point at application runtime (World Origin Reference). SR\(_3\): It must be possible to specify the location of virtual augmentations in relation to other objects or the world origin in real or virtual space during model specification (Reference Point) [6, 21, 37]. SR\(_4\): It must be possible to specify real-world objects that can be tracked during application runtime (Detectable/Trackable) [6, 15, 21, 29, 30, 37]. Therefore, a concept is required to create such detectable objects during modeling. These detectables should not only specify the existence of a real-world object, but also provide data to recognize these objects at runtime, for example using images or 3D object data. SR\(_5\): Specifying the modification of different objects based on different actions is a critical functionality of AR applications [21, 29, 30, 33, 37]. Thus, the modeling language should permit to define transitions to subsequent actions and to directly manipulate and transform augmentations. SR\(_6\): For realizing complex AR workflows [15, 33, 37], triggers and conditions are required to enable dynamic branchings in AR applications [6, 15, 29, 30, 33, 37].

The category Abstraction refers to a general aspect for creating an AR modeling language and contains only one specific requirement, which details the generic requirement of different abstraction levels (GR\(_5\)). SR\(_7\): To reduce complexity and to separate the different roles required during the specification of AR scenarios, the modeling language shall include concepts for abstraction, e.g., model decomposition, and separation of concerns to allow task sharing among stakeholders with different responsibilities [21, 29, 30, 37]. For example, a designer could work on visualizing augmentations, while a domain expert could specify the application workflow.

The final category, Implementation, considers the requirements that must be supported in terms of language specification and implementation. SR\(_8\): Due to the nature of modeling languages, an abstract and a concrete syntax in textual notation needs to be provided [13, 20], also for easing future interoperability with previous approaches [6, 15, 21, 29, 30, 33, 37]. In addition, as visual notations are more intuitive and user-friendly than text-based notations, a two-dimensional graphical notation needs to be specified [15]. Finally, since the AR domain reverts largely to 3D content, specifying models directly in a 3D environment is useful to facilitate spatial imagination [6, 33, 37]. Thus, a domain-specific modeling language should consider concepts for text-based, 2D visual, and 3D spatial modeling. SR\(_9\): To allow for an easy and rapid adaptation of the language as requirements change, the modeling language shall be based on metamodeling [6, 13, 29]. SR\(_{10}\): It should be possible to directly feed the model into an AR application for the execution of the modeled AR scenario [15, 29, 30, 37]. Thus, a domain-specific modeling language for AR applications shall provide a data format that can be processed by an AR engine during runtime [10] or generate code for creating the AR application itself from the models [15]. SR\(_{11}\): AR applications are often built using commercial SDKs such as Apple ARKit, Wikitude, or Vuforia, most of which depend on the closed-source Unity development platform. To make the modeling language widely applicable on a large range of devices and enable non-commercial long-term research, the modeling language (specification) and code generated from it (execution) shall be based on open standards, such as the WebXR Device API [18]. SR\(_{12}\): To ensure reproducibility and accessibility, the implementation of the domain-specific modeling language shall be made openly available [29, 33, 37].

4.3 Language Specification

According to Frank the phase of language specification contains several parts [13]. The first step is to create a glossary containing all the concepts that are considered relevant to the domain of discourse. These terms were derived from the requirements shown above, e.g., augmentation, detectable, or condition. Next, for each concept in the glossary, it has to be decided whether it shall be part of the modeling language and how it will be expressed with the language during instantiation. Further, it needs to be decided which metamodeling language or meta\(^2\) model shall be used. Subsequent to the language specification, Frank foresees a separate phase for the design of the graphical notation. First, an overview of the language concepts and the abstract syntax is presented in the form of a metamodel. Thereafter, we show the graphical notation and details on the semantics of the constructs.

Fig. 2.
figure 2

Metamodel of the DSML for augmented reality applications with the three modeltypes ObjectSpace, Statechange, and FlowScene, as well as a legend.

For the definition of the modeling language, we used the metamodeling language of ADOxx [11]. ADOxx was chosen due to its wide usage within projects of the OMiLAB network [14] and the availability of an open platform for the implementation of model editors. The main metamodeling concepts in ADOxx are [11, 12]: ModelType , Class , Relationclass , and Attribute. Modeltypes contain one or more classes, which may be connected by relationclasses. Modeltypes, classes, and relationclasses may have attributes. Instances of classes and relationclasses can only be contained in one particular instance of a modeltype. Special attributes of type act as pointers to other class instances or model instances. In the metamodel introduced in the following, each concept will be marked with the icons introduced above (, , , ) to indicate the corresponding meta\(^2\)-concept.

Figure 2 shows the metamodel of the new domain-specific modeling language. The modeling language is divided into three separate ModelTypes : ObjectSpace, Statechange, and FlowScene. This results from requirements GR\(_2\), GR\(_5\) and SR\(_7\). An ObjectSpace  defines the real world of an AR environment. It contains the two classes Augmentation  and Detectable  as defined by requirements SR\(_1\) and SR\(_4\). Further, augmentations can include other augmentations, indicated by the child  relationclass and they may be connected to Detectables via anchored  relations (SR\(_3\)). A Detectable has an attribute is_origin , specifying if a Detectable references the world origin (SR\(_2\)).

Statechanges are described in the separate ModelType Statechange  - SR\(_5\) and SR\(_7\). Within such models, Augmentations from the ObjectSpace model are referenced (Reference ) and changes on their attributes - e.g., a rotation transformation - are expressed via the attribute statechange_list .

The FlowScene  ModelType defines the workflow of the AR application and how it reacts to different environmental conditions (GR\(_4\), SR\(_6\)). Every FlowScene contains exactly one Start  and one End  instance (SR\(_6\)). Each FlowScene contains an ObjectSpace  instance, which references an instance of the ObjectSpace ModelType. Inside this ObjectSpace class instance, the FlowScene model defines an Origin , one or multiple Statechanges , Conditions , and Resolves  (SR\(_2\), SR\(_6\)). They are linked to the ObjectSpace with the is_inside  relationclass, specifying that these concepts are linked to one specific ObjectSpace. The Origin is used to define the world origin of the AR environment. Thus, it references a Detectable in the ObjectSpace model. Conditions  define requirements which are necessary to trigger the subsequent Statechanges, or to trigger Resolves, if there are no consecutive Statechanges (SR\(_6\)). Thus, Statechanges and Resolves are connected to Conditions by the triggers  relationclass. Conditions, on the other hand, follow an Origin or Statechange via the has_condition  relationclass. Furthermore, Conditions can be associated with an Observer  using the has_observer  relationclass. Observers can be used to monitor sensor data or APIs (SR\(_6\)).

Table 1. Semantics and notation of the modeling language. For each ModelType, the semantic definition of the contained constructs is explained and the visual notation is shown.

For each of the classes and relationclasses, we added a graphical notation and details about the meaning of each construct in the form of a semantic definition, as shown in Table 1. Thereby, we considered principles from graphical notation design by Moody as far as possible [23]. In particular we aimed for Semiotic Clarity, Perceptual Discriminability, Semantic Transparency, Complexity Management, Cognitive Integration, Visual Expressiveness, Dual Coding, Graphic Economy, and Cognitive Fit. The further development of the graphical notation including more advanced methods such as recently described by Bork and Roelens is planned for the future [4].

4.4 Implementation and Execution

Subsequently, the modeling language has been implemented using the freely available and open ADOxx metamodeling platform and will be made available via Zenodo [25]. The platform allows the easy definition and adaptation of metamodels based on the ADOxx meta\(^2\) model and the creation of model instances in automatically generated model editors (SR\(_9\)). ADOxx provides several text-based formats for defining metamodels and models, as well as a DSL for graphical notation (SR\(_8\)). In this way, the models can be exported manually or programmatically in XML format for processing them in other applications.

The ADOxx XML interface has been chosen as a basis for enabling the execution of the modeling language (SR\(_{10}\)). For this purpose, a software component has been designed in the form of an AR engine to interpret the models. The engine is implemented as a platform-independent web application using the 3D JavaScript library three.jsFootnote 15 and the VR/AR immersive web standard WebXR [18]. The application can be accessed through a WebXR-compatible web browser on any mobile device, such as smartphones or head-mounted displays in line with requirement SR\(_{11}\). For starting an AR experience, the engine processes the models selected by the user and monitors the user’s environment for potentially relevant changes. Based on these environmental changes and user interactions, the application adapts the environment according to the specified workflows specified through triggers, conditions, and actions (SR\(_6\)).

5 Use Case

To demonstrate the use of the modeling language and showcase a practical application, we have developed a use case involving augmented reality-assisted assembly of a bedside table. The goal of this use case is to guide a user through the assembly of a bedside table using an augmented reality application instead of traditional 2D instructions on paper. Figure 3 shows a screenshot of the implementation in ADOxx. It includes an excerpt of a FlowScene model (1), the referenced ObjectSpace model (2), and two Statechange models (3, 4).

Fig. 3.
figure 3

Screenshot of the ADOxx implementation showing model excerpts for supporting an assembly process in augmented reality: 1) FlowScene model of the assembly process. 2) ObjectSpace model of the necessary augmentations and detectables using markers. 3) and 4) showing two exemplary Statechange models.

In the upper part of Fig. 3, the excerpt of the FlowScene model shows how to define the process for assembling the piece of furniture step by step. This includes steps such as turning the pieces into the correct position and attaching them piece by piece. It is important to note that no static flows are defined here but rather trigger-condition-action sequences. The FlowScene model references one ObjectSpace model (2) and several Statechange models (3 & 4).

In the lower left part of Fig. 3, the ObjectSpace model is shown (2). It includes ten Detectables that contain images of markers that are well-suited for computer vision detection algorithms. These act as surrogates for more advanced 3D object recognition algorithms that would permit the direct detection of physical objects. Further, the model includes Augmentation instances for each part of the furniture piece, e.g., “TopPlate 1”. These Augmentations are provided as GLTF filesFootnote 16, which is a common format for 3D objects and their textures. The Augmentations are connected by is_child relations to facilitate positioning and can be assigned Detectables to use them as reference points by anchored relations. The Augmentations and Detectables defined in the ObjectSpace model are then referenced in the FlowScene model.

Furthermore, the FlowScene model (1) includes Statechange instances - e.g., “Init MiddlePlate” - which reference Statechange models. In the lower right of Fig. 3, two examples of Statechange models “Init MiddlePlate” (3) and “Leg 1 Positioned” (4) are shown. They reference one or more Augmentations from the ObjectSpace model and define the state of the position, rotation, and visibility parameters during the execution of the FlowScene model. These parameters are also displayed as a table. A detailed description of the semantics and notation of each language concept is available in Table 1.

Fig. 4.
figure 4

Illustration of the assembly process of a bedside table – cf. IKEA [1] (a–c), and the support through AR based on the visual models (d–f).

The execution of the models of the use case is shown in Fig. 4 by using parts from an IKEA table [1]. Subfigures (a)–(c) illustrate the traditional 2D assembly instructions for (a) “attaching Leg 1”, (b) “turning MiddlePlate 90\(^\circ \) counterclockwise”, and (c) “attaching Leg 2”. Subfigures (d)–(f) illustrate the same steps of the instructions in augmented reality using the aforementioned models [25] and the WebXR AR engine. The screenshots were taken while using the WebXR AR engine in the Chrome browser on a Samsung Galaxy Tab S7 tablet. Subfigure (d) shows the Statechange “Leg 1 Positioned”. It superimposes an image of Leg 1 on top of the real MiddlePlate, whose existence, position, and orientation are detected via a marker – Detectable 10. The Statechange “Rotate MiddlePlate”, where the virtual object is rotated according to the desired position for further assembly of the table is shown in Subfigure (e). Subfigure (f) shows the Statechange “Leg 2 Positioned”. The augmentation shows where the next leg shall be attached. As can be seen in subfigures (d), (e) and (f), several colored markers are placed on the real object at strategic points and according to the ObjectSpace model. Once a marker is detected, it is decided based on the current state of the workflow defined by the FlowScene model if it triggers an action or not. If an action is triggered, the workflow moves on and waits until the next detectable (marker) in line is detected. The flexible structure of the DSML allows multiple workflow paths to be active at the same time by checking for multiple detectables simultaneously. Detectables are also tracked when they are not part of the FlowScene. To avoid making the use case unnecessarily complex, the concepts of Resolves and Observer were not used.

Table 2. Feature comparison of the new domain-specific visual modeling language ARWFML based on twelve specific requirements SR\(_{1-12}\). (Y): Requirement met. (N): Requirement not met. (-): Not specified.

6 Evaluation

Several techniques can be chosen to evaluate the new modeling language, including feature comparisons, theoretical and conceptual investigations, and empirical evaluations [34]. Thereby we opted for a feature comparison to previous approaches along the specific requirements that we had formulated. The previous approaches we considered were the ones from Ruminski and Walczak [30], Grambow et al. [15], Seiger et al. [33], Lechner [21], Campos-Lopez et al. [6], Ruiz-Rube et al. [29], and Wild et al. [37].

For each specific requirement that we had formulated, we conducted a detailed comparison using multiple dimensions, as shown in Table 2. This provides a detailed overview of the features supported by previous approaches and our new modeling language in terms of augmented reality concepts, levels of abstraction, user interaction, metamodeling capabilities, model execution, support for open standards, and availability of according implementations. Thereby, we can show that our new modeling language denoted as ARWFML (AR Workflow Modeling Language) currently supports 26 out of 33 dimensions of requirements, whereas the next runner-up only supports 21 dimensions.

In regard to Augmentations (SR\(_1\)), features such as animations, links, checklists, and forms are not yet supported by our language. However, this is more of a technical than a conceptual issue and will be addressed in future versions. The same holds true for area triggers (SR\(_6\)). Concerning User Interaction (SR\(_8\)), the current implementation of our language only supports text-based and 2D visual modeling, which is due to limitations of the ADOxx platform, which is not yet available as open source. 3D spatial modeling, such as in a 3D-capable modeling tool or directly in AR, is not yet supported. For enabling 3D spatial modeling, the adaptation of current metamodeling platforms would be necessary, e.g., for directly supporting open 3D standards such as WebXR [18] (SR\(_{11}\)). This would certainly facilitate the specification of models, as 3D modeling greatly facilitates spatial imagination.

7 Conclusion and Outlook

In this paper, we presented a domain-specific visual modeling language that is capable of representing complex augmented reality workflows for diverse application scenarios and that can be executed using the open WebXR standard. The modeling language allows designers to specify three different types of visual models: (1) for defining the AR environment, (2) the AR workflow, and (3) different statechanges within this workflow. Thus, the language emphasizes a high level of abstraction and separation of concerns. This abstraction bridges potentially missing knowledge about the technical implementation for AR environments and allows the user to focus on the content and functionality of AR applications. The technical feasibility was demonstrated by implementing the modeling language using the ADOxx platform and a prototypical web application for executing the models. A first evaluation has been conducted through a feature comparison to previous approaches and indicated the high coverage of the defined requirements.

In future research, we plan a further evaluation of the DSML and the AR application by means of a user study, which allows to identify bottlenecks or blind spots of the DSML. Furthermore, the 2D modeling approach presented here has some limitations due to modeling 3D environments in 2D modeling tools. For example, specifying the position of the legs in the application use case described above requires a good understanding of three-dimensional space. It is almost impossible to define position and rotation vectors in 3D space without visualizing them in 3D. Therefore, a new metamodeling platform is currently being developed to incorporate the third dimension during visual modeling, enabling 3D modeling in three-dimensional space [24]. Once the approach has gained further maturity, it will be possible to evaluate it empirically.