Keywords

1 Introduction

Extended reality (XR) is increasingly used in many companies and institutions. They introduce XR systems to improve their basic operations, such as employee training, merchandising, product design, marketing, preserving cultural heritage, and education. XR environments are built using multiple 3D objects. While using an XR environment, its 3D objects change their properties—states, appearance, and geometry—in response to user interactions, mutual interactions with other objects as well as autonomous actions. Such interactions and actions constitute the XR behavior and express the knowledge in the domain for which the environment has been developed. However, currently, the acquisition of knowledge by users is possible mostly by immersing into the environment and directly interacting with its 3D content.

Actions and interactions within XR environments could also be subject to exploration based on semantic queries and automated reasoning. Such exploration has the potential to enrich various application domains, especially in the context of the collaborative network- and web-based XR providing additional information about users’ behavior, e.g., past, current, and possible future actions.

Furthermore, such exploration can be used as a method for analyzing and improving the user experience. By using the exploration of content behavior, we can increase the understanding of the relationship between objects in the scene, the user and their interactions. Exploration can provide an efficient tool for measuring the user experience, for example, which interactions in the scene were the most difficult and time-consuming, and the differences in interaction with different objects. Such information can be used for evaluating the system to improve the user experience. Additionally, semantic queries can be used to predict the activity in the virtual scene, so the system can be adjusted for the user to improve the quality of use.

For the possibility of this kind of exploration, XR environments need to use an appropriate representation for interactive 3D content behavior. That can be created using knowledge representation technologies such as the semantic web standards (the Resource Description Framework—RDF [19], the RDF Schema—RDFS [20], the Web Ontology Language—OWL [18] and the SPARQL query language [17]) and ontologies, which gain increasing attention in 3D modelling and multimedia description [14, 15].

The main contribution of this paper is a knowledge-based model for 3D content behavior representation. The primary goal of the model is to enable the expression of behavior semantics in any domain, regardless of the specific XR environment’s 3D graphics and animation technologies. The proposed model consists of two sub-models responsible for the representation of different types of behavior using different knowledge representation technologies: the ontology-based component and the rule-based component. The model uses Semantic Web standards - RDF, RDFS and OWL.

The proposed model has been used to represent the behavior of the virtual environment in the industrial XR training system developed for Amica S.A (Poland’s main house appliances manufacturer) to train employees how to operate specialized industrial devices.

The remainder of the paper is structured as follows: Sect. 2 provides an overview of the current state of modelling 3D content behavior. Then, Sect. 3 explains the proposed semantic model of 3D content behavior. Section 4 provides an example and overview of the generated knowledge base for the XR training system. Finally, Sect. 5 concludes the paper and indicates possible future research.

2 Related Work

The area of semantic representation of 3D content behavior has gained little attention in the scientific literature and is not present in the available XR environments. So far, ontologies have been used mostly for the representation of the structure, geometry, appearance, and animation of 3D content, which has been summarized in [6].

A few approaches have been developed to model the behavior of 3D content using ontologies and semantic web standards. One such approach, proposed in [9, 10, 12], provides temporal operators and allows for the expression of both primitive and complex behavior. Based on this approach, a graphical tool has been implemented to model complex behavior using diagrams, with the ability to encode the specified behavior in X3D scenes [11]. In another approach, presented in [5], primitive actions such as move, turn and rotate are combined to represent complex behavior in a manner that is easily understandable to end users, without requiring knowledge of 3D graphics and animation

In [8], a tool that utilizes semantic concepts, services, and hybrid automata to describe the behavior of 3D content elements is presented. The client is based on a 3D content presentation tool, such as an XML3D browser, while the server is composed of various services that facilitate content selection and configuration of 3D content. Additionally, a separate module is responsible for managing intelligent avatars, including their perception of the scene.

In [1], XSD-based semantic metadata schemes were proposed for describing the interactivity of 3D objects using events, conditions, and actions. Ontologies presented in [3, 4] provide a means of representing multi-user virtual environments and avatars. These ontologies define the geometry, space, animation, and behavior of 3D content, with concepts that are semantic equivalents to those used in commonly-used 3D content formats such as VRML and X3D. Environmental objects, the main entities of 3D content, are described using translation, rotation, and scale. Avatars, on the other hand, are described using names, statuses, and user interfaces (UIs), and their behavior is described using code bases.

In the academic literature, researchers have proposed several approaches for spatiotemporal reasoning on semantic descriptions of evolving human embryos and 3D molecular models. For instance, in [13], the authors proposed an approach that leverages RDF, OWL, and SPARQL, along with an ontology that describes stages, periods, and processes. In a similar vein, [16] proposed an approach that combines different input and output modalities to enable presentation and interaction tailored to specific types of content and tasks. While there has been more research on the semantic representation of animations than interactions, the existing approaches in both areas remain preliminary and face several challenges, particularly in terms of content exploration

There is also another recent work that focuses on humans and their interactions in virtual reality and on creating an ontology for experimentation of human-building interaction that uses virtual reality (VHBIEO) [2]. The proposed solution aims to create a standardized approach to virtual human-building interaction experimentation. To achieve this, an ontology has been developed that builds on existing ontologies and semantic models, such as EXPO, STED, DNAs, ifcOWL, SSN, SUR, and UO. The DOGMA methodology has been employed to establish the internal structure of the ontology. In a similar vein, another ontology has been introduced in [7]. This ontology focuses on human-centred analysis and design of virtual reality conferencing, with goals that include enhancing user experience, facilitating research on VR-conferencing (especially psychological and behavioral), and enabling the sharing of research findings. Although both ontologies are still in their early stages of development.

3 Semantic Behavior Model

The presented knowledge-based model is a formal representation of 3D objects’ behavior, including their interactions (with users and other objects) and autonomous actions. Knowledge exploration is made possible through the use of complex domain terminology, which is tailored to a specific application domain. The model may be used for any domain ontology.

The model allows for the representation of the behavior of XR objects and environments. The behavior of an XR environment encompasses the actions of its users and objects. The model consists of two main components based on distinct knowledge representation technologies, which differ in expressiveness.

  1. 1.

    The ontology-based behavior representation component provides a foundation for representing the behaviors of explorable XR environments. This component encompasses classes and properties defined in ontologies based on Semantic Web standards that implement description logic. The component consists of three main elements:

    1. (a)

      Domain ontologies, which are formal specifications of the conceptualization of individual application domains for which explorable XR environments are created.

    2. (b)

      A fluent ontology, which defines domain-independent classes and properties of temporal entities and the relationships between them. The ontology also specifies entities for the visual representation of the behaviors of XR components and environments. Additionally, it enables visualization associated with knowledge exploration.

    3. (c)

      Visual semantic logs, which represent knowledge about the behavior of users and objects demonstrated during sessions of using explorable XR environments.

  2. 2.

    The rule-based component for behavior representation provides complex relationships between temporal entities defined in the fluent ontology component. In particular, this component enables transitions between events and states that are essential for describing the behavior of XR components and environments. Entities and relationships are specified in a fluent set of rules. The rule set is defined using logic programming, which allows for describing relationships between the properties of individual objects.

The model utilizes knowledge representation technologies to provide axioms that allow for the representation of behaviors, features of components, and XR environments. An XR component is a reusable module, which is a collection of classes with methods that can be combined with other XR components in an XR environment.

3.1 Ontology-Based Component for Behavior Representation

This component is part of the behavior model used for representing the features as well as past and current behaviors of XR environments in the form of behavior logs, which encompass the actions of users and objects described using temporal statements. This component is based on Semantic Web standards - RDF, RDFS, and OWL.

States and Events. The fundamental units representing the behavior of explorable XR environments are states. A state is a representation of a fragment of the XR environment, specifically a set of concepts describing users and objects. States are described using terms, while their occurrence can be evaluated using predicates. In the presented method, we assume that an Instantaneous State is a state that occurs at a specific point in time, while an Interval State is a state that occurs over a time interval.

States are initiated and terminated by events. Like states, events are denoted using terms, while their occurrence can be evaluated using predicates.

  1. 1.

    The begins predicate is used to indicate that an event initiates a state. This predicate is true for a given event and state if and only if the event starts the state. It is denoted as begins(event, state), for example, begins(startOfRun, run).

  2. 2.

    The finishes predicate is used to indicate that an event terminates a state. This predicate is true for a given event and state if and only if the event ends the state. It is denoted as finishes(event, state), for example, finishes(endOfRun, run).

Fluent Ontology. The fluent ontology defines fundamental temporal entities - classes and properties - that allow for the representation of users’ and objects’ behaviors. The fluent ontology extends domain ontologies with temporal terminology, which can be used in conjunction with domain-specific classes and properties. The fluent ontology is an invariant part of the approach, which is common to all explorable XR environments, regardless of individual application domains.

  1. 1.

    The predicate start is used to indicate that a temporal entity begins at a specific point in time. This predicate is true for a given temporal entity and time point if and only if the temporal entity starts at that time point. It is denoted as start(temporalEntity, tp).

  2. 2.

    The predicate end is used to indicate that a temporal entity terminates at a specific point in time. This predicate is true for a given temporal entity and time point if and only if the temporal entity ends at that time point. It is denoted as end(temporalEntity, tp).

The fluent ontology defines a variety of distinct entities, such as:

  1. 1.

    State: An OWL class encompassing all states.

  2. 2.

    InstantState: A subclass of State representing all instantaneous states.

  3. 3.

    IntervalState: A subclass of State representing all interval states, which is disjoint from InstantState.

  4. 4.

    Event: An OWL class that includes all events.

  5. 5.

    TemporalEntity: An RDFS class that covers all temporal entities.

  6. 6.

    TimePoint: A subclass of both RDFS datatype and TemporalEntity, representing all time points. The use of RDFS datatype allows for specifying time domains according to the requirements of individual applications.

  7. 7.

    TimeInterval: A subclass of TemporalEntity, among others, that constitutes all time intervals. TimeInterval is described by two properties, start and end, which indicate the time points at which the interval begins and ends, respectively.

  8. 8.

    TimeSlice: An OWL class representing all time slices, each of which possesses two obligatory properties determined by qualified cardinality restrictions in OWL.

  9. 9.

    VisualDescriptor: An OWL class encompassing all visual descriptors.

Domain ontologies and fluent ontologies are used to create visual semantic behavior logs consisting of temporal statements. The concept of temporal statements is illustrated in Fig. 1. In the figure, every pair of nodes connected by a predicate represent a single RDF statement.

Fig. 1.
figure 1

Time statements (yellow) of the visual semantic behavior log. Entities that are related but not part of the time statements are highlighted in blue. (Color figure online)

Temporal Reasoning. We also define predicates for time points and intervals based on event calculus predicates and time ontology. Such predicates allow for temporal reasoning on time points and intervals.

  • The predicate in is a predicate that is true for a given time point and time interval if and only if the time point is greater than or equal to the time point that starts the time interval and less than or equal to the time point that ends the time interval.

As examples of relationships between time intervals, we define three predicates from the Time Ontology proposed by Allen and Ferguson [18]: before, after, and starts.

  • The predicate before is a predicate that is true for a given time interval ti1 and time interval ti2 if and only if ti1 ends before ti2 begins. before(ti1, ti2) is equivalent to after(ti2, ti1).

  • The predicate starts is a predicate that is true for a given time interval ti1 and time interval ti2 if and only if ti1 and ti2 begin at the same time and ti1 ends before ti2 concludes.

3.2 Rule-Based Component for Behavior Representation

The rule-based component extends the formal representation of states and events provided by the ontology-based component and defines the relationships between them.

The compound term begin signifies an event that initiates a state. It is denoted as begin(state), for example, begin(run) = startOfRun. Additionally, begin(begin(state), state).

The compound term finish signifies an event that concludes a state. It is denoted as finish(state), for example, finish(run) = endOfRun. Moreover, finishes(finish(state), state).

Every state is initiated by an event and concluded by an event. Therefore, from the occurrence of states, we can infer the occurrence of the events associated with them.

An event initiating an interval state occurs at the time point that starts the time interval of that state. An event concluding an interval state occurs at the time point that ends the time interval of that state. The event that initiates an instantaneous state is equal to the event that concludes the state, and it occurs at the time point when the state appears.

  1. 1.

    An event initiating an interval state occurs at the time point that starts the time interval of that state.

  2. 2.

    An event concluding an interval state occurs at the time point that ends the time interval of that state.

  3. 3.

    The event that initiates an instantaneous state is equal to the event that ends the state, and it occurs at the time point when the state appears.

The eventStartEnv atom represents the event that initiates the XR environment. Therefore, it can be stated that no event can occur earlier than eventStartEnv.

The eventStopEnv atom represents the event that concludes the XR environment. Consequently, it can be said that no event can occur later than eventStopEnv.

Events signify the beginning and end of states, thereby determining their duration. The duration of a state is the difference between the time point of the event that concludes the state and the time point of the event that initiates the state. It is defined in the domain of time points and denoted by a compound term.

The duration can be utilized to calculate the length of time a particular state lasts in a selected time domain. Since each state is initiated and concluded by an event that determines the duration of the state.

In the case of an instantaneous state, the events that initiate and conclude the state are equal, which means the duration of an instantaneous state is zero. On the other hand, for an interval state, the events that initiate and conclude the state are different. As a result, the duration of an interval state is greater than zero.

Behavior. In practical applications, there is a need for direct relationships between states and the temporal entities in which these states occur. The concepts of event calculus can be used to achieve this goal.

The immediate evaluation predicate (IEP) is true for a given state or event and a timepoint if and only if the state or event occurs at that timepoint. In particular, IEP predicates include holdsAt and time. holdsAt(state, tp) assesses whether a state occurs at a timepoint tp. This predicate can be applied to both instantaneous and interval states, as each state can be evaluated at a timepoint.

To evaluate whether a state is initiated or terminated at a given timepoint, the IEP can be used. This evaluation can be performed using the time predicate along with the begin and finish terms, which represent events. The predicates time(begin(state), tp) and time(finish(state), tp) assess whether a state is initiated and terminated, respectively, at a timepoint tp.

Similarly, we define a predicate that evaluates the occurrence of states in time intervals (ETOI). It is true for a given interval state and a time interval if and only if the interval state occurs within the time interval. The ETOI predicate is holds(state, ti).

The temporal evaluation predicate (TEP) is either IEP or ETOI and can be used to evaluate the occurrence of states in timepoints or time intervals.

State Transitions. States and events can trigger other states and events. For instance, the XR environment is active (state) from the moment it begins (event). In a factory, a stamping press initiates operation (state) when the start button is pressed (event) and proceeds to shape the product (state) as a result of contact with the metal material (event). Moreover, states can depend on other states, such as a house being fully illuminated if each room has been illuminated, regardless of the order in which lights were turned on in individual rooms. This allows for the creation of arbitrary cause-and-effect chains described by transitions that are consistent with the event-condition-action model. There are two types of transitions: event-based transitions and state-based transitions. An event-based transition is a Horn clause that consists of:

  1. 1.

    Body, which is a conjunction of a statement based on the IEP predicate that evaluates the occurrence of an event, and any number of statements based on predicates that are not fluent,

  2. 2.

    Head, which is a statement based on the TEP predicate.

The assert predicate is a predicate that is true for a given statement if and only if the statement exists in the rule set or it is possible to add the statement to the rule set. The predicate is denoted as assert(statement). In examples, the predicate is often used to associate time points with time intervals in transitions. We assume that the assert predicate is satisfied unless stated otherwise.

4 Example

4.1 General Information

The introduced semantic behavior model was used to represent the virtual environment of the industrial worker training XR system, which allows trainees to learn how to act safely in an industrial setting.

The system has been developed in the Unity game engine and uses an Oculus Quest 2 VR headset. Users can interact with objects using their own hands, which was archived by using the Oculus Integration plug-in together with supplementary plug-ins to ensure a high level of immersion.

The training scenario implemented in the system focuses on safe work with the industrial press. It was developed using resources from Amica S.A., a major producer of household equipment in Poland. The prepared virtual scene in which the training takes place is shown in Fig. 2.

The scenario consists of several steps:

  1. 1.

    The worker pulls the metal sheet out of the container

  2. 2.

    The worker visually inspects the physical condition of the metal sheet to be processed looking for flaws

  3. 3.

    The worker places a metal sheet in the press

  4. 4.

    The worker lubricates/sprays the metal sheet parts with oil

  5. 5.

    The worker starts the press using the button

  6. 6.

    The worker takes the finished product from the press

Fig. 2.
figure 2

An industrial press in a factory hall

4.2 Knowledge Base Example

In the scene, there are three main objects which are used in the scenario: the metal sheet, the industrial press and the button for turning on the industrial press. The system generates a knowledge base that describes these objects. The example of such a knowledge base presents Listing 1.1. The objects are described in lines 1–15. Every object has its name, id and possible states; for example, Object2 (lines 6–9) - the industrial press has id: 2, and three possible states: empty press, press with inserted metal sheet and working press. The first two states are instant states, while the last state is an interval state. Additionally, object3 (a press button) is part of the industrial press, which is described in line 13 by object property isObjectOf. In lines 17–47, the rest possible states are described, also by their name and id.

figure a

Every action that happens in the scene is presented in the knowledge base as an event. Each described event has its own name and informs which states it begins and finishes. Moreover, each event has assigned a time slice, which informs when the event took place.

The Listing 1.2 presents the knowledge base generated by the system when the user inserts a metal sheet into an industrial press (Fig. 3). It generates an event (lines 1–7) that finishes two states: pressState6 (empty press) and SheetState3 (Visually checked), and begins two states pressState7 (press with inserted metal sheet) and SheetState4 (metal sheet inside press). The event has an object TimeSlice assigned (lines 9–12). Since the event happened over a period of time, additionally time interval was assigned (TimeInterval13). Time interval inform when the event started and ended (lines 14–17).

figure b
Fig. 3.
figure 3

User inserts metal sheet into industrial press

Another fragment of the generated knowledge base is presented in the Listing 1.3. When the user pushes the button (Fig. 4) to activate the industrial press, the event is generated (lines 1–7). This event finishes two states: pressState7 (press with inserted metal sheet) and ButtonState8 (button unpressed), and begins two states pressState10 (working press) and ButtonState9 (button pressed). Event has assigned the time slice (lines 9–12), which by data property time point informs in what exact time the event happened (line 12). After that, another event (stopping the operation of the press) is generated, which describes when and what happens in the scene when the press stopped working (lines 14–24).

figure c
Fig. 4.
figure 4

User pushes the button to activate the press

4.3 Exploration Queries

After generating the knowledge base, it is possible to explore it using semantic queries. It gives the possibility to gain a broad range of information about the progress of training, for example, what events happened during the training process, when they happened, how the states of objects in the scene changed etc.

The exploration is possible due to the use of queries encoded using SPARQL, which is the main query language for RDF-based ontologies and knowledge bases. Below are presented examples of such queries and their results:

Query 1. Which events had happened before the press button was pushed?

figure d
Fig. 5.
figure 5

Results of query 1

The first query provides information about what happened in the scene before the trainee pushed the press button. The query searches for events and their assigned time slices that happened before the event with the name “pressing the turn on the button of the press”. The result consists of event names ordered by time, which is shown in Fig. 5.

Query 2. How long the metal sheet was inspected?

figure e
Fig. 6.
figure 6

Results of query 2

The second query gives information about the duration of the visual inspection of the metal sheet. The query searches for the time slice of the event named “visually controlling the metal sheet”, then using the time interval object, access the information when the event started and ended. The result of the query is shown in Fig. 6.

Query 3. When and What states the sheet metal transitioned into?

figure f
Fig. 7.
figure 7

Results of query 3

The last query provides information about what happened sequentially with the metal sheet. The query searches for events that changed the states of the scene object named “metal sheet”. Then using proper time slices, the query determines the time by which the states are ordered. The results consist of the name and ID of the states and the time when they have begun. The query result is shown in Fig. 7.

Such queries can provide information about user experience with the training system. For example, how many mistakes the user made, how long the interaction with different objects took, and how complex and hard were the activities for the user. This information can be then used to evaluate the system and training scenario in order to increase the functionality and user-friendliness of the system and boost the efficiency of the virtual training.

5 Conclusions and Future Work

The use of knowledge-based representation for interactive 3D content, especially in the context of the collaborative network- and web-based VR/AR systems, has the potential to benefit various application domains by providing additional information about users’ and objects’ behavior. The paper presents a new approach to semantically representing the behavior of XR environments, including users’ and objects’ interactions and autonomous actions. The proposed model allows for the representation of temporal characteristics of XR environments with domain-specific terminology, which can be further explored through queries and reasoning. This approach could potentially lead to a more effective and widespread distribution of domain knowledge using XR. Furthermore, the information acquired by semantic queries can improve the user experience by measuring the performance of the users inside the virtual scene, which can lead to evaluating the system and training scenario in order to enhance the usability of the training system.

Additionally, an example and overview of the generated knowledge base for the VR training system have been presented, which is a real training scenario based on the requirements of the training process house appliances manufacturer. Additionally, on generated knowledge base the semantic queries were performed and results and overview were presented.

Possible future research directions can focus on the development of semantic repositories with domain knowledge collected using the proposed model as well as the development of machine learning methods for analyzing and classifying the collected data. It could be especially useful to discover patterns in users’ behavior in training sessions, e.g., common problems and mistakes, to improve the overall training performance. In addition, we plan to develop graphical tools supporting the execution of queries against training ontologies and knowledge bases, which would facilitate the use of our approach by non-IT-specialists. Moreover, formal responses to semantic queries can be enriched by the accompanying visualization recorded during the training session.