Keywords

1 Introduction

Conceptual modeling is a powerful tool for reducing the complexity in a domain under study, enabling stakeholders with different interests to better understand and communicate the underlying concepts and mechanisms of the respective domain. Consequently, the development of dedicated modeling languages and their implementation in the form of a complete modeling method have gained increasing attention in research and industry. The available modeling languages vary in their degree of specificity, thus forming the basis for the separation between general-purpose and domain-specific languages. However, more extensive classifications based on specificity and purpose have been proposed [4].

Domain-specific modeling methods offer advanced functionalities that enhance the value of models beyond visual representations by leveraging underlying domain semantics. Recent efforts in research have explored how such functionalities can be leveraged by incorporating certain aspects of the Semantic Web vision. Among these efforts are the serialization of diagrammatic models according to Semantic Web standards [3], their linking in the manner advocated by the linked data vision [15], and their deployment as hybrid knowledge base [14].

In this paper, we aim to match modeling methods with the intent of expected users by building upon existing research in the field. With this goal in mind, we elaborate on how modeling methods can be matched with suitable Semantic Web knowledge graphs based on the respective method purpose. Through this matching, models can be enriched with domain-specific information from suitable graphs. The presented ideas are finally applied in an experimental smart city implementation scenario of a domain-specific modeling method.

The remainder of the contribution is structured as follows: To begin with, the broader theoretical background on conceptual modeling and knowledge graphs is presented before more detailed elaborations on related work follow. Afterwards, the Semantic Matching Model is presented and showcased in the context of a specific smart city implementation scenario. Finally, we conclude with a summary and evaluate the idea of this study based on a SWOT analysis.

2 Theoretical Background and Related Work

The following chapter introduces the theoretical background relevant to this study before related works on existing semantic approaches and their current use within conceptual modeling are discussed.

2.1 Conceptual Modeling and Domain-Specific Modeling Languages

Conceptual modeling is a long-standing topic in scientific literature and was early defined as the process of formally describing aspects of the physical and social world with the goal of improving understanding and communication [25]. The conceptual modeling approaches that emerged since then have been widely applied in the field of computer science to simplify complex information systems, for instance by representing their underlying concepts and structures as diagrammatic models. In order to achieve such a reduction in complexity and foster comprehension among the involved stakeholders, abstraction is commonly employed [21]. Accordingly, the essence of conceptual modeling comprises the “use of abstraction to reduce complexity for a specific purpose.” [30, p.244]

Compared to so-called general-purpose modeling languages, Domain-Specific Modeling Languages (DSMLs) allow for customizations that aim at achieving a desired level of simplification [2]. As a result, such specified languages have several benefits over their general-purpose counterparts [9], leading to DSMLs being widely adopted across various disciplines [17, 18]. To create a DSML, the corresponding notation, syntax, and semantics need to be specified. This underlying structure is captured in the form of a metamodel, which itself is created using a modeling language, referred to as the metamodeling language [16].

The customization of modeling languages for a specific domain extends the value of models created with them beyond comprehensible representations [2, 9]. Consequently, DSMLs are commonly implemented as complete methods, including advanced functionalities and specifications on how to apply the respective language. The resulting components are captured in the Generic Modeling Method Framework (GMMF), in which the modeling language is supported through a corresponding modeling procedure and through mechanisms and algorithms that enable generic, specific, or hybrid functionalities [16].

Beyond the functionality-based value, the process of creating model value through co-creation needs to be considered in the context of modeling methods. The co-creation of model value is based on the interaction of two key roles: the modeler designing model artifacts, and the method engineer creating the modeling method [2, 30]. This study focuses on how functionalities implemented by the method engineer can create value for the modeler. Future works are intended to derive additional value through the co-creation process by utilizing the created models to derive further functionalities during the iterative process of the Agile Modeling Method Engineering (AMME) life cycle [13].

2.2 Knowledge Graphs, RDF and Linked Open Data

In today’s research, knowledge graphs are commonly understood as graph-based data structures that capture knowledge by explicitly representing concepts and the relationship between them. At the same time, it remains a rather undefined term [8] that has gained a lot of its popularity after Google announced the launch of its knowledge graph in 2012, with the corresponding blog entry regularly being cited in the literature as a seemingly valid definition of the term.

Graph-based data representation gained traction in the early days of the Semantic Web vision, prompting the development of the Resource Description Framework (RDF) by the W3C as a standard to represent web data in a machine-processable way [32]. Further, graph-based data structures ensure a seamless extension and linking to other data sources. Some authors argue that knowledge graphs not only consist of the represented data but are also characterized by an integrated reasoning engine used to make inferences [8].

Finally, the prevalence of Linked Open Data (LOD)Footnote 1 is emphasized. LOD can be described as the last of five data openness levels that were proposed by Tim Bernes-Lee [1]. According to these openness levels, RDF data representations already fulfill the requirements of machine-readability, non-proprietary, and the use of an open standard. If such data representations are enriched by contextual linking to other resources, it fulfills all requirements to be considered LOD.

2.3 Related Work

The related work section shortly delimits semantic approaches used in literature, before comparable applications in the field of conceptual modeling are discussed.

Delimitation of Semantic Approaches. Various semantic approaches related to this study are often used interchangeably in the literature. Namely, Semantic Matching, Semantic Alignment, and Semantic Enrichment are discussed in the following due to their relevance for this research.

Semantic Matching is concerned with comparing concepts from different graph-like data sources to align them based on shared semantic properties [11]. This process of identifying related concepts typically involves ontology matching, which employs various semantic similarity measures to determine concepts from different database schemas, such as ontologies, that can be matched [12].

Semantic Alignment, on the other hand, originates from linguistics and is an important concept when comparing how the grammatical structures of languages differ. Within computer science, it is prominently employed for text or image generation [19], natural language processing [26], and other areas that rely on the alignment of words or phrases with corresponding concepts.

Further, Semantic Enrichment builds upon the extensible graph-based data structure of RDF and has the goal of adding contextual semantics to existing data by integrating links to other LOD sources. This approach is increasingly discussed in the literature, encompassing the domains of cultural heritage [7], bibliographic records [31], and information retrieval [28]. Among the resulting benefits are advanced data integration and reasoning mechanisms.

In the context of this research, we deemed the term semantic matching to be most suitable for describing the identification of relevant knowledge graphs that can be matched with concepts of a DSML based on shared semantic meaning, while semantic alignment forms one specific approach to how such matching mechanisms can be performed. Finally, semantic enrichment is understood as any effort to integrate links to or information from LOD sources.

Semantic Approaches Within Conceptual Modeling. For several years, researchers have investigated how the benefits of conceptual modeling and the Semantic Web vision can be combined. As a result, mechanisms that enable an automated RDF-based serialization of conceptual diagrammatic models have been established. One such example is the RDFizer, which is designed as method-independent functionality for the ADOxxFootnote 2 metamodeling platform [3]. The representation as RDF triples enables advanced querying and inferencing procedures, as was shown in a corresponding proof-of-concept study [5] and through the deployment of translated models as knowledge base on GraphDB [14].

In addition, the transformation of conceptual models into knowledge graphs based on a generic cloud platform has been discussed in the literature [29]. The implementation of the platform was tested on a set of 5.000 UML models, thus demonstrating the capability of processing large volumes of models.

Still, the use of semantic approaches within conceptual modeling is still in its infancy. First efforts have contributed to making the semantics contained in conceptual diagrammatic models accessible through linked data enrichment by building upon Semantic Web conform model transformations [3, 6].

Further, the notion of Linked Open Models has been brought forward, requiring to match and align models with resources outside of the respective modeling environment [15]. For this purpose, an RDF serialization of models is utilized to ensure their conformance with the linked data paradigm.

3 Semantic Matching Between Domain-Specific Modeling Methods and Linked Open Data

This contribution aims at providing an approach for matching semantically relevant knowledge graphs from the LOD cloud with domain-specific modeling methods to enable advanced functionalities and increase the value of created models. For this purpose, we propose a Semantic Matching Model, which is later instantiated to a concrete implementation scenario in the smart city context.

3.1 Semantic Matching Model

In the Semantic Matching Model displayed in Fig. 1, we differentiate between the generic version and a LOD-based instance of it. The Generic Semantic Matching Model emphasizes the fundamental requirement of matching the purpose of a modeling method with the anticipated user intent in a semantically relevant manner. The respective method purpose and user intent, as well as the resulting process of semantic matching, are specific to each instance.

The LOD-based instance of the Semantic Matching Model is closely related to the underlying principles of model value co-creation (cf. section 2.1), as both put an emphasis on the interaction between modeler and method engineer. This interaction is guided by the semantic matching between the methods purpose and the anticipated user intent with the goal of increasing model value. At this point, we have to emphasize that our understanding of semantic matching differs from the common approach of ontology matching used in this context. While ontology matching is concerned with comparing and aligning entire schemas or concepts, our semantic matching process is focused on identifying knowledge graphs that can be matched with concepts of a DSML based on shared semantic meaning.

In Fig. 1, this matching of purpose and anticipated intent is performed by the method engineer who defines a domain structure during the creation of a Domain-Specific Modeling Method (DSMM). The domain structure refers to a collection of knowledge graphs from the LOD cloud that have been selected based on their semantic relevance to the domain under study. Subsequently, this selection is used to establish permanent access between the DSMM and the chosen graphs, ensuring seamless integration of relevant data. In order to ultimately increase the value of created models, the established access has to be utilized in such a way that advanced functionalities supporting the anticipated user intent can be offered within the modeling environment. The actual intent is determined by the modeler’s final decision of which subset of the domain structure to retrieve during the use of the respective DSMM.

3.2 A Smart City Implementation Scenario

Finally, the described Semantic Matching Model is applied in the context of an implementation scenario, which is based on the tour guide case presented as part of our previous work on a citizen development approach for smart cities [24]. The scenario is used to showcase how smart city tour models can be matched with a selection of knowledge graphs to retrieve semantically relevant information.

Fig. 1.
figure 1

Semantic Matching Model (LOD cloud diagram taken from [22]).

Tour Guide Case in the Vienna Smart City Setting. The tour guide scenario presented in [24] is located in the smart city of Vienna and builds upon the idea of citizen development. According to this idea, a person with little to no skills in software engineering or coding can still be enabled to develop their own applications by utilizing dedicated low-code, or no-code platforms [27]. This principle was applied in the smart city context to provide citizens with an easy solution to develop their own services through conceptual modeling.

In the specific scenario, a citizen of Vienna makes use of a dedicated platform that provides microservices relevant to the design of a city tour. Besides selecting and integrating suitable services (e.g., routing, payment, etc.), the citizen also has to design the actual city tour using a conceptual modeling environment. The work at hand aims to support this specific aspect of the city tour design process that is enabled through a dedicated modeling method.

Matching Smart City Tour Models with LOD. The matching of knowledge graphs with smart city tour models is assessed using the setting described previously. The corresponding city tour modeling method allows users to create simple city tours centered around points of interest (POIs) they wish to include.

During the creation of the DSMM, the method engineer defines the domain structure containing several knowledge graphs that are determined by the matching between modeling method purpose and anticipated user intent (cf. section 3.1). The resulting domain structure is represented as structure of touristic information in Fig. 2. Within the context of the implementation scenario, the modeling method purpose is to enable the design of city tours using POIs, while the anticipated user intent is to enrich these POIs with relevant information.

In order to identify suitable knowledge graphs that can be matched with the POI concept of the modeling method, the instances registered on the LOD cloud [22] were limited to only those containing the keywords travel, tourism, Wikipedia, or Points_of_interest. From the 20 results, the ones with a working SPARQL API were selected, resulting in 8 instances, of which only two are in English. Namely, DBpedia and Wikidata constitute the result of the semantic matching process between the modeling method purpose and anticipated user intent, and are thus selected as most suitable knowledge graph domain structure.

After the creation of the DSMM, the modeler creates a city tour by placing POIs and connecting them using the dedicated relation (see Step 1 in Fig 2). Each POI requires a name attribute that corresponds to a real-world POI, which is then used to align the POI with a corresponding LOD concept.

The second step of the LOD alignment involves matching the name attribute of a POI to a LOD concept (see Step 2 in Fig 2). The DBpedia Spotlight text annotation API was used to link unstructured information sources to DBpedia concepts [23]. This process retrieves a DBpedia URI for the respective POI and displays it in the POI notebook. This step of the alignment process is guided by the user intent that determines which domain-specific information to retrieve.

Fig. 2.
figure 2

Execution of information retrieval within the smart city implementation scenario of the Semantic Matching Model implemented on the ADOxx platform.

In the third step, the user verifies which information should be ultimately retrieved. In our scenario, the decision is between the description available on DBpedia or Wikidata, as these two knowledge graphs were selected as the domain structure (see Step 3 in Fig 2). For the retrieval of the DBpedia description, the URI from the previous step is combined with a SPARQL query to request the abstract of the POI from DBpedia. Retrieving the description from Wikidata requires two separate API calls. First, the Wikidata URI of the POI is retrieved through the “owl:sameAs” predicated on DBpedia, which links the two LOD instances. The received URI can then be used to request the description of the POI from Wikidata. In both cases, the received description is imported as the value of the “POIDescription” attribute in the POI notebook of the model.

To summarize, we call the output of the Semantic Matching Model instance a linked model artifact, which is aligned with and enriched by resources from the Semantic Web. Accordingly, such model artifacts follow the notion of linked and open models (cf. [15], or Sect. 2.3) while also enabling their RDF-based serialization. Building upon this notion, we propose the utilization of open government data to further leverage the capabilities of the Semantic Matching Model.

Utilizing Open Government Data for Semantic Matching. Combining RDF-based conceptual model serializations with LOD and Open Government Data (OGD) opens up interesting future research opportunities, especially in the smart city context. While LOD can be defined as RDF data sources linked to each other, OGD describes structured but not linked data made publicly available by governmental institutions on a country or city level [20].

Currently, we are investigating an interconnected process that utilizes the benefits resulting from the combination of conceptual model serializations, LOD and OGD. This process requires the identification of OGD relevant for the domain under study, which in our case is the “Sehenswürdigkeiten Standorte Wien” data setFootnote 3 containing information about more than 2700 POIs in Vienna. Through semantic approaches discussed in this paper (cf. section 2.33.2), it is possible to first align the OGD (after its translation into RDF) with links to corresponding LOD resources and then match these links with POIs contained in city tours.

Through the utilization of this approach, a first exploratory implementation has been set up to enrich the tour model from Fig. 2 with additional semantic information provided by the city of Vienna. Figure 3a displays an excerpt from the enriched tour model serialized as RDF, with the prefix “ns1:” indicating which information is taken from the transformed and linked government data set.

Following the existing research on conceptual model transformations and their deployment as knowledge base [14], both the linked government data set and the linked model artifact in the form of a city tour were imported to GraphDB within this exploratory implementation setting. In Fig. 3b, again, an excerpt from the created repository is displayed, which highlights the connection between the city tour and the linked OGD through the retrieved DBpedia links.

Fig. 3.
figure 3

Semantic Matching of open government data and city tour models.

4 Conclusion and Evaluation

4.1 Summary

The main contribution of this paper consists of the Semantic Matching Model, which aims to contribute to the research efforts investigating how to combine the benefits of conceptual modeling and the Semantic Web vision. Within this model, the purpose of designing a DSMM guides the selection of a suitable graph-based domain structure. The resulting matching process determines the permanent access established between the modeling method and the selected knowledge graphs that constitute the domain structure. Subsequently, the modeler designing a modeling artifact verifies the information to retrieve from the selected domain structure. The information retrieval is guided by the modelers’ intent, which together with the modeling method purpose forms the major influence within the proposed Semantic Matching Model. To showcase the proof-of-concept, an implementation scenario within a smart city setting was provided as an instance of the Semantic Matching Model to illustrate its application.

4.2 SWOT Evaluation

Conclusively, a SWOT analysis is presented in order to evaluate the proposed Semantic Matching Model while incorporating future research possibilities.

Strengths. Even though the realization of the Semantic Web vision as a whole is still a long way ahead, the presented Semantic Matching Model still benefits from the wide range of data available through the LOD cloud. As future extensions to the LOD cloud can be expected, the prospect of matching a wider range of relevant knowledge graphs due to their increased availability forms the main strength. Nevertheless, if such extensions fail to materialize in the future, this strength can transform into a threat, as will be addressed subsequently.

Weaknesses. The current focus on the process guided by the modeling method purpose within the implementation scenario forms the main weaknesses of the contribution. Future efforts must be invested so that the finalized DSMM in the form of a packaged tool can be applied to a greater extent. Only under these conditions can a collection of modeling artifacts be accumulated to form the basis for iterative improvements within the AMME life cycle [13]. Further, it must be acknowledged that the domains covered in the LOD cloud are often too specific to be utilized across the variety of available DSMLs [17, 18].

Opportunities. Future research directions include exploring the use of RDF-based serialization of conceptual diagrammatic models with both LOD and OGD, as discussed in Sect. 3.2. The resulting representation of these components as interconnected knowledge graph bears the potential to become a LOD cloud instance (e.g., see [10]), and thereby serves as domain-specific structure for future touristic applications in the city of Vienna. Additionally, future research efforts are aimed at identifying suitable machine learning techniques that can be utilized to analyze the collection of city tour models and derive semantically relevant recommendations, as soon as such a collection has been accumulated.

Threats. As future extensions to the LOD cloud are expected, but not certain, the prospect of matching a wider range of relevant knowledge graphs may not materialize, thus limiting further advancements of the Semantic Matching Model. One indication of this threat is the lack of working APIs from the knowledge graphs identified during the definition of the domain structure (cf. section 3.2). Although 20 relevant instances were identified during this step, more than half had to be excluded because no working SPARQL API was available. Further, the reliance on semantic matching through knowledge graphs may not be sufficient to meet the dynamic demands of tourists, and integration with multiple web services may be necessary in the future to ensure adaptability.