Keywords

1 Introduction

Developing ontologies is not a straightforward task. This assumption is implicitly demonstrated by the number of ontology development processes that have been developed in last 30 years, that have their roots in the Knowledge and Software Engineering domains. Moreover, the choice of the right development process to follow is a delicate task, since it may vary according to a large amount of variables, such as the intrinsic complexity of domain to be modelled, the context in which the model will be used (enterprise, social community, high-profile academic/industrial project, private needs, etc.), the amount of time available for the development, and the technological hostility and the feeling of unfruitfulness shown by the final customers against both the model developed and the process adopted for the development.

In the past twenty years, the Software Engineering domain has seen the proposal of new agile methodologies for software development, in contrast with highly-disciplined processes that have characterised such discipline since its beginning. Following this trend, recently, agile development methodologies have been proposed in the field of Ontology Engineering as well (e.g. [3, 7, 13]). Such kind of methodologies would be preferred when the ontology to develop should be composed by a limited amount of ontological entities – while the use of highly-structured and strongly-founded methodologies remain valid and, maybe, mandatory to solve and model incredibly complex enterprise projects.

One of main characteristics that ontology development methodologies usually have is the use of exemplar data during the development process so as to:

  • avoid inconsistencies – a common mistake when developing a model is to make the TBox consistent if considered alone, and inconsistent when we define an ABox for it, even if all the classes and properties are completely satisfiable. Using real-world data, as exemplar of a particular scenario of the domain we are modelling, can definitely prevent this problem;

  • have self-explanatory and easy-understandable models – trying to implement a particular real-world and significative scenario related to a model by using real data allows one to better understand if each TBox entity has a meaningful name that describes clearly the intent and the usage of the entity itself. This allows users to understand a model without spending a lot of effort in reading entity comments and the related documentation. The use of real data as part of the ontology development obliges ontology engineers and developers to think about the possible ways users will understand and use the ontology they are developing, in particular the very first time they look at it;

  • provide examples of usage – producing data within the development process means to have a bunch of exemplars that describe the usage of the model in real-world scenarios. This kind of documentation, implicitly, allows users to apply a learn-by-example approach [1] in understanding the model and during their initial skill acquisition phase.

As already mentioned, several methodologies already propose the use of data during the development. However, the current ontology engineering processes, that deal with the development of small-/medium-size ontologies, usually do not include other aspects that, according to our experience, are crucial for guaranteeing a correct and quick outcome. In particular, it would be important:

  • to take advantages of existing agile methodologies from the Software Engineering domain, by considering important features such as adaptive planning, evolutionary development, early delivery, continuous improvement, and rapid and flexible response to change;

  • not to oblige pair programming – from our personal experience, the development of small ontologies usually involves only one ontology engineer;

  • to provide a precise definition of different kinds of tests that the ontology must pass at each stage of the development, and that can be used for documenting the ontology as well.

In order to address all the aforementioned desiderata, in this paper we introduce SAMOD (Simplified Agile Methodology for Ontology Development), a novel agile methodology for the development of ontologies, partially inspired to the Test-Driven Development process in Software Engineering [2] and to existing agile ontology development methodologies such as eXtreme Design (XD) [13]. In particular, SAMOD is organised in three simple steps within an iterative process that focuses on creating well-developed and documented models by using significative exemplars of data, so as to produce ontologies that are always ready-to-be-used and easily-understandable by humans (i.e. the possible customers) without spending a lot of effort.

SAMOD is the result of our dedication to the development of ontologies in the past six years. While the first draft of the methodology has been proposed in 2010 as starting point for the development of the Semantic Publishing and Referencing OntologiesFootnote 1 [10], it has been revised several times so as to come to the current version presented in this paper – which has been already used for developing several ontologies, such as the Vagueness OntologyFootnote 2, the F Entry OntologyFootnote 3, the OA Entry OntologyFootnote 4, and the Imperial Data OntologyFootnote 5. While a full introduction to SAMOD is provided in [11], in this paper we provide a summary of it and we discuss some outcomes of an user-based evaluation we have conducted in the past months.

The rest of the paper is organised as follows. In Sect. 2 we introduce the entities involved in the methodology. In Sect. 3 we present all the steps of SAMOD, providing details for each of them. In Sect. 4 we discuss the outcomes of an experiment where we asked to subjects with limited knowledge about Semantic Web technologies and Ontology Engineering to use SAMOD for developing an ontology. In Sect. 5 we present some of the most relevant related works in the area. Finally, in Sect. 6 we conclude the paper sketching out some future works.

2 Preliminaries

The kinds of people involved in SAMOD are domain experts and ontology engineers. A domain expert, or DE, is a professional with expertise in the domain to be described by the ontology, and she is mainly responsible to define, often in natural language, a detailed description of the domain in consideration. An ontology engineer, or OE, is a person who constructs meaningful and useful ontologies by using a particular formal language (such as OWL 2Footnote 6) starting from an informal and precise description of a particular problem or domain provided by DEs.

A motivating scenario (MS) [17] is a small story problem that provides a short description and a set of informal and intuitive examples about it. In SAMOD, a motivation scenario is composed by a name that characterises it, a natural language description that presents a problem to address, and one or more examples according to the description.

An informal competency question (CQ) [17] is a natural language question that represents an informal requirement within a particular domain. In SAMOD, each informal competency question is composed by an unique identifier, a natural language question, the kind of outcome expected as answer, some exemplar answers considering the examples provided in the related motivating scenarioFootnote 7, and a list of identifiers referring to higher-level informal competency questions that the question in consideration requires, if any.

A glossary of terms (GoT) [5] is a list of term-definition pairs related to terms that are commonly used for talking about the domain in consideration. The term in each pair may be composed by one or more words or verbs, or even by a brief sentence, while the related definition is a natural language explanation of the meaning of such term. The terminology used for naming terms and for describing them must be as close as possible to the domain language.

As anticipated in the introduction, SAMOD prescribes an iterative process which aims at building the final model through a series of small steps. At the end of each iteration a particular preliminary version of the final model is released. Within a particular iteration in, the current model is the version of the final model released at the end of the iteration in-1. Contrarily, a modelet is a stand-alone model describing a particular aspect of the domain in consideration which is used to provide a first conceptualisation of a motivating scenario, without caring about the current model available after the previous iteration of the process – it is similar to a microtheory as introduced in Cyc [15]. By definition, a modelet does not include entities from other models and it is not included in other models.

A test case Tn, produced in the nth iteration of the process, is a sextuple including a motivating scenario MSn, a list of scenario-related informal competency questions CQn, a glossary of terms GoTn for the domain addressed by the motivating scenario, a TBoxn of the ontology implementing the description introduced in the motivating scenario, an exemplar ABoxn implementing all the examples described in the motivating scenario according to the TBoxn, and a set of SPARQLFootnote 8 queries SQn formalising the informal competency questions. A bag of test cases (BoT) is a set of test cases.

Given as input MSn, TBoxn and GoTn – a model test aims at checking the validity of TBoxn against specific requirements:

  • [formal requirement] understanding (even by using appropriate unit tests [19]) whether TBoxn is consistent;

  • [rhetorical requirement] understanding whether TBoxn covers MSn and whether the vocabulary used by TBoxn is appropriate.

Given as input MSn, TBoxn and ABoxn built according to TBoxn, and considering the examples described in MSn, a data test aims at checking the validity of the model and the dataset and against specific requirements:

  • [formal requirement] understanding whether the TBoxn is still consistent when considering the ABoxn;

  • [rhetorical requirement] understanding whether the ABoxn describes all the examples accompanying the motivating scenario completely.

Given as input TBoxn, ABoxn, CQn, and SQn, a query test aims at checking the validity of TBoxn, ABoxn, and each query in SQn against specific requirements:

  • [formal requirement] understanding whether each query in SQn is well-formed and can correctly run on Tboxn + ABoxn;

  • [rhetorical requirement] understanding whether each query in CQn is mapped into an appropriate query in SQn and whether, running each of them on TBoxn + ABoxn, the result conforms to the expected outcome detailed in each query in CQn.

Fig. 1.
figure 1

A brief summary of SAMOD, starting with the “Collect requirements and develop a modelet” step.

3 Methodology

SAMOD is based on the following three iterative steps (briefly summarised in Fig. 1) – where each step ends with the release of a snapshot of the current state of the process called milestone:

  1. 1.

    OEs collect all the information about a specific domain, with the help of DEs, in order to build a modelet formalising the domain in consideration, following certain ontology development principles. Then OEs create a new test case that includes the modelet. If everything works fine (i.e. model test, data test, and query test are passed), OEs release a milestone and proceed;

  2. 2.

    OEs merge the modelet of the new test case with the current model produced by the end of the last iteration of the process, and consequently they update all the test cases in BoT specifying the new current model as TBox. If everything works fine (i.e. model, data and query tests are passed according to their formal requirements only), OEs release a milestone and proceed;

  3. 3.

    OEs refactor the current model, in particular focussing on the last part added in the previous step, taking into account good practices for ontology development processes. If everything works fine (i.e. model, data and query tests are passed), OEs release a milestone. In case there is another motivating scenario to be addressed, OEs iterate the process, otherwise the process ends.

The next sections elaborate on these steps introducing a real running exampleFootnote 9 considering a generic iteration in.

3.1 Step 1: Define a New Test Case

OEs and DEs work together to write down a motivating scenario MSn, being as close as possible to the language DEs commonly use for talking about the domain. An example of motivating scenario is illustrated in Table 1.

Table 1. An example of motivating scenario.

Given a motivating scenario, OEs and DEs should produce a set of informal competency questions CQn, each of them identified appropriately. An example of an informal competency question, formulated starting from the motivating scenario in Table 1, is illustrated in Table 2.

Table 2. An example of competency question.

Now, having both a motivating scenario and a list of informal competency questions, OEs and DEs write down a glossary of terms GoTn. An example of glossary of terms is illustrated in Table 3.

Table 3. An example of glossary of terms.

The remaining part of this step is led by OEs onlyFootnote 10, who are responsible of developing a modelet according to the motivating scenario, the informal competency questions and the glossary of termsFootnote 11.

In doing that work, they must strictly follow the following principles:

  • Keep it small. Keeping the number of the developed ontology entities small – e.g. Miller’s magic number “7 ± 2” [9] entities per type (classes, object properties, data properties) – so as not to overload OEs’ working memory. In addition, by making small changes (and retesting frequently, as our framework prescribes), one has always a good idea of what change has caused an error/inconsistency in the model [2].

  • Use patterns. OEs should take into consideration existing knowledge, in particular existing and well-documented patterns – the Semantic Web Best Practices and Deployment Working Group pageFootnote 12 and the Ontology Design Patterns portalFootnote 13 are both valuable examples – as well as widely-adopted Semantic Web vocabularies – such as FOAFFootnote 14 for people, SIOCFootnote 15 for social communities, and so on.

  • Middle-out development. OEs should start to define the most relevant concepts and then to focus on more high-level and more concrete ones. Such middle-out approach [18] allows one to avoid unnecessary effort during the development because detail arises only as necessary, by adding sub- and super-classes to the basic concepts. Moreover, this approach, if used properly, tends to produce much more stable ontologies [17].

  • Keep it simple. The modelet must be designed according to the information obtained previously (MSn, CQn, GoTn) in an as-quick-as-possible way, spending the minimum effort and without adding any unnecessary semantic structure – avoiding to think about inferences at this stage, and rather focussing on describing the motivating scenario fully.

  • Self-explanatory entities. Each ontological entity must be understandable by humans by simply looking at its local name (i.e. the last part of the entity IRI). No labels and comments have to be added at this stage and all the entity IRIs must not be opaque – class local names has to be capitalised (e.g. Justification) and in camel-case notation if composed by more than one word (e.g. DescriptionOfVagueness), property local names must start with a non-capitalised verbFootnote 16 and in camel-case notation if composed by more than one word (e.g. wasAttributedTo), and individual local names must be non-capitalised (e.g. ceo) and dash-separated if composed by more than one word (e.g. quantitative-vagueness).

The goal of OEs is to develop a modeletn, possibly starting from a graphical representation written in a proper visual language – such as Graffoo [4] – so as to convert it automatically in OWL by means of appropriate tools, e.g. DiTTO [6].

Starting from modeletn, OEs proceed in four phases:

  1. 1.

    run a model test on modeletn. If it succeeds, then

  2. 2.

    create an exemplar dataset ABoxn that formalises all the examples introduced in MSn according to modeletn. Then, OEs run a data test and, if succeeds, then

  3. 3.

    write SPARQL queries in SQn as many informal competency questions in CQn. Then, OEs run a query test and, if it succeeds, then

  4. 4.

    create a new test case Tn = (MSn, CQn, GoTn, modeletn, ABoxn, SQn) and add it to BoT.

When running the model test, the data test and the query test, it is possible to use any appropriate software to support the task, such as reasoners (PelletFootnote 17, HermiTFootnote 18) and query engines (JenaFootnote 19, SesameFootnote 20).

Any failure of any test that is considered a serious issue by the OEs results in getting back to the more recent milestone. It is worth mentioning that an exception should be also arisen if OEs think that the motivating scenario MSn is to big to be covered by only one iteration of the process. In this case, it may be necessary to re-schedule the whole iteration, e.g. by splitting adequately the motivating scenario in two new ones.

3.2 Step 2: Merge the Current Model with the Modelet

At this stage, OEs merge modeletn, included in the new test case Tn, with the current model, i.e. the version of the final model released at the end of the previous iteration (i.e. in-1). OEs proceed in three consecutive steps:

  1. 1.

    to define a new TBoxn mergingFootnote 21 the current model with modeletn, by adding all the axioms in the current model and modeletn to TBoxn and then by collapsing semantically-identical entities, e.g. those that have similar names and that represent the same real-world entity (for instance Person and HumanBeing);

  2. 2.

    to update all the test cases in BoT, swapping the TBox of each test case with TBoxn and refactoring each ABox and SQ according to the new entity names if needed, so as to refer to the more recent model;

  3. 3.

    to run the model test, the data test and the query test on all the test cases in BoT, according to their formal requirements only;

  4. 4.

    to set TBoxn as the new current model.

Any serious failure of any test – i.e. something went bad in updating the test cases in BoT – results in getting back to a previous milestone. In this case, OEs have to consider either the most recent milestone, if they think there was a mistake in some actions performed during the current step, or one of the other previous milestones, if the failure is demonstrably a consequence of any of the components of the latest test case Tn.

3.3 Step 3: Refactor the Current Model

In the last step, OEs work to refactor the current model shared among all the test cases in BoT, and, accordingly, each ABox and SQ of each test case, if needed. In doing that task, OEs must strictly follow the following principles:

  • Reuse existing knowledge. Reusing concepts and relations defined in other models is encouraged and often labelled as a common good practice [18]. The reuse can result either in including external entities in the current model as they are or in providing an alignmentFootnote 22 or an harmonisationFootnote 23 with another model.

  • Document it. Adding annotations – i.e. labels (i.e. rdfs:label), comments (i.e. rdfs:comment), and provenance information (i.e. rdfs:isDefinedBy) – to ontological entities, so as to provide natural language descriptions of them and to allow tools (e.g. LODE [12]) to produce an HTML human-readable documentation from the ontology source;

  • Take advantages from technologies. Enriching the current model by using all the capabilities offered by OWL 2 (e.g. keys, property characteristics, property chains, inverse properties and the like) in order to infer automatically as much information as possible starting from a (possible) small set of real data. In particular, it is important to avoid over-classifications by specifying assertions that may be automatically inferred by a reasoner – e.g. creating an inverse property of a property P defining explicitly its domain and range even if they can be inferred automatically.

Finally, once the refactor is finished, OEs have to run the model test, the data test and the query test on all the test cases in BoT. This is a crucial task to perform, since it guarantees that the refactoring has not damaged any existing conceptualisation implemented in the current model.

3.4 Output of an Iteration

Each iteration of SAMOD produces a new test case that will be added to the bag of test cases (BoT). Each test case describes a particular aspect of the model under-development, i.e. the current model under consideration after one iteration of the methodology.

In addition of being integral part of the methodology process, each test case represents a complete documentation of a particular aspect of the domain described by the model, due to the natural language descriptions it includes (the motivating scenario, the informal competency questions, and the glossary of terms), as well as the formal implementation of exemplar data (the ABox) and possible ways of querying the data compliant with the model (the set of formal queries). All these additional information should help end-users in understanding, with less effort, what the model is about and how they can use it to describe the particular domain it addresses.

4 Experiment

We performed an experiment so as to understand to which degree SAMOD can be used by people with limited experience in Semantic Web technologies and Ontology Engineering. In particular, we organised a user testing session so as to gather some evidences on the usability of SAMOD when modelling OWL ontologies.

We asked nine Computer Science and Law people – one professor, two post-docs, and six Ph.D. students – to use SAMOD (one iteration only) for modelling a particular motivating scenario provided as exercise. SAMOD, as well as the main basics on Ontology Engineering, OWL, and Semantic Web technologies, were introduced to the subjects during four lectures of four hours each. At the end of the last lecture, we asked them to answer three questionnaires:

  • a background questionnaire containing questions on previous experience in Ontology Engineering and OWL;

  • another questionnaire containing ten likert questions according to the System Usability Scale (SUS), which also allowed us to measure the sub-scales of pure Usability and pure Learnability, as proposed recently by Lewis and Sauro [8];

  • a final questionnaire asking for the experience of using SAMOD for completing the task.

The mean SUS score for SAMOD was 67.25 (in a 0 to 100 range), approaching the target score of 68 to demonstrate a good level of usability (according to [14]). The mean values for the SUS sub-scales Usability and Learnability were 65.62 and 73.75 respectively. In addition, an Experience score was calculated for each subject by considering the values of the answers given to the background questionnaire. We compared this score (x-axis in Fig. 2) with the SUS values and the other sub-scales (y-axis) using the Pearson’s r. As highlighted by the red dashed lines (referring to the related Least Squares Regression Lines), there is a positive correlation between the Experience score and the SUS values – i.e. the more a subject knew about ontology engineering in general, the more SAMOD was perceived as usable and easy to learn. However, only the relation between the Learnability score and the Experience score was statistical significant (p < 0.05).

Fig. 2.
figure 2

Three comparisons between the SUS score (and its sub-scales) and the experience score by the subjects.

Axial coding of the personal comments expressed in the final questionnaires [16] revealed a small number of widely perceived issues. Overall the methodology proposed has been evaluated positively by 7 subjects (described with adjectives such as “useful”, “natural”, “effective”, and “consistent”), but it has also received criticisms by 5 subjects, mainly referring to the need of more expertise in Semantic Web technologies and Ontology Engineering for using it appropriately. The use of the tests for assessing the ontology developed after a certain step has been appreciated (3 positive comments vs. 1 negative one), as well as the use of the scenarios and examples in the very first step of SAMOD (3 positive comments) and the implementation of competency questions in form of SPARQL queries (2 positive comments). All the outcomes of the questionnaires are available online in the SAMOD GitHub repositoryFootnote 24.

5 Related Works

Several quick-and-iterative ontology development processes have been introduced recently, which could be preferred when the ontology to develop should be composed by a limited amount of ontological entities – while the use of highly-structured and strongly-founded methodologies (e.g. [5, 17, 18]) is still necessary and, maybe, mandatory for incredibly complex enterprise projects. In this section we introduce some of the most interesting agile approaches to ontology development.

One of the first agile methodologies introduced in the domain is eXtreme Design (XD) [13], which has been inspired by the eXtreme Programming methodology in Software Engineering. The authors described XD as “an approach, a family of methods and associated tools, based on the application, exploitation, and definition of ontology design patterns (ODPs) for solving ontology development issues”. Summarising, XD is an agile methodology that uses pair design (i.e. groups of two ontology engineers working together during the development) and an iterative process which starts with the collection of stories and competency questions as requirements to address, and then it proposes the re-use of existing ontology design patterns for addressing such informal requirements.

Another recent approach has been introduced by Keet and Lawrynowicz in [7]. They propose to transfer concepts related to the Test-Driven Development in Software Engineering [2] into the Ontology Engineering world. The main idea behind this methodology is that tests have to be run in advance before to proceed with the modelling of a particular (aspect of a) domain. Of course, the first execution of the tests should fail, since no ontology has been already developed for addressing them properly, while the ontology developed in future iterations of the process should result in passing the test eventually.

De Nicola and Missikoff [3] have recently introduced their Unified Process for ONtology building methodology (a.k.a. UPON Lite), which is an agile ontology engineering method that places end-users without specific ontology expertise (domain experts, stakeholders, etc.) at the centre of the process. The methodology is composed by an ordered set of six steps. Each step outputs a self-contained artefact immediately available to end users, that is used as input of the subsequent step. This makes the whole process progressive and differential, and involves ontology engineers only the very last step of the process, i.e. when the ontology has to be formalised in some standard language.

6 Conclusions

In this paper we have introduced SAMOD, a Simple Agile Methodology for Ontology Development. In particular, we have introduced its process by detailing each of its steps, and we have also discussed the results of an experiment we have run involving nine people with no or limited expertise in Semantic Web technologies and Ontology Engineering.

In the future, we plan to involve a larger set of users so as to gather additional data about its usefulness, usability, and effectiveness. In addition, we plan to develop supporting tools for accompanying and facilitating users in each step of the methodology.