1 Introduction

Noise pollution is one of the many environmental problems affecting the health and well-being of citizens in large towns and cities. According to the World Health Organization, noise is the second largest environmental cause of health problems, just after the impact of air quality (COM(2017) 151 final 2017). Bearing in mind this public health issue, many cities around the world have defined several regulations in order to assess and manage environmental noise. In the case of the European Union, in 2002 the Directive 2002/49/EC (Directive EU 2002), also known as the Environmental Noise Directive (END), was passed in order to identify noise pollution levels and to trigger the necessary response from a member state level to the height of the European Union. The END defines the environmental noise pollution as noise caused by road, rail and airport traffic, industry, construction, as well as other outdoor activities. In accordance with one of the key areas related with this directive, member states are committed to reporting their strategic noise maps every five years. The purpose of these maps is to present and assess the calculated/measured noise levels over a geographical area in order to determine the population exposed by this pollution. In the context of national legislation and local regulations, the Spanish Ley del Ruido 37/2003 (BOE 2003) has also been defined for protecting citizens from excessive noise pollution.

In order to detect the noise level of cities, several sensors, located at strategic places (e.g. train stations, airports, wide boulevards, etc.), are used to collect this data. These sensors, in most cases, are part of an interconnected sensor network deployed over the city. Although cities are reporting their strategic noise maps, there is no agreement about how to make available the noise level data collected to build them. Several cities are providing their noise levels following different approaches as Open Data which is defined as data that can be freely used, re-used and redistributed by anyone—subject only, at most, to the requirement to attribute and sharealike (Dietrich et al. 2009).

However, the lack of use of semantics in the description of this data hinders their reusability and interoperability. In order to describe this data, ontologies, understood as formal specifications of shared conceptualizations (Studer et al. 1998), are used (a) to avoid ambiguities; (b) to provide semantic interoperability and, in some cases; (c) to bring inference capabilities over the data. Developers may take advantage of the mentioned ontology benefits. For example, to integrate noise data coming from different sensors (providing data in different formats and granularity) in order to build ambient intelligence applications for smart cities, as in Mulero et al. (2018). Before undertaking any ontology development a review of the state of the art should be carried out, in order to identify potential ontologies to be reused. Out of all available ontologies in the noise domain, as shown in Section 2, there is not an ontology which allows the representation of the noise pollution concepts in a general way. However, other ontologies are reused to complement the model proposed in this paper.

In this paper we describe the process and activities followed in order to develop an ontology that describes noise pollution data. The process includes defining the ontological requirements, implementing the ontology, publishing it on the web and its maintenance. Such process is detailed in Sect. 3. Additionally, an example of use of the proposed ontology is provided in Sect. 4. Discussion about the ontology development experience is provided in Sect. 5. Our work represents a step forward to improve semantic interoperability in the noise pollution domain. Finally, Sect. 6 provides conclusions about the presented work and future directions.

2 Related work

In this section, we discuss the state of the art of noise pollution ontology-based representation. There are some available noise related ontologies, in the following we present the main features of these ontologies.

The Recommender System Context (rsctx) ontologyFootnote 1 represents the context of mobile usage which may be interesting in providing recommendations to users (Karpus et al. 2016). This ontology represents several dimensions, such as noise level, traffic level, light level, etc. In the noise level domain, this ontology allows modeling the current level of noise in the environment as (1) a symbolic level (e.g very noisy, silent, etc.), and (2) a number expressed in decibels. However, the rsctx ontology does not provide a representation of the devices used to measure noise or detailed definitions about the different kinds of noise levels.

M3-lite (m3lite)Footnote 2 is a taxonomy that enables testbeds to semantically annotate the IoT data produced by heterogeneous devices and store them in a federated datastore such as FIESTA-IoTFootnote 3 (Agarwal et al. 2016). This ontology provides several concepts representing measurements of noise level in the environment, sensors used to detect the noise level (e.g. sound sensor, noise level sensor, microphone, etc.), and the source of the noise (e.g. traffic, siren of a police car, etc.). However, the m3lite ontology represents several sources of sound which are considered as noise nuisance (e.g. animals, crown, neighbors, etc.) instead of environmental noise (e.g. rail traffic, air traffic, etc.), thus this sound source representation is out of the scope of this work. In addition, the sound sensor is classified only by two domains of interest: smart building and transportation, which its reuse is difficult for our purposes.

The ISO 37120 standard,Footnote 4 named “Sustainable development of communities - Indicators for city services and quality of life”, defines city indicators for specific domains (e.g. health, education, environment, etc.) in order to measure city performance. In our particular case of study, the environment domain contains a related indicator named noise pollution, which aims to estimate the percentage of population affected by noise disturbance in a specific area. In general, the majority of the standard indicators are defined as a ratio of parameters of two populations. In the case of the environment domain indicators these populations include observations generated by sensors in different times and locations. In order to represent the eight environment indicators of the ISO 37120 specification, the Global City Indicator Environment (iso37120en)Footnote 5 ontology has been defined. The iso37120en ontology has built on the Global City Indicators Foundation ontology, which aims at providing ontology design patterns to model indicators and indicator metadata (Fox 2018). The iso37120en includes generic ontologies in order to represent pollutants and their concentrations, taxonomic groups of animal species, and sensor devices related to the environment indicators. This ontology allows modelling noise pollution by means of a formula which includes the average sound level over a 24 h period (Lden) and city population. However, the iso37120en is heavy, it represents only the Lden noise level and it does not model the sources which produce the noise. Accordingly, as the purpose of our work is representing several noise levels (e.g. average noise level and percentile noise level) originated by several acoustic emitters, this ontology is not enough to fulfill our requirements.

3 Methodological background

The presented ontology was built following the LOT (Linked Open Terms) methodology, initially proposed by Poveda-Villalón (2012) and further developed by García-Castro et al. (2017). This methodology is based on agile techniques in which sprints and iterations represent the main workflow organization in order to align the ontology development with software development agile practices. In addition, the methodology focuses on: (a) the reuse on terms (ontology classes, properties and attributes) existing in already published vocabularies or ontologies and (b) on the publication of the built ontology. according to Linked Data principles. In addition, it is worth mentioning that the LOT methodology builds on top of the ontological engineering activities defined in the NeOn methodology (Suárez-Figueroa et al. 2015) when available.

The LOT methodology defines iterations over a basic workflow composed of the following activities: (1) ontological requirements specification; (2) ontology implementation; (3) ontology publication; and (4) ontology maintenance. The activities, roles involved and expected outputs are depicted in Fig. 1.

Fig. 1
figure 1

LOT methodology base workflow of processes. Image taken from García-Castro et al. (2017)

The following sections will briefly present, for each of the above-mentioned activities: (a) main definitions and guidelines provided by the methodology for the activity and (b) how the activity was carried out during the development of the noise ontology.

3.1 Ontology requirements specification

The ontology requirements specification activity refers to the activity of collecting the requirements that the ontology should fulfill (Suárez-Figueroa et al. 2015). These requirements are usually related to the goal of the ontology, to the domain that the ontology should model, or to technical details of the ontology like the implementation language, among others.

In order to carry out this activity, the LOT methodology proposes the exchange of different documents between domain experts and ontology users (potentially software developers) and the ontology development team. Such documents could be manuals, API specifications, datasets, standards, formats used in the community, etc. Once the ontology engineers gather the information from documents or interviews with domain experts, they propose a set of ontological requirements. These ontological requirements can be written following the Competency Questions technique (Grüninger and Fox 1995) or in the form of natural language sentences. Each ontological requirement may contain more information like provenance, comments, relations with other requirements, priority, the sprint in which it will be addressed, etc. Such list of ontological requirements are later validated and completed together with domain experts and ontology users in order to create the Ontology Requirements Specification Document (ORSD), which is the main output of this activity.

3.1.1 Ontology requirements specification in the noise pollution ontology

The ontology requirements specification activity was performed using different inputs. Firstly, we used the noise pollution data available in Spanish open data portals in order to extract the most common terms available from this data sources, such as the location of noise sensor stations, measurement date, types of noise levels, etc. Secondly, we extracted several definitions from Ley del Ruido 37/2003, for instance, the acoustic emitters (agglomerations, major roads, major airports, etc.). Finally, we had several interviews with a domain expert in order to validate the first semantic model draft and improve it including important concepts. Other concepts were obtained from the International Standard IEC 61672-1:2013 (level meters-Part, Electroacoustics-Sound 2013a) and IEC 61672-2:2013 (level meters-Part, Electroacoustics-Sound 2013b), documents suggested by the expert, in order to obtain definitions for sensor classes and weighting methods respectively.

Taking the first and second aforementioned inputs, we generate a first proposal of ontological requirements written as Competency Questions. It is worth noting that even though the LOT methodology proposes describing the requirements both as Competency Questions and natural language statements, for the noise ontology development all the requirements were written as Competency Questions. In order to define and share them, an online spreadsheet which included the following fields was used:

  • Identifier, unique for each requirement.

  • Competency Question, to define the requirements which the ontology should fulfill.

  • Answer, the answer to the competency question.

  • Clarification of Competency Questions, to include comments related with the questions.

  • References, to include the provenance.

Once this first proposal was written, we shared the spreadsheetFootnote 6 with the domain expert in order to validate the requirements. As previously mentioned, the expert determined which questions were correct and included new ones. Table 1 shows an excerpt from the final requirements for the noise pollution ontology.

Table 1 Noise pollution ontology requirements excerpt

It is worth mentioning that all the files generated during the development process, including these requirements, are stored and managed in a GitHub repositoryFootnote 7 from the GitHub account of the Spanish thematic network on Open Data and Smart Cities.Footnote 8

3.2 Ontology implementation

The aim of the ontology implementation activity is to build the ontology using a formal ontology implementation language. The implementation is based on the ontological requirements identified by the domain experts (Suárez-Figueroa et al. 2015). This phase is carried out in different sprints in which a set of requirements are selected to be implemented. The ontology development team schedules and plans the implementation according to the prioritization of the requirements. After each iteration or sprint a new version of the ontology is produced. The ontology implementation is usually divided into the following sub-activities:

  • Conceptualization Ontology conceptualization refers to the activity of organizing and structuring the information obtained during the acquisition process, into meaningful models at the knowledge level according to the ontology requirements specification document (Suárez-Figueroa et al. 2015). During this activity ontology developers may use different tools as diagrams (for example, UML based) or description logics to create the model according to the requirements previously acquired.

  • Encoding This activity refers to the transformation of the conceptualization into an ontology, or set of ontologies, expressed in the chosen ontology implementation language, for example OWL. In line with ontology encoding the ontology reuse activity could be also carried out. In this sense, when implementing the conceptualized model, ontology developers should search for existing ontologies in order to reuse them as a whole or reuse part of them. Searching in ontology registries and indexes like Linked Open Vocabularies (LOV) (Vandenbussche et al. 2017) is advisable to find potential ontologies to be reused. The ontology reuse activity is highly recommended and one of the basis of the LOT methodology as by reusing existing models, the interoperability of the resulting ontology is maximized while the resources spent in the development could be reduced. For experienced developers, the reuse activity could be carried out both during the conceptualization or the implementation.

  • Evaluation This activity refers to checking the technical quality of an ontology against a frame of reference (Suárez-Figueroa et al. 2015). Such checking may be carried out considering different evaluation criteria. For example: domain coverage, fit for purpose or application, detection of bad practices, logical consistency checking, etc. Some recommended tools are reasoners for the logical consistency checking and ontology validators like OOPS! (OntOlogy Pitfall Scanner!) (Poveda-Villalón et al. 2014) for bad design practices detection. According to the evaluation results, the ontology development team could go back to the conceptualization or encoding activities in order to fix bugs or improve the current version of the ontology.

3.2.1 Ontology implementation in the noise pollution ontology

Taking into account the output from the previous activity, several terms were extracted. In order to perform the ontology conceptualization activity, we used these terms to search for existing ontologies using LOV. As a result, we identified several well-known ontologies to represent these terms:

  1. 1.

    The W3C Semantic Sensor Network (SSN) ontology,Footnote 9 allows describing sensors in terms of capabilities, measurement processes, observations and deployments (Compton et al. 2012).

  2. 2.

    The OWL-Time,Footnote 10 an ontology of temporal concepts, for describing temporal properties of resources (Cox and Little 2017).

  3. 3.

    The WGS84 ontology,Footnote 11 for representing information about spatially-located things, such as latitude, longitude, altitude, etc. (Brickley 2006).

  4. 4.

    The Ontology of Units of Measure and Related Concepts (OM),Footnote 12 in the domain of quantities and units of measure (Rijgersberg et al. 2013). Since this ontology has not provided a definition for the decibel unit, we selected the term definition described in the DBpedia ontology.

The first conceptualization model is shown in Fig. 2. In such figure the ontologies in which each concept or relation is defined is indicated by the use of prefixes, for example the concept ssn:Sensor is defined in the “https://www.w3.org/2005/Incubator/ssn/ssnx/ssn#” namespace denoted by the prefix “ssn” while the local identifier is represented by “Sensor”. In this figure it can be observed that the ontology reuse activity was carried out in parallel with the ontology conceptualization, considering from the first draft which models could be reused.

Fig. 2
figure 2

Initial conceptualization of the noise pollution ontology

The model depicted in Fig. 2 relates the reused ontologies, except for the class om:Unit_of_measure because there is no object property to link it with the ssn:ObservationValue class. Due to the lack of this link and that there are concepts which are not available in any ontology to be reused, we incrementally defined them until obtaining the final conceptualization model as shown in Fig. 3. As it can be observed, the property noise:hasUnitOfMeasure was created to link the above-mentioned classes, which is needed to model the noise ontology requirements as it can be deduced by CQ1 answer (Table 1): The detected noise level Ld during April in location with coordinates 40.42, -3,69, 648 is 67.4 dB. In such requirement the value of the noise level is linked to the unit of measure in which it is expressed.

Once the final conceptualization model was defined, it was encoded in OWL. The ontology code is available in our GitHub repository, as explained in Sect. 3.1.1. An excerpt of the ontology Turtle code is shown in Listing 1.

figure a

Listing 1 Noise pollution ontology code excerpt in Turtle format

Fig. 3
figure 3

Final conceptualization of the noise pollution ontology

Although the ontology elements contain annotations in English and Spanish, provided by rdfs:label and rdfs:comment, in the Listing 1 only the English comments are shown for space and readability reasons. Furthermore, we developed a SKOSFootnote 13 thesaurus to represent the types of acoustic emitters. We adopted this modeling approach since these emitters are part of a well-structured and closed list which is defined in the Ley del Ruido 37/2003 document and SKOS is a well-known ontology for the representation of controlled vocabularies like this case.

As shown in Fig. 4, we evaluated the ontology using OOPS! and retrieved several pitfalls. We fixed the pitfalls related to our ontology. However, we did not fix those related to the reused ontologies because we do not have authority over them. Moreover, only one critical pitfall was detected which corresponds to two concepts of SSN ontology but we are not reusing them therefore it did not represent a problem for our model. Finally, in the case of important and minor pitfalls related to the reused ontologies, they are not crucial for the functionality of our ontology.

Fig. 4
figure 4

Results of the noise pollution ontology evaluation performed in OOPS!

3.3 Ontology publication

The aim of the ontology publication activity is to make it available on-line both as human-readable documentation and in a machine-readable format. Both versions of the ontology should be reachable from the ontology URI by means of content negotiation mechanisms. It should be noted that the machine-readable version has been obtained during the implementation. However, the human oriented documentation should be generated in a previous step to the publication. Such documentation is usually composed of HTML pages describing the content of the ontology and may include diagrams and examples to improve the ontology’s readability and reusability.

3.3.1 Ontology publication of the noise pollution ontology

Although the URI strategy definition is performed during the encoding activity, we will explain it in this subsection in order to have in mind these definitions at the time of publication. We defined the persistent URIs for the features of the noise pollution ontology, according to the best practices described in Tandy et al. (2017) and the Spanish Technical Interoperability Standard (Spain 2013). The base URI for all elements in the ontology is http://vocab.linkeddata.es/datosabiertos/def/ medio-ambiente/contaminacion-acustica#. We also followed an upper camel case strategy to name classes and a lower camel case strategy for object and data properties and resources, as shown in Fig. 3.

Once we encoded the ontology, we performed the tasks related to the publication activity. For this purpose we used OnToology (Alobaid et al. 2018), a web-based system that builds on top of Git-based environments and integrates a set of existing tools for documentation, evaluation and publication. These tools are Widoco (Garijo 2017) for generating the documentation in HTML format, AR2DToolFootnote 14 for generating diagrams, and OOPS! for evaluating the ontology. In addition, OnToology provides two alternatives for ontology publication, namely: publishing the ontology with a permanent id using the https://w3id.org services or downloading a bundle with all the files needed to publish the ontology on a server. For the noise ontology publication the latter option was selected.

The noise pollution ontology was published on the Web and it is available under its URI http://vocab.linkeddata.es/datosabiertos/def/medio-ambiente/contaminacion-acustica as a human-readable documentation and a machine-readable file, using content negotiation. This human-readable documentation, generated with Widoco, is provided in English and Spanish. Figure 5 presents a screenshot of the English version of ontology documentation in HTML format.

Fig. 5
figure 5

English version of the noise pollution ontology HTML documentation

3.4 Ontology maintenance

The goal of this activity is to update the ontology after the last release. This may be needed due to different situations, for example: new requirements identification, bugs detection, improvement suggestions, etc. This activity may be triggered during the ontology development process in which the new requirements implementation or bug fixing would be scheduled in one or more sprints. This activity may be also triggered after the ontology development process in which a new version or revision of the ontology should be generated.

3.4.1 Ontology maintenance in the noise pollution ontology

In order to support the maintenance of the ontology, we used the GitHub issue tracker which keep control of the list of issues. When an ontology developer, domain expert or user wants to add new ontology requirement, changes, improvements, etc. they should create a new issue in the GitHub repository.

All opened issues are discussed by the ontology development team and if the issue is agreed upon, the proposal is implemented over the ontology. For example, in our GitHub repository trackerFootnote 15 an issue proposes an ontology actualization according to the new version of the SSN ontology, namely the SSN/SOSA.Footnote 16 This proposal has been agreed by the development team, and it is currently being implemented in a new ontology version.

4 Noise pollution ontology in practice

In this section, we provide an example of how users can instantiate the noise pollution ontology. This example represents the stage where a sensor collects measurements of the average noise level (Ld) captured during April 2017 at the Paseo de Recoletos boulevard in Madrid. This example is depicted in Fig. 6. The prefix “ex” represents the namespace of this example.

Fig. 6
figure 6

Example of instantiation of an observation

As the SSN ontology is being reused, it is important to bear in mind that the observation concept (ssn:Observation) allows the representation of the measurement of a specific property (ssn:Property) of a feature of interest (ssn:FeatureOfInterest). Following this context, the instance observation ex:NoiseSensorObservation_LdInterval201704, aims at collecting the average noise level in the daytime (Ld), instantiated as ex:LdESP, as the observed property for the feature of interest ex:FeatureOfInterestPaseoDeRecoletos (Paseo de Recoletos).

This observation has been performed by ex:NoiseSensor and the result of the observation is represented as a sensor output instantiated by ex:NoiseSensorSensorOutput_ObservationLdInterval201704. Additionally, the observation value is represented by the instance ex:NoiseSensorObservationValue_ObservationLdInterval201704, which allows specifying their decimal value, unit of measure and weighting method. Finally, the time interval where this observation has been performed is modeled according to the OWL-Time ontology specification and it is instantiated as ex:Interval201704, which corresponds to April 2017.

The code of this example is available into the resource documentation folderFootnote 17 of our GitHub repository. Additional examples are provided in the HTML documentation of the noise pollution ontology.

5 Discussion

Methodologies for building ontologies are designed to guide developers through the whole process aiming at converting the art of building ontologies on an engineering process. However, there are still activities that require a developer’s know-how and aptitudes like for example the conceptualization phase. In other cases, the applicability of a methodology depends on the quantity and quality of the resources available for the project. The development of the noise ontology has not been exempt from these situations. In the rest of this section, some insights about the ontology development process and future issues are discussed.

Firstly, during the ontology requirements specification activity the Spanish open data portals were inspected in order to understand the noise datasets published by the city halls. However, most of these portals do not provide the needed proper documentation describing the data in order to understand the dataset accurately. Thus, the collaboration with domain experts becomes crucial for the correct understanding of data. While it is true that it is advisable to count with domain experts in any ontology development project, it should be noted that domain experts’ time is a costly resource which not all projects can afford.

Secondly, during the ontology implementation activity, several ontologies were reused, however, the search and selection of potential ontologies to be reused is by no means a trivial task. Although available ontology registries and indexes speed up the search task, the analysis of retrieved ontologies and their selection are still time consuming. This selection highly relies on the experience of the ontology engineer and it involves the evaluation of some criteria such as quality, trustworthiness, availability, etc. of the ontologies. In this regard, the development of better tools which facilitates the search and selection process is an open issue in the ontology engineering field.

Thirdly, during the ontology maintenance activity, ontology changes are allowed. Sometimes these changes may originate from external changes related to reused ontologies. As in the case of the noise ontology, as was explained in Sect. 3.4.1, the new SSN/SOSA ontology (Haller et al. 2018) will originate a new noise ontology version. The replacement of the SSN/SOSA ontology by its newer version would impact the presented noise ontology mostly in the following ways: (a) the noise ontology will become lighter and (b) most of the pitfalls detected in the noise ontology evaluation Sect. 3.2.1 will not appear in the new version. These two implications are due to the fact that the new version of SSN removed the dependency on the DOLCE Ultralite ontology.

Finally, it is also important to consider how to keep data consistency through future versions of the ontology. One aspect to bear in mind may be to ensure backward compatibility when developing the new ontology version. This compatibility implies that the semantics of the ontology changes, but the new ontology is backward compatible with the previous version (Ding et al. 2001). Therefore, interpretation of data according the new ontology would be the same as when using the old ontology. In addition, data may include metadata including the ontology version used to represent it. In addition, it may be helpful to use a version control system to allow access to the dataset version valid at a certain time, it means, when the old ontology was operational. However, the lack of practical guidelines and best practices for ontology evolution in different scenarios hinder the reduction of the quality issues introduced by the changes while evolving an ontology (Mihindukulasooriya et al. 2017). Furthermore, it remains necessary to integrate the existing solutions and tools in order to achieve a platform that could help in the process of ontology evolution (Zablith et al. 2015).

6 Conclusions and future work

In this work, the process followed to develop an ontology for representing the environmental noise pollution domain has been described. In addition, the noise pollution ontology itself has been presented. At the beginning, this ontology aimed at representing noise data from open data sources over the Spanish territory, however, we modeled it in a general way in order to be also reusable in other territories and to cover important concepts related with the noise sensor devices.

The noise pollution ontology represents a step forward with regard to the state of art ontologies since it allows representing several noise levels, specific details about noise sensors (e.g. weighting methods, sensor classes, etc.) and different types of acoustic emitters. An important aspect of the noise pollution ontology is their alignment with international standards and directives, and the reuse of well-known ontologies. In this context, it is expected to achieve broader interoperability instead of using ad-hoc models to represent such data. Finally, reusing well known, adopted and tested ontologies helped us during the development of our ontology since it allows reducing time and effort instead of developing them from scratch.

In addition, this ontology has been included as an example in the open data guide (FEMP 2017) published by the Spanish Federation of Municipalities and Provinces (FEMP), which aims to present a work itinerary on the opening of data and its reuse for all Spanish local administrations.

Finally, as current work, we are developing the new version of the noise pollution ontology according to the SSN/SOSA ontology. We also plan to perform a revision of the ontology to determine its possible extension in order to represent new requirements or use cases, for example, to cover the representation of noise maps.