Abstract
Nowadays, air pollution is one of the most serious problems in the world, therefore the real-time monitoring air quality is considered as necessity. Internet of Things (IoT) devices, such as sensors, enable real-time air quality monitoring, which produce sensed data continuously in the stream data, and transmit these data to a centralized server. Raw sensor stream data is useless unless properly annotated. Hence, the researchers proposed Semantic Sensor Web (SSW), which is a combination of Sensor Web and technologies of Semantic Web. However, how to advance techniques for integration of the semantic annotations in real-time is still an open issue that should be addressed. This research focuses on real-time integration of semantics into heterogeneous sensor stream data with context in the IoT. In this context, an IoT real-time air quality monitoring system and different semantic annotations are developed for sensor stream data in the format of Sensor Observation Service (SOS).
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The Internet of Things (IoT) is new revolution of the Internet. It refers to the billions of physical devices around the world that are now connected to the Internet, all collecting and sharing data. Sensors are one of the main components that enable IoT, which send the observation in stream data.
Furthermore, sensor data are enabled to the web through the Sensor Web. Sensor Web by incorporating technologies of the Semantic Web creates the Semantic Sensor Web. In this way, sensor data stream can be annotated with semantics by providing machine-interpretable descriptions on what the data represents, where it originates from, how it can be related to its surroundings, who is providing it, and what are the quality, technical, and non-technical attributes [3]. The real-time integration of sensor data as dynamic data with semantics is defined as real-time semantic annotation, while sensor data that are stored in repository (data store) as static data, and then integrated with semantics is defined as non-real-time semantic annotation [16].
Organizations like Open Geospatial Consortium (OGC) and World Wide Web Consortium (W3C) have proposed several standards for sensor data. The OGC defines standardization for the Sensor Web named Sensor Web Enablement (SWE). It’s a framework and a set of standards that allow exploitation of sensors and sets of sensors connected to a communication network. Is founded on the concept of “Web Sensor” using standard protocols and application interfaces [13].
This paper is an extended version of conference proceedings [15]. In [15], we have investigated on how to integrate semantic annotations into the sensor stream data. In particular, we have discussed the annotation techniques for real-time integration of semantics into heterogeneous sensor observation data and sensor metadata with context in the IoT. Different from conference proceedings [15], in this paper system architecture is advanced by adding new features such as archival data stream, an ontology, OGC standards, and Xlink annotated SOS. Also, implemented system is advanced as shown in Fig. 7.
The paper is organized as follows: Sect. 2 provides a discussion on literature review for semantic annotations to the sensor stream data. Section 3 is an overview of the difference between sensor streaming versus traditional streaming, semantic annotations concepts, and selection of technologies and standards for semantic annotations. An overview of the system architecture is presented in Sect. 4, while Sect. 5 represents the implemented system, including received sensor data format, integration of semantic annotations to the sensor data, and system outputs. Finally, Sect. 6 concludes the paper and identifies some of the future perspectives of the semantic integrations into the sensor stream data.
2 Literature Review
Recently, some researchers have already shown up with several investigations related to semantic enrichment of sensor stream data. Authors in [1] brought together semantic web and data mining in the context of IoT with a focus on sensors as interconnected devices, concluding that practical data mining applications can be built by usage of real world sensors ontologies, query mechanisms and linked sensor data available. SSW is described as a synthesis of sensor data and semantic metadata in [17]. It represents an approach by OGC and Semantic Web Activity of the W3C to provide meaning for sensor data. Construction of a Semantic Sensor Observation Service (SemSOS) based on the SWE standards is discussed in [7], by adding semantic annotations to sensor data and by using the ontology models to reason over sensor observations.
An extension of the SWE framework in order to support standardized access to sensor data is described in [11]. Furthermore, they list as future work the extension of SOS server with semantics, since the lack of semantically rich mechanism is seen as a significant issue, which makes it hard to explore related concepts, subgroups of sensor types, or other dependencies between the sensors and data collected.
3 Background
Currently, billions of interconnected IoT devices produce sensed data continuously in the stream data, and transmit these data to a centralized server. Due to the dramatically increase of streaming data, their management and exploitation has become increasingly important and difficult to process and integrate the semantic to sensor data stream in real time. Therefore, the selection of technologies and standards for technique development of real-time integration of semantics into heterogeneous sensor observation data and sensor metadata with context in the IoT is highly important. The proposed real-time semantic annotation system utilizes Spark StreamingFootnote 1, Apache KafkaFootnote 2, Apache Cassandra databaseFootnote 3, and standards like OGC Sensor Web Enablement standards, which will be discussed below.
3.1 Sensor Streaming Versus Traditional Streaming
The distinction between traditional data stream processing and sensor data stream processing is important because the sensory data stream have their own features [5]:
-
The sensor data streams are only samples of entire population, while traditional streaming such as network streams, data of web logs, stock market, etc. represent the entire population of the data.
-
The sensor data streams are considered noisy compared with traditional streaming data. Traditional streaming data is exact and error-free. Sometimes the environmental effect on the deployed sensor networks can also play a negative role on the sensed values. While web logs and web click streams are considered accurate values compared with data generated from sensor networks.
-
The sensor data streams is typically of moderate size as compared to overwhelming storage and processing of huge data in traditional streams.
3.2 Semantic Annotations
IoT applications are enabled using heterogeneous sensors, which send observational data referred to as sensor stream data to a remote server. Raw sensor stream data is useless unless properly annotated. Therefore, the researchers proposed Semantic Sensor Web (SSW), which is a combination of Sensor Web and technologies of Semantic Web. Based on study [14], the explored publications show that major number of research are accepting the proposed industry standards, such as SWE, and techniques that can be used for annotating sensor data, such as Resource Description Framework in attributes (RDFa), XML Linking Language (Xlink), and Semantic Annotations for WSDL and XML Schema (SAWSDL), by different organizations like OGC and W3C [18]. However, how to advance techniques for integration of the semantic annotations in real-time is still an open issue that should be addressed.
3.3 Technologies
The proposed real-time integrated semantic annotations to the sensor stream data for the IoT utilizes:
-
Spark Streaming: Several stream data processing systems including Spark Streaming, Storm, Google Data Flow, and Flink have emerged to support real-time analytics for the streaming data sets [10]. Majority studies conclude that Spark Streaming works best with high throughput when the incoming volume is huge [6]. Therefore, we have chosen Sparking Streaming to develop our system for real-time integration of semantic annotations to sensor stream data. Spark Streaming is an extension of the Apache Spark that enables to build scalable fault-tolerant IoT applications for real-time processing sensor stream data. It can receive data from different input sources such as Apache Kafka, TCP sockets, Flume, Kinesis, Hadoop Distributed File System (HDFS), or Twitter, and can be processed using complex algorithms expressed with high-level functions like map, join, reduce and window. Finally, processed streaming data can be published in IoT real-time applications or can be pushed out to databases or file systems.
-
Apache Kafka: Is a distributing streaming platform with capabilities to publish and subscribe to streams of records, similar to a message queue or enterprise messaging system, store streams of records in a fault-tolerant durable way, and process streams of records as they occur. Kafka is generally used for building real-time streaming data pipelines that reliably get data between systems or applications [9]. In our system Kafka is used as middleware between sensor stream data and Spark Streaming.
-
Apache Cassandra database: Is a free and open source, distributed store for structure data that scale-out on cheap, commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. It is designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The Spark Streaming interacts well with Cassandra database. Therefore, in our system, the sensor stream data with their semantic annotations processed by Spark Streaming are stored in Cassandra database.
3.4 Standards
The OGC defines standardization for the Sensor Web named Sensor Web Enablement (SWE), which is divided into two parts [12]:
-
SWE Information Model: Is comprised of conceptual language encodings that permits sensor observations visibility on the Internet. The SWE information model includes the following specifications: Sensor Model Language (SensorML), Observation and Measurement (O&M), and Transducer Model Language (TransducerML).
-
SWE Service Model: Is a set of Web Service specifications that allow a client to search and find the required information. The SWE Service model includes the following specifications: Sensor Observation Service (SOS), Sensor Alert Service (SAS), Sensor Planning Service (SPS), and Web Notification Services (WNS).
To encode semantic annotations and data gathered by sensors, in this paper is used SOS O&M, which will be discussed in Sect. 5.2.
4 An Overview of the System Architecture
In the Fig. 1, an overview of the system architecture for real-time integration of semantics into heterogeneous sensor stream data with context in the Internet of Things is presented. As mentioned above, the proposed real-time semantic annotation system utilizes Apache Kafka, Spark Streaming, Apache Cassandra database, and SOS O&M standards.
The heterogeneous sensor stream data from the IoT-based sensor device is wirelessly transmitted to serve as the “producer” for the Kafka server. The “producer” client publishes streams of data to Kafka “topics” distributed across one or more cluster nodes/servers called “brokers”. The published streams of data from Kafka are then processed by Apache Spark Streaming in parallel and real-time.
Kafka server is utilized to receive various formats of sensor data streams (e.g. text, binary, JSON, XML etc.), and to transform them in a particular format that will be processed by Spark Streaming.
The Spark Streaming enables a real-time integration of semantics into heterogeneous sensor stream data with context in the IoT, by using sensor metadata, archival data streams, mining data streams, association rules for adding semantic annotations with concept definitions from ontologies or other semantic sources, which allows the understanding of senor data and metadata elements. The semantic annotations will be implemented into SOS O&M by using stakes, such as XLink (without including XPath) and Embedded (only a single value-scalar of semantic annotation) to add annotations in XML files. These annotations can point to extra sources of information (e.g. a file), or Uniform Resource Name (URN).
The enriched sensor stream data with the semantic annotations results will be stored in the Cassandra database, and will be displayed in IoT real-time monitoring system. It is worth mentioning that Spark Streaming will process sensor data stream in format of OGC standards like SWE, respectively version 2.0 of the SOS standard to encode semantic annotations and data gathered by sensors [4].
The detailed description is presented in Sect. 5.2 where an example of integration of semantic annotations into the sensor stream data with context in the IoT is given.
5 System Implementation
An IoT real-time air quality monitoring system is developed to visualize sensor stream data and their semantic annotations, based on web platform. Sensor data of Hydrometeorological Institute of Kosovo (HMIKFootnote 4) are used, through World Air Quality Index API (AQI API). The AQI API can be used for advanced programmatic integration, such as: access to more than 11000 station-level and 1000 city-level data, station name and coordinates, search station by name, geo-location query based on latitude/longitude, individual Air Quality Index (AQI) for all pollutants, current weather conditions, etc. [2].
5.1 Input Sensor Stream Data
The system receives raw sensor stream data from AQI API in JSON format, as presented in Fig. 2, which supports measuring in real-time of the following parameters:
-
Carbon Monoxide (co),
-
Humidity (h),
-
Nitrogen Dioxyde (no2),
-
Ozone (o3),
-
Pressure (p),
-
PM10 (pm10),
-
PM25 (pm25),
-
Sulphur Dioxide (so2),
-
Temperature (t),
-
Wind (w), and
-
Water Gauge (wg).
As shown in Fig. 2, JSON data contains also attributes such as: data (station data: idx - unique id for the city monitoring station, aqi - real time air quality information, time - measurement time information, s - local measurement time, and tz - station time zone), city (information about the monitoring station: name - name of the monitoring station, geo - latitude/longitude of the monitoring station, and url - url for the attribution link), attributions (EPA Attribution for the station), and iaqi (measurement time information: pm25 - individual AQI for the PM2.5, v - individual AQL for the PM2.5).
Data received by sensors every 6 min, through AQI API, are represented in corresponding numerical formats, e.g. in −3.8 (°C) for temperature parameter.
5.2 Processing Sensor Stream Data by Integrating Semantic Annotations
In our system, an ontology name ‘onto-core.owl’ is created (see Fig. 3). Here different semantic annotations for sensor stream data are developed, such as:
-
#AIQ_Index,
-
#Air_Pollution_Level, and
-
#Health_Implications
#AIQ_Index annotation – is an index for reporting daily air quality, and tells how clean or polluted air is. United States Environmental Protection Agency (EPAFootnote 5) calculates the AQI for five major air pollutants regulated by Clean Air Act: ground-level ozone, particle pollution (also known as particulate matter), carbon monoxide, sulfur dioxide, and nitrogen dioxide. The AQI range values is from 0 to 500.
According to EPA, the higher the AQI value, the greater the level of air pollution and the greater the health center (take the maximum of all individual AQI), as presented Eq. 1:
#Air_Pollution_Level annotation – based on the AQI value, its divided into six ‘Air Quality Index Levels of Health Concern’ categories:
-
Good (AQI is 0 to 50)
-
Moderate (AQI is 51 to 100)
-
Unhealthy for Sensitive Groups (101 to 150)
-
Unhealthy (AQI is 151 to 200)
-
Very Unhealthy (AQI is 201 to 300)
-
Hazardous (AQI is 301 to 500)
#Health_Implications annotation – each of six categories described above, corresponds to a different level of health concert. #Health Implications annotation tells what they mean, for example “Unhealthy for Sensitive Groups” category means: ‘Although general public is not likely to be affected at this AQI range, people with lung disease, older adults and children are at a greater risk from exposure to ozone, whereas persons with heart and lung disease, older adults and children are at greater risk from the presence of particles in the air.’, or for “Moderate” category: ‘Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people who are unusually sensitive to air pollution.’
The above described annotations are developed into ontology named ‘ont-core’.
After describing different types of the semantic annotations for sensor stream data, in the following is presented the process of semantic annotations.
The sensor stream data may arrive in different formats to Kafka server (JSON format - in our case), which will transform them in a specific format that will be processed by Spark Streaming. After that, through the Spark Streaming, based on measuring values, the sensor data stream will semantically be annotated and converted in SOS O&M format. A fragment of an example of integrated semantic annotations to the SOS O&M format by using stakes like XLink and Embedded, is presented in Fig. 4.
SOS O&M observation document comprise zero or multiple observationData entries, and each store an instance of an observation. In the following are presented common observation properties (the prefix gml indicates that this element is defined in OGC 07-033, while the prefix om indicates that the element is defined in OGC 10-025r1) [8]:
-
gml:identifier (mandatory): identifies or refers to a specific observation.
-
om:phenomenonTime (mandatory): describes the time instant or time period for which the observation contains sensor data.
-
om:resultTime (mandatory): provides the time when the result became available (often this is identical to the phenomenonTime).
-
om:procedure (mandatory): the identifier of the sensor instance that has generated the observation.
-
om:observedProperty (mandatory): the identifier of the phenomenon that was observed.
-
om:featureOfInterest (mandatory): an identifier of the geometric feature (e.g. sensor station) to which the observation is associated.
-
om:result (mandatory): the observed value, the type of the result is restricted to the types shown in Table 1.
We have developed a new type of observation to add, named ‘SemObservation’ with ‘gml:Sem MeasureType’ result type, as shown and described in Table 2.
For clearer explanation of semantic integration to sensor observation data, Fig. 5 illustrates (a) the concept of the O&M and relationship between the entities involved in observations, (b) data streams generated from wireless sensor networks, (c) the sensor data integrated with sensor metadata, archival data streams and the ontological knowledge, and finally, (d) the semantic annotated data with attributes: sem-annotations data, the observed value, unit, metadata, location, timestamp, result type, and gml:id of observation.
5.3 System Outputs
To display the heterogeneous sensor stream data and their semantic annotations, is developed an real time IoT application in the ASP.NET Core MVC, a cross-platform, high-performance, open source framework for building modern, cloud-based, and Internet-connected applications. The ‘DataStax C# for Apache Cassandra’ is used to read data from Apache Casandra database. It’s a modern, feature-rich and highly tunable C# client library. To display the data in the map, is used Leaflet, an open-source JavaScript library for interactive web maps. Leaflet is designed with simplicity, performance and usability in mind. It works efficiently across all major desktop and mobile platforms out of the box, taking advantage of HTML5 and CSS3 on modern browsers while being accessible on older ones too.
As shown in Fig. 6 and Fig. 7 (map & table view), the users can observe the quality of air pollution on certain geographical points in map marked as measuring nodes. Each node (marker) has an AQI Index, to indicate air pollution. When clicking over a whatever marker, the latest measurement values obtained for that point will be shown, such as: PM2.5, PM10, O3, NO2, SO2, CO, Temperature, Pressure, Humidity, Wind, Water Gauge, and semantic annotations, such as: #AQI Index, #Air Pollution Level, and #Health Implications.
6 Conclusions and Future Work
WSNs are one of the main components of the IoT. They produce the observed data in continuous form, known as sensor stream data and transmit to the server for further processing. Raw sensor data are useless unless properly annotated. By adding semantic annotations with concept definitions from ontologies, it’s possible the interpretation and understanding of sensor data streams.
First, this study provides a literature review related to the topic of the integration of semantics into sensor data for the IoT. Next, is describes the distinction between traditional data stream processing and sensor data stream processing. Then, are presents the selected technologies and standards (such as Spark Streaming, Apache Kafka, Apache Cassandra, and OGC standards) which are used to develop the proposed system. After that, an overview of the system architecture for real-time integration of semantics into heterogeneous sensor stream data with context in the Internet of Things is presented. Finally, an system implementation of an IoT real-time air quality monitoring system is presented, including:
-
input sensor stream data in JSON format of the following measuring parameters: carbon monoxide, humidity, nitrogen dioxyde, ozone, pressure, pm10, pm25, sulphur dioxide, temperature, wind, and water gauge;
-
processing sensor stream data by integrating semantic annotations to the sensor stream data in SOS O&M format, and a new type of observation SemObservation (with gml:Sem MeasureType result type) is developed;
-
system outputs to display the heterogeneous sensor stream data and their semantic annotations in map and table view format.
This paper is an extended version of conference proceedings [15]. Different from this conference proceedings, in this paper system architecture is advanced by adding new features such as archival data stream, an ontology, OGC standards, and Xlink annotated SOS.
Extending the system with more advanced real-time annotation techniques of semantics such as XPath annotations, development of techniques for real-time interpretation of semantic annotations, and to evaluate the system performance is left for future work.
References
Aggarwal, C.C., Ashish, N., Sheth, A.: The internet of things: a survey from the data-centric perspective. In: Aggarwal, C.C. (ed.) Managing and Mining Sensor Data, pp. 383–428. Springer, Boston (2013). https://doi.org/10.1007/978-1-4614-6309-2_12
Aqicn: API – Air Quality Programmatic APIs. https://aqicn.org/api. Accessed 20 Feb 2020
Barnaghi, P., Wang, W., Henson, C., Taylor, K.: Semantics for the Internet of Things: early progress and back to the future. Int. J. Semant. Web Inf. Syst. (IJSWIS) 8(1), 1–21 (2012)
Bröring, A., Stasch, C., Echterhoff, J.: OGC sensor observation service interface standard. Open Geospatial Consortium Interface Standard, 12-006 (2012)
Elnahrawy, E.: Research directions in sensor data streams: solutions and challenges. Rutgers University, Technical report. DCIS-TR-527 (2003)
Gorasiya, D.V.: Comparison of open-source data stream processing engines: spark streaming, flink and storm. Technical report (2019). https://doi.org/10.13140/RG.2.2.16747.49440
Henson, C.A., Pschorr, J.K., Sheth, A.P., Thirunarayan, K.: SemSOS: semantic sensor observation service. In: International Symposium on Collaborative Technologies and Systems, CTS 2009, pp. 44–53. IEEE (2009)
Jirka, S., Stasch, Ch., Bröring, A.: OGC Best Practice for Sensor Web Enablement, Lightweight SOS Profile for Stationary In-Situ Sensors. Open Geospatial Consortium. Version 1.0, ref. no. 11-169r1 (2014)
Kafka Apache: Kafka Apache – A distributed streaming platform. https://kafka.apache.org. Accessed 15 Feb 2020
Karimov, J., Rabl, T., Katsifodimos, A., Samarev, R., Heiskanen, H., Markl, V.: Benchmarking distributed stream data processing systems. In: Proceedings of the IEEE 34th International Conference on Data Engineering (ICDE), Paris, France (2018)
Lee, Y.J., Trevathan, J., Atkinson, I., Read, W.: The integration, analysis and visualization of sensor data from dispersed wireless sensor network systems using the SWE framework. J. Telecommun. Inf. Technol. 4, 86 (2015)
OGC Standards: Open Geospatial Consortium (OGC). https://www.ogc.org/docs/is/. Accessed 05 Jan 2020
Pradilla, J., Palau, C., Esteve, M.: SOSLITE: Lightweight Sensor Observation Service (SOS) for the Internet of Things (IOT). ITU Kaleidoscope: Trust in the Information Society, Barcelona (2016)
Sejdiu, B., Ismaili, F., Ahmedi, L.: Integration of semantics into sensor data for the IoT - a systematic literature review. Int. J. Semant. Web Inf. Syst. (IJSWIS) 16(4), Article 1 (2020)
Sejdiu, B., Ismaili, F., Ahmedi, L.: A real-time integration of semantics into heterogeneous sensor stream data with context in the Internet of Things. In: The 15th International Conference on Software Technologies (ICSOFT 2020), Lieusaint - Paris, France, 07–09 July 2020 (2020)
Sejdiu, B., Ismaili, F., Ahmedi, L.: A management model of real-time integrated semantic annotations to the sensor stream data for the IoT. In: The 16th International Conference on Web Information Systems and Technologies (WEBIST 2020), Budapest, Hungary, 03–05 November 2020 (2020)
Sheth, A., Henson, C., Sahoo, S.: Semantic sensor web. IEEE Internet Comput. 12(4), 78–83 (2008). https://doi.org/10.1109/MIC.2008.87
W3C Semantic Sensor Network Incubator Group (SSN-XG): Semantic Sensor Network Ontology. https://www.w3.org/2005/Incubator/ssn/ssnx/ssn. Accessed 25 Feb 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sejdiu, B., Ismaili, F., Ahmedi, L. (2021). A Real-Time Integration of Semantic Annotations into Air Quality Monitoring Sensor Data. In: van Sinderen, M., Maciaszek, L.A., Fill, HG. (eds) Software Technologies. ICSOFT 2020. Communications in Computer and Information Science, vol 1447. Springer, Cham. https://doi.org/10.1007/978-3-030-83007-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-83007-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83006-9
Online ISBN: 978-3-030-83007-6
eBook Packages: Computer ScienceComputer Science (R0)