1 Introduction

Pervasive computing, the computing that disappears, has been introduced by Weiser (1991). It has been followed by ubiquitous computing, the computing appearing everywhere and anytime introduced by Satyanarayanan (2001a). They both have profoundly changed Information Systems (ISFootnote 1) in the three last decades. Those IS are sometimes qualified as Pervasive Information Systems (PISFootnote 2) (Kourouthanassis and Giaglis 2007). With PIS, IS features are enriched while their architecture becomes more and more distributed. In addition to the traditional databases they were built upon, they include data coming from the physical environment and should be accessible anytime and from any (mobile) device.

To illustrate the complexity of PIS throughout this chapter, we consider the case of a logistic chain traceability system related to the transport operations of shipments (Ahmed et al. 2021). Each shipment transport involves at least three types of stakeholders: (1) the shipper at the origin of the transport request; (2) the carriers in charge of transport operations; and (3) the consignee that receives the transported shipment. Other stakeholders can also be involved in this process, e.g., logistic service providers, customs, insurance companies, and banks. These traceability IS were centralized in the past, but next-generation IS in this domain are going to be more and more distributed. The system is deployed at each stakeholder infrastructure locally and in the cloud. IS includes data collected from the Internet of Things (IoT) with wireless connected devices (such as a temperature sensor) deployed on the shipment and in the stakeholders’ infrastructures. Furthermore, traceability data may be used dynamically for decision-making purposes, e.g., a change in a transport company, notification of transport delays to the consignee and the carriers, and the early identification of transport default such as the non-respect of temperature conditions.

PIS software architecture comprises several layers: a business layer, a service/middleware layer, and context-management data layer (a.k.a. IoT layer). Each layer might be composed of several software components provided by different organizations and deployed on a large-scale, heterogeneous, and distributed infrastructure. A PIS may be abstracted by a distributed software architecture in which data and actions are transmitted among components, both inside and between layers. The so-called middleware is an essential part of the design and execution of this software architecture.

In a distributed computing system, middleware is defined as the software layer that lies between the operating system and the applications on each site of the system. Its role is to make application development easier, by providing common programming abstractions, by masking the heterogeneity and the distribution of the underlying hardware and operating systems, and by hiding low-level programming details (Krakowiak 2009). Middleware has provided a key set of features enabling distributed architectures to expand. In the 1990s, middleware started by offering the basic client-server model that has been extensively used by IS. Since then, there have been extensive innovations in middleware capabilities. We can mention the persistence capability that enables transparent interactions between applications and databases and the publish-subscribe interaction pattern that enables designers to decouple system components.

PIS have specific requirements concerning middleware. Biegel and Cahill (2007) have identified some of these requirements, such as loosely coupled communication and sensor and actuator abstractions. Raychoudhury et al. (2013) surveyed the literature on middleware for pervasive systems and highlighted new requirements for PIS, such as context management, i.e., how to consume high-level context information obtained after processing, fusing, and filtering a large amount of low-level context data collected from the environment. They also draw attention to the service-oriented paradigm, the common middleware abstraction in this decade, which comes with service discovery and service composition issues. In this chapter, we focus on presenting the state of the art on requirements concerning middleware for PIS in the context of the IoT, i.e., the integration of connected devices that interact with the environment into the Internet. As stated by Blair et al. (2016), the IoT ensues with new requirements and challenges for PIS middleware such as scalability and heterogeneity.

At the same time, as PIS grow in terms of complexity and distribution and become ubiquitous, they raise a new concern in terms of energy consumption. According to Ferreboeuf et al. (2021), the energy demand of Information Technology (IT) in 2019 was estimated to be 4184 TWh (IT represents 4.2% of the energy consumption and 3.5% of greenhouse gas in the world). If the energy consumption continues to rise by 6.2% by year as it has had since 2015, both energy and greenhouse gas could double in 10 years, a non-sustainable scenario. As middleware has a central position in IS and as it is used by many of them, middleware platforms might play a key role in making systems developed atop of them become energy-aware and energy-efficient. These requirements are even more relevant considering that programmers often have limited knowledge on how much energy their software consumes and which parts use most energy (Pang et al. 2016). Consequently, energy consumption is a first-class concern for PIS middleware that we address in this chapter.

The remainder of this chapter is organized as follows. Section 2 describes the requirements imposed by PIS to middleware. Section 3 presents how some of those requirements are handled by middleware in the literature. Section 4 details how platforms proposed in our research respond to some of the identified requirements concerning PIS middleware. Next, Sect. 5 draws open challenges to be handled in the future. Section 6 concludes the chapter with final remarks.

2 Requirements for PIS Middleware

This section gives an overview of the requirements for PIS middleware in the context of the IoT. As the aim of a middleware layer is to bridge the gap between the pervasive elements spread over the physical environment and the applications, the requirements for PIS middleware include the provision of several services to allow applications to gather contextual information from heterogeneous distributed devices. We present the main functional requirements (i.e., driven by application constraints such as interacting with a given sensor or defining application adaptation rules) and non-functional requirements essential in a PIS scenario (such as handling interoperability, scalability, and the need for supporting energy-efficiency). Table 1 summarizes the presented requirements by organizing them in three categories: requirements necessary for Context data management in the IoT, Application support, and requirements Exacerbated in IoT systems. Table 1 also maps the middleware proposals that will be presented in Sect. 4 with the requirements they tackle and for which we discuss the state of the art in Sect. 3.

Table 1 Requirements for PIS in the context of the IoT

2.1 Sensing and Actuation Support

PIS middleware needs to deal with small, often battery-powered devices such as sensors and actuators, the physical elements that the system needs to interact with the environment. Sensors typically obtain information from entities of interest or their environment, whereas actuators act on an entity or the environment or provide feedback to the user. A relevant requirement for PIS middleware is to provide programming abstractions that enable event-driven programming at a high level, thereby significantly simplifying the use of sensors and actuators by hiding the complexity of accessing heterogeneous devices that use different communication protocols.

In the example of the logistic chain traceability related to the transport operations of shipments, all the data collected by sensors providing the temperature in the compartments of a ship during the transportation need to be received by the application. Similarly, the application needs to set the desired temperature remotely by sending a message to some temperature actuators. The middleware layer should provide programming abstractions for the communication with heterogeneous sensors and actuators to support high-level interaction with them.

2.2 Context-Awareness

Several works have defined the terms context and context-awareness. In this chapter, we rely on a generic, well-known definition from Dey and Abowd (2000): Context is any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves. Context-awareness is one of the most notorious characteristics of PIS as it is related to the pervasive capability of collecting, processing, storing, and reasoning about environmental information on a real-time basis. This requirement is essential to support PIS self-adaptation to any environmental condition. For instance, users’ mobility, or any environmental disruption, such as temperature increase, that can impact the quality of the application.

In essence, middleware should provide a well-defined interface to generic context management solutions to prevent PIS from dealing with the burden of context-awareness management. Middleware for PIS typically should offer system-level services to deal with context data acquisition, storage, reasoning, discovery, and query processing, as well as automated context-aware adaptation.

In the logistic chain traceability system example, context-awareness is essential to provide specific information to the different stakeholders involved. For instance, the insurance companies do not need to receive the same information as the customs services. Insurance companies are often interested only in information important to the insurance context, which differs from the information of interest to the customs services. Another example of context-awareness, is to automatically trigger alert and reconfiguration in case of an inappropriate temperature detection.

2.3 Dynamic Adaptation Capabilities

PIS have to be dynamic for diverse reasons, such as failure management, energy budget, network unavailability, user mobility, and unpredictable interactions. In the face of these situations, PIS middleware should hence provide dynamic adaptation capabilities to ensure the quality and availability of applications at runtime. Dynamic adaptation means the ability of an application to reconfigure its structure, behavior, protocols, etc. without interrupting its execution, ideally with minimal or no human intervention or disruption.

PIS should possess inherent characteristics that make dynamic adaptation particularly relevant. Context-awareness is related to the ability of a system to perceive information about the context where it is inserted into. By sensing environmental conditions, the system can recognize the current context and adapt itself according to changes in it. Another sort of dynamic adaptation in PIS is device mobility, e.g., a user with a mobile device in the environment at a given moment and leaving that location at another, so that PIS needs to transparently discover and (un)link participating devices into the network. Kourouthanassis and Giaglis (2015) also raises opportunistic user interaction as a challenge to the development of PIS, in the sense that it may not be possible to know in advance the users who will interact with the system or the frequency of such interactions. All these features need to be adequately supported by PIS middleware components to enable building applications atop them that can have their structure and behavior adapted at runtime while maintaining their availability and quality.

In the logistic chain traceability system example, dynamic adaptations may be required due to communication latency issues (e.g., changing protocols for the sake of reliability and performance), anomalous operation, unavailability of connected devices due to a low power level or even failure, or measures to improve the accuracy of gathered data. These scenarios point out PIS middleware to maintain availability and work properly in such a dynamic environment while collecting, analyzing, planning and reacting to changes.

2.4 Quality of Context Management

An important requirement concerns monitoring and managing the quality of the context information received by applications. International standardization bodies underline the importance of uncertainty in metrology (Joint Committee for Guides in Metrology 2008). When reporting the result of a measurement of a physical quantity, some quantitative indication of the quality of the result should be given so that those who use it can assess its reliability.

Regarding context information, Henricksen and Indulska (2004) acknowledge that it may be inherently ambiguous, when two different sources provide contradictory information, inaccurate, when too little information is available about a situation, or even erroneous when it does not reflect reality. For information provided by open data or human beings, e.g., data from social networks, the latter may be incomplete or erroneous, whether voluntarily or not. In general, context information sources are numerous and diverse. They do not all share the same formats or units of measurement, which means that conversion operations are necessary and potentially add new errors.

Quality of Context (QoC) has first been defined by Buchholz et al. (2003) as any information that describes the quality of information that is used as context information and can be represented as a set of parameters that reflects the quality of context data (Bellavista et al. 2012). We consider that QoC parameters, such as accuracy or currentness as defined in ISO/IEC 25012 (2008), should therefore be associated with context information in the form of metadata and be used to compute the quality level of context information.

In the case of logistic chains, at least four quality parameters should be considered in these metadata (Ahmed et al. 2021): (1) the accuracy, to ensure that the collected data represent the reality of the shipment conditions, (2) the completeness, to ensure that there is no gap in the collected data, (3) the consistency, to ensure the users’ agreement on the traceability data collected from multiple sources, and (4) the currentness, to ensure that the collected data are timely valid.

Information provided to context-aware applications is derived through analysis operations and various transformations. However, if these operations are performed on erroneous information, the new information produced is also erroneous. QoC management must hence be carried out throughout the entire information life cycle, from its collection to its dissemination to the applications through all the intermediate transformation steps. Middleware should enable applications to become QoC-aware and provide PIS developers with QoC management facilities.

2.5 Application Development Support

Middleware platforms are a key element in leveraging application development by abstracting away the specificities of the underlying distributed components from users and exposing valuable reusable services to applications. Besides an accessible programming model that adequately supports application developers by taking advantage of abstractions exposed by PIS middleware, it is relevant to come up with interoperable environments that could assist those developers to effortlessly build their applications while orchestrating the diversity of existing devices, platforms, and services. Inspired by a cloud-based IoT scenario (Truong and Dustdar 2015), the life cycle of developing a PIS may comprise (1) selecting, composing, and integrating components across the system for specifying and developing possible governance and control operations, (2) deploying several types of software components at different levels of abstraction and capabilities to configure deployments and continuous resource provisioning, and (3) capabilities to monitor end-to-end metrics and perform governance processes across the system. Transversally, it is necessary to provide environments supporting the development of applications based on data streams generated by devices and available through the underlying deployment infrastructure (i.e., cloud, edge).

2.6 Support for Multiple Interaction Patterns

To facilitate the development of applications that exchange data between devices and services, PIS middleware platforms rely on IP-based protocols. These protocols abstract distributed peers that interact with each other based on different interaction patterns, such as request-reply, publish-subscribe and event-based. Middleware protocols are typically available through an API, and each protocol supports several characteristics (synchronous/asynchronous interactions, QoS guarantees, etc.). In general, each interaction pattern can be characterized by (1) its semantics, which expresses the different dimensions of coupling among interacting peers, and (2) its API, with a set of primitives expressed as functions provided by the middleware.

The request-reply pattern is commonly used for Web Services and followed by popular middleware protocols such as HTTP, XMPP, etc. A client interacts directly (without intermediate components) with a server either by direct messaging (one-way) or through remote procedure calls (RPC). Request-reply protocols usually support both synchronous and asynchronous interactions. In turn, the publish-subscribe pattern is commonly used for content broadcasting. Middleware protocols such as MQTT and AMQP, APIs (e.g., JMS) and message brokers such as RabbitMQ, EMQx, and Mosquitto follow this pattern. Multiple publisher-consumer peers interact via an intermediate broker. Consumers subscribe to a specific filter (e.g., topic-based filters) on the broker while publishers produce events to that filter, whereas consumers receive events in a FIFO order. Publish-subscribe protocols commonly support asynchronous interactions. In Sect. 3.2, we provide an overview of existing protocols that can be classified into the request-reply and publish-subscribe patterns.

PIS are characterized by diverse entities (devices, systems, users, etc.) that are pervasively inserted into the environment and provide context information about this environment. These entities are also inherently mobile, i.e., they may be present in the surroundings at a given instant of time and no longer be there at another one, and they may be unknown a priori at design time. Such characteristics lead the communication in a PIS to be preferably loosely coupled due to the inherent dynamicity of interaction among the system constituents and scale well upon the many entities envisioned in PIS environments. In this perspective, Biegel and Cahill (2015) especially advocate using an event-based pattern (Bacon et al. 2000) in PIS middleware as a means of providing asynchronous communication in a many-to-many, loosely-coupled interaction among the distributed application components.

2.7 Enabling Interoperability

It is essential to tackle heterogeneity across multiple layers to enable interoperability between IoT devices and other PIS components. For instance, in the logistic chain traceability scenario, a shipment may provide information regarding its state through the following application layer operation: get_shipment_state (id). However, a carrier may require the shipment status via query_shipment (shipment_id, state). Such issues at the application layer can be qualified as semantic heterogeneity issues. Ensuring end-to-end data consistency is one of the goals of semantic interoperability. There are two basic solutions for achieving semantic interoperability between two IoT devices. The first solution is a one-to-one model mapping. Another more suitable approach is to use shared data meta-models that can be used to unambiguously define the meaning of terms in existing models, such as ontologies. In Sect. 3.3, we discuss some existing semantic interoperability approaches in the literature.

Semantic interoperability ensures mapping between diverse data models employed by IoT systems. However, this alone does not make the interacting devices fully interoperable. Different APIs and data representations and primitives used by IoT devices must be mapped with each other at the middleware layer. Solving the middleware interoperability issue is challenging, mainly due to the fast development of protocols and APIs. Existing efforts address the middleware interoperability issue by relying on service-oriented architectures (SOA), IoT gateways, cloud computing platforms, and model-driven engineering. In Sect. 3.3, we discuss some middleware interoperability approaches in the literature.

2.8 Security and Privacy

To promote the user acceptability of new IoT-enabled PIS applications, it is essential to provide mechanisms to ensure the privacy of users and the protection of the handled data. With the heterogeneity and amount of connected things and the unprecedented amount of collected data, security and privacy are no longer an option in PIS. They should be enforced throughout the entire software life cycle. PIS middleware is the right layer to intercept the information flow of applications and integrate security and privacy mechanisms. Such mechanisms can then benefit all applications by default, with the possibility to configure some specific business rules to take into account applications needs.

Security corresponds to the degree to which a product or system protects information and data so that persons or other products or systems have the degree of data access appropriate to their types and levels of authorization (ISO/IEC 25010 2011). More specifically, cybersecurity is about ensuring three properties of information, services, and systems, namely confidentiality, integrity, and availability. Securing an information system means preventing an unauthorized entity from accessing information, services, and systems, modifying them, or making them unavailable. Privacy can be thought of as the confidentiality of the relationship between people and data. Therefore, it is important to notice that privacy can be guaranteed only when a security strategy is enforced in an end-to-end way. While relying on cryptographic primitives and protocols, privacy protection involves its own properties, techniques, and methodologies.

Cavoukian and Dixon (2013) recommend aligning seven principles for both security-by-design and privacy-by-design. These principles are: (1) proactive and preventative, not reactive and remedial, to anticipate and prevent invasive events before they happen; (2) default setting as no action should be required on the part of individuals for their protection; (3) embedded into the design, not bolt after the fact; (4) a positive-sum, not zero-sum but full functionality by accommodating all legitimate interests and objectives; (5) an end-to-end approach, by ensuring secure life-cycle management of information with confidentiality, integrity, and availability of all information for all stakeholders; (6) visibility and transparency, by keeping IT systems’ internal parts transparent to users and providers and by following open standards; and (7) respect for the user in a user-centric approach to protecting the interests of all information owners.

PIS technology is still in its infancy and does not have utterly standardized security and privacy requirements (Chaudhuri and Cavoukian 2018). Alhirabi et al. (2020) recommend using threat modelling techniques during the design stage, like STRIDE for security threats and LINDDUN for privacy threats. The STRIDE framework (Howard and Lipner 2006) is an acronym for Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege. The LINDDUN framework (Deng et al. 2011) is an acronym for Linkability, Identifiability, Non-repudiation, Detectability, information Disclosure, content Unawareness, and policy and consent Non-compliance.

Even though PIS mainly rely on an event-driven data reporting method (see Sect. 2.6), there may be situations when a query-driven approach is more relevant to get insights about some phenomenon at a given time. For instance, a query would get a particular set of sensor readings satisfying some condition. Data query privacy (López et al. 2017) is hence an important requirement of PIS in order to reduce the risk of exposing sensitive information to attackers when issuing queries.

In the logistic chain traceability system example, the collected data should be kept confidential and not be transferred to or stored by untrustworthy third parties. Anonymity or pseudonymity should also be enforced so that untrustworthy third parties can distinguish location information from fake locations. These are just examples of some issues. Many more security and privacy aspects should be considered in a PIS middleware all along the data life cycle and at all the system architecture.

2.9 Scalability

The IoT paradigm calls for exchanging data among dynamic, heterogeneous sensors and client applications at unprecedented scales. We follow the framework from Duboc et al. (2007) for characterizing the scalability of PIS middleware. For instance, when considering an IoT-based solution, the scaling dimensions, which represent the scaling aspects, are the number of queries per second and the number of machines in the cluster. The non-scaling variables are the network conditions (e.g., available bandwidth). The dependent variables, which represent the aspects of the system behavior affected by changes in the scaling dimensions, are the response time for a query, bandwidth usage, and cluster load. In this first example, the requirement can then be formulated as follows: “the studied system shall scale with respect to latency” because it can maintain a maximum given response time as the number of requests per second scales by varying the number of machines in the cluster. In another architectural style, such as a highly-distributed publish-subscribe system for the PIS middleware, the scaling dimensions shall include the number of intermediary entities (i.e., brokers of the overlay network) that route data from sensors to client applications. We shall then measure the total resource consumption for filtering data records through the multiple brokers from the sensor to the client application.

When considering the logistic chain traceability illustrative application domain, architects may differentiate PIS systems deployed in relatively small areas mainly managed by one administrative entity, such as merchandise warehouses or ocean liners, from more extensive areas with many stakeholders, e.g., in port cities. In the former scenario, IoT solutions, including ones enhanced with cloud computing, may be appropriate. In the latter configuration, more distributed, decoupled solutions involving several brokers along with distributed routing and filtering might be required.

2.10 Energy Efficiency and Energy-Awareness

Penzenstadler (2015) point out that new quality attributes have recently been studied by the research community in the objective to keep systems sustainable. In the past, resource utilization mainly referred to the efficiency of the use of the available processing, storage, and network. For energy-efficiency purposes, the resource to be monitored is the energy consumption.

While energy efficiency means using less energy to perform a given task, energy-awareness represents knowing the energy consumption for a given task. The middleware can use energy-awareness to reduce energy consumption through energy-saving strategies, e.g., protocol, scheduling and the volume of exchanged data. Energy-awareness can also be shared with upper layers of applications. Applications may adjust their behavior for energy-saving purposes, e.g., reducing some requirements to remain within the limits of a given energy budget. Applications can also share energy consumption reports with end-users who could adapt their usage based on energy consumption knowledge. Indeed, energy-awareness is expected to have a positive impact in terms of energy efficiency (Hassan et al. 2009).

In the logistic chain traceability system example, to achieve energy efficiency, the PIS middleware could: (1) at design time, choose the most energy-efficient consensus algorithm for sharing securely and transparently data between the stakeholders (Sedlmeir et al. 2020); (2) at runtime, reduce the volume of exchanged data by filtering data based on their content or minimizing the frequency of data transmissions (de Oliveira et al. 2020). For energy-awareness, the middleware could adapt the frequency of data transmissions to keep energy consumption above a certain level of energy budget or transmit energy consumption information to the application level, for example, to inform the end-user about the consumption of the energy budget.

PIS middleware should integrate architectural tactics for energy efficiency, e.g., energy monitoring, resource allocation, and resource adaptation (Paradis et al. 2021). It is reasonable seeing middleware as the good level for integrating energy management strategies due to its operation at the protocol level and high reusability (Noureddine et al. 2013). Additionally, middleware should provide energy-awareness mechanisms (Verdecchia et al. 2021) that allow future PIS providers to master energy consumption.

3 State of the Art on Middleware Supporting PIS Requirements

In Sect. 2, we have identified and defined the requirements for PIS middleware in the context of the IoT. Context-awareness state-of-the-art is covered in chapter “The Context Awareness Challenges for PIS” of this book. In this section, we present only the state of the art concerning the most pregnant requirements in the context of the IoT.

3.1 QoC Management

Even though the management of the quality of context data has long been recognized as a requirement of PIS and context-aware applications (Buchholz et al. 2003), only a few middleware actually provide the necessary support for QoC. We herein present some recent initiatives and summarize the provided mechanisms.

The ContextNet middleware (Endler and Silva 2018) integrates QoC management through the Context Data Distribution Layer (CDDL) (Gomes et al. 2017a). A set of QoC parameters is available, including accuracy, measurement time, age, completeness, and numeric resolution. ContextNet targets the Internet of Mobile Things (IoMT) and takes dynamicity into account at different levels. QoC parameters may also exhibit dynamic variability (they oscillate over time), and CDDL can monitor the variation of a given QoC parameter. CDDL also offers filtering based on context data and their QoC metadata.

The LAURA architecture (Teixeira et al. 2020) was designed to support the deployment of decoupled IoT applications. LAURA provides a fog layer that plays the role of an intermediate between applications and the network or sensor nodes and can be regarded as a middleware. This fog layer, still under development, is designed to filter or aggregate data received from the physical layer to prevent unnecessary or poor quality data from being sent to upper layers. QoC parameters are associated with the sensed data, allowing user applications to verify the context data’s usefulness or temporal relevance. QoC-based filtering and aggregation are seen as important features of LAURA.

Jagarlamudi et al. (2021) proposed a Service Level Agreement (SLA) template integrating a QoC-aware mechanism, called the Relative Reputation (RR), to select context providers with high RR values. The QoC evaluator generates the RR unit representing the match between QoC outcomes and QoC requirements. A mechanism of penalties also exists to indicate the applicable penalties with each QoC indicator’s degradation in the context response compared to its guarantees.

3.2 Protocols for Multiple Interaction Patterns

As mentioned in Sect. 2.6, PIS middleware platforms leverage communication protocols upon different interaction patterns. We herein provide an overview of existing middleware-based IoT protocols. These protocols offer middleware primitives that aim to facilitate the development of IoT applications that include resource-constrained IoT devices. Karagiannis et al. (2015) compare the most promising IoT middleware protocols (more specifically, the ones mentioned here). Even though there are multiple IoT protocols, no single protocol has been adopted yet for IoT system development. This is mainly because the IoT is too diverse, including multiple data formats and (possibly highly) resource-constrained devices.

Protocols such as DPWS, OPC UA, CoAP, and XMPP have been introduced to support data exchange among peers based on the request-reply interaction pattern. OASIS introduced DPWS (Zeeb et al. 2007) in 2004 as an open standard, and it is suitable for supporting large-scale deployments and mobile devices. Nevertheless, the induced protocol overhead is noticeable and requires a large amount of RAM. The OPC Foundation designed OPC UA (Mahnke et al. 2009) in 2008 to target resource-constrained devices, but it implies a large payload unsuitable for IoT applications. IETF designed CoAP (Shelby et al. 2014), a lightweight protocol that supports highly resource-constrained devices and the delivery of small message payloads. Finally, XMPP (Saint-Andre 2011) is now a suitable protocol for IoT real-time communications, even though it uses XML data formats that create a significant computational overhead.

The publish-subscribe interaction pattern is an alternative to request-reply and offers time and space decoupled interactions. The Sun Microsystems’ JMS standard has been one of the most successful asynchronous messaging technologies available by defining an API for building messaging systems. DDS (OMG 2015) is a messaging protocol designed for brokerless architectures and real-time applications. AMQP (OASIS 2012) is another messaging protocol designed to support applications with high message traffic rates. To support highly resource-constrained devices, MQTT (Banks and Gupta 2014) offers a publish-subscribe centralized architecture, but its performance decreases significantly when sending large message payloads. WebSockets (Fette 2011) were introduced to support real-time full-duplex interactions using only two bytes of overhead in message payloads.

3.3 Enabling Interoperability

Different data representations and APIs among IoT devices, platforms, and applications can be mapped with each other at the middleware layer. However, this alone does not make the interacting peers fully interoperable. There are indeed incompatibilities of IoT devices at the application layer, e.g., operation/resource names, data semantics, etc.

Ontologies (Gruber 1993) provide a common model for annotating content and thus help systems to interoperate. We review well-known ontologies for general sensor modeling. The W3C Semantic Sensor Network (SSN) ontology (Compton et al. 2012) presents a vocabulary to describe sensors and their observations, actuators, and their association to features of interest. Its central building block is the SOSA (Sensor, Observation, Sample, and Actuator) ontology (Janowicz et al. 2019), a standalone light-weight ontology that offers the core vocabulary for the descriptions. The Smart Appliances REFerence (SAREF) ontology (Daniele et al. 2015) follows a similar design to describe concepts required by smart applications. In SAREF, devices make measurements related to properties of interest (similar to sensors making observations in SSN). Depending on the application under development, developers must use the appropriate ontology. For example, the SAREF ontology is commonly used to model information of appliances in smart homes.

Several approaches to bridge middleware-based protocols have been proposed concerning APIS, protocols, and data representations, e.g., the QEST broker for CoAP and RESTful APIs (Collina et al. 2012), HTTP-CoAP proxy (Castellani et al. 2012), and Ponte for REST, CoAP, and MQTT (Banks and Gupta 2014). These approaches implement one-to-one mappings between existing protocols. Despite the simplicity, this is highly inefficient due to the vast development of IoT protocols. Negash et al. (2015, 2016) introduces the Lightweight Internet of Things Service Bus (LISA) for tackling IoT heterogeneity. Derhamy et al. (2017) introduced a protocol translator that utilizes an intermediate format to capture all protocol-specific information. XWARE (Roth et al. 2018) implements mediators to translate messages of IoT protocols by using an intermediate format. Finally, Georgantas et al. (2013) extended the Bouloukakis et al. (2019)’s work to deal with IoT heterogeneity using software abstractions and code generation.

While the above approaches considerably reduce the development effort, they do not consider semantic layer incompatibilities prevalent in the IoT. IoT platforms such as SemIoTic (Yus et al. 2019) provide end-to-end IoT interoperability in smart buildings by leveraging the SSN/SOSA ontologies and mediating adapters. In addition, it leverages the middleware-based interoperability approach that is further presented in Sect. 4.3.

3.4 Security and Privacy

In a comparison of 50 context-aware computing research projects, Perera et al. (2014) identified that only 11 projects (about 20%) provided security and privacy solutions. More recently, Alhirabi et al. (2020) reviewed the evolution of design notations, models, and languages that facilitate capturing the non-functional requirements of security and privacy. The majority of the requirement engineering efforts are focused on security. Among the 47 design notations analyzed in their study, security is supported by more than half (32 notations out of 47), while only three notations cover privacy. Even though a by-design approach has long been recommended for both security and privacy (Cavoukian and Dixon 2013), it is still not sufficiently put into practice by developers. Aljeraisy et al. (2021) highlight that there is still a relevant gap between legislation and design patterns that can help to translate and implement them.

Aljeraisy et al. (2021) analyzed data protection laws used across different countries, namely the European General Data Protection Regulations (GDPR), the Canadian Personal Information Protection and Electronic Documents Act (PIPEDA), the California Consumer Privacy Act (CCPA), the Australian Privacy Principles (APPs), and the New Zealand’s Privacy Act 1993. The authors then retained the fundamental principles and individuals’ rights to define the Combined Privacy Law Framework (CPLF) by eliminating duplication. Finally, they mapped CPLF with privacy-by-design (PbD) schemes (e.g., privacy principles, strategies, guidelines, and patterns) previously developed by different researchers to investigate the gaps in existing schemes. The results of this extensive study helped to identify where new privacy patterns should be defined. More than 70 privacy patterns have already been proposed in the literature (Colesky et al. 2022; Kargl et al. 2022) and they are a relevant, concrete mechanism to handle data usage and protection in a specific context. However, some principles and rights of CPLF are not achieved by any existing privacy pattern and call for further research.

While security and privacy research is very active, its integration into operational middleware is still limited. Fremantle and Scott (2017) analysed 54 IoT middleware frameworks and observed that they address security and privacy in very different ways. A majority of these middleware frameworks provide access control and authentication mechanisms, and others focus on providing protection for the content shared on the network. However, very few middleware frameworks support a sufficient coverage of the features required to support security and privacy for PIS.

3.5 Scalability

Without middleware, i.e., when applications directly obtain IoT data from sensors, existing coupling significantly hampers the system’s scalability. Therefore, as formulated by Bellavista et al. (2012), PIS middleware architectures are classically first organized according to the following question: is the middleware centralized or decentralized? The centralized approach includes deploying middleware on a single host or cloud. The second approach has two subcategories depending on whether the distribution is hierarchical or not. Consequently, the basic solutions for scaling up follow these three classes of solutions.

The architectures of the first class of solutions have been referred to as Web of Things (Delicato et al. 2013) or, more recently, Cloud of Things (Dias et al. 2020). Scalability issues arising in these centralized architectures concern the complex processing of a huge quantity of data with many clients either producing or requesting data. Cugola and Margara (2012) surveyed solutions for complex event processing and stream processing.

The second class of solutions dealing with scalability targets this requirement at a local scale, a.k.a. localized scalability (Satyanarayanan 2001b). A collection of small clouds, i.e., cloudlets, typically are brought to lower latencies between so-called co-located clients: these smaller clouds are physically distributed to form smaller groups of clients. This architectural style corresponds to what we know as fog computing. Perera et al. (2017) surveyed such solutions for smart cities.

To target scalability at a global scale, an architecture based on publish-subscribe is preferred as it favors decoupling. Eugster et al. (2003) distinguish three forms of decoupling, namely space, time, and synchronization decoupling. In this approach, some clients publish IoT data while others consume these data. As surveyed by Bellavista et al. (2012), a first set of solutions organize an overlay of brokers responsible for routing IoT data from producers to subscribers. Producers push data to their access broker, and brokers forward them to the consumers that have subscribed to these data. A data model and a filtering model define the non-scaling variables of publish-subscribe solutions: roughly speaking, topic-based filtering with opaque data scales better than content-based filtering with structured data or semi-structured data. In addition, the diameter of the overlay network of brokers is the other non-scaling variable. Kermarrec and Triantafillou (2013) surveyed a second set of solutions targeting non-broker-based routing and using topic-based filtering. These solutions are constructed as peer-to-peer systems: peer nodes simultaneously play the three roles, namely publishers, subscribers, and routers.

Finally, note that broker-based PIS middleware protocols such as AMQP and MQTT are topic-based and cloud-based, but without complex event processing or streaming. This is precisely the role of recent works such the one of Luckner et al. (2014), and of industrial platforms such as AWS IoT Core,Footnote 3 Google IoT Core,Footnote 4 Microsoft Azure IoT Hub,Footnote 5 and FIWARE,Footnote 6 to add complex event processing and streaming to publish-subscribe middleware standards. These platforms are proof, if any were needed, of the interest of major operators in working to integrate scalability into PIS middleware.

3.6 Energy Efficiency and Energy-Awareness

Middleware has recently explored some strategies for the IoT for energy efficiency and energy-awareness purposes. The most used strategy for energy efficiency is network adaptation. Network adaptation refers to introducing new protocols, modifying existing ones, and making network optimizations. Akkermans et al. (2016) proposed adapting a publish-subscribe middleware by adding a layer between the broker and the client applications to send notifications via IPv6 multicast rather than using several point-to-point messages. Kalbarczyk and Julien (2018) proposed Omni, a device-to-device middleware with periodic adaptive discovery of neighbor devices using lightweight discovery mechanisms in wireless local area networks. Discovered devices are only connected when data needs to be transferred, and the communication technology adapts both to the network energy efficiency and the volume of data.

Task offloading stands for using the network to transfer software components to other locations. For example, an application running on a mobile phone could send data to a server in a cloud or to another computer in its vicinity for data processing purposes. Several authors such as Aazam et al. (2020), Pasricha (2018), Song et al. (2017), Ivarez-Valera et al. (2019), and Shekhar et al. (2019) proposed middleware to offload software components to other nodes in the cloud or the fog as a means of saving energy on the source nodes. All these proposals show that task offloading has a benefit in terms of energy consumption for at least one of the nodes of the system.

The data filtering capability offered by some middleware proposes processing data to reduce the number or the size of messages according to specific criteria. Adaptive sampling and adaptive filtering (Giouroukis et al. 2020a) are two techniques that have emerged over the last decade. These techniques dynamically reconfigure rates and filter thresholds to trade-off data quality against resource utilization. de Oliveira et al. (2020) proposed a data stream processing workflow to be deployed at the network’s edge to perform data cleaning tasks.

Another strategy used by middleware is to temporarily reduce the activity of some nodes to reduce the infrastructure energy consumption. This strategy is used for a time in data centers. For example, Binder and Suri (2009) presented a dispatch algorithm to concentrate services on a reduced number of servers so that they put inactive servers in a sleeping mode to save energy in the data center. In the context of the IoT, this strategy is used as well and is known as active node selection. For example, Cecchinel et al. (2019) proposed determining an optimal configuration of sensors towards extending their battery life. Sarkar et al. (2016) proposed to reduce interactions among the nodes of a wireless sensor network and hence the network’s energy consumption. The data stream processing workflow proposed by de Oliveira et al. (2020) also includes active node selection. Active node selection can hence reduce energy consumption on some of the nodes of a PIS.

The second requirement concerning energy introduced in Sect. 2.10 is energy-awareness. The energy-awareness may be provided at the middleware or the application level (i.e., knowledge shared through middleware abstractions with the application components). At the former level, energy-awareness may be used to constrain the system’s energy consumption through an energy budget configuration. For example, Padhy et al. (2017) proposed a middleware to minimize the total energy consumption of an IoT application while ensuring that the requested accuracy is met. The middleware intends to find the sensors that consume the minor energy while satisfying the sensing requirements and maximizing the overall accuracy under an energy budget. For the latter level, we found some examples where applications express energy requirements (e.g. Song et al. 2017) for deployment purposes. However, middleware does not usually expose energy consumption to upper layers.

Energy consumption is a recent concern for the community working on IoT middleware. Some middleware has mainly handled energy efficiency to reduce energy consumption only on some systems parts. We noticed a few middleware proposals providing energy-awareness to the upper layers.

4 PIS Middleware Proposals

We have been working on middleware for the IoT and PIS for some years. Different software is available in open source (Bouloukakis et al. 2022; Conan et al. 2022; Gomes et al. 2017b) and some of these proposals are presented below. Table 1 summarizes the requirements tackled by each of them.

4.1 QoC Management with QoCIM and Processing Functions

Based on the QoC criteria most frequently mentioned in the literature, it is possible to notice that no criteria can respond to all the needs of applications, each having its own method for computing the quality of context information. We have then focused our attention on realizing a model able to represent any type of QoC criteria. This resulted in QoCIMFootnote 7 (Quality of Context Information Model) (Marie et al. 2013), a meta-model dedicated to modeling QoC criteria and enforcing expressiveness, computability, and genericity of QoC management. QoCIM offers a flexible ideology, i.e., it defines a basis to design and represent any QoC criterion instead of providing a predefined list of supported QoC criteria. With QoCIM, a given QoC criterion can also be built upon other primitive or composed QoC criteria.

QoCIM is complemented with the specification and implementation of a set of functions for processing context information and its QoC metadata. The goal of these processing functions is to provide the developers of PIS with middleware programming facilities to process context information together with its associated QoC metadata efficiently. The functions manage three types of data: (1) context information sensed and collected from different sources; (2) QoC metadata modeled with QoCIM, each piece of QoC metadata corresponding to an instance of a QoC indicator, and (3) message encapsulating a piece of context information associated to a list of QoC metadata. There are functions for aggregation, filtering, inference, and fusion of context information with QoC metadata. These functions can be configured to determine what computing method to use and to indicate the number of messages to be taken as input. The configurability of the functions is based on a declarative solution.

The aggregation function applies an aggregation operator onto a list of messages. The result is a message with the same abstraction level. The choice of the aggregation operator (arithmetical average, for instance) is specified in a configuration file. There is also a distinction between temporal aggregation and spatial aggregation. The former handles information coming from a single context source and produced during some time. The latter handles information coming from several context sources that periodically produce the same type of context information. The filtering function analyzes the message and decides to remove it or not, but the content of the message itself is never modified. The inference function applies an inference operator onto a list of messages. The result is only one message with a higher abstraction level. The fusion function executes a set of functions sequentially. The result is a list of messages with a higher abstraction level.

QoC management must take place throughout the whole chain of processing context information. A declarative programming approach allows qualifying context information and self-adapting QoC management due to potential physical limitations of the processing entities (Marie et al. 2016).

4.2 muDEBS

Distributed-based event systems (DEBS) for broad IoT face unprecedented scales regarding the volume of exchanged data, number of participants, and communication distance. As many brokers may be involved, a high amount of messages may be exchanged when installing subscription filters and, most importantly, when routing numerous events from producers to consumers. muDEBS Footnote 8 (Conan et al. 2017) take advantage of the inherently heterogeneous nature of broad IoT systems to control and limit the amount of exchanged data. Some sources of heterogeneity, such as geographical and group membership heterogeneity, may delimit visibility scopes for data distribution, with notifications being visible only in certain scopes. More precisely, Fiege et al. (2002) define scope as an abstraction that bundles a set of clients (producers and consumers) in that the visibility of notifications published by a producer is confined to the consumers belonging to the same scope as the producer; a scope can recursively be a member of other scopes. In muDEBS, filtering is impacted by the visibility of notifications that are analyzed according to several dimensions of scopes. A client advertises or subscribes by providing a filter tagged with a set of scopes, with at most one scope per dimension, e.g., interest in geographical scopes or areas belonging to end-users scopes or groups. A notification is visible to a client if it is visible in all the dimensions. In summary, muDEBS targets scalability by scoping the distribution of data between producers and consumers.

IoT data can be exploited by pervasive applications to detect the users’ current situation and provide them with the relevant services corresponding to their precise needs. The threats to the users’ privacy appear more clearly and Chabridon et al. (2014) have shown that QoC and privacy are closely related and must be addressed together in order to find a workable solution. As a first step, Lim et al. (2015) identified models for a first set of attributes to be specified in privacy policies, namely purpose (intention of use), visibility (who has access), and retention (for how long data may be retained). Following these models, IoT producers specify privacy requirements and QoC guarantees in producer context contracts that are then registered in muDEBS as XACML policies.Footnote 9 On their side, IoT consumers express their QoC requirements and the privacy guarantees that they are committed to fulfilling in consumer context contracts, mentioning at least for what purpose they are requesting access to some specific IoT data. Privacy guarantees take the form of ABAC information registered with the subscription filters. QoC guarantees and requirements are expressed by following the QoCIM model (see Sect. 4.1). As a second step, Denis et al. (2020) studied confidentiality under the semi-trusted broker assumption in which brokers are considered honest-but-curious, i.e., brokers route the publications to the interested consumers, but they can make use of the data for their own interest. More precisely, confidentiality concerns encompass (1) part or all of the constraints of the subscriptions, (2) part or all the information in the publication that is used for routing against subscriptions, and (3) the payload of the publications. The solution proposed in muDEBS adapts an existing attribute-based encryption scheme and combines it with data splitting, a non-cryptographic method called for alleviating the cost of encrypted matching. Data splitting enables forming groups of attributes sent apart over several independent broker networks. It also prevents the identification of an end-user, and only attributes are encrypted to prevent data leakage.

4.3 DeX Mediators

IoT devices employ middleware-layer protocols such as MQTT, CoAP, ZeroMQ, and more to interact with each other. These protocols support different Quality of Service (QoS) semantics. They define multiple data-serialization formats (e.g., JSON, XML, protobuf, etc.) and different payloads suitable for constrained or healthy devices and follow different interaction patterns such as request-reply and publish-subscribe. IoT systems include heterogeneous IoT devices employing any of those protocols. In many cases, new heterogeneous IoT devices may be added to an IoT system in an on-demand fashion. For instance, in the logistic chain traceability scenario, IoT devices in the shipment must interact with the services of the IS that dynamically collect information for decision-making purposes. Therefore, generic, automated solutions must enable data exchange in such IoT systems.

The Data eXchange Mediator Synthesizer (DeXMS)Footnote 10 (Bouloukakis et al. 2019) addresses the heterogeneity of IoT devices and services by synthesizing software mediators. As depicted in Fig. 1, DeXMS relies on the Data eXchange (DeX) API, which implements POST and GET primitives for sending/receiving messages using existing IoT protocols such as CoAP, MQTT, XMPP, etc. In the illustration, the mediator converts temperature data from a package (in JSON format through the HTTP protocol) to be received from an IS dashboard (in XML format through the MQTT protocol). Considering a set of heterogeneous IoT devices that have to interconnect with devices deployed in an IoT system, DeXMS accepts their input/output data representation models as input and synthesizes the required mediators. Based on the requirements defined in Sect. 2, DeXMS provides a semi-automated manner to tackle interoperability among devices employing middleware-layer protocols (classified to diverse interaction patterns). Regarding the application-layer, DeXMS enables developers to manually perform data mappings between applications semantics. More details on DeXMS can be found in the work of Bouloukakis et al. (2019).

Fig. 1
figure 1

Enabling data exchange via mediators

4.4 QoDisco

A pervasive context encompasses a distributed plethora of heterogeneous resources (sensors, actuators, services) with different functionalities and communication protocols. In this scenario, a well-known challenge for both machines and users is finding, selecting, and using these resources. Discovering services play a significant role in addressing this issue by enabling clients (applications, middleware, end-users) to retrieve available resources based on complex search criteria considering contextual information essential in a pervasive environment.

QoDiscoFootnote 11 is a QoC-aware federated discovery service supporting multiple-attribute searches, range queries, and synchronous/asynchronous operations. It encompasses an ontology-based information model for semantically describing resources, services, and QoC-related information. QoDisco is structured upon a distributed architecture composed of a federation of autonomous repositories cooperating with each other to perform data and service discovery tasks. It provides an API to perform discovery tasks in such repositories, and each repository provides operations for querying and updating records. Clients are responsible for semantically annotating resource data (such as the ones provided by sensors) by using the concepts of the QoDisco information model. When receiving a discovery request, QoDisco searches for resource descriptions or data stored in the available repositories, thereby hiding the heterogeneity.

The semantic description of resources defined by the QoDisco information model relies on: (1) the SAN ontology (Spalazzi et al. 2014), an extension of the W3C’s SSN ontology (Barnaghi et al. 2011) that provides concepts, attributes, and properties to model both sensors and actuators; (2) part of the SOUPA ontology (Chen et al. 2005) aiming at including location-related concepts to describe spatial locations of entities in terms of latitude, longitude, altitude, distance, and surface, as well as symbolic representations of space and spatial relationships; (3) the OWL-S ontology for semantically modeling services exposed by the resources; and (4) part of the QoCIM meta-model (Marie et al. 2013) to describe QoC-related concerns (see Sect. 4.1). This information model supports the QoC management requirement and tackles data format heterogeneity using ontologies.

Due to the dynamic context in which the IoT resources operate, QoDisco handles both synchronous calls and asynchronous notifications. The former relies on request-reply interactions towards providing resource information at the moment of the search. The latter is based on publish-subscribe interactions to notify clients in case of resource removal, insertion, or update. More details on QoDisco can be found in the work of Gomes et al. (2019).

4.5 IoTVar

IoTVar Footnote 12 is a middleware that provides developers with abstractions for IoT variables. From a variable declaration, IoTVar automatically discovers matching data-producer objects and transparently deals with updates to these variables thanks to transparent interaction with IoT systems. IoTVar offers an abstraction level to interact with virtualized sensors. It drastically minimizes the number of lines of code to be written by the client application developer to obtain up-to-date sensor data from several hundreds of lines of code to a single dozen.

The IoTVar architecture has been designed to integrate new IoT platforms and IoT systems. For this purpose, it exposes an interface that can easily be implemented for integrating with new platforms. The architecture was focused not only on developing the IoT applications but also on expanding the middleware to support multiple IoT platforms. IoTVar is currently integrated with FIWARE, OneM2M (oneM2M Partners 2019), and MuDEBS (Conan et al. 2017) IoT platforms. More details on IoTVar can be found in the work of Borges et al. (2019).

IoTVar responds to some of the previously mentioned PIS requirements. The multiple IoT platforms supported by IoTVar have different data models and API access and use different protocols to retrieve sensor data. For the sake of interoperability, IoTVar includes data unmarshallers, IoT protocols handlers, and IoT API handlers, as well as it supports both publish-subscribe and request-reply interaction patterns to be chosen according to efficiency considerations. IoTVar also supports application development by providing an API accessible through code in the Java programming language and enabling IoT developers to access sensor data easily. The developer will declare environment variables by providing a simple IoT variable declaration. Those IoT variables will be automatically updated.

5 Open Challenges for Future PIS Middleware

Next-generation PIS are deployed at an unprecedented scale with components on connected mobile devices and remote servers in cloud and fog intermediaries. In this context, handling requirements from an end-to-end perspective is challenging. At the same time, mastering requirements such as privacy and sustainability becomes essential and even more complex. This section highlights some open challenges that can commission research on future PIS middleware.

5.1 Enabling End-to-End Interoperability

As mentioned in Sect. 3.3, existing middleware approaches enable interoperability at each layer (i.e., application, middleware, network) independently. However, enabling IoT interoperability requires introducing end-to-end approaches. This is challenging due to: (1) the difficulty to select a unique data model and IoT protocol to develop cross-domain IoT applications, which results in composing multiple IoT protocols and data models; (2) the existence of numerous IoT protocols to support diverse types of devices (healthy/constrained/tiny in terms of resources); (3) the diversity of data models to cover multiple application domains (healthcare, autonomous driving, etc.); and (4) end-to-end approaches are usually developed for specific application domains (e.g., smart buildings) and it is difficult to adapt them to other domains. Therefore, advanced end-to-end interoperability approaches must be introduced while considering those challenges.

5.2 PIS Adaptive Middleware

Previous research on the so-called adaptive middleware can indeed contribute to support dynamic adaptation in PIS, including proposals on context-aware applications (Huebscher and McCann 2006), ubiquitous computing (Yau and Karim 2004), wireless sensor networks (Portocarrero et al. 2016), IoT (Cavalcanti et al. 2021), cyber-physical systems (García-Vallis and Baldoni 2015), and cloud computing (Rafique et al. 2017). Adaptive middleware can be defined as a kind of middleware that enables modifying the behavior of a distributed application in response to changes in requirements or operating conditions (Sadjadi and McKinley 2003). To the best of our knowledge, the literature has still not explored building adaptive middleware to support PIS and provide these systems with dynamic adaptation capabilities.

Designing adaptive middleware needs to consider some 5W1H (What? Who? Where? When? Why? How?) issues typically associated with self-adaptive software systems (Salehie and Tahvildari 2009). It is necessary to understand (1) the need for adapting the middleware to changes in application requirements and context, (2) the time at which the adaptation needs to be triggered, whether proactively or reactively, (3) the extent of the adaptation in terms of how many components should be subjected to the adaptation, and (4) how the adaptation actions can be executed and implemented (Rosa et al. 2020). Designing PIS middleware with adequate support for dynamic adaptation should hence cope with these issues.

5.3 Support to Develop PIS Relying on Middleware

Middleware platforms are well-acknowledged to leverage the development of distributed applications, but this does not seem to be the case for PIS yet. Indeed, there is still no available programming model for PIS relying on middleware while coping with the characteristics of this class of systems. Biegel and Cahill (2015) highlight that existing solutions and approaches in the literature are not currently able to address the requirements for PIS middleware comprehensively, but rather only a subset of them. The authors also point out the significant effort necessary from application developers to deal with these requirements, an issue that hampers a broader adoption of PIS middleware in industrial settings. Therefore, a programming model able to ease the development of PIS relying on middleware is desirable.

The development of PIS relying on middleware faces other challenges. On the one hand, the proliferation of physical devices and platforms to support PIS may lead these systems to become primarily vendor/platform- and hardware-specific (Taivalsaari and Mikkonen 2017). This may also pose difficulties in finding the most suitable solution (or set of solutions) for a specific application and deepen users’ lack of experience and knowledge on understanding the implications for current and future needs. PIS middleware should hence enable applications to benefit from using different devices and platforms while relieving developers from dealing with their specificities through proper high-level abstractions.

5.4 Privacy and Security

Security for PIS is still a significant challenge as attacks are relatively easy in an open, connected world. Many devices were not designed for security, and their high number increases the attack surface, as well as their integration within the Internet that exposes them to numerous potential attackers. We underline some specific areas where research challenges need to be addressed by PIS middleware in the short term: (1) the need for low-cost cryptography primitives suitable for devices with limited resources; (2) security analysis of new low-power wireless wide area network technologies; and (3) the need for frameworks and protocols to facilitate the development of devices where security is considered from the design stage.

Considering privacy, our connected world has allowed unprecedented growth in personal data collection practices, with intrusion in our private life. The lack of transparency, the fact that many services and devices behave like black boxes, and the lack of user control raise major research challenges to enable PIS middleware to enforce data protection and privacy patterns. In addition, robust anonymization, which effectively resists deanonymization attacks while preserving data utility, remains an open research topic.

With resource-constrained devices and sustainability objectives, resource consumption of security and privacy solutions is gaining importance. We consider that this also opens some new research directions where concerns for security, privacy, and sustainability can be addressed jointly in PIS middleware.

5.5 Context Data Sampling and Filtering

As discussed in Sect. 3.5, many contributions exist that enable scaling PIS solutions deployed in Clouds. Among the next challenges, for scaling PIS deployed in highly distributed environments such as connected mobile devices and with fog intermediaries, the contextual data filtering module of a PIS middleware should strive to increase the system’s scalability by controlling and reducing the amount of transmitted data. Giouroukis et al. (2020b) classify filtering techniques into (1) time-based, i.e., sending data is suppressed until certain time conditions become true, and (2) change-based, i.e., sending data is suppressed as long as the contextual data are equal or similar to that previously transmitted. Of course, any combination of time-based and change-based filtering techniques is possible. For example, in the illustrative logistic chain traceability system, some applications may request to receive location updates only if the new location is not identical to the previous one and if an interval of at least 10 min has elapsed.

Adaptive sampling is, of course, closely related to adaptive filtering. For instance, tuning sensor sampling frequency enforces network usage optimization and can be performed according to the frequency of requests from deployed software applications. As another significant outcome, PIS middleware obtains a self-adaptive platform with an extended sensor battery life while ensuring good data quality and freshness.

Put together, selection-based filtering of publish-subscribe systems enables the system to limit dissemination to some scopes, contrary to system-wide scoping. Context-based filtering uses context data of different context dimensions to route IoT data at the application layer. In contrast, adaptive filtering enables the system to decide whether some IoT data are worth passing on intermediaries, depending on whether a sensor value is similar to previous values or evolves predictably. These issues are still not solved and are certainly a very fruitful area for future research.

5.6 PIS Sustainability

In the last decade, the number of existing PIS has grown, coming with new facilities for the end-users and rising computer power demand. However, sustainability in IT is from now on a first-class concern for enterprises. This demand has to be taken into account by PIS middleware designers.

As seen in Sect. 3.6, many strategies have been proposed so far by middleware targeting energy efficiency. However, those strategies mainly target one of the components of the system. Considering energy efficiency at the scale of the whole system is still a challenge.

Even though middleware eases the task of application developers when dealing with energy efficiency, a developer may face difficulties in evaluating the energy consumption of the system. An important research direction to foster energy efficiency in PIS is providing energy-awareness at the middleware level. Some techniques such as static code analysis (Vekris et al. 2012) and profilers to detect software energy and performance bugs (Nistor and Ravindranath 2014) have been proposed in the last years aiming at statically easing the identification of energy-consuming components. Energy awareness may also be provided at runtime through abstractions expressing energy requirements and evaluating energy consumption. These abstractions based on measures and energy consumption models have yet to be integrated in middleware. We believe that energy-awareness may significantly increase the efficiency of the systems as the awareness brings a broader view of where and how the many resources (CPU, network, energy, etc.) used by an application are behaving in terms of energy consumption.

6 Conclusion

In this chapter, we have considered PIS middleware in the context of the IoT. This middleware provides applications with an easy integration of context data collected from connected objects spread over the Internet. This context comes with new challenges and requirements. In addition to context-awareness, middleware should tackle scalability, privacy and interoperability and provide applications with new abstractions representing the physical environment and ensure the quality of the data that may be used for decision-making.

We have shown through the state of the art that middleware has proposed semantic interoperability for handling heterogeneities and large-scale publish-subscribe architectures to tackle scalability. However, while middleware has already enabled new kinds of PIS in various domains such as transport traceability, healthcare, and smart cities, the middleware community still faces new challenges, such as providing high-level programming model for PIS, supporting PIS dynamic adaptation, disseminating and filtering large volumes of data, end-to-end privacy and interoperability handling, as well as enabling the deployment of sustainable applications.