Stream and Event Processing Services
The goal of Real-time Linked Dataspaces is to support a real-time response from intelligent systems to situations of interests within a smart environment by providing data processing support services that follow the data management philosophy of dataspaces and meet the requirements of real-time data processing. This part of the book details support services to process streaming and event data which support loose semantic integration and administrative proximity, support services include entity-centric queries, quality of service composition, stream dissemination, and semantic matching for heterogeneous events. The goal of these services is to support a real-time linked dataspace to get up and running with a low overhead for administrative setup costs (e.g. establishing data agreements, service selection, and service composition).
Driven by the adoption of the Internet of Things (IoT), smart environments are enabling data-driven intelligent systems that are transforming our everyday world. To support the interconnection of intelligent systems in the data ecosystem that surrounds a smart environment, there is a need to enable the sharing of data among systems. A data platform can provide a clear framework to support the sharing of data among a group of intelligent systems within a smart environment.
Dynamic data from sensors and IoT devices comprise a significant portion of data generated in a smart environment. Responding to this trend requires a data platform to provide specific support services designed to work with real-time data sources. These services must keep with the dataspace philosophy; thus, they must coexist, and co-evolve over time, and ensure a rigid data management approach does not subsume the source systems. Within the dataspace paradigm, data management pushes the boundaries of traditional databases in two main dimensions:
- Administrative Proximity: which describes how data sources within a space of interest are close or far in terms of control, and
- Semantic Integration: which refers to the degree of how much the data schemas within the data management system are matched up.
We have created the Real-time Linked Dataspace (RLD) as a data platform for intelligent systems within smart environments. The RLD combines the pay-as-you-go paradigm of dataspaces with linked data and real-time stream and event processing capabilities to support a large-scale distributed heterogeneous collection of streams, events and data sources. The RLD follows a set of principles to describe the specific requirements within a real-time setting:
- An RLD must deal with many different formats of streams and events.
- An RLD does not subsume the stream and event processing engines; they still provide individual access via their native interfaces.
- Queries in the RLD are provided on a best-effort and approximate basis.
The RLD must provide pathways to improve the integration between the data sources, including streams and events, in a pay-as-you-go fashion.
In order to enable these principles to support real-time data processing for events and streams, we explore new techniques to support approximate and best-effort stream and event processing services within an RLD-Support Platform (RLD-SP). The RLD-SP services support many formats of data, do not depend on prior-agreement for composition or dissemination, and provide a best-effort quality of service and approximate answers using a pay-as-you-go approach. As shown in figure above, the stream and event processing services provided by the RLD-SP are:
- Entity-centric Index: The entity-centric index enables unified queries across live streams, historical streams, and entity data to enable full entity-centric views of the current and past state of the smart environment.
Further details available in: Ch 10. Stream and Event Processing Services for Real-time Linked Dataspaces
- Complex Event Processing: The individual and compositions of event services within a smart environment can have different quality-of-service levels. An RLD-SP must support quality-of-service aware complex event service compositions to maximise the level of service available.
Further details available in: Ch 11. Quality of Service-Aware Complex Event Service Composition in Real-time Linked Dataspaces
- Stream Dissemination: A key challenge for an RLD-SP is to disseminate events and streams to relevant data consumers efficiently. The dataspace must facilitate machine-to-machine communications to build an efficient stream dissemination system for a smart environment.
Further details available in: Ch 12. Dissemination of Internet of Things Streams in a Real-time Linked Dataspace
- Approximation: RLD-SP needs to be able to support the processing of heterogeneous events. Semantic event matchers are one approach to handle data heterogeneity within real-time events when few or no prior-agreements exist.
Further details available in: Ch 13. Approximate Semantic Event Processing in Real-time Linked Dataspaces
Each of these support services is designed following a tiered model for service provision. This means that it can provide incremental support for participants in the RLD in a “pay-as-you-go” fashion.
|Complex Event Processing||Stream Dissemination||Semantic Approximation|
|Basic processing||Single stream||None||Semantic matching|
|Historical views of streams||Multi-service composition||Point-to-point||Thematic matching|
|Stream enrichment with context and entity data||Quality of service aware service composition||Wireless broadcast||Entity-centric matching|
|☆☆☆☆☆ Full Semantic Integration, Search, and Query||Entity-centric real-time query||Context-aware||Complex patterns||Context-aware|
Curry, E. et al. (2019) ‘A Real-time Linked Dataspace for the Internet of Things: Enabling “Pay-As-You-Go” Data Management in Smart Environments’, Future Generation Computer Systems, 90, pp. 405–422. doi: 10.1016/j.future.2018.07.019.
Gao, F. et al. (2017) ‘Automated discovery and integration of semantic urban data streams: The ACEIS middleware’, Future Generation Computer Systems, 76, pp. 561–581. doi: 10.1016/j.future.2017.03.002.
Hasan, S. and Curry, E. (2014) ‘Approximate Semantic Matching of Events for the Internet of Things’, ACM Transactions on Internet Technology, 14(1), pp. 1–23. doi: 10.1145/2633684.
Qin, Y., Sheng, Q. Z. and Curry, E. (2015) ‘Matching Over Linked Data Streams in the Internet of Things’, IEEE Internet Computing, 19(3), pp. 21–27. doi: 10.1109/MIC.2015.29.