Stream and Event Processing Services for Real-time Linked Dataspaces

Stream and Event Processing Services

The goal of Real-time Linked Dataspaces is to support a real-time response from intelligent systems to situations of interests within a smart environment by providing data processing support services that follow the data management philosophy of dataspaces and meet the requirements of real-time data processing. This part of the book details support services to process streaming and event data which support loose semantic integration and administrative proximity, support services include entity-centric queries, quality of service composition, stream dissemination, and semantic matching for heterogeneous events. The goal of these services is to support a real-time linked dataspace to get up and running with a low overhead for administrative setup costs (e.g. establishing data agreements, service selection, and service composition).

Driven by the adoption of the Internet of Things (IoT), smart environments are enabling data-driven intelligent systems that are transforming our everyday world. To support the interconnection of intelligent systems in the data ecosystem that surrounds a smart environment, there is a need to enable the sharing of data among systems. A data platform can provide a clear framework to support the sharing of data among a group of intelligent systems within a smart environment.

Dynamic data from sensors and IoT devices comprise a significant portion of data generated in a smart environment. Responding to this trend requires a data platform to provide specific support services designed to work with real-time data sources. These services must keep with the dataspace philosophy; thus, they must coexist, and co-evolve over time, and ensure a rigid data management approach does not subsume the source systems. Within the dataspace paradigm, data management pushes the boundaries of traditional databases in two main dimensions:

  1. Administrative Proximity: which describes how data sources within a space of interest are close or far in terms of control, and
  2. Semantic Integration: which refers to the degree of how much the data schemas within the data management system are matched up.

We have created the Real-time Linked Dataspace (RLD) as a data platform for intelligent systems within smart environments. The RLD combines the pay-as-you-go paradigm of dataspaces with linked data and real-time stream and event processing capabilities to support a large-scale distributed heterogeneous collection of streams, events and data sources. The RLD follows a set of principles to describe the specific requirements within a real-time setting:

  • An RLD must deal with many different formats of streams and events.
  • An RLD does not subsume the stream and event processing engines; they still provide individual access via their native interfaces.
  • Queries in the RLD are provided on a best-effort and approximate basis.

The RLD must provide pathways to improve the integration between the data sources, including streams and events, in a pay-as-you-go fashion.

Support for stream and event processing in the Real-time Linked Dataspace

In order to enable these principles to support real-time data processing for events and streams, we explore new techniques to support approximate and best-effort stream and event processing services within an RLD-Support Platform (RLD-SP). The RLD-SP services support many formats of data, do not depend on prior-agreement for composition or dissemination, and provide a best-effort quality of service and approximate answers using a pay-as-you-go approach. As shown in figure above, the stream and event processing services provided by the RLD-SP are:

The four layers of the entity-centric real-time query service (Curry et al., 2019)
Overview of an event service network (Gao et al., 2017)
Overview of linked data stream dissemination service (Qin, Sheng and Curry, 2015)
Models and elements in the approximate semantic event processing model (Hasan and Curry, 2014)

Each of these support services is designed following a tiered model for service provision. This means that it can provide incremental support for participants in the RLD in a “pay-as-you-go” fashion.

Pay-as-you-go
Star Rating
Entity-centric
Index
Complex Event ProcessingStream DisseminationSemantic Approximation
☆ BasicNoneNoneNoneNone
☆☆ Machine-
Readable
Basic processingSingle streamNoneSemantic matching
☆☆☆ Basic
Integration
Historical views of streamsMulti-service compositionPoint-to-pointThematic matching
☆☆☆☆ Advanced
Integration
Stream enrichment with context and entity dataQuality of service aware service compositionWireless broadcastEntity-centric matching
☆☆☆☆☆ Full Semantic Integration, Search, and QueryEntity-centric real-time queryContext-awareComplex patternsContext-aware

References

Curry, E. et al. (2019) ‘A Real-time Linked Dataspace for the Internet of Things: Enabling “Pay-As-You-Go” Data Management in Smart Environments’, Future Generation Computer Systems, 90, pp. 405–422. doi: 10.1016/j.future.2018.07.019.

Gao, F. et al. (2017) ‘Automated discovery and integration of semantic urban data streams: The ACEIS middleware’, Future Generation Computer Systems, 76, pp. 561–581. doi: 10.1016/j.future.2017.03.002.

Hasan, S. and Curry, E. (2014) ‘Approximate Semantic Matching of Events for the Internet of Things’, ACM Transactions on Internet Technology, 14(1), pp. 1–23. doi: 10.1145/2633684.

Qin, Y., Sheng, Q. Z. and Curry, E. (2015) ‘Matching Over Linked Data Streams in the Internet of Things’, IEEE Internet Computing, 19(3), pp. 21–27. doi: 10.1109/MIC.2015.29.