Data Support Services for Real-time Linked Dataspaces

Data Support Services

The objective of a Real-time Linked Dataspace is to support a real-time response from intelligent systems to situations of interest when a set of events take place within a smart environment. In addition to the obvious need for real-time data processing support services, there is also the need for the fundamental data support services one would expect in a dataspace support platform. This book discusses the enhanced data support services developed for the real-time linked dataspace to support data management for intelligent systems within smart environments. The goal of these services is to support a real-time dataspace system to get up and running with a low overhead for administrative setup costs (e.g. catalog, entity management, search and query, and data service discovery). Each of the support services has been specifically designed for and evaluated within Internet of Things-based smart environments.

The adoption of the Internet of Things (IoT)-enabled smart environments are empowering data-driven systems that are transforming our everyday world. To support the interconnection of intelligent systems in the data ecosystem that surrounds a smart environment, there is a need to enable the sharing of data among intelligent systems. A data platform can provide a clear framework to support the sharing of data among a group of intelligent systems within a smart environment. In this book, we advocate the use of the dataspace paradigm within the design of data platforms to enable data ecosystems for intelligent systems.

We have created the Real-time Linked Dataspace (RLD) as a data platform for intelligent systems within smart environments. The RLD combines the pay-as-you-go paradigm of dataspaces with linked data, knowledge graphs, and real-time stream and event processing capabilities to support large-scale distributed heterogeneous collection of streams, events and data sources. At the foundation of the pay-as-you-go approach to data integration, is the idea that the owners of the data sources are responsible for the incremental improvement in the integration and quality of data available in the dataspace. The needs of the user drive incremental improvements over time. This pragmatic approach allows the dataspace to grow and enhance gradually with data sources or streams joining or leaving at any time. In order to reduce the burden on data source owners and users of the RLD, a support platform with a number of data support services is provided.

The design of the support services needs to conform to the principles of RLDs:

  • A Real-time Linked Dataspace must deal with many different formats of streams and events.
  • A Real-time Linked Dataspace does not subsume the stream and event processing engines; they still provide individual access via their native interfaces.
  • Queries in the Real-time Linked Dataspace are provided on a best-effort and approximate basis.
  • The Real-time Linked Dataspace must provide pathways to improve the integration among the data sources, including streams and events, in a pay-as-you-go fashion.
Data services provided by the Real-time Linked Dataspace

The required data services provided by RLD which are depicted in the figure above are:

Example of managed entities in the Entity Management Service (Curry et al., 2019)
High-level architecture components for Treo (best-effort natural language) (Freitas et al., 2012)
Concept lattice of the example of Table 8.2 (Derguech et al., 2015)
Overview of the human task service for Real-time Linked Dataspace (Curry et al., 2019)

Each of these services has been designed to follow the RLD principles and to offer tiered service-levels following the 5 star pay-as-you-go model as detailed in the table below. The 5 star scheme details the level of integration of the data sources with the support services of a dataspace. At the 1-star level, a data source needs to be made available with the dataspace. Over time the level of integration with the support services can be improved in an incremental manner on an as-needed basis. The more investment is made to integrate with the support services; the better integration is achievable in the dataspace.

Pay-as-you-go Star Rating Data
Format
Catalog Access Control Search and Query   Entity   Human Tasks

Basic
Any format (e.g. PDF). Registry of datasets, and streams None Browsing None None
☆☆
Machine-Readable
Machine-readable structured data (e.g. Excel)   Documentation to understand the data/stream structure, format, and characteristics. Non-machine-readable metadata document (e.g. PDF) Coarse-grained (Dataset level) Keyword search Entities identifiers in documentation Schema mapping
☆☆☆
Basic Integration
Non-proprietary format (e.g. CSV, JSON, XML) Machine-readable metadata   Equivalence among schema concepts Fine-grained (Entity-level)   Secure query service Structure search Source level (siloed) Entity
mapping
☆☆☆☆
Advanced Integration
Open standards (RDF, JSON-LD) to identify things/entities using the first two principles of linked data Relations among schemas (dataspace level)   Data anonymisation Structured queries Canonical identifiers and entity mappings across sources Entity enrichment  
☆☆☆☆☆
Full Semantic Integration, Search, and Query
Follows all publishing principles of linked data Full semantic mappings Usage control Schema-agnostic question answering Knowledge graphs semantically link entities to related entities, data, and streams Data quality improvement

References

Curry, E. et al. (2019) ‘A Real-time Linked Dataspace for the Internet of Things: Enabling “Pay-As-You-Go” Data Management in Smart Environments’, Future Generation Computer Systems, 90, pp. 405–422. doi: 10.1016/j.future.2018.07.019.

Derguech, W. et al. (2015) ‘Using Formal Concept Analysis for Organizing and Discovering Sensor Capabilities’, The Computer Journal, 58(3), pp. 356–367. doi: 10.1093/comjnl/bxu088.

Freitas, A. et al. (2012) ‘Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends’, IEEE Internet Computing, 16(1), pp. 24–33. doi: doi:10.1109/MIC.2011.141.