Data Support Services
The objective of a Real-time Linked Dataspace is to support a real-time response from intelligent systems to situations of interest when a set of events take place within a smart environment. In addition to the obvious need for real-time data processing support services, there is also the need for the fundamental data support services one would expect in a dataspace support platform. This book discusses the enhanced data support services developed for the real-time linked dataspace to support data management for intelligent systems within smart environments. The goal of these services is to support a real-time dataspace system to get up and running with a low overhead for administrative setup costs (e.g. catalog, entity management, search and query, and data service discovery). Each of the support services has been specifically designed for and evaluated within Internet of Things-based smart environments.
The adoption of the Internet of Things (IoT)-enabled smart environments are empowering data-driven systems that are transforming our everyday world. To support the interconnection of intelligent systems in the data ecosystem that surrounds a smart environment, there is a need to enable the sharing of data among intelligent systems. A data platform can provide a clear framework to support the sharing of data among a group of intelligent systems within a smart environment. In this book, we advocate the use of the dataspace paradigm within the design of data platforms to enable data ecosystems for intelligent systems.
We have created the Real-time Linked Dataspace (RLD) as a data platform for intelligent systems within smart environments. The RLD combines the pay-as-you-go paradigm of dataspaces with linked data, knowledge graphs, and real-time stream and event processing capabilities to support large-scale distributed heterogeneous collection of streams, events and data sources. At the foundation of the pay-as-you-go approach to data integration, is the idea that the owners of the data sources are responsible for the incremental improvement in the integration and quality of data available in the dataspace. The needs of the user drive incremental improvements over time. This pragmatic approach allows the dataspace to grow and enhance gradually with data sources or streams joining or leaving at any time. In order to reduce the burden on data source owners and users of the RLD, a support platform with a number of data support services is provided.
The design of the support services needs to conform to the principles of RLDs:
- A Real-time Linked Dataspace must deal with many different formats of streams and events.
- A Real-time Linked Dataspace does not subsume the stream and event processing engines; they still provide individual access via their native interfaces.
- Queries in the Real-time Linked Dataspace are provided on a best-effort and approximate basis.
- The Real-time Linked Dataspace must provide pathways to improve the integration among the data sources, including streams and events, in a pay-as-you-go fashion.
The required data services provided by RLD which are depicted in the figure above are:
- Catalog: The catalog service plays a crucial role in providing information with a repository for participating data sources in the dataspace. Within the catalog, all datasets and entities are declared along with relevant metadata.
Further details available in: Ch 6. Catalog and Entity Management Service for Internet of Things-Based Smart Environments
- Entity Management: The Entity Management Service (EMS) manages information about the entities (e.g. real-world objects) in the dataspace. The EMS is an essential service for decision-making applications that rely on accurate entity information.
Further details available in: Ch 6. Catalog and Entity Management Service for Internet of Things-Based Smart Environments
- Search and Query: The Search and Query service helps developers, data scientists, and users to find relevant data sources within the dataspace.
Further details available in: Ch 7. Querying and Searching Heterogeneous Knowledge Graphs in Real-time Linked Dataspaces
- Data Service Discovery: Efficiently describing and organising data sources in dataspaces is essential. The Data Service Discovery Service organises, and indexes data sources based on their capabilities.
Further details available in: Ch 8. Enhancing the Discovery of Internet of Things-Based Data Services in Real-time Linked Dataspaces
- Human Tasks: The Human Task service is concerned with the collaborative aspect of data management within the dataspace by enabling small data management tasks (e.g. data quality and enrichment) to be distributed among willing users in the smart environment. The Human Task service can also engage participants in citizen actuation tasks within the smart environment. Further details available in: Ch 9. Human-in-the-Loop Tasks for Data Management, Citizen Sensing, and Actuation in Smart Environments
Each of these services has been designed to follow the RLD principles and to offer tiered service-levels following the 5 star pay-as-you-go model as detailed in the table below. The 5 star scheme details the level of integration of the data sources with the support services of a dataspace. At the 1-star level, a data source needs to be made available with the dataspace. Over time the level of integration with the support services can be improved in an incremental manner on an as-needed basis. The more investment is made to integrate with the support services; the better integration is achievable in the dataspace.
Pay-as-you-go Star Rating | Data Format | Catalog | Access Control | Search and Query | Entity | Human Tasks |
☆ Basic | Any format (e.g. PDF). | Registry of datasets, and streams | None | Browsing | None | None |
☆☆ Machine-Readable | Machine-readable structured data (e.g. Excel) Documentation to understand the data/stream structure, format, and characteristics. | Non-machine-readable metadata document (e.g. PDF) | Coarse-grained (Dataset level) | Keyword search | Entities identifiers in documentation | Schema mapping |
☆☆☆ Basic Integration | Non-proprietary format (e.g. CSV, JSON, XML) | Machine-readable metadata Equivalence among schema concepts | Fine-grained (Entity-level) Secure query service | Structure search | Source level (siloed) |
Entity mapping |
☆☆☆☆ Advanced Integration | Open standards (RDF, JSON-LD) to identify things/entities using the first two principles of linked data | Relations among schemas (dataspace level) | Data anonymisation | Structured queries | Canonical identifiers and entity mappings across sources | Entity enrichment |
☆☆☆☆☆ Full Semantic Integration, Search, and Query | Follows all publishing principles of linked data | Full semantic mappings | Usage control | Schema-agnostic question answering | Knowledge graphs semantically link entities to related entities, data, and streams | Data quality improvement |
References
Curry, E. et al. (2019) ‘A Real-time Linked Dataspace for the Internet of Things: Enabling “Pay-As-You-Go” Data Management in Smart Environments’, Future Generation Computer Systems, 90, pp. 405–422. doi: 10.1016/j.future.2018.07.019.
Derguech, W. et al. (2015) ‘Using Formal Concept Analysis for Organizing and Discovering Sensor Capabilities’, The Computer Journal, 58(3), pp. 356–367. doi: 10.1093/comjnl/bxu088.
Freitas, A. et al. (2012) ‘Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends’, IEEE Internet Computing, 16(1), pp. 24–33. doi: doi:10.1109/MIC.2011.141.