Future Research Directions for Dataspaces, Data Ecosystems, and Intelligent Systems

Research Directions

Dataspaces are a relatively new research area that brings together several other areas in computer science and other disciplines. We now discuss research areas which are essential to enabling the next-generation of dataspaces, data ecosystems, intelligent systems (see figure below). Research is needed to overcome many challenges, including decentralised support services, support for multimedia data, trusted data sharing, governance and economic models, incremental systems engineering, and human-centricity.

Research directions for dataspaces, data ecosystems, and intelligent systems
  • Large-scale Decentralised Support Services: As dataspaces are deployed at larger scales, it will be necessary to create enhanced support services, to scale entity management, and to minimise the cost of operation for these deployments.
    • Enhanced Supported Services: Many enhancements are possible for support services of dataspaces including the use of natural language interfaces to improve user experience, decentralised support services for large-scale deployments, and privacy-by-design approaches to support the increase in personal information captured in intelligent systems.
    • Scaling Entity Management: Within larger-scale deployments, it will be necessary to enhance the entity management services to support both the increase in entities, data, and users. Ranking and summarisation need to be query- and activity-relevant, with relevant facts, but at the same time diverse to cater for a wide range of information/conceptualisation need. A trade-off between processing time and expressiveness is necessary. Furthermore, there is significant potential for extensive usage of large-scale crowdsourcing for “human-in-the-loop” data management and curation.
    • Maintenance and Operation Cost: As the size of deployment increases, it will be necessary to investigate new techniques to improve the performance of dataspaces in terms of the maintenance and operational costs of the support platform within large-scale deployments (e.g. city-level).

  • Multimedia/Knowledge-Intensive Event Processing: As multimedia streams become more pervasive with the Internet of Multimedia Things (IoMT), it will be necessary for dataspaces to provide specific support services to process and manage these streams.
    • Support Services for Multimedia Data: As multimedia data becomes more common in data ecosystems through the IoMT, there will be a direct need for appropriate support services within dataspaces. There is an opportunity to leverage advances in deep learning for image processing (e.g. object detection) which can be the basis of dataspace support services for rich content types including text and multimedia streams.
    • Placement of Multimedia Data and Workloads: The increased computing resources needed to process, and extract multimedia data will pose challenges for existing techniques for processing and data placement. This will require dataspace support services to consider the simultaneous training and processing of multimedia streams, taking into consideration the geospatial and temporal characteristics of smart environments.
    • Adaptive Training of Classifiers: To effectively process multimedia data, dataspaces will need to be able to assemble the appropriate classifiers to extract features from multimedia content based on the needs of users of the dataspace at runtime. The training of classifiers needs to be adaptable to the changing requirements of data ecosystems. There is a need to support transfer learning among intelligent systems and for collective efforts to build pre-trained models for datasets and to bootstrap dataspace support services. Finally, distributed approaches to training classifiers are needed to maximise the available resources from the cloud to the edge of the network.
    • Complex Multimedia Event Processing: To detect patterns from multimedia streams within a dataspace, it will be necessary to investigate new techniques for complex multimedia event processing. Challenges include defining the language to express the complexity of the event patterns and the content of the event, optimisation techniques to improve system performance for event detection over computationally intensive multimedia streams, and methods to train models over incoming media streams for new unseen queries in lack of available training data.

  • Trusted Data Sharing: There is a need to enable the trusted sharing of data among organisations, people, and systems.
    • Trusted Platforms: A trusted data platform focuses on the secure data sharing among a group of participants (e.g. industrial consortiums sharing private or commercially sensitive data) within a clear legal framework. An ecosystem data platform would have to be infrastructure agnostic and must support continuous, coordinated data flows, seamlessly moving data among systems. Data exchange could be based on models for monetisation or reciprocity. Data platforms can create possibilities for smaller organisations and even individual developers to get access to large volumes of data, enabling them to explore their potential. Trusted platforms open many research areas for dataspaces, including data discovery, curation, linking, synchronisation, standardisation, and decentralisation.
    • Usage Control: The challenges with data sharing go beyond technical issues to issues of data ownership, privacy, business models, and licensing and authorised reuse by third parties. The control paradigm for shared data must shift from today’s access control to usage control, and dataspaces will need to support both of these usage control for both organisations and individuals.
    • Personal Dataspaces: Building on the need for trusted platforms, there will be a need for personal dataspaces for the management of the data of the individual. Personal dataspaces will need to respect the relevant legislation for personal data (e.g. General Data Protection Regulation) and allow an individual to remain in control of their personal data and its use. Personal dataspaces will need to balance the need for privacy with the benefits of analytics and handle this trade-off based on the preferences of the individual. Techniques for preserving privacy for metadata, query privacy, and privacy-preserving integration of independent data sources will all be needed in next-generation dataspaces.
    • Industrial Dataspaces: The sharing of data among commercial organisations will also increase. Industrial dataspaces will be needed to facilitate the trusted and secure sharing and trading of commercial data among collaborating organisations. These platforms will need to provide support services that enable a data marketplace that facilitates the automated licensing of data exchanged among organisations and the enforcement of legal rights and appropriation of remuneration to the original data owners.

  • Ecosystem Governance and Economic Models: For mass collaboration to take place within data ecosystems, we need to overcome the challenges of dealing with large-scale agreements among potentially decoupled interacting parties.
    • Decentralised Data Governance: Research is needed on decentralised data governance models for data ecosystems that support collaboration and fully consider ethical, legal, and privacy concerns. Data governance within a data ecosystem must recognise data ownership, sovereignty, and regulation while supporting economic models for the sustainability of the data ecosystem. A range of decentralised governance approaches may guide a data ecosystem from authoritarian to democratic, including majority voting, reputation models (e.g. eBay), proxy-voting, and dynamic governance (e.g. sociocracy: circles and double linking). Dataspaces will need to enforce these data governance models automatically.
    • Economic Models: Economic model may be used as an incentivisation factor within governance models including support for “data-vote exchange” models (pay for votes with data), and economic models for peer-to-peer systems. The sharing and exchange of data within dataspaces could also be based on models for monetisation or reciprocity. Data platforms can create possibilities for smaller organisations and even individual developers to get access to large volumes of data, enabling them to explore their potential.

  • Incremental Intelligent Systems Engineering (Cognitive Adaptability): The design of adaptive intelligent systems will need to consider the implication of operating within a smart environment and its associated data ecosystem.
    • Pay-as-you-go Systems: The boundaries of systems will be fluid and will change and evolve at runtime to adapt to the context of the current situation. However, we must consider the cost of system participation, and support “pay-as-you-go” approaches at both the system and data-levels. How can the pay-as-you-go approach of dataspaces be extended to the design of incremental and evolving systems? How can we integrate systems on an “as-needed” basis with the labour-intensive aspects of system integration postponed until they are required?
    • Cognitive Adaptability: Work on evolving systems engineering will need to consider the inclusion of data-driven probabilistic techniques that can provide “Cognitive Adaptability” that helps intelligent systems adapt to changes in the environment that were unknown at design-time by enriching the control-loop with observational data from the environment. Intelligent system designers will need to consider the varying levels of accuracy offered by data-driven approaches, providing best-effort or approximate results using the data accessible at the time. How can we mix deterministic and statistical approaches in the design of intelligent systems? How can we test and verify these systems? There is a need to support transfer learning among intelligent systems and for joint efforts to build pre-trained models for system adaptability. Dataspaces can play a role in supporting these collective efforts.

  • Towards Human-centric Systems: Currently, intelligent systems make critical decisions in highly-engineered systems (e.g. autopilots) where users receive specialised training to interact with them (e.g. pilots). As we move forward, intelligent systems will be making both critical and lifestyle decisions: from the course of treatment for a critical illness, safely driving a car, to choosing what takeout to order and the temperature of our shower. This will pose specific challenges in the design of human-centric systems:
    • Explainable Artificial Intelligence and Data Provenance: Data-driven decision approaches (including Cognitive and Artificial Intelligence (AI)-based techniques)will need to provide explanations and evidence to support their decisions and guarantees for the decisions they recommend. How can we trust the large-scale, data-driven decision-making provided by dataspace-powered AI platforms? This will require a greater need for provenance support within dataspaces to support the audit trail necessary to justify a data-driven decision.
    • Human-in-the-loop: The role of users in intelligent systems will not be a passive one. Users are a critical part of socio-technical systems, and we need to consider more ways of including the “Human in the Loop” within future intelligent systems. Active participation of users can improve their engagement and sense of ownership of the system. Indeed, active involvement of the user could be a condition for them granting access and usage of their private data. Research is needed to give trust in algorithms and data, in the trusted co-evolution between humans and AI-based systems, and in the legal, ethical, and privacy issues associated with making data-driven critical decisions.

Further details available in: Chapter 18 “Curry E. (2020) Future Research Directions for Dataspaces, Data Ecosystems, and Intelligent Systems. In: Real-time Linked Dataspaces. Springer, Cham”