• No results found

Data Tool/Service Discovery

In document GRDI2020 Final Roadmap Report (Page 58-62)

8. Infrastructural Challenges

8.1 Data and Data Service/Tool Findability

8.1.4 Data Tool/Service Discovery

Acceptance of the open science principle entails open access not only to research data but also to data services/tools/analyses/methods which enable researchers to conduct efficiently and effectively their research activities.

Enabling automated location of data services/tools that adequately fulfill a given research need is an important challenge for a global research data infrastructure.

Description of Data Services/Tools

Publishing a data service/tool requires a description of the data service/tool capability, i.e., what functionality the data service/tool provides.

 a first level which describes the static characteristics of the service also called abstract capabilities; the abstract capabilities of a service describe only what a published service can provide but no longer under which circumstances a concrete service can actually be provided [51].

 a second level which describes the dynamic characteristics of the service also called contracting capabilities; the contracting capabilities describe what input information is required for providing a concrete service and what conditions it must fulfill (i.e. service pre conditions), and what conditions the objects delivered fulfill depending on the input given (i.e. post conditions) [51]. The abstract capability might be automatically derived from the contracting capability and both must be consistent with each other.  a third level which describes the characteristics of the operational

environment where the service will be hosted, i.e., the operational conditions, capacity requirements, the service’s resource dependencies, and integrity and access constraints, also called deployment capabilities; the deployment capabilities describe the hosting operational environment.

Description of Researcher Needs [52]

Researchers may describe their desires in a very individual and specific way that makes immediate mapping with data service/tool descriptions very complicated. Therefore, each service discovery attempt requires a process where user expectations are mapped on more generic need descriptions.

A researcher is expected to specify her/his needs in terms of what he wants to achieve by using a concrete data service/tool. We assume that a researcher will in general care about what she/he wants to get from a concrete service, but not about how it is achieved. Her/his desire is formally described by the so-called goal. In particular, goals describe what kind of outputs and effects are expected by the researcher.

Data service/tool requesters (researchers) are not expected to have the required background to formally describe their goals. Thus, either goals can be expressed in a language they are familiar with (like natural language) or appropriate tools should be available which can support requesters to express their precise needs in a simpler manner. Hence, a possible approach could be the availability of pre-defined, generic, formal and reusable goals defining generic objectives requesters may have. They can be refined (or parameterized) by the requester to reflect his concrete needs, as requesters are not expected to write formalized goals from scratch. It is assumed that there will be a way for requesters to easily locate such pre-defined goals e.g. keyword matching.

Modeling Approaches to Goals and Data Services/Tools [52]

Keyword based Representation Models

By adopting this model, both requester and provider use keywords to describe their goals and services respectively.

Controlled vocabularies

Another approach assumes that requester and provider use (not necessarily the same) controlled vocabularies in order to describe goals and services respectively.

Ontologies

The border between controlled vocabularies and ontologies is thin and open for a smooth and incremental evolvement. Ontologies are consensual and formal conceptualizations of a domain. Controlled vocabularies organized in taxonomies resemble all necessary elements of an ontology. Ontologies may simply add some logical axioms for further restricting the semantics of the terminological elements. Notice that a service requester or provider gets these logical definitions "for free". She/He can select a couple of concepts for annotating her/his service/request, but she/he does not need to write logical expressions as long as she/he only reuses the ones already included in the ontology.

Full-fledged Logic

Simply reusing existing concept definitions as described in the previous section has the advantage of the simplicity in annotating services and in reasoning about them. However, this approach has limited flexibility, expressivity, and grain-size in describing services and request. Therefore, it is only suitable for scenarios where a more precise description of requests and services is not required. For these reasons, a full-fledged logic is required when a higher precision in the results of the discovery process is required.

The Data Service/Tool Location Process

Based on formal models for the description of data services/tools and goals, a conceptual model for the semantic-based location process of services can be defined [51].

This process is composed of five steps:

Goal Discovery: starting from a user desire (expressed using natural language or

any other means), goal discovery will locate the pre-defined goals, resulting on a selected pre-defined goal. Such a pre-defined goal is an abstraction of the requester desire into a generic and reusable goal.

Goal Refinement: The selected pre-defined goal is refined, based on the given

requester desire, in order to actually reflect such desire. This step will result on a formalized requester goal.

Service Discovery: Available services that can, according to their abstract

capabilities, potentially fulfill the requester goal are discovered.

Service Contracting: Based on the contracting capability, the abstract services

selected in the previous step will be checked for their ability to deliver a suitable concrete service that fulfills the requester’s goal. Such service(s) will be selected. This step might involve interaction between service requester and provider.

Service Workability: the final step has to verify whether the hosting computing

environment is suitable for efficiently running the selected service(s).

Mediation Support in the Data Service/Tool Discovery Process

Data service/tool discovery is based on matching abstracted goal and service descriptions. In order to lift discovery process on an ontological level two processes are required: a) the concrete user input has to be generalized to more abstract goal descriptions, and b) concrete services and their descriptions have to be abstracted to the classes of services a science ecosystem can provide [52]. In order to successfully carry out the data service/tool discovery process a mediation support must be offered by the data infrastructure; such mediation support should establish a mapping between the controlled vocabularies or ontologies used to describe goals and services. In fact, we assume that goals and data services/tools most likely use different controlled vocabularies or ontologies.

Depending on the modeling approach adopted for representing goals and data services/tools, different mediation support should be provided:

Keyword based Representation Models

The mediation process consists in matching a set of keywords extracted from a goal description against a set of keywords extracted from a service description.

Controlled vocabularies

The mediation process consists in mapping a set of concepts extracted from a goal description into a set of concepts contained in a data service/tool description and equivalent from the semantic point of view. Reasoning over hierarchical relationships may be required in case of taxonomies. Mediation support is needed in case the requester and provider use different controlled vocabularies.

Ontologies

The descriptions of abstract services and goals are based on ontologies that capture general knowledge about the problem domains under consideration.

The mediation process consists in mapping the set of concepts describing the goal into semantically equivalent concepts contained in the service description.

Full-fledged Logic

If a full-fledged logic is adopted for describing goals and services, a mediation support can only be provided if the terminology used in the logical expressions is grounded in ontologies. Therefore, the mediation support required is the same as for ontology-based discovery.

Data Service/Tool Registration

A research data infrastructure should maintain a registry where all the abstract static, dynamic and deployment data service/tool descriptions, made publicly available by the communities of research of the science ecosystem, as well as pre-defined, generic, formal and reusable goals are contained. These descriptions contain all the information necessary to enable an efficient and effective automated location of data services/tools that adequately fulfill a given research need. In addition, research data infrastructures should maintain the appropriate tools, i.e., controlled vocabularies, ontologies, etc. as well as mapping algorithms in order to be able to provide an efficient mediation support.

In document GRDI2020 Final Roadmap Report (Page 58-62)