• No results found

Data Storage Model - R2E1-DMG

N/A
N/A
Protected

Academic year: 2021

Share "Data Storage Model - R2E1-DMG"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

1.2.3.11 S Data Management The Data Management Subsystem is responsible for providing life cycle management, federation, preservation and presentation of OOI data holdings and associated metadata via data streams, repositories and catalogs. see: https://confluence. oceanobservatories. org/display/syseng/Release+Construction+Plan+Overview https://confluence. oceanobservatories. org/display/CIDev/Product+Description+Release+2 Public source repos at https://github.com/ooici. More docs at https://confluence. oceanobservatories.

org/display/syseng/CIAD+DM+Data+Management

1.2.3.11.2.22.2 SV Dynamic Data Distribution

Services - DM R2 Elaboration

1) Provides publish/subscription services for routing the different types of data messages. 2) Provides request/response services that enable users and services to query for and retrieve data messages. 3) In combination with the CEI Processing Service (https://confluence. oceanobservatories.

org/display/syseng/CIAD+CEI+OV+Process+Execution+Management), this can drive a policy decision to execute a process ---- [WAS: Provides publication, subscription, and query services associated with variant and dynamic data resources. This is the DM Distribution service. Used in combination with the CEI Processing Service to drive the policy decision to execute a process.]

CO 1 Define content and format

needed to describe messages DEPENDENCIES: https://confluence.oceanobservatories. org/display/CIDev/UC.R2.13+Acquire+Data+From+Instrument, https: //confluence.oceanobservatories.org/display/CIDev/UC.R1. 06+Distribute+Data+Product 4 https://confluence. oceanobservatories. org/display/CIDev/DM+Data+Type+Representations GIT like version in the message? Consumed by internal OOI programs, won't be used outside. JBG: What would it mean for a message to have a version 2? I think messages are atomic and don't get versioned. (Is this a reference to annotations creating new resources?)

IT 0 R2E1 Prototype data streaming Based on the mechanisms of pubsub and the Exchange, prototype describing, registering and using data streams such that arbitrary consumers can (a) find and (b) subscribe to data streams. Consumers get a certain metadata context before actual real-time streaming would occur.

X 4 5 dstuebe Architecture Team + Team Leads https://confluence. oceanobservatories. org/display/CIDev/Prototype+data+streaming

IT R2E1 Prototype Science Data

Processing: Part 1

Prototype HDF as a means of transport for science data X 4 4 dstuebe Tlennan https://confluence.

oceanobservatories.

org/display/CIDev/R2+Scientific+Data+Model+Development+Support+and+Java+Prototyping

IT R2E1 Prototype Science Data

Processing: Part 2

Prototype YAML objects which are locators in an HDF Data System X 4 4 dstuebe Tlennan https://confluence.

oceanobservatories.

org/display/CIDev/Prototype+Science+Data+Representation

IT R2E2 Define the types of data

messages

The DM needs to deal with messages of all types: events, data, metadata, queries, etc. Find and describe the appropriate level of abstraction for these messages to support their detailed definition in later iterations. Review work done to date on message types and evaluate its suitability.

Outline of current and additional message types and definitions required to meet R2 deliverables

X 4 3 dstuebe Moved to E1 as

messages can be more accurately defined when we know more about the services.

https://confluence.oceanobservatories.

org/display/syseng/CIAD+DM+SV+Notifications+and+Events

IT R2E3 Define the content of the

data messages

Confirm the circumstances under which data messages: 1) contain data in a standardized (internal canonical) format, 2) are represented as raw data (e.g., output from an instrument/instrument service), 3) contain notification about data available via some external interface. Evaluate existing content formats against known requirements.

Draft of R2 message types, scenarios for use and content of each

X 2 dstuebe MBARI Moved to E1 as

messages can be more accurately defined when we know more about the services.

IT R2E3 Define the headers of the

data messages

Specify how the FIPA headers are read and processed (COI) and how priority can be handled in transformation (DM) services.

One page design draft for DM data headers

X 2 dstuebe Moved to E1 as

messages can be more accurately defined when we know more about the services. CO 2 Data distribution control Capabilities to manage the routing underlying the data distribution

services.

4 COI messaging. Need to understand the

core-messaging before we can address this. What do we mean by 'underlying data distribution services' MManning - Architecture presentations being arranged for design teams.

IT R2E2 Refactor R1 pubsub

service to define data streams.

The R1 pubsub service has CRUD operations for topics and queues, refactor the service to manage streams and their definition.

A new pubsub service with the correct object model and service interface for R2.

X 5 5 dstuebe Architecture

team

IT R2E2 Define data distribution

policies and services

Define default/initial policies and service functional interface for distribution

X

IT Implement tools for users

to input data distribution rules

This tool would allow users to defines routes and policies on those routes for distributing data throughout the network

How does the user visualize and adjust the routing and priority? Like running a railroad? Is this Subscription?

IT Implement routing in

message bus

Somehow the messaging backbone must be able to lookup user defined distribution routing and policy definitions and implement them in the messaging bus.

This is a COI function!

1.2.3.11.2.23 SV OOI Common Data and

Metadata Model - DM R2 Elaboration

Extends the canonical data and metadata model for the Integrated Observatory with respect to associations (provenance, lineage, versions) as well as semantical interpretation and enables the transformation to and from the canonical data format.

https://confluence. oceanobservatories.

org/display/CIDev/Versioning%2C+Provenance%2C+and+Related+Concepts lineage == provenance. Depends on 1.2.3.11.1.2. JG: Example: NetCDF is decomposed into components (CDM) and attributes (semantic elements). Data streams do not have version, but they do have provenance.

SV

CO 3 Information resource

associations model

The data model as resources and associations for information resources and descriptions of provenance, citation, lineage and versions.

5 Information about an resource, as well as how

resources are related. Is it part of this use case: https://confluence.oceanobservatories. org/display/CIDev/UC.R2.20+Annotate+Resources. MManning - Yes\

IT R2E2 Define what resources are

to be associated

Provenance, lineage (a subset of provenance), versions, and citations are all characterized using annotations on resources. These annotations are likely created using associations of particular types (essentially the relations describing the resources). It is also possible to link two resources using annotations. Here we need to define what resources can be annotated with what kind of associations.

List of current and R2 resources which will can be annotated, and what kind of associations apply to each.

X 5 2 mmanning The simplest case is to associate one file with

another. But can cached messages have provenance, versions? JBG: A resource of any type can be associated to other resources. (Not sure what is meant by 'cached messages' here.) Can we have a scenario where a version subtype is used?

CO 4 Information resource

versioning

Capabilities to define, identify and retrieve different versions of a data set.

4 https://confluence.

oceanobservatories.

org/display/CIDev/UC.R2.22+Version+Resource

Will need to have a consistent understanding of the concept of 'version'.

IT 4 1 R2E2 Define versioning policy Define what products, resources and other entities will be versioned, policies for each and service design to support this capability and prototype implemntation using CouchDB

X 3 2 dstuebe https://github.com/twitter/snowflake

IT R2E2 Investigate existing

version control tools

Investigate existing version control tools (Hg, Git, Alfesco). Determine if they, or their model, are applicable to managing the versions of information resources in this system. Compare the existing use of the Git model in this process.

Short analysis of gaps in exisiting model and how version control tools address those gaps.

X 4 3 dstuebe

CO Data transformation

service

Capability to register transformation process definitions and to execute them based on the data distribution services.

3 https://confluence.

oceanobservatories.

org/display/CIDev/UC.R2.21+Transform+Data+in+Workflow

IT 0 R2E2 Prototype process

execution with CEI

Prototype executing a pre-registered algorithm within an existing execution engine (e.g. Matlab script inside Matlab, or C program controlled by CC), connected to input queue and producing into output queue X 4 mmanning Architecture Team + Team Leads https://confluence. oceanobservatories. org/display/CIDev/Prototype+process+execution

IT R2E2 Create a process

definition

Define a framework for the creation of a process resource 1-2 page transformation service design

X 3 5 David Chris,

Maurice

IT R2E2 Prototype transformation

processing

Execute a transformation. prototype of how a

transformation is defined, executed and documented

X 4 4 dstuebe Chris, Adam Include handling of data transforms that may take a

relatively long time to process.

IT R2E1 Create Data

transformation service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X mmanning

IT R2E3 Implement basic Data

transformation service capabilities

Complete capabilities required to meet LCA deliverables. X

WBS TypeCO-ID IT-ID Iteration Task Description Outcome of task

(deliverables)

Risk (COs)

R2E1-DM R2E2-DM R2E3-DM Jira Task# Priority Work days Assignee Support Roles

Description Page URL

(2)

IT R2E2 Investigate using OGC or other standard process definition

Review the OGC and other standards for applicability and gaps. One page overview of external workflow engines

X 3 2 mmanning Chris,

DavidS

In particular, WPS, and deegree implementation

CO Data transformation

process repository

Register, retrieve and apply data format transformation processes. 3

IT R2E2 Define storage

mechanism for process definitions

Where and how will the process definitions be stored. How will processes be versioned, tracked and governed.

Define transform resource metadata and associations, etc.

X 3 3 mmanning Chris,

DavidS

IT R2E3 Create a framework and

API that allow users to easily annotate their transformations with metadata.

This should be some sort of library or API that users can instrument their own transformations so that when their transformation run, the process and provenance descriptions can be captured by the system into the repository. This could be a service that is called by the transformation process as it is running.

X 3 2 mmanning Chris,

DavidS

This should be some sort of library or API that users can instrument their own transformations so that when their transformation run, the process and provenance descriptions can be captured by the system into the repository. This could be a service that is called by the transformation entry process as it is running.

CO Semantics resource model The data model as resources and associations representing vocabularies, ontologies, inferencing related to resources

4

IT R2E3 Prototype bridging of

vocabularies

Create mappings to/from exemplar vocabularies from/to the OOI common data model and services to make those conversions

X

CO Vocabulary repository

service

Register and access domain specific vocabulaties; update and extend vocabularies. Naming and versioning of vocabularies. Unique identification of terms and vocabularies with their versions.

4

IT R2E2 Create Vocabulary

repository service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Implement basic

Vocabulary repository service capabilities

Complete capabilities required to meet LCA deliverables. X

CO Information resource

provenance

Capabilities to define, identify and retrieve data sets related by provenance to other data sets.

4

IT R2E2 Create provenance and

workflow schema

create a schema that contains the necessary attributes to track provenance and workflow. Reflect SSDS and other existing provenance resources.

Draft schema that identifies critical attributes

X 4 5

IT R2E2 Define data set retrieval Need clarification on what it means to retrieve a dataset. Consider the possibility of storing a function/workflow and a reference and retrieving the result of the workflow. (Check out of version control, FTP, OpenDAP, file download via http/s, etc.) Describe the highest-priority retrieval protocols to implement.

Short analysis on data set retrieval capabilities and challenges

X 5 3 dstuebe

IT R2E2 Define versioning

scheme/mechanism

What are versioning requirements for vocabularies? What schemes are supported by the storage mechanisms reviewed above? Propose a versioning scheme to satisfy ION needs.

One page proposal for versioning vocabularies, needs met and unanswered challenges

X 3 differetn types of vocabulary updates:

add/delete/update terms, add/delete/update relationships; add/update term metadata

IT R2E2 Survey existing tools that

provide SKOS and OWL services, characterize their interfaces, and define use cases enabled by them

Research the semantic tools that provide services beyond simple storage of vocabularies/ontologies.

Short analysis for tools reviewed and recommendations

X 4

IT R2E3 Implement or adopt a

Python SPARQL wrapper

X e.g. http://sparql-wrapper.sourceforge.net/

This presumes SPARQL is the search service that's selected above.

IT R2E2 Integrate vocabulary

services with model of associations/annotations

Provide the glue between the CI model for associations, and the services and information presented by the vocabulary repository. Includes recreating vocabularies for internal use from a vocabulary repository, using associations that are provided by the vocabulary repository, and publishing ION associations to a vocabulary.

X how shall we govern vocabularies - if they are built

bottom-up from associations? jg: this would be done by limiting the mechanisms for users creating those associations that can be turned into vocabularies. (All associations should be publishable as RDF, but not all associations should be turned into vocabularies.)

IT R2E3 Crowdsourcing strategy

for vocabulary development and maintenance

Describe how expertise within and beyond the OOI community can be used to develop vocabularies and mappings. Identify the OOI needs that will not be met without using this approach.

One page description of proposed approach, plus a scenario.

X 3 graybeal kstocks, Ilya also a governance issue - need use case

IT R2E3 Identify or create

vocabularies for several domains.

For the domains of features, parameters, functional sensor type, platform type, observing medium, data quality, and institutions: review existing alternatives and select from them or define new

List of reviewed and potential vocabularies and needs met.

X 5 what are the vocabularies we actually need at the

beginning? - from use cases. Platform type, institution name would be 2 particularly useful ones; functional sensor type and parameter are next.

IT R2E3 Create or select a

hierarchical vocabulary, to demonstrate navigation/search

Identify a topic that can use a hierarchical vocabulary to inform semantic searches, then find an existing vocabulary for that topic, or create one. Edit the vocabulary as needed to support effective search results.

Vocabulary, maintained in a controlled repository.

X Need to define how this hierarchy will be managed.

Will there be multiple hierarchies, e.g. custom to users?

IT R2E3 Determine discovery

keyword scheme

Consider existing (e.g. GCMD), pick one, map with others. X

IT R2E3 Mapping between

vocabulary terms and values

For the mappings, need to define storage model, querying, versioning One example mapping and guidelines for additional mappings

X what does 'between terms and values' mean? if it

means what I think it means, I don't think we want it as a task. jbg

CO 5 Ontology repository

service

Register and retrieve ontological representations, representing universal domain knowledge and connecting specific vocabularies. Define a standard ontology language and representation format.

3 Same as 'Vocabulary repository service'. (At least,

they can be implemented with the same tools.) Might or might not be similar enough to treat them identically.

No Task

CO 6 Semantics UI components Screens and plug-ins to the Web UI and application integration services related to use and manage ontologies, vocabularies and related semantics based functions. Definition of vocabularies, ontologies and mappings

1

IT R2E2 Engage UX team w.r.t.

semantic UI components

Discuss search and navigation capabilities; how semantic integration services and available semantic UI technologies may be leveraged.

Meetings with UX team to describe internal processing and interface to search service. Definition of key semantic-related UI components (screens, workflows, etc).

X 2 mmanning Susanne, Carolanne, kstocks, John 1.2.3.11.2.24 SV Persistent Archive Services - DM R2 Elaboration

Extensions of the persistant archive services provides cataloging, validation & curation to organize, persist and maintain data holdings with their associated metadata for an individual, group and/or community.

https://confluence. oceanobservatories.

org/display/CIDev/Persistent+Archive+Storage+Architecture Q: Is the data transformed to engineering units upstream? Or do we store native output of instruments? Or is it a mix of both? REFS: https: //confluence.oceanobservatories. org/display/syseng/CIAD+DM+OV+Preservation https://confluence.oceanobservatories.

org/display/syseng/CIAD+DM+SV+Persistence+Architecture SV

CO 7 Persistent archive policy Capabilities for the definition of storage constraints, replication policies, access policies and their application.

5 IRODS is a strongly leading candidate. Do we

leverage IRODS policy. But we want an abstraction layer in case we don't use IRODS.

CO 8 IRODS persistent archive Implementation of persistent archive using the iRODS technology 4

IT R2E2 Investigate other archives Include investigation of federation issues, update use cases and abstraction layer

Short analysis of potential archive technologies for consideration

X 4 dstuebe If you are already using EC2 for processes, why are

we not looking at using cloud storage for archival purposes? See also "Persistent archive replication service" and "long-term data archive service" components for related tasks. Where does the National Archives requirement play?

IT R2E2 Prototype persistent

archive services using CouchDB, MongoDB, Zookeeper - and compare performance

prototype an alternate data object model and investigate performance gains with the alternate DBs (also see "Get requirements on latency" task)

X 4 https://confluence.

oceanobservatories.

org/display/CIDev/Datastore+and+Association+Service+performance+testing Need this to address data store and registry performance issues

WBS TypeCO-ID IT-ID Iteration Task Description Outcome of task

(deliverables)

Risk (COs)

R2E1-DM R2E2-DM R2E3-DM Jira Task# Priority Work days Assignee Support Roles

Description Page URL

(3)

IT Implement IRODS persistent archive

Implementation of persistent archive using the iRODS technology dstuebe If it is the chosen archive.

CO 9 Persistent archive

replication service

Capabilities to replicate based on policy content of persistent archives to distributed locations or other persistent archives.

4 https://confluence.

oceanobservatories.

org/display/CIDev/UC.R2.27+Manage+Replicated+Archive

IT R2E2 Investigate how other

archives handle replication

As part of the investigation of archival technologies, investigate how they handle replication.

Short analysis of potential archive technologies and/or capabilities for consideration. Propose a design to support R2 requirements.

X 4 dstuebe

IT R2E2 Create service skeleton Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Implement basic service

capabilities

Complete capabilities required to meet LCA deliverables. X

CO 10 Data caching service Capabilities to manage and operate short term caches of information in the network, with configurable cache fill and replacement policies. Caches may be geographically proximate to locations of use, or hold data of recurring interest.

4 What's the purpose of the caching? What are we

caching? If the latency between fetch, transform and delivery from base storage is low do we need caching? What is acceptable latency? MManning -these questions can be part of scoping of this task.

IT R2E2 Design the caching

mechanism

Detailed design for local data cache X

IT R2E3 Define caching policies Draft initial caching policies; types of data to be cached, expiry policy, update policy

X

IT R2E2 Create Data caching

service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Implement basic Data

caching service capabilities

Complete capabilities required to meet LCA deliverables. X

CO 11 Long-term data archive

service

Manage information in dark or offline archives. Maintain information integrity. Provide estimates for data retrieval time. Move data to and from archive to online repository. Notify data requester of completed retrieval.

3 This is within IRODS capabilities. How is this

different than 68?

IT R2E2 Investigate National

Archive

Understand interface and policies related to moving data to this archive. Draft near-term archive plan.

X

IT R2E2 Create Long-term data

archive service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Implement basic

Long-term data archive service capabilities

Complete capabilities required to meet LCA deliverables. X

CO 12 Persistent archive

federation

Capability to federate multiple persistent archives that are potentially under different domain of authority, to realize a composite higher level persistent archive

2

IT R2E2 Determine scope of use

for Cassandra

Determine which data types will be managed by Cassandra and which are better suited to different technologies.

X

IT R2E2 Determine persistence

technologies to employ

Map which entities are stored in each persistence store and integration requirement that are created.

X

CO 13 Long-term data archive

access

Capabilities to access remote data archives identified by OOI. 3

IT R2E3 Define search for data

archives

Determine whether the search can and should be the same as the normal OOI search mechanisms. Possibly integration of OOI metadata with OAI

High level plan for R2 search

X 4 3 mmanning Semantic issues as we map these archives to

terms/metadata used by archivists. This will largely depend on the archive chosen/implemented and what kind of native catalog they have. I suspect whatever is chosen will have to have a "wrapper" interface to handle more complex queries to handle other query fields defined in the OOI common data model.

CO 14 Data validation service Capabilities to ensure long term data integrity through analysis, replication etc.

3 This is about bit-level integrity, not content

validation relative to the real world.

IT R2E2 Define data validation

service

X

IT R2E2 Create Data validation

service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Implement basic Data

validation service capabilities

Complete capabilities required to meet LCA deliverables. X

CO 15 Data curation services Capabilities to enable the curation of stored data by trained professionals or automatic processes.

3

IT R2E2 Define data curation

service

Define functional interface, dependencies and processing workflow for curation

X

IT R2E2 Create Data curation

services skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Implement basic Data

curation services capabilities

Complete capabilities required to meet LCA deliverables. X

CO 16 Preservation management

UI components

Screens and plug-ins to the Web UI and application integration services related to the management of both logical and physical repositories and their resources, as well as for data curation and validation

1 This will be dependent on the archiving technology

of choice for physical repositories, but logical repositories ideally would be independent of archival technology.

No Task

CO 17 Persistent archive index

service

Capabilities to define and query indexes for data holdings, backed by storage technologies. Must be extensible to add new indexes

4 No Task

IT R2E2 DefinePersistent archive

index service

Define functional interface, dependencies and processing workflow for curation

X

IT R2E2 Create Persistent archive

index service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Implement basic

Persistent archive index service capabilities

Complete capabilities required to meet LCA deliverables. X

1.2.3.11.2.25 SV Search and Navigation

Services - DM R2 Elaboration

Provides query and browsing services by context based on the content, metadata and semantics of the data holdings.

Use cases: https://confluence.oceanobservatories. org/display/CIDev/UC.R2. 25+Advanced+Resource+Search, https: //confluence.oceanobservatories. org/display/CIDev/UC.R2. 24+Search+for+Resource, https://confluence. oceanobservatories.org/display/CIDev/UC.R2. 26+Navigate+Resources+and+Metadata SV

CO 18 Catalog service Capabilities to present a structured, defined view (catalog) of resources matching a filter expression or of a certain type. Resources in a catalog are structured according to a choosable structure and can be navigated along the structure. There can be many catalogs for the same sets of resources.

4

IT 0 R2E2 Prototype Information

Discovery

Prototype both navigation of resources and search via catalogs and index sets from the object and associations store.

X

IT R2E2 Investigate existing

cataloging technologies and standards

Google appliance, Alfresco, etc.? Augmentation of existing or roll our own? On the standards side - look at OGC CSW, different profiles

X 3 5 Google appliance, Alfresco, etc.? Augmentation of

existing or roll our own?

IT R2E3 Define schema for catalog

metadata, and filter language

This must includes information to support the search services defined below (geospatial, content, etc.). Look at OGC FES

Draft schema for catalog metadata for representational data sets

X 4 4 dstuebe Karen This must includes information to support the

search services defined below (geospatial, content, etc.)

WBS TypeCO-ID IT-ID Iteration Task Description Outcome of task

(deliverables)

Risk (COs)

R2E1-DM R2E2-DM R2E3-DM Jira Task# Priority Work days Assignee Support Roles

Description Page URL

(4)

IT R2E1 Create Catalog service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X 3 mmanning This would be automated as much as possible. For

example, if somebody wants to register a NetCDF file in the catalog, a great deal of information for the catalog can be pre-populated from the file itself.

IT R2E2 Implement basic Catalog

service capabilities

Complete capabilities required to meet LCA deliverables. X 3

CO 19 Index service Definition and application of resource level data and metadata indexes and access strategies. May be backed by persistent archives. May be implemented as associations.

4

IT R2E2 Investigate curation

technologies

Research technologies useful to make data more usable in the long term (e.g., by validation and annotation), or make it easier to prepare for long-term storage and later use.

X What is currently out there for this kind of stuff (if

there is anything)? JBG: Curation is about the policies we apply to make sure data is properly maintained and preserved, but can also include data archival solutions

IT R2E2 Define catalog and index

services

Define the functional interface and process of creation, access and maintenance of catalogs

X

IT R2E2 Create Index service

skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Implement basic Index

service capabilities

Complete capabilities required to meet LCA deliverables. X

IT R2E2 Prototype index

generation

Implement tools to crawl catalog and generate indexes X 3

CO 20 Geospatial indexing Capabilities to index and query datasets by geospatial metadata 4

IT R2E3 Prototype geospatial

indexing

Implement tools to crawl catalog and generate geospatial indexes prototype of service X 4 5 dstuebe Not necessarily limited to simple vertical values;

have to consider different geodetic reference frames (http://en.wikipedia.org/wiki/Datum_(geodesy) #Reference_datums)

IT R2E3 Prototype UI to query data

sets via geospatial index

Implement query tools and APIs for searching for data sets by geospatial extents

wireframes/mockups of UI view

X 4 2 Dependent on catalog technology. JBG: Our UI

should not be dependent on catalog technology, but the other way around.

CO 21 Discovery service Find resources by navigating a data catalog and metadata associations.

5

IT R2E2 Define the Discovery

service

Define the functional interface, dependencies and workflow sequences for the discovery service

X

IT R2E2 Create Discovery service

skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Prototype query for

discovery data

Implement query tools and APIs for discovering data, metadata, resources, etc.

X dstuebe Dependent on catalog technology. JBG: Ideally the

queries remain the same, while the Resource Agent knows how to modify or adapt those queries to the chosen technologies.

IT R2E3 Implement basic

Discovery service capabilities

Complete capabilities required to meet LCA deliverables. X

CO 22 Content search service Query information resources based on information resource content. 4 https://confluence.

oceanobservatories.

org/display/CIDev/UC.R2.24+Search+for+Resource

IT R2E3 Clarify content search and

contextual query needs

Research use cases and define needs for both content search and contextual query. Identify enabling technologies and patterns.

X

IT R2E3 Prototype generation of

indices for content-based search

Create indexing for content based search (either within documents like a Word doc, or within data sets).

prototype of service X 5 TBD dstuebe If this is to implement something like Google or

Alfresco does with documents using Lucene/Solr, the content search capability will be largely dependent on creating indexes which is a task that was not defined.

IT R2E3 Create Content search

service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Implement basic Content

search service capabilities

Complete capabilities required to meet LCA deliverables. X

IT R2E3 Prototype content-based

query

Implement query tools and APIs for searching by content. Explore standard search interfaces (Opensearch?)

prototype of UI components

X 3 TBD dstuebe Dependent on catalog technology. JBG: Ideally the

queries remain the same, while the Resource Agent knows how to modify or adapt those queries to the chosen technologies. What about browse interfaces?

CO 23 Metadata query service Query resource registry based on metadata content 4 How is this different from catalog and discovery

search?

IT R2E2 Create Metadata query

service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E2 Implement basic Metadata

query service capabilities

Complete capabilities required to meet LCA deliverables. X

CO 24 Semantic query service Query resource registry based on the application of semantic technologies for instance to use defined vocabularies and semantic inference.

3

IT R2E2 Define services to which

semantics will be applied

This component focuses on the resource registry, but possibly semantics should be applied to all types of searches (discovery service, content search service, metadata query service, data archive service, persistent archive service, external data access framework)

X 5 4 TBD Karen,

Carlos, Ilya, dstuebe

IT R2I1 Define what we want to

accomplish with semantics

Clearly identify what success looks like for semantic query. What types of searches would a scientist be able to perform that are not possible with non-semantic approaches.

Scenarios and/or use cases.

5 graybeal,

Karen

these were defined earlier

CO 25 Semantic inference

service

Defines a common canonical interface for semantic inferencing based on available and envisioned inference engines. Select a specific inference engine suitable for OOI purposes. Utilize ontologies and relationships between vocabularies.

3 No Task

IT R2E2 Define semantic strategy

for DM and capabilities necessary for R2.

X

IT Create Semantic inference

service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

IT Implement basic Semantic

inference service capabilities

Complete capabilities required to meet LCA deliverables.

CO 26 Query result grouping

service

Capabilities for temporal and geographic grouping of query results 3 No Task

IT R2E2 Clarify result grouping

needs with UX team

Engagen the UX team and designers to clearly define needs for query results organization

X

IT R2E2 Create Query result

grouping service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Implement basic Query

result grouping service capabilities

Complete capabilities required to meet LCA deliverables. X

CO 27 Contextual query service Contextual query of information resource content or metadata 3

IT R2E2 Define contextual query

capabilities required for R2.

X

IT Create Contextual query

service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

1.2.3.11.2.21 SV External Data Access

Services - DM R2 Elaboration

Provides an extensible suite of access interfaces and data formats for interoperability with external communities and applications

https://confluence. oceanobservatories.

org/display/CIDev/UC.R2.29+Integrate+External+Data+Source Is this a service to transform external data (not a transform service that is external)?

SV

WBS TypeCO-ID IT-ID Iteration Task Description Outcome of task

(deliverables)

Risk (COs)

R2E1-DM R2E2-DM R2E3-DM Jira Task# Priority Work days Assignee Support Roles

Description Page URL

(5)

CO 29 External data transformation services

Services to support the definition and execution of external observatory and community specific data format ingestion, transformation and presentation processes

4 reword? i think this means "to support the definition

and execution of processes to ingest, transform, and present external observatory data and community-specific data"

IT R2E2 Define transformation

service detail

Define details on implementation of transformation definition, policies and workflow

Sequence diagrams which outline workflow and dependencies

X

IT R2E2 Create External data

transformation service skeleton

Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.

X

IT R2E3 Implement services that

allow users to register their external processes

How do these definitions get reflect as resources out to others so they can include them in their workflows?

X 3 May be an R3 capability. Stretch goal.

CO 30 Ingestion into common

data format

Transformation processes to the ION canonical data format 4 https://confluence.

oceanobservatories.

org/display/CIDev/UC.R2.23+Ingest+Dataset+Supplement

No Task

CO 31 Presentation from

common data format

Transformation processes from the ION canonical data format 4 Wouldn't these be user defined and they would

simply have to register their transformations as above? MManning- I believe this task is about transforming data and metadata between internal and external format. The user should not need to know then internal format and can select applicate formats for export.

No Task

IT R2E2 Define output formats Determine initial data output formats desired by Release 2 target users

X

CO 32 External data access

framework

Extensible framework to support integration of interfaces to external data both incoming and outgoing

4

IT R2E3 Architecture for external

data sources

Develop service OV and SV diagrams for registering and processing external data sources.

Draft schema of external service data & metadata along with ION services and resources

X 4 5 dstuebe cmueller,

mmanning

Probably need to look at some existing standards in the regard

IT R2E3 Implement services for

users to register their external data sources

X 3

IT R2E3 Implement services to

crawl these data sources definitions and catalog and build search indexes for those data sources.

X 3 How does that infrastructure get notified when these

sources change or will the CI be responsible for figuring that out? MManning - CI will periodically check for updates to source but endpoit changes will require an update to the source definition.

CO 33 DAP integration Integration of DAP servers using the external data access framework 3 No Task

CO 34 THREDDS integration Integration of THREDDS catalogs using the external data access framework

3 No Task

CO 35 External data access UI

components

Screens and plug-ins to the Web UI and application integration services related to external data access services

1 No Task

IT Implement basic

Contextual query service capabilities

Complete capabilities required to meet LCA deliverables.

CO 28 Search and navigation UI

components

Screens and plug-ins to the Web UI and application integration services related to search and navigation services.

1 No Task

Work Days by Iteration 13 Note: the numbers on

the left are automatically computed for each developer, but the sum up independent of the marked iteration (r2i2, r2e1). Hidden rows do count. For these numbers to work, make sure that non R2I2 tasks have different assignee (such as *dstuebe to not match R2I2 DM tasks Including boilerplate tasks dstuebe 59 59 Ilya 0 0 mmanning 14 14 graybeal 3 3 kstocks 0 0 mattr 0 0 mkanand 0 0

WBS TypeCO-ID IT-ID Iteration Task Description Outcome of task

(deliverables)

Risk (COs)

R2E1-DM R2E2-DM R2E3-DM Jira Task# Priority Work days Assignee Support Roles

Description Page URL

References

Related documents

recommendations intended primarily to inform CMS in the development of final rules for the Medicare Shared Savings Program but also to assist the Innovation Center as it

The GLOBE study proposed nine cultural dimensions: assertiveness, future orientation, gender egalitarianism, humane orientation, in-group collectivism, institutional

 Once  the  availability  of   the  right  is  triggered  in  response  to  an  armed  attack,  these  two  requirements   operate  to  restrain  the  scope  of  defensive

Frequently updating online pro- grams (Winkler-Prins et al., 2007), collecting student feedback (Cornelius & Glasgow, 2007; Li & Irby, 2008), and obtaining input by col-

• Donald MacKenzie, a RSO from Tayside who was on parole for sexual offences against young boys, was convicted of possessing child abuse imagery and.. sentenced to 5

PerfExplorer manages analysis complexity through the abstraction of data mining procedures, thereby reducing the expertise required of the user to develop these procedures or

attributes are added, defined as follows: common aisles (number of aisles traveled in both order 1 and order 2), added aisles (number of extra aisles to travel if order 2

This paper examines the employment effects of a Finnish payroll tax subsidy scheme, which is targeted at the employers of older, full-time, low-wage workers.. The system’s