1.2.3.11 S Data Management The Data Management Subsystem is responsible for providing life cycle management, federation, preservation and presentation of OOI data holdings and associated metadata via data streams, repositories and catalogs. see: https://confluence. oceanobservatories. org/display/syseng/Release+Construction+Plan+Overview https://confluence. oceanobservatories. org/display/CIDev/Product+Description+Release+2 Public source repos at https://github.com/ooici. More docs at https://confluence. oceanobservatories.
org/display/syseng/CIAD+DM+Data+Management
1.2.3.11.2.22.2 SV Dynamic Data Distribution
Services - DM R2 Elaboration
1) Provides publish/subscription services for routing the different types of data messages. 2) Provides request/response services that enable users and services to query for and retrieve data messages. 3) In combination with the CEI Processing Service (https://confluence. oceanobservatories.
org/display/syseng/CIAD+CEI+OV+Process+Execution+Management), this can drive a policy decision to execute a process ---- [WAS: Provides publication, subscription, and query services associated with variant and dynamic data resources. This is the DM Distribution service. Used in combination with the CEI Processing Service to drive the policy decision to execute a process.]
CO 1 Define content and format
needed to describe messages DEPENDENCIES: https://confluence.oceanobservatories. org/display/CIDev/UC.R2.13+Acquire+Data+From+Instrument, https: //confluence.oceanobservatories.org/display/CIDev/UC.R1. 06+Distribute+Data+Product 4 https://confluence. oceanobservatories. org/display/CIDev/DM+Data+Type+Representations GIT like version in the message? Consumed by internal OOI programs, won't be used outside. JBG: What would it mean for a message to have a version 2? I think messages are atomic and don't get versioned. (Is this a reference to annotations creating new resources?)
IT 0 R2E1 Prototype data streaming Based on the mechanisms of pubsub and the Exchange, prototype describing, registering and using data streams such that arbitrary consumers can (a) find and (b) subscribe to data streams. Consumers get a certain metadata context before actual real-time streaming would occur.
X 4 5 dstuebe Architecture Team + Team Leads https://confluence. oceanobservatories. org/display/CIDev/Prototype+data+streaming
IT R2E1 Prototype Science Data
Processing: Part 1
Prototype HDF as a means of transport for science data X 4 4 dstuebe Tlennan https://confluence.
oceanobservatories.
org/display/CIDev/R2+Scientific+Data+Model+Development+Support+and+Java+Prototyping
IT R2E1 Prototype Science Data
Processing: Part 2
Prototype YAML objects which are locators in an HDF Data System X 4 4 dstuebe Tlennan https://confluence.
oceanobservatories.
org/display/CIDev/Prototype+Science+Data+Representation
IT R2E2 Define the types of data
messages
The DM needs to deal with messages of all types: events, data, metadata, queries, etc. Find and describe the appropriate level of abstraction for these messages to support their detailed definition in later iterations. Review work done to date on message types and evaluate its suitability.
Outline of current and additional message types and definitions required to meet R2 deliverables
X 4 3 dstuebe Moved to E1 as
messages can be more accurately defined when we know more about the services.
https://confluence.oceanobservatories.
org/display/syseng/CIAD+DM+SV+Notifications+and+Events
IT R2E3 Define the content of the
data messages
Confirm the circumstances under which data messages: 1) contain data in a standardized (internal canonical) format, 2) are represented as raw data (e.g., output from an instrument/instrument service), 3) contain notification about data available via some external interface. Evaluate existing content formats against known requirements.
Draft of R2 message types, scenarios for use and content of each
X 2 dstuebe MBARI Moved to E1 as
messages can be more accurately defined when we know more about the services.
IT R2E3 Define the headers of the
data messages
Specify how the FIPA headers are read and processed (COI) and how priority can be handled in transformation (DM) services.
One page design draft for DM data headers
X 2 dstuebe Moved to E1 as
messages can be more accurately defined when we know more about the services. CO 2 Data distribution control Capabilities to manage the routing underlying the data distribution
services.
4 COI messaging. Need to understand the
core-messaging before we can address this. What do we mean by 'underlying data distribution services' MManning - Architecture presentations being arranged for design teams.
IT R2E2 Refactor R1 pubsub
service to define data streams.
The R1 pubsub service has CRUD operations for topics and queues, refactor the service to manage streams and their definition.
A new pubsub service with the correct object model and service interface for R2.
X 5 5 dstuebe Architecture
team
IT R2E2 Define data distribution
policies and services
Define default/initial policies and service functional interface for distribution
X
IT Implement tools for users
to input data distribution rules
This tool would allow users to defines routes and policies on those routes for distributing data throughout the network
How does the user visualize and adjust the routing and priority? Like running a railroad? Is this Subscription?
IT Implement routing in
message bus
Somehow the messaging backbone must be able to lookup user defined distribution routing and policy definitions and implement them in the messaging bus.
This is a COI function!
1.2.3.11.2.23 SV OOI Common Data and
Metadata Model - DM R2 Elaboration
Extends the canonical data and metadata model for the Integrated Observatory with respect to associations (provenance, lineage, versions) as well as semantical interpretation and enables the transformation to and from the canonical data format.
https://confluence. oceanobservatories.
org/display/CIDev/Versioning%2C+Provenance%2C+and+Related+Concepts lineage == provenance. Depends on 1.2.3.11.1.2. JG: Example: NetCDF is decomposed into components (CDM) and attributes (semantic elements). Data streams do not have version, but they do have provenance.
SV
CO 3 Information resource
associations model
The data model as resources and associations for information resources and descriptions of provenance, citation, lineage and versions.
5 Information about an resource, as well as how
resources are related. Is it part of this use case: https://confluence.oceanobservatories. org/display/CIDev/UC.R2.20+Annotate+Resources. MManning - Yes\
IT R2E2 Define what resources are
to be associated
Provenance, lineage (a subset of provenance), versions, and citations are all characterized using annotations on resources. These annotations are likely created using associations of particular types (essentially the relations describing the resources). It is also possible to link two resources using annotations. Here we need to define what resources can be annotated with what kind of associations.
List of current and R2 resources which will can be annotated, and what kind of associations apply to each.
X 5 2 mmanning The simplest case is to associate one file with
another. But can cached messages have provenance, versions? JBG: A resource of any type can be associated to other resources. (Not sure what is meant by 'cached messages' here.) Can we have a scenario where a version subtype is used?
CO 4 Information resource
versioning
Capabilities to define, identify and retrieve different versions of a data set.
4 https://confluence.
oceanobservatories.
org/display/CIDev/UC.R2.22+Version+Resource
Will need to have a consistent understanding of the concept of 'version'.
IT 4 1 R2E2 Define versioning policy Define what products, resources and other entities will be versioned, policies for each and service design to support this capability and prototype implemntation using CouchDB
X 3 2 dstuebe https://github.com/twitter/snowflake
IT R2E2 Investigate existing
version control tools
Investigate existing version control tools (Hg, Git, Alfesco). Determine if they, or their model, are applicable to managing the versions of information resources in this system. Compare the existing use of the Git model in this process.
Short analysis of gaps in exisiting model and how version control tools address those gaps.
X 4 3 dstuebe
CO Data transformation
service
Capability to register transformation process definitions and to execute them based on the data distribution services.
3 https://confluence.
oceanobservatories.
org/display/CIDev/UC.R2.21+Transform+Data+in+Workflow
IT 0 R2E2 Prototype process
execution with CEI
Prototype executing a pre-registered algorithm within an existing execution engine (e.g. Matlab script inside Matlab, or C program controlled by CC), connected to input queue and producing into output queue X 4 mmanning Architecture Team + Team Leads https://confluence. oceanobservatories. org/display/CIDev/Prototype+process+execution
IT R2E2 Create a process
definition
Define a framework for the creation of a process resource 1-2 page transformation service design
X 3 5 David Chris,
Maurice
IT R2E2 Prototype transformation
processing
Execute a transformation. prototype of how a
transformation is defined, executed and documented
X 4 4 dstuebe Chris, Adam Include handling of data transforms that may take a
relatively long time to process.
IT R2E1 Create Data
transformation service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X mmanning
IT R2E3 Implement basic Data
transformation service capabilities
Complete capabilities required to meet LCA deliverables. X
WBS TypeCO-ID IT-ID Iteration Task Description Outcome of task
(deliverables)
Risk (COs)
R2E1-DM R2E2-DM R2E3-DM Jira Task# Priority Work days Assignee Support Roles
Description Page URL
IT R2E2 Investigate using OGC or other standard process definition
Review the OGC and other standards for applicability and gaps. One page overview of external workflow engines
X 3 2 mmanning Chris,
DavidS
In particular, WPS, and deegree implementation
CO Data transformation
process repository
Register, retrieve and apply data format transformation processes. 3
IT R2E2 Define storage
mechanism for process definitions
Where and how will the process definitions be stored. How will processes be versioned, tracked and governed.
Define transform resource metadata and associations, etc.
X 3 3 mmanning Chris,
DavidS
IT R2E3 Create a framework and
API that allow users to easily annotate their transformations with metadata.
This should be some sort of library or API that users can instrument their own transformations so that when their transformation run, the process and provenance descriptions can be captured by the system into the repository. This could be a service that is called by the transformation process as it is running.
X 3 2 mmanning Chris,
DavidS
This should be some sort of library or API that users can instrument their own transformations so that when their transformation run, the process and provenance descriptions can be captured by the system into the repository. This could be a service that is called by the transformation entry process as it is running.
CO Semantics resource model The data model as resources and associations representing vocabularies, ontologies, inferencing related to resources
4
IT R2E3 Prototype bridging of
vocabularies
Create mappings to/from exemplar vocabularies from/to the OOI common data model and services to make those conversions
X
CO Vocabulary repository
service
Register and access domain specific vocabulaties; update and extend vocabularies. Naming and versioning of vocabularies. Unique identification of terms and vocabularies with their versions.
4
IT R2E2 Create Vocabulary
repository service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Implement basic
Vocabulary repository service capabilities
Complete capabilities required to meet LCA deliverables. X
CO Information resource
provenance
Capabilities to define, identify and retrieve data sets related by provenance to other data sets.
4
IT R2E2 Create provenance and
workflow schema
create a schema that contains the necessary attributes to track provenance and workflow. Reflect SSDS and other existing provenance resources.
Draft schema that identifies critical attributes
X 4 5
IT R2E2 Define data set retrieval Need clarification on what it means to retrieve a dataset. Consider the possibility of storing a function/workflow and a reference and retrieving the result of the workflow. (Check out of version control, FTP, OpenDAP, file download via http/s, etc.) Describe the highest-priority retrieval protocols to implement.
Short analysis on data set retrieval capabilities and challenges
X 5 3 dstuebe
IT R2E2 Define versioning
scheme/mechanism
What are versioning requirements for vocabularies? What schemes are supported by the storage mechanisms reviewed above? Propose a versioning scheme to satisfy ION needs.
One page proposal for versioning vocabularies, needs met and unanswered challenges
X 3 differetn types of vocabulary updates:
add/delete/update terms, add/delete/update relationships; add/update term metadata
IT R2E2 Survey existing tools that
provide SKOS and OWL services, characterize their interfaces, and define use cases enabled by them
Research the semantic tools that provide services beyond simple storage of vocabularies/ontologies.
Short analysis for tools reviewed and recommendations
X 4
IT R2E3 Implement or adopt a
Python SPARQL wrapper
X e.g. http://sparql-wrapper.sourceforge.net/
This presumes SPARQL is the search service that's selected above.
IT R2E2 Integrate vocabulary
services with model of associations/annotations
Provide the glue between the CI model for associations, and the services and information presented by the vocabulary repository. Includes recreating vocabularies for internal use from a vocabulary repository, using associations that are provided by the vocabulary repository, and publishing ION associations to a vocabulary.
X how shall we govern vocabularies - if they are built
bottom-up from associations? jg: this would be done by limiting the mechanisms for users creating those associations that can be turned into vocabularies. (All associations should be publishable as RDF, but not all associations should be turned into vocabularies.)
IT R2E3 Crowdsourcing strategy
for vocabulary development and maintenance
Describe how expertise within and beyond the OOI community can be used to develop vocabularies and mappings. Identify the OOI needs that will not be met without using this approach.
One page description of proposed approach, plus a scenario.
X 3 graybeal kstocks, Ilya also a governance issue - need use case
IT R2E3 Identify or create
vocabularies for several domains.
For the domains of features, parameters, functional sensor type, platform type, observing medium, data quality, and institutions: review existing alternatives and select from them or define new
List of reviewed and potential vocabularies and needs met.
X 5 what are the vocabularies we actually need at the
beginning? - from use cases. Platform type, institution name would be 2 particularly useful ones; functional sensor type and parameter are next.
IT R2E3 Create or select a
hierarchical vocabulary, to demonstrate navigation/search
Identify a topic that can use a hierarchical vocabulary to inform semantic searches, then find an existing vocabulary for that topic, or create one. Edit the vocabulary as needed to support effective search results.
Vocabulary, maintained in a controlled repository.
X Need to define how this hierarchy will be managed.
Will there be multiple hierarchies, e.g. custom to users?
IT R2E3 Determine discovery
keyword scheme
Consider existing (e.g. GCMD), pick one, map with others. X
IT R2E3 Mapping between
vocabulary terms and values
For the mappings, need to define storage model, querying, versioning One example mapping and guidelines for additional mappings
X what does 'between terms and values' mean? if it
means what I think it means, I don't think we want it as a task. jbg
CO 5 Ontology repository
service
Register and retrieve ontological representations, representing universal domain knowledge and connecting specific vocabularies. Define a standard ontology language and representation format.
3 Same as 'Vocabulary repository service'. (At least,
they can be implemented with the same tools.) Might or might not be similar enough to treat them identically.
No Task
CO 6 Semantics UI components Screens and plug-ins to the Web UI and application integration services related to use and manage ontologies, vocabularies and related semantics based functions. Definition of vocabularies, ontologies and mappings
1
IT R2E2 Engage UX team w.r.t.
semantic UI components
Discuss search and navigation capabilities; how semantic integration services and available semantic UI technologies may be leveraged.
Meetings with UX team to describe internal processing and interface to search service. Definition of key semantic-related UI components (screens, workflows, etc).
X 2 mmanning Susanne, Carolanne, kstocks, John 1.2.3.11.2.24 SV Persistent Archive Services - DM R2 Elaboration
Extensions of the persistant archive services provides cataloging, validation & curation to organize, persist and maintain data holdings with their associated metadata for an individual, group and/or community.
https://confluence. oceanobservatories.
org/display/CIDev/Persistent+Archive+Storage+Architecture Q: Is the data transformed to engineering units upstream? Or do we store native output of instruments? Or is it a mix of both? REFS: https: //confluence.oceanobservatories. org/display/syseng/CIAD+DM+OV+Preservation https://confluence.oceanobservatories.
org/display/syseng/CIAD+DM+SV+Persistence+Architecture SV
CO 7 Persistent archive policy Capabilities for the definition of storage constraints, replication policies, access policies and their application.
5 IRODS is a strongly leading candidate. Do we
leverage IRODS policy. But we want an abstraction layer in case we don't use IRODS.
CO 8 IRODS persistent archive Implementation of persistent archive using the iRODS technology 4
IT R2E2 Investigate other archives Include investigation of federation issues, update use cases and abstraction layer
Short analysis of potential archive technologies for consideration
X 4 dstuebe If you are already using EC2 for processes, why are
we not looking at using cloud storage for archival purposes? See also "Persistent archive replication service" and "long-term data archive service" components for related tasks. Where does the National Archives requirement play?
IT R2E2 Prototype persistent
archive services using CouchDB, MongoDB, Zookeeper - and compare performance
prototype an alternate data object model and investigate performance gains with the alternate DBs (also see "Get requirements on latency" task)
X 4 https://confluence.
oceanobservatories.
org/display/CIDev/Datastore+and+Association+Service+performance+testing Need this to address data store and registry performance issues
WBS TypeCO-ID IT-ID Iteration Task Description Outcome of task
(deliverables)
Risk (COs)
R2E1-DM R2E2-DM R2E3-DM Jira Task# Priority Work days Assignee Support Roles
Description Page URL
IT Implement IRODS persistent archive
Implementation of persistent archive using the iRODS technology dstuebe If it is the chosen archive.
CO 9 Persistent archive
replication service
Capabilities to replicate based on policy content of persistent archives to distributed locations or other persistent archives.
4 https://confluence.
oceanobservatories.
org/display/CIDev/UC.R2.27+Manage+Replicated+Archive
IT R2E2 Investigate how other
archives handle replication
As part of the investigation of archival technologies, investigate how they handle replication.
Short analysis of potential archive technologies and/or capabilities for consideration. Propose a design to support R2 requirements.
X 4 dstuebe
IT R2E2 Create service skeleton Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Implement basic service
capabilities
Complete capabilities required to meet LCA deliverables. X
CO 10 Data caching service Capabilities to manage and operate short term caches of information in the network, with configurable cache fill and replacement policies. Caches may be geographically proximate to locations of use, or hold data of recurring interest.
4 What's the purpose of the caching? What are we
caching? If the latency between fetch, transform and delivery from base storage is low do we need caching? What is acceptable latency? MManning -these questions can be part of scoping of this task.
IT R2E2 Design the caching
mechanism
Detailed design for local data cache X
IT R2E3 Define caching policies Draft initial caching policies; types of data to be cached, expiry policy, update policy
X
IT R2E2 Create Data caching
service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Implement basic Data
caching service capabilities
Complete capabilities required to meet LCA deliverables. X
CO 11 Long-term data archive
service
Manage information in dark or offline archives. Maintain information integrity. Provide estimates for data retrieval time. Move data to and from archive to online repository. Notify data requester of completed retrieval.
3 This is within IRODS capabilities. How is this
different than 68?
IT R2E2 Investigate National
Archive
Understand interface and policies related to moving data to this archive. Draft near-term archive plan.
X
IT R2E2 Create Long-term data
archive service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Implement basic
Long-term data archive service capabilities
Complete capabilities required to meet LCA deliverables. X
CO 12 Persistent archive
federation
Capability to federate multiple persistent archives that are potentially under different domain of authority, to realize a composite higher level persistent archive
2
IT R2E2 Determine scope of use
for Cassandra
Determine which data types will be managed by Cassandra and which are better suited to different technologies.
X
IT R2E2 Determine persistence
technologies to employ
Map which entities are stored in each persistence store and integration requirement that are created.
X
CO 13 Long-term data archive
access
Capabilities to access remote data archives identified by OOI. 3
IT R2E3 Define search for data
archives
Determine whether the search can and should be the same as the normal OOI search mechanisms. Possibly integration of OOI metadata with OAI
High level plan for R2 search
X 4 3 mmanning Semantic issues as we map these archives to
terms/metadata used by archivists. This will largely depend on the archive chosen/implemented and what kind of native catalog they have. I suspect whatever is chosen will have to have a "wrapper" interface to handle more complex queries to handle other query fields defined in the OOI common data model.
CO 14 Data validation service Capabilities to ensure long term data integrity through analysis, replication etc.
3 This is about bit-level integrity, not content
validation relative to the real world.
IT R2E2 Define data validation
service
X
IT R2E2 Create Data validation
service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Implement basic Data
validation service capabilities
Complete capabilities required to meet LCA deliverables. X
CO 15 Data curation services Capabilities to enable the curation of stored data by trained professionals or automatic processes.
3
IT R2E2 Define data curation
service
Define functional interface, dependencies and processing workflow for curation
X
IT R2E2 Create Data curation
services skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Implement basic Data
curation services capabilities
Complete capabilities required to meet LCA deliverables. X
CO 16 Preservation management
UI components
Screens and plug-ins to the Web UI and application integration services related to the management of both logical and physical repositories and their resources, as well as for data curation and validation
1 This will be dependent on the archiving technology
of choice for physical repositories, but logical repositories ideally would be independent of archival technology.
No Task
CO 17 Persistent archive index
service
Capabilities to define and query indexes for data holdings, backed by storage technologies. Must be extensible to add new indexes
4 No Task
IT R2E2 DefinePersistent archive
index service
Define functional interface, dependencies and processing workflow for curation
X
IT R2E2 Create Persistent archive
index service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Implement basic
Persistent archive index service capabilities
Complete capabilities required to meet LCA deliverables. X
1.2.3.11.2.25 SV Search and Navigation
Services - DM R2 Elaboration
Provides query and browsing services by context based on the content, metadata and semantics of the data holdings.
Use cases: https://confluence.oceanobservatories. org/display/CIDev/UC.R2. 25+Advanced+Resource+Search, https: //confluence.oceanobservatories. org/display/CIDev/UC.R2. 24+Search+for+Resource, https://confluence. oceanobservatories.org/display/CIDev/UC.R2. 26+Navigate+Resources+and+Metadata SV
CO 18 Catalog service Capabilities to present a structured, defined view (catalog) of resources matching a filter expression or of a certain type. Resources in a catalog are structured according to a choosable structure and can be navigated along the structure. There can be many catalogs for the same sets of resources.
4
IT 0 R2E2 Prototype Information
Discovery
Prototype both navigation of resources and search via catalogs and index sets from the object and associations store.
X
IT R2E2 Investigate existing
cataloging technologies and standards
Google appliance, Alfresco, etc.? Augmentation of existing or roll our own? On the standards side - look at OGC CSW, different profiles
X 3 5 Google appliance, Alfresco, etc.? Augmentation of
existing or roll our own?
IT R2E3 Define schema for catalog
metadata, and filter language
This must includes information to support the search services defined below (geospatial, content, etc.). Look at OGC FES
Draft schema for catalog metadata for representational data sets
X 4 4 dstuebe Karen This must includes information to support the
search services defined below (geospatial, content, etc.)
WBS TypeCO-ID IT-ID Iteration Task Description Outcome of task
(deliverables)
Risk (COs)
R2E1-DM R2E2-DM R2E3-DM Jira Task# Priority Work days Assignee Support Roles
Description Page URL
IT R2E1 Create Catalog service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X 3 mmanning This would be automated as much as possible. For
example, if somebody wants to register a NetCDF file in the catalog, a great deal of information for the catalog can be pre-populated from the file itself.
IT R2E2 Implement basic Catalog
service capabilities
Complete capabilities required to meet LCA deliverables. X 3
CO 19 Index service Definition and application of resource level data and metadata indexes and access strategies. May be backed by persistent archives. May be implemented as associations.
4
IT R2E2 Investigate curation
technologies
Research technologies useful to make data more usable in the long term (e.g., by validation and annotation), or make it easier to prepare for long-term storage and later use.
X What is currently out there for this kind of stuff (if
there is anything)? JBG: Curation is about the policies we apply to make sure data is properly maintained and preserved, but can also include data archival solutions
IT R2E2 Define catalog and index
services
Define the functional interface and process of creation, access and maintenance of catalogs
X
IT R2E2 Create Index service
skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Implement basic Index
service capabilities
Complete capabilities required to meet LCA deliverables. X
IT R2E2 Prototype index
generation
Implement tools to crawl catalog and generate indexes X 3
CO 20 Geospatial indexing Capabilities to index and query datasets by geospatial metadata 4
IT R2E3 Prototype geospatial
indexing
Implement tools to crawl catalog and generate geospatial indexes prototype of service X 4 5 dstuebe Not necessarily limited to simple vertical values;
have to consider different geodetic reference frames (http://en.wikipedia.org/wiki/Datum_(geodesy) #Reference_datums)
IT R2E3 Prototype UI to query data
sets via geospatial index
Implement query tools and APIs for searching for data sets by geospatial extents
wireframes/mockups of UI view
X 4 2 Dependent on catalog technology. JBG: Our UI
should not be dependent on catalog technology, but the other way around.
CO 21 Discovery service Find resources by navigating a data catalog and metadata associations.
5
IT R2E2 Define the Discovery
service
Define the functional interface, dependencies and workflow sequences for the discovery service
X
IT R2E2 Create Discovery service
skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Prototype query for
discovery data
Implement query tools and APIs for discovering data, metadata, resources, etc.
X dstuebe Dependent on catalog technology. JBG: Ideally the
queries remain the same, while the Resource Agent knows how to modify or adapt those queries to the chosen technologies.
IT R2E3 Implement basic
Discovery service capabilities
Complete capabilities required to meet LCA deliverables. X
CO 22 Content search service Query information resources based on information resource content. 4 https://confluence.
oceanobservatories.
org/display/CIDev/UC.R2.24+Search+for+Resource
IT R2E3 Clarify content search and
contextual query needs
Research use cases and define needs for both content search and contextual query. Identify enabling technologies and patterns.
X
IT R2E3 Prototype generation of
indices for content-based search
Create indexing for content based search (either within documents like a Word doc, or within data sets).
prototype of service X 5 TBD dstuebe If this is to implement something like Google or
Alfresco does with documents using Lucene/Solr, the content search capability will be largely dependent on creating indexes which is a task that was not defined.
IT R2E3 Create Content search
service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Implement basic Content
search service capabilities
Complete capabilities required to meet LCA deliverables. X
IT R2E3 Prototype content-based
query
Implement query tools and APIs for searching by content. Explore standard search interfaces (Opensearch?)
prototype of UI components
X 3 TBD dstuebe Dependent on catalog technology. JBG: Ideally the
queries remain the same, while the Resource Agent knows how to modify or adapt those queries to the chosen technologies. What about browse interfaces?
CO 23 Metadata query service Query resource registry based on metadata content 4 How is this different from catalog and discovery
search?
IT R2E2 Create Metadata query
service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E2 Implement basic Metadata
query service capabilities
Complete capabilities required to meet LCA deliverables. X
CO 24 Semantic query service Query resource registry based on the application of semantic technologies for instance to use defined vocabularies and semantic inference.
3
IT R2E2 Define services to which
semantics will be applied
This component focuses on the resource registry, but possibly semantics should be applied to all types of searches (discovery service, content search service, metadata query service, data archive service, persistent archive service, external data access framework)
X 5 4 TBD Karen,
Carlos, Ilya, dstuebe
IT R2I1 Define what we want to
accomplish with semantics
Clearly identify what success looks like for semantic query. What types of searches would a scientist be able to perform that are not possible with non-semantic approaches.
Scenarios and/or use cases.
5 graybeal,
Karen
these were defined earlier
CO 25 Semantic inference
service
Defines a common canonical interface for semantic inferencing based on available and envisioned inference engines. Select a specific inference engine suitable for OOI purposes. Utilize ontologies and relationships between vocabularies.
3 No Task
IT R2E2 Define semantic strategy
for DM and capabilities necessary for R2.
X
IT Create Semantic inference
service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
IT Implement basic Semantic
inference service capabilities
Complete capabilities required to meet LCA deliverables.
CO 26 Query result grouping
service
Capabilities for temporal and geographic grouping of query results 3 No Task
IT R2E2 Clarify result grouping
needs with UX team
Engagen the UX team and designers to clearly define needs for query results organization
X
IT R2E2 Create Query result
grouping service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Implement basic Query
result grouping service capabilities
Complete capabilities required to meet LCA deliverables. X
CO 27 Contextual query service Contextual query of information resource content or metadata 3
IT R2E2 Define contextual query
capabilities required for R2.
X
IT Create Contextual query
service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
1.2.3.11.2.21 SV External Data Access
Services - DM R2 Elaboration
Provides an extensible suite of access interfaces and data formats for interoperability with external communities and applications
https://confluence. oceanobservatories.
org/display/CIDev/UC.R2.29+Integrate+External+Data+Source Is this a service to transform external data (not a transform service that is external)?
SV
WBS TypeCO-ID IT-ID Iteration Task Description Outcome of task
(deliverables)
Risk (COs)
R2E1-DM R2E2-DM R2E3-DM Jira Task# Priority Work days Assignee Support Roles
Description Page URL
CO 29 External data transformation services
Services to support the definition and execution of external observatory and community specific data format ingestion, transformation and presentation processes
4 reword? i think this means "to support the definition
and execution of processes to ingest, transform, and present external observatory data and community-specific data"
IT R2E2 Define transformation
service detail
Define details on implementation of transformation definition, policies and workflow
Sequence diagrams which outline workflow and dependencies
X
IT R2E2 Create External data
transformation service skeleton
Create the service with initial methods and interfaces (message definitions). Note responsibilities of each method.
X
IT R2E3 Implement services that
allow users to register their external processes
How do these definitions get reflect as resources out to others so they can include them in their workflows?
X 3 May be an R3 capability. Stretch goal.
CO 30 Ingestion into common
data format
Transformation processes to the ION canonical data format 4 https://confluence.
oceanobservatories.
org/display/CIDev/UC.R2.23+Ingest+Dataset+Supplement
No Task
CO 31 Presentation from
common data format
Transformation processes from the ION canonical data format 4 Wouldn't these be user defined and they would
simply have to register their transformations as above? MManning- I believe this task is about transforming data and metadata between internal and external format. The user should not need to know then internal format and can select applicate formats for export.
No Task
IT R2E2 Define output formats Determine initial data output formats desired by Release 2 target users
X
CO 32 External data access
framework
Extensible framework to support integration of interfaces to external data both incoming and outgoing
4
IT R2E3 Architecture for external
data sources
Develop service OV and SV diagrams for registering and processing external data sources.
Draft schema of external service data & metadata along with ION services and resources
X 4 5 dstuebe cmueller,
mmanning
Probably need to look at some existing standards in the regard
IT R2E3 Implement services for
users to register their external data sources
X 3
IT R2E3 Implement services to
crawl these data sources definitions and catalog and build search indexes for those data sources.
X 3 How does that infrastructure get notified when these
sources change or will the CI be responsible for figuring that out? MManning - CI will periodically check for updates to source but endpoit changes will require an update to the source definition.
CO 33 DAP integration Integration of DAP servers using the external data access framework 3 No Task
CO 34 THREDDS integration Integration of THREDDS catalogs using the external data access framework
3 No Task
CO 35 External data access UI
components
Screens and plug-ins to the Web UI and application integration services related to external data access services
1 No Task
IT Implement basic
Contextual query service capabilities
Complete capabilities required to meet LCA deliverables.
CO 28 Search and navigation UI
components
Screens and plug-ins to the Web UI and application integration services related to search and navigation services.
1 No Task
Work Days by Iteration 13 Note: the numbers on
the left are automatically computed for each developer, but the sum up independent of the marked iteration (r2i2, r2e1). Hidden rows do count. For these numbers to work, make sure that non R2I2 tasks have different assignee (such as *dstuebe to not match R2I2 DM tasks Including boilerplate tasks dstuebe 59 59 Ilya 0 0 mmanning 14 14 graybeal 3 3 kstocks 0 0 mattr 0 0 mkanand 0 0
WBS TypeCO-ID IT-ID Iteration Task Description Outcome of task
(deliverables)
Risk (COs)
R2E1-DM R2E2-DM R2E3-DM Jira Task# Priority Work days Assignee Support Roles
Description Page URL