• No results found

Data Management Subsystem

N/A
N/A
Protected

Academic year: 2021

Share "Data Management Subsystem"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

C o m m o n Op er ati n g In fr astr u ctu re ESB

Sensing & Acquisition Subsystem

Data Management Subsystem

Knowledge, Planning & Execution Subsystems External Interfaces: OPeNDAP, THREDDS, LAS,…

Figure 1. DMS and Interfaces Data Management Subsystems report from PDR meeting

Arcot Rajasekar

Data Intensive Computing Environment Group San Diego Supercomputer Center. UCSD

With participation of Kevin Gomes, Michael Wan and Michael Meisinger The Data Management Subsystem (DMS) is the data and information handling component of the ORION CI. The work breakdown structure for the DMS is

concentrated in releases one through three of the OOI Cyberinfrastructure. In this write-up we cover release 1 in detail with brief outlines of work breakdown for releases 2 and 3.

In release 1, an initial end-to-end automated data preservation and distribution network will be established. The main task of the data management group is to create a virtual data network with distributed storage repositories and data archives, and, provide data distribution and access mechanisms for access to soft real-time sensor data as well as archived data collections. The ingestion of the soft real-time data into the DMS is governed by interactions with the Sensing and Acquisition Subsystem and their CI Channel system. Similarly, the access of data from the DMS for application and human consumption is governed through interactions with the Analysis and Synthesis

Subsystem. The internal activities in the DMS are coordinated by close interactions with the Common Operating Infrastructure Subsystem. Other interactions, as we move forward to later releases, will be with the Knowledge Management, Planning &

Prosecution, and Common Execution Infrastructure Subsystems. Another Release 1 goal for the DMS is to provide the Data Management services for the GSO CyberPoP endpoints by extending the virtual data networks to the CyberPoP deployment systems. The interaction of the DMS

with other subsystems is shown in Figure 1. The ORION project will begin assembling collections of digital holdings, both real-time sensor data and static collections and archives that comprise the intellectual content on which current and future research will be based. These digital holdings can

be massive in size, measured in petabytes and tens of millions of files, and must be maintained for decades. In addition, the digital holdings are distributed across multiple institutions, and are published in digestible collections for access by other researchers.

sekar 11/13/07 1:44 PM

sekar 11/13/07 1:44 PM

Comment: This comment originally comes from

Kevin Gomes: What is the GSO? Is this supposed to be CGSN? Not sure about this-Raja

Comment: This comment originally comes from

Kevin Gomes: Not sure I understand what this last sentence means. Is this just basically saying that the CGSN and RSN will have access to read from the DMS so that they can make decisions based on information from data processing or other observatory assets? Not sure about this-Raja

(2)

The result is an ever-growing demand for software cyber infrastructure that

simultaneously supports data sharing on a day-to-day basis, data publication for reference and alternate analysis, and data preservation for long-term access.The OOI data

management system needs capabilities not only to address sensor data in conjunction with static and archived data, it also needs to interact with multiple organizations, autonomous ownership and stewardship. This requires a federation architecture with transparent policy based mechanisms for storage and access across each autonomous organization, while at the same time balancing the need for single sign-on and a uniform access protocol. To meet these competing needs we will implement a federated data grid architecture with transparent and customizable policy-based data management. Table 1 provides a list of services provided by the DMS.

To meet the goal of providing a distributed data management and preservation infrastructure, we propose an architecture based on proven technologies that are currently used across a wide variety of projects: 1) Storage Resource

Broker(SRB)/integrated Rule Oriented Data Management System, both developed by the Data Intensive Computing Environment (DICE) Group at the San Diego Supercomputer Center (SDSC) at the University of California at San Diego, 2) the Shore Side Data Systems (SSDS) developed at the Monterey Bay Aquarium Research Institute (MBARI). The SRB/iRODS middleware provides all of the features that are needed to implement a production-level data Grid,

including facilities for collection building, sharing, management, querying, accessing, and preserving data in a federated, distributed framework with uniform access to diverse, heterogeneous storage resources across administrative domains. The middleware uses an integrated metadata catalog that holds system and application- or domain-dependent metadata about the resources and datasets, and methods and users. Together, the data management and metadata management system provide a scalable information discovery and data access system for publishing and analyzing scientific data and metadata. The DMS will also incorporate a component based on the Antelope Real Time System (ARTS) developed by BRTT, Colorado. Even though it is not open source, it is used in many projects for acquisition and dissemination of sensor stream data. The Sensing and

Table 1: List of Data Management Services

Online Data Repository

Data and Metadata organization Persistent Archive Service

Persistent naming, Preservation processes Asset Validation Service

Integrity and Authenticity Aggregation Service

Classification, Categorization, Grouping Attribution Service

Community attributes, Semantic ontology Metadata Search & Navigation Services

Query & Browse by context Dynamic Data Distribution Services

Publish, Subscribe and Query for dynamic data resources

Data Access Services

Suite of External Interfaces OPeNDAP, THREDDS, LAS,…

Management policy enforcement Community specific policies Access and processing controls

(3)

Acquisition System will integrate ARTS as a subsystem. Since ARTS also provides database capabilities for storing stream data, we may provide an interface to it through the iRODS system. Other sensor network systems such as Data Turbine will also be considered as needs arise and their usage is indicated in one of the observatories. Complimentary features offered by iRODS and SSDS systems are listed in Table 2.

The DMS for release 1 will be developed in stages. Since iRODS and SSDS are established open source software, we propose to integrate them at the software level through tight integration based on creation of system-specific drivers. The Enterprise Service Bus access point of SSDS will be used for interfacing with the COI. Later as we progress through other releases, the iRODS subsystem itself will be able to provide this interface directly, providing multiple ways of interfacing with the COI. Similarly, the sensor interface provided by SSDS will be leveraged to provide access to sensor streams. Since the products coming out of SSDS are files and relational metadata, these will be made accessible through the iRODS infrastructure. The method of this accessibility is still under design as one can envision (a) replicating the information directly into iRODS or (b) providing a registered access to the products which are still under the control of SSDS. The pros and cons of the two approaches will be discussed in the near future to find a path forward. The iRODS system will be used as a main access point for many applications. To this end, access services based on OPeNDAP, THREDDS and other systems will be developed in addition to the native C, Java and Web interfaces of iRODS. Also, the SRB system (forerunner of iRODS) has been integrated with the ARTS sensor repository and distribution system. We propose porting this capability onto iRODS and providing similar access to the ARTS products and sensor streams through iRODS. The interactions between the two sub-subsystems are captured in Figures 2 and 3.

Table 2: Features Supported in SSDS and iRODS Components of DMS iRODS

Resource: Archives, Unix File

Systems

API: Ingest/Access, Register,

Metadata

Form: Hierarchical (POSIX)

Access: C,Java,PHP

Protocol: Native Bin/XML

Metadata Extraction: Link

DataProcessing /Extraction Micro -Services, Rules

Replication: supported

Metadata Catalog: RDB

System: Owner,ACL,Chksum,Audit ,… User defined: KVU -Triplets

SSDS

Resource: RDB (File System for

Backup)

API: Ingest/Access, Register,

Metadata

Form: URI

Access: Java,REST , WS

Protocol: HTTP

Metadata Extraction: API/XML and

Services

Replication: not supported

Metadata Catalog: RDB

System Ownership,provenance ) User-defined: No

(4)

DP DataStream File HTT Ingest RDB Bk Up WAN Registratio<XML> n File RDB DATA META DATA API HTTP/REST JMS URI ESB

Figure2: SSDS Data Flow

DP File Regist er A P I Put Posix Web File System RDB A P I Get Web OPeN DAP THRE DDS LAS DataStreams Distributed/ Replicated E S B SSDS WS HDF

(5)

The DMS will provide distributed, replicated access to heterogeneous data from sensor streams in soft real-time and to static data in files and relational databases. Also, the metadata and semantic ontology support by the subsystem will enhance querying and discovery of such data products. The system will provide a facility to arrange the data products into collections that enable curation and policy-driven reservation. With features for access control, auditing, integrity and authenticity checking, the system will be well-suited for disseminating validated ocean data products through various interfaces to serious researchers, to the public, and to policy makers.

sekar 11/13/07 1:46 PM

Comment: This comment originally comes from

Kevin Gomes: There is no mention in the document about R2 or R3. It makes it looks like everything will be done in R1 even though the intro text mentions R2/R3. I think it would be helpful to actually layout the WBS by time and list the various development pieces and where they go. After reading the Sensing and Acquisition texts and the DMS texts, they seem very much like islands of development. There should be text describing the two WBS schedules and milestones and they should align and complement each other. That may not be part of this text, but somebody should put together that big picture that ties it all together. Also, I think there should be some LOE tied to the various tasks. For example, is the development of the OPeNDAP mechanism into iRODS/SRB a two week effort, or a two year effort? It’s hard to tell if this is all feasible or not because no times are listed with the development tasks. Also, the diagrams make sense because I know the technologies, but at a PDR level, there really is no text in here that describes how iRODS/SRB/SSDS will work together. Table 2 helps, but doesn’t really hit Figures 2 & 3. Not sure about this-Raja What should we do? How can we map from one to another. We didn’t discuss about Releaes 2 and 3 In our meeting because of time constraints. Can the items for these releases be given as a chart/table?

References

Related documents

 Write a brief reflective paper (2-3 double-spaced pages) on the interview in relation to your own experience and perspective and what we have been reading and

Location provides the container for all project data and is used as the primary work division through a location breakdown structure (LBS).. The LBS provides

The following work is so developed: after a brief introduction, methods to detect where and when flow mixings happen are discussed and proposed; then strategies are built in order

You can start using the repository manager to stage releases and supporting the work fl ow associated with a managed release, and you can use the procurement features of a tool like

Case-3: Hotel Pulkeshi International (Case Illustrating Development of Project Scope Statement and Work Breakdown Structure)- Page 193-196 Text Book ( Team-3), Class should

Future Work Extend the scope of MDM to cover other data entities Implement end-to-end data quality framework including data and process governance Move towards Enterprise

Step-3 Creating the Work Breakdown Structure Step-4 Integrating the WBS with the Organization Step-4 Integrating the WBS with the Organization Step-5 Coding the WBS for the

2. Once the new NVMe arrives, power off the system and switch off the power supply... M.2 Boot Drive Replacement.. 3. Remove the left system cover by pressing on the button