• No results found

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

N/A
N/A
Protected

Academic year: 2021

Share "INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

INTEGRATED RULE ORIENTED DATA

SYSTEM (IRODS)

Todd BenDor

Associate Professor

Dept. of City and Regional Planning

UNC-Chapel Hill

[email protected]

http://irods.org/

(2)

Important note:

Most of this is blatantly ripped off from

iRODS official documentation

iRODS executive summary/introduction

Reagan Moore, UNC SILS

(3)

Punchline

Model integration and flexibility needs

sophisticated tools for collecting, curating, and

updating databases.

What happens when our model integration dreams

come true?

Maintaining data provenance and facilitating

(4)

Overview

What is iRODS?

Who Uses iRODS?

(5)

What is iRODS?

i

ntegrated

R

ule

O

riented

D

ata Management

S

ystem.

iRODS is

open source

data grid

middleware

that

implements…

Data Virtualization

Automation of Data Operations

A Robust Metadata Catalog

Data Management Policy Enforcement and

(6)

iRODS?

Based on considerable experience from Storage

Resource Broker (SRB) developed by Data Intensive

Cyber Environment (DICE) group at UNC, UCSD, SD

Super Computer Center.

Found many groups used SRB to store large quantities

of data.

A lot of server-side post-processing of the data is

required (e.g. replicate files, convert to different

format, checksum etc).

(7)

Emergence of iRODS

SRB experience motivated requirements for a new

data management system

Contained all SRB functionality

Add work-flow to manage server-side post-processing

Model coupling potential grows

Configurable – only include the 'services' you need

Open-source – SRB license imposed sever restrictions

(8)

Claims to fame…

An infinitely configurable data janitor

iRODS is the kind of technology you need to host everyone’s

unstructured data.

A powerful data migration tool

A data preservation technology

A tool for providing fine-grained privacy and

security controls

Can be seen as a basis for a Digital

Repository/Archive

Digital Repository/Archive is a Policy Driven

(9)

…but wait, there’s more!

Extensible: iRODS has command-line clients, APIs for

numerous programming languages, and web clients

Data is accessed using familiar APIs

Supports new plug-ins for storage resources,

authentication mechanisms, microservices, and network

protection

iRODS lets system administrators roll out an extensible

data grid without changing their infrastructure.

(10)

How will models interface with large

amounts of heterogeneous data?

Databases

Network

Services

Authentication

Mechanisms

Storage

Resources

Coordinating

Resources

Microservices

Command Line Client

Web Client

Integrated Custom

Client

Familiar

(11)

iRODS is Middleware

Operating System

Filesystem

Applications/Users

iRODS Data Grid

(Heterogeneous) Storage Systems

iRODS Middleware Layer

-

abstracts out the low-level I/O

-

provides a uniform interface to

heterogeneous storage systems

mid

dle

ware

`midl,we(

ə

)r

noun

software that acts as a bridge between an operating

(12)

Data Virtualization across Grids

Administrators control how the grid is presented to users

Implement replication, load-distribution, and archiving policies that

are completely transparent to the user.

Independent grids can be federated with one another to allow

controlled access to remote grids or grids operated by

separate workgroups.

Boston

Chicago

RTP System 1

RTP System 2

London

(13)

Automation of Data Operations

With iRODS,

any agent

can initiate

any action

upon

any trigger.

This powerful capability allows administrators to

automate policies such as:

Validating checksums every time a new file is

placed in a folder.

Backing up a set of files every second Thursday.

Archiving data that hasn’t been accessed in over

1 month.

Logging each time a file is replicated or

destroyed.

Permitting a file to be accessed by multiple

independently defined user groups.

These operations can be

distributed

to the storage

(14)

Policy Enforcement and Compliance

Verifications

Metadata + automation

= infrastructure to

enforce mandated data

management policies

Records retention and

privacy protection

requirements

Audit trails generated

by iRODS can be used

to verify compliance

with policy.

(15)

Who Uses iRODS?

iPlant: 15,000 users on an iRODS data grid with 100 million files

http://www.iplantcollaborative.org/

IN2P3 (French Nat. Inst. Nuclear and Particle Physics): over 6 PB of

data managed by iRODS

Sanger Institute (Genome Research): 20+ PB of iRODS data

NASA Center for Climate Simulations: 300 million metadata

attributes

CineGRID (exchanging digital media): sites distributed across

Japan-US-Europe

15

(16)

What Can iRODS Do?

iRODS simplifies data grid management

Data on different storage

devices at different

locations can be centrally

managed.

In situ migration to new

hardware can be managed

by replicating the legacy

resource before

repurposing or

decommissioning it.

Backup and archiving are

transparent and highly

configurable using the

automation and metadata

capabilities in iRODS.

BnF

(17)

What Can iRODS Do?

iRODS simplifies data discovery, data validation, and

data processing.

User-defined and intrinsic

metadata make stored

data searchable.

Validation and analytical tools can be automated to

process incoming data.

The results and process steps can be stored in the

iCAT metadata catalog.

attribute: library attribute: total_reads attribute: type attribute: lane attribute: is_paired_read attribute: study_accession_number attribute: library_id attribute: sample_accession_number attribute: sample_public_name attribute: manual_qc attribute: tag attribute: sample_common_name attribute: md5 attribute: tag_index attribute: study_title attribute: study_id attribute: reference attribute: sample attribute: target attribute: sample_id attribute: id_run attribute: study attribute: alignment attribute: library attribute: total_reads attribute: type attribute: lane attribute: is_paired_read attribute: study_accession_number attribute: library_id attribute: sample_accession_number attribute: sample_public_name attribute: manual_qc attribute: tag attribute: sample_common_name attribute: md5 attribute: tag_index attribute: study_title attribute: study_id attribute: reference attribute: sample attribute: target attribute: sample_id attribute: id_run attribute: study attribute: alignment attribute: library attribute: total_reads attribute: type attribute: lane attribute: is_paired_read attribute: study_accession_number attribute: library_id attribute: sample_accession_number attribute: sample_public_name attribute: manual_qc attribute: tag attribute: sample_common_name attribute: md5 attribute: tag_index attribute: study_title attribute: study_id attribute: reference attribute: sample attribute: target attribute: sample_id attribute: id_run attribute: study attribute: alignment

ute: libra

ribute: total_

ribute: type

ribute: lane

ribute: is_paire

ribute: study_

ute: li

(18)

What Can iRODS Do?

Data at scale

Large number of users

Complex

management tasks

Critical policy

enforcement

Data Preservation –

Digital Archives

Data

Maintenance

Data

Curation –

Digital Libraries

Data

Virtualization

Automated

Data

Processing

Policy

Enforcement

Distributed Data

Management

Data Sharing and

Access

Data Protection

and Security

(19)

Where iRODS fits?

Storage

Digital Repository

Access

Client interacts with digital repository

to access data

IRODS

Spans

Digital

Repository

and

Storage

Domains

(20)

Rules

Policies are implemented in iRODS as rules.

Rule is a series of logically connected steps.

Each step realised as a micro-service.

IRODS rules fully featured:

Contain loops and branches.

Can have rules contained within rules.

IRODS rules read from a rule-file (called core.irb by

(21)
(22)

Questions/ideas?

Todd BenDor

Department of City and

Regional Planning

[email protected]

References

Related documents

Yet, notwithstanding some recent reforms, public policies—including the eligibility ages for the Canada Pension Plan (CPP) and Quebec Pension Plan (QPP)—favour low retire- ment

The objectives of this research were to identify the possible pollution sources influencing the water quality, to determine the influence of land use and cover within zones

The potentiometric E-tongue performance for olive oil classification according to the autochthonous Tunisian olive cultivar (i.e., cvs Chemlali or Sahli) and for each single-

The future PDM systems should support data transfer management based on uniform schema, along with workflow definition and enactment, auditing functions, contracts, new

to restrict abortion services has made the procedure difficult to obtain in various parts of the US, even various parts of considerably prochoice states. The research from

Integrated medical management (dba Total Population Health Management) will improve savings and ROI. ROI will improve with appropriate information technology infrastructure

Life Skills Lesson Content Teaching/Learning Cycle Stages Language Features Technology/Math Lesson Content Learning contexts and tasks -Nutrition (giving and receiving