• No results found

EcoTrends Cyber-infrastructure Development

N/A
N/A
Protected

Academic year: 2021

Share "EcoTrends Cyber-infrastructure Development"

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

Long Term Ecological Research Network Office

EcoTrends

Cyber-infrastructure

Development

Mark Servilla LTER Network Office LTER Information Managers

Annual Meeting – San Jose, California 2 – 5 August 2007

(2)

Building Blocks to Success

• EcoTrends NIS module

• PASTA NIS Module

Framework

• Metacat/EML metadata and

data management

• PostgreSQL RDBMS

• Java Servlet, JSP, and R

programming

• Community support for data

collection, documentation,

and accessibility

EcoTrends

PASTA

Metacat/EML

Community

PostgreSQL/

Java/Tomcat

(3)

LNO NIS

PASTA Architecture

Source A Source B Source C Metacat-Harvester EML Workflow Engine Parser-Loader Dataset Registry Cache Metadata Derived Data Web API HTML SOAP EML.xml

Data loading for synthetic processing based on events (e.g., new data, metadata change) Existing LTER

metadata infrastructure (Metacat and EML)

Source data cache available to all workflow engines Support for multiple scientific workflow engines (e.g., R script, Kepler,

Chimera, D2K) Metadata and derived data products; metadata as EML Standard interfaces to support various

web portals (e.g., Trends, GEOSS, GEON,

NEON, WATERS) and web service APIs Metadata describing

derived data, including data provenance and data versioning

– expand on community provenance research

Derived data management Site data/metadata

Existing infrastructure New infrastructure Pluggable work flows

(4)

EcoTrends Development 2007

Source A Source B Source C Metacat-Harvester EML Workflow Engine Parser-Loader Dataset Registry Cache Metadata Derived Data Web API HTML SOAP EML.xml

(5)

LNO NIS

Development Process

Use-case Project Plan Requirements Coding Testing Release Milestones ITERA TIVE SO LUTI ONS Editorial and technical committees, and LNO Technical committee, NISAC, and LNO Editorial and technical committees, and LNO

(6)

Major Milestones

• EML generation

• Derived data loading

• Website presentation/integration • Data discovery and presentation

– Browse (by site, by topic/sub-topic) – Search (simple keyword, advanced)

– Result (result set display, dataset display, plot display)

• Data exploration

– Graphing (single and multiple datasets) – Aggregation (temporal)

– Download (data and metadata)

• Site auditing/DAS

(7)

LNO NIS

EML Generation

Step 1: Core Metadata

– Define core metadata (e.g., contact information) that is repeated in all EML documents

Step 2: File Name Parsing

– Parse the derived data file names for site/station, variable, unit, and timescale metadata

Step 3: Derived Data Analysis

– Analyze derived data for temporal coverage and data value bounds

Step 4: R Script Analysis and Inclusion

– Include in the methods section of EML the R script used to generate derived data and any annotation associated with a specific derived data product

Step 5: Manual Documentation

– Include both non-automated metadata and tacit knowledge metadata into the EML

(8)

Derived Data Loading

• Parse data and load relational database

• Record level attributes

-PRIMARY_KEY :: INTEGER START_DATE :: DATESTAMP END_DATE :: DATESTAMP OBS :: FLOAT N_EXPECTED :: INTEGER S_DEV :: FLOAT S_ERR :: FLOAT PROP_MISSING :: FLOAT PROP_QUESTIONABLE ::FLOAT PROP_ESTIMATED :: FLOAT PROP_TRACE :: FLOAT PROP_INVALID :: FLOAT COMMENT :: TEXT

(9)

LNO NIS

Website Presentation

• Initial design and development – EcoTrends editorial committee – Electric Sage Designs, LLC – Laura Downey, Usability Engineer, SEEK Project

(10)

Website Integration

Stage 1: Apache, PHP, CSS,Javascript, and MySQL Stage 2: Apache, PHP, CSS,Javascript, and MySQL Stage 3: Tomcat, Servlet, JSP, CSS, Javascript, and Metacat Refactor Refactor Refactor original website to reflect consistency and modularity; modify CSS for application specific design (e.g., table layout)

Convert all PHP functionality to equivalent Java Server Page (JSP); integrate Metacat based content

(11)

LNO NIS

(12)
(13)

LNO NIS

(14)
(15)

LNO NIS

Data Exploration

• Graphing (single and multiple datasets)

• Aggregation (temporal)

(16)

Site Auditing/DAS

• Web page auditing

• Data access auditing

• Plot auditing

(17)

LNO NIS

(18)

PASTA Architecture

Source A Source B Source C Metacat-Harvester EML Workflow Engine Parser-Loader Dataset Registry Cache Metadata Derived Data Web API HTML SOAP EML.xml

Data loading for synthetic processing based on events (e.g., new data, metadata change) Existing LTER

metadata infrastructure (Metacat and EML)

Source data cache available to all workflow engines Support for multiple scientific workflow engines (e.g., R script, Kepler,

Chimera, D2K) Metadata and derived data products; metadata as EML Standard interfaces to support various

web portals (e.g., Trends, GEOSS, GEON,

NEON, WATERS) and web service APIs Metadata describing

derived data, including data provenance and data versioning

– expand on community provenance research

(19)

LNO NIS

PASTA Application Stack

EML, Metacat, and Harvester Registry/Parser/Loader

Workflow Engine

Cache Database Derived Database Metadata Harvest Web API - Portal

Site Data and EML Metadata

EML, Metacat, and Harvester Network-level Synthesis

Site-level data archive Data transformation and

integration

Standardized data products

Dataset identification and loading

Network interface

Existing EML Harvesting

Sc ale (te mpo ral-spa ti al -org an iz ati o n al )

(20)

Generalized Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat

4. EML Loader/Parser loads new/updated dataset into “cache” database

5. Workflow Engine transforms “cache” data into “derived” data

6. Transformed data is stored in “derived” database 7. EML is generated for derived data and is stored in

(21)

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat

4. EML Loader/Parser loads new/updated dataset into “cache” database

5. Workflow Engine transforms “cache” data into “derived” data

6. Transformed data is stored in “derived” database

7. EML is generated for derived data and is stored in Metacat

(22)

LTER Site Data Collection

• Time-series data

– Physical environment (e.g., climate, …)

– Human population and economy – Biogeochemistry – Biotic structure • Data/metadata – Relational Database – Spreadsheet – Text file – HTML/XML

(23)

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat

4. EML Loader/Parser loads new/updated dataset into “cache” database

5. Workflow Engine transforms “cache” data into “derived” data

6. Transformed data is stored in “derived” database

7. EML is generated for derived data and is stored in Metacat

(24)

EML, Metacat, and the Harvester

• EML Package ID knb-lter-site.XX.YY knb-lter-sev.354.1 knb-lter-sev.354.2 knb-lter-sev.354.3

• Metacat stores the XML of EML; new revisions take precedence – old revisions are deprecated, but not deleted

• Harvester is a time-based update process to “pull” site EML and inserts

“existing LTER investment in technology” Source A Source B Source C Metacat-Harvester EML

(25)

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat

4. EML Loader/Parser loads new/updated dataset into “cache” database

5. Workflow Engine transforms “cache” data into “derived” data

6. Transformed data is stored in “derived” database

7. EML is generated for derived data and is stored in Metacat

(26)

EML Loader/Parser

• Dataset registry identifies Trends data in Metacat • New revisions assert a

“new” data load. The EML parser/loader*

– Translates the site EML into the RDBMS DDL

– Creates a new DB table in the primary database based on the revision

– Loads the new data into the primary database – Trigger to continue workflow Source A Source B Source C Metacat-Harvester EML Parser-Loader Dataset Registry Cache

(27)

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat

4. EML Loader/Parser loads new/updated dataset into “cache” database

5. Workflow Engine transforms “cache” data into “derived” data

6. Transformed data is stored in “derived” database

7. EML is generated for derived data and is stored in Metacat

(28)

Workflow Data Transformation

• “Cache” database stores site data in native site schema and based on snap-shot version

• Workflow Engine

– reads native schema

– performs transformation/integration – writes to global schema

– produces EML metadata

• “Derived” database stores derived data in consistent global schema Workflow Engine Cache Metadata Derived Data

(29)

LNO NIS

Site to Global Schema Mapping

Maximum wind speed meters/second

wspdmax

Minimum wind speed meters/second

wpsdmin

Wind speed meters/second

wspd

Standard deviation of wind direction

wdirstd

Wind direction (azimuth)

wdir

Timestamp of observation 15 min interval

date_time

MCM Canada Glacier Wind Timestamp (daily) value Wind direction (knb-eco-trends.1.1)

value Timestamp (daily)

Wind direction std dev (knb-eco-trends.2.1)

value Timestamp (daily)

Wind speed max (knb-eco-trends.5.1)

“triggered by data load”

(30)

Global Schema

knb_eco_trends_1_1

scope

identifier

(31)

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat

4. EML Loader/Parser loads new/updated dataset into “cache” database

5. Workflow Engine transforms “cache” data into “derived” data

6. Transformed data is stored in “derived” database

7. EML is generated for derived data and is stored in Metacat

(32)

EML for “derived” data

• EML metadata for the derived data and inserts

into Metacat

• Derived data is now accessible through “all”

Metacat user interface

Metacat-Harvester EML Workflow Engine Metadata Derived EML.xml

(33)

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series observation data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision indicating new data

3. EML is harvested into Metacat

4. EML Loader/Parser loads new/updated dataset into “cache” database

5. Workflow Engine transforms “cache” data into “derived” data

6. Transformed data is stored in “derived” database

7. EML is generated for derived data and is stored in Metacat

(34)

Web API

• Store Front provides API to derived data products in secondary DB • HTML – today • Web service – tomorrow • Issues: – Authentication – Authorization – Provenance – Quality – Interactive Plots http://www.ecotrends.info

(beta site location)

Metacat-Harvester EML Metadata Derived Data Web API HTML SOAP EML.xml

(35)

LNO NIS

References

Related documents

It is here that the linkages among agricultural growth per se, overall economic growth, and the connection of the poor to that growth, become crucial, for most of these linkages

Maka dari itu penulis tertarik dan memilih repertoar Variation on a Catalan Folk Song “Canco del Lladre” Op.25 untuk dijadikan objek penelitian dalam karya tulis

Second, the good fit to active data, especially in segments 1 and 2, supports the inferred transition from the rupture barrier to the asperity based on observations of large

Scrooge said to the Ghost, 'Oh, please tell me who that dead man was!' The Ghost took him near his office, but it didn't stop. 'Wait!'

Words and Music by FRANK SULLIVAN and JIM PETERIK Arranged by JOHNNIE VINSON.. EYE OF

hierarchically nested within participants’ residential neighborhood data. Neighborhoods were defined using census tracts, considered to be relatively homogeneous units based on

Oklahoma State Legislature, http://www.lsb.state.ok.us (last visited Aug.. concluded that if a child is born in Oklahoma and is adopted in another state by a same-sex

While it is true that libraries who apply for e-rate discounts on internet access or internal connections must filter, libraries that apply for e-rate discounts on