• No results found

Development of Earth Science Observational Data Infrastructure of Taiwan. Fang-Pang Lin National Center for High-Performance Computing, Taiwan

N/A
N/A
Protected

Academic year: 2021

Share "Development of Earth Science Observational Data Infrastructure of Taiwan. Fang-Pang Lin National Center for High-Performance Computing, Taiwan"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

Development of Earth Science

Observational Data

Infrastructure

of

Taiwan

Fang-Pang Lin

National Center for High-Performance Computing, Taiwan

(2)

The Path from Infrastructure to Data

Sensing for Understanding

Sensing:

(Networks change the game!)

Evolve since 10 years ago: Ecogrid, SARS Grid, … etc

Institutional missions based on special vehicles:

Satellites, Research Ships & Aircrafts, Met Stations …etc.

It is growing even larger and broader, e.g. IOT, social

network.

Understanding:

Modeling from hypothesis to discovery

(3)

Ecogrid:

(4)

The Path from Infrastructure to Data

Sensing for Understanding

Sensing:

(Networks change the game!)

Evolve since 10 years ago: Ecogrid, SARS Grid, … etc

Institutional missions based on special vehicles:

Satellites, Research Ships & Aircrafts, Met Stations …etc.

It is growing even larger and broader, e.g. IOT, social

network.

Understanding:

Modeling from hypothesis to discovery

(5)

5

Global Lake Ecological Observational Network (GLEON)

Lakebase: harvest quality data from internet and

collect more than ~25,000 lakes across the world.

Global compute service through CONDOR

>10 major real time observational data from selected

GLEON sites.

(6)

Fish4Knowledge–

human level query for Marine Biology

6 UCATANIA (Italy) Centrum Wiskunde & Informatica (Netherlands) U. of Edinburgh (UK)

• NCHC in Taiwan: sustainable system for data acquisition, storage, and computing.

• U. of Catania in Italy: fish detection and tracking.

• U. of Edinburgh in UK: workflow, fish recognition, and fish behavior.

• CWI in Netherlands: user interfaces.

NCHC (Taiwan)

Video Data: 10 camera years = 10 cameraframes = 112 Tb

massive data storage Descriptive data: 20 Tb

Fish: 10^10 detections

Summary data: 500 Gb Target: 1 sec query answering

(7)

Infrastructure enables Fish4Knowledge

2013/10/6 7

Cloud of Resources

Service Frontend Data Source Application Developers Storage Nodes Computing Nodes Video Server Marine Biologists Clients Computing platform Storage platform
(8)
(9)

Video Classification

(~350K videos from 2009 to 2013)

9 Algae: 9.2% Blurred: 33.5% Highly Blurred: 13.9% Complex Scenes: 4.3% Encoding: 23.9% Normal: 12.9% Unknown: 2.2%
(10)

The SEATANK Event & Other

SEATANK event

– It happened again!

– Require scientific evidences for Taiwan to win the lawsuit.

(TORI, NSPO, NCHC)

2006 Cargo ship ‘Tzini’ stranded in Yilan water and

leaks >100 tons fuel Oil

2013.1 Freighter SEATANK stranded in Penhu water

Other

– Oceanic temperature data of TW waters only available in JP, if it is

before 1990 – K.C. Shao.

(11)

The Path from Infrastructure to Data

Sensing for Understanding

Sensing:

(Networks change the game!)

Evolve since 10 years ago: Ecogrid, SARS Grid, … etc

Institutional missions based on special vehicles:

Satellites, Research Ships & Aircrafts, Met Stations …etc.

It is growing even larger and broader, e.g. IOT, social

network.

Understanding

: (Data change the game!)

Modeling from hypothesis to discovery

(12)

• Natural disaster

• Facilities infrastructure failure

• Storage failure

• Server hardware/software failure

• Application software failure

• External dependencies (e.g. PKI failure)

• Format obsolescence

• Legal encumbrance

• Human error

• Malicious attack by human or automated agents

• Loss of staffing competencies

• Loss of institutional commitment

• Loss of financial stability

• Changes in user expectations and requirements CC i m a g e b y Sh a ry n M o rro w o n F lic k r Source: http://blog.alltop.com.tw/story/archives/3278 Source: DataONE

(13)

Inf

o.

con

tent

Time (From Michener et al 1997)

Time for Publication

Accident Specific Details General Details Retirement or Career Change Death Data Management

(14)

The Treasure of NARLabs:

Earth Science Observational Data

(15)

Big Data Infrastructure Challenge in ESOD

Data Features:

high complexity, large scale, frequency, real-time and stream

Big

Data

Integration

– of images & data of F2-3 (NSPO), Marine Environmental Databank (TORI), Atmospheric Research Databank (TTFRI), Engineering Geological Database for TSMIP (NCREE) of NARLabs.

Data Size:

155TB/yr.

(Actual data size will 2~3x)

Currently, 103 TB for Q1. Simulation data 200TB from TTFRI.

Real time, High frequency Data:

Process > 18,000 records/sec.

Currently 6,000 records/sec

.

(16)

ESOD Big Data Service ~ 1PB

Parallel compute server d1 d2 d3 External client Parallel data server Query Source dataset Derived datasets Parallel file system (e.g., GFS, HDFS, GPFS) Result

Data-intensive computing system (e.g. Hadoop)

Parallel query server External data sources Examples: • Search

• Photo scene completion

• Log processing

• Science analytics

Characteristics:

• Small queries and results

• Massive data and computation performed on server

Source: David O’Hallaron

Continuous query stream Continuous query results 16 Virtually Centralized Resources

(17)

ESOD

Hardware Architecture

Challenges of use of distributed resources

ESOD Storage 1 1 1 1 2 2 2 3 3 3 1 Disk 918TB Tape 433TB TTFRI NSPO TORI NCREE NCHC 10G Bridge Gateway 8G Optical Fiber WindRider: 40 G infinitband (Lustre FS) 1G - 10G Fiber GPFS GPFS Preload/Stage Exmaples of Datasets 17

(18)

General Parallel File System

2 X 10GE

HSM Tape

library

ESOD

Data Archiving and System Infrastructure

General Parallel File System

2 X 10GE

HS M

Tape library

General Parallel File System

2 X 10GE HSM Tape library NCHC firewall Load balance server Master Load balance server Slave

 Failover between 3 sites.

 Automatic load balancing.

 Transparent file systems: direct read/write of files across different sites.

 Automatic storage backup.

18 18 HAProxy https ftps Scale out > 1 PB WindRider Formosa 2/3/5 Paralllel DBMS High IOPS

(19)

Users

Hardware layer (disk, tape, etc)

DATA DATA DATA DATA Data rule engine(polices) Metadata Catalog

User access to single united

data warehouse

User request for data

Request goes to data server

Server looks up information

in catalog

Catalog tells where the data

physically located

Server applied rules and

serve data

Data server

ESOD:

Data discovery

(20)

Data discovery: Metadata Catalog

Use relational database (mysql) with

multi-dimensional schema

design

to speed up searching

(migrating to NOSQL solutions now)

System Metadata

User name space

- Address / e-mail / telephone number

- Role (administrator, curator, user)

- File name space

- Creation date / size / location / checksum

- Owner / access controls

Storage resource name space

- Capacity / quotas / Type (archive, disk, fast cache)

Domain Metadata

User-given metadata

- Key-Value-Unit Triplets, Annotation

- Relational / XML Metadata

- Domain-specific Schema

Adopt

OGC standards

20
(21)

ESOD

: Smart query and answer

Develop a set of

control vocabulary

based on int’l

standards, e.g. HDF, NetCDF, OGC … etc.

Derive

RDF triple dataset

from common query

tasks of ESOD.

Combination of

visual data plus metadata

to

support a specific high-level information seeking

user task.

Design of an interface for

graph & visual

comparison search.

Selected one specific task, that of comparing sets

of objects, and designed a prototype interface on

top of

linked data sets

used by the experts to

support this task explicitly

Develop methods for data

provenance

.

(22)

The Path from Infrastructure to Data

Sensing for Understanding

Sensing:

(Networks change the game!)

Evolve since 10 years ago: Ecogrid, SARS Grid, … etc

Institutional missions based on special vehicles:

Satellites, Research Ships & Aircrafts, Met Stations …etc.

It is growing even larger and broader, e.g. IOT, social

network.

Understanding

: (Data change the game!)

Modeling from hypothesis to discovery

Data dominate

(23)

100G

TWAREN/

TANET

/

AS

References

Related documents

‘We were impressed with the way Huntsman® integrated into our data infrastructure,’ the Security Team Manager makes the point, ‘and how well it works with our other security

The software could be marketed directly to physi- cians for office use; it could be included in elec- tronic health record (EHR) systems; it could be embedded in insulin pumps

Overall, EHR problems can be delineated into specific usability violations which can be extrapolated into negative outcomes regarding patient safety, increased time, confusion,

Centred on the differentiated rights of PRC and South East Asian spouses derived from the dual legal track, this chapter compared the development of AHRLIM, as an umbrella alliance,

Abstract: The research results of runoff changes in the River Viliya at 3 stations (Steshitsy Village, Vileyka Town and Mihalishki village) during the period 1946–2014 for the

With the Keyguard unlocked, enter a phone number, then press the Left Soft Key [Save].. Use the Directional Key to highlight an existing

In order to address this hiatus, the present article provides findings from a National Health and Medical Research Council (NH&MRC) two-year study on Indigenous palliative