• No results found

Maximising the utility of OpeNDAP datasets through the NetCDF4 API

N/A
N/A
Protected

Academic year: 2021

Share "Maximising the utility of OpeNDAP datasets through the NetCDF4 API"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Maximising the utility of OpeNDAP datasets

through the NetCDF4 API

Stephen Pascoe (

[email protected]

)

Chris Mattmann (

[email protected]

)

Phil Kershaw (

[email protected]

)

(2)

Opportunities for the

“NetCDF/OPeNDAP Platform”

H ig h-Le ve l T oo ls N et C D F-A P I Application Script

Local Files

Remote Files

Virtual Datasets

Web App.

Remote Files

Virtual Datasets

E S G F S ec ur ity OPeNDAP Server F S ec ur ity OPeNDAP Server

W

A

N

LA N
(3)

Key Questions

What performance can a user expect today from

high-level interaction with OPeNDAP datasets through the

NetCDF API?

Can OPeNDAP servers cope with these scenarios?

How can we improve from where we are?

(4)

OpeNDAP Test Framework

http://github.com/stephenpascoe/dapbench

Tools to intercept OPeNDAP requests from clients for analysis

Load-testing framework for testing OPeNDAP servers

Specific tests discussed here

“pre-alpha” work in progress

(5)

Test System

● DELL optiplex 980. ● Intel i5 4-cores @ 3.2GHz ● 1Gbps NIC ● NetCDF 4.1.2-beta2, HDF5-1.8.4-patch1

● Xen Virtual Machine

● Host: DELL PowerEdge 2950 III ● 2x quad-core CPU, 24GB RAM ● Guest VM: Paravirtualised

OpenSuSE 11.0

● 2 CPU, 8GB RAM, 4GB swap

Measured Bandwidth: 930Mbps (iperf)

Client

LAN

Server

Test Servers

● Hyrax: OLFS-1.7.1, bes-3.9.0, dap-server-4.1.0 ● THREDDS Data Server: 3.17.3.1

● Pydap: 3.0.rc.15. netCDF4-python-0.9.3.

(6)

Client Tests

Test Clients Test Dataset

● CDAT (cdat-lite 6.0-alpha-3) ● CDO 1.4.7

● 4D temperature field

● shape [120,4,144,192] time/level/lat/lon ● 51MB NetCDF file.

Test Requests 1. Take a 45ºx45º subset of entire field

(7)

Client Results

Test Tool DODS

Requests Download Size Comment Subset CDO 481 50Mb 1 + time*level requests

CDAT 17281 1.5Mb 1 + time*level*lat requests Regrid CDO 481 50Mb 1 + time*level requests

CDAT 69121 50Mb 1 + time*level*lat requests

CDO downloads too much and with sub-optimal number of requests. CDAT

downloads just enough but with very sub-optimal number of requests. CDAT

iterates over all outer dimensions.

(8)

Server Tests

Test Requests

1. Request all of a random file in n slices, n = [15, 30, 60, 120, 240, 720, 1440] 2. Continuously reqest random subsets in m threads, m = [1, .. 20]

Test Framework Test Dataset

● Grinder load-testing framework ● NetCDF-Java API

● Python httplib2

● 30 files of 4D temperature field

● shape [120,4,144,192] time/level/lat/lon ● 600MB per NetCDF file.

(9)

Health Warning

Preliminary Results

These data may say as

much about the test

environment as the

(10)

Server Test 1: ramp slices

0 200 400 600 800 1000 1200 1400 1600 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000

Time to download whole field

pydap tds hyrax Requests T im e / m s 0 200 400 600 800 1000 1200 1400 1600 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 Equivilent Bandwidth pydap tds hyrax Requests M B /s

~360Mbps

(11)
(12)

Solutions

(13)

How do we save

clients from themselves?

What if clients were forced to

cache data themselves?

What if the server only

responded to sensible sized

requests?

What if we could restrict the

number of allowed requests and

cache them?

(14)

DAP Response tiling / chunking

WMS web-clients use tiling to improve

client & server performance

Browser

Tile Cache Tiles WMS

Tiled WMS

Proxy Cache Cache
(15)

REST Constraints

Client-server

: separation of concerns

Stateless

: no client-context stored on the server

Cacheable

: responses can be cached to improve

performance

Layered

: clients are unaware whether they are connected

directly to the end server

Code on demand (optional)

: Servers can extend clients

with custom code

Uniform Interface

: generic interface between client and

server decouples them allowing independent evolution.

?

?

OPeNDAP

(16)

Big wins for caching

If tiling/chunking could help standard OpeNDAP performance it

could

hugely

help server-side processing

Consider

http://example.com/mydataset.nc.dods?zonalmean(tas

)

But

not all server-side processing can be expressed as a URI

(17)

Beyond Synchronous OPeNDAP

We know we have to develop

server-side processing systems

How do we keep them RESTful

and in the spirit of OPeNDAP?

So that the outputs remain

(18)
(19)

WPS Process Workflow

GET DescribeProcess

resource to discover process

arguments

POST to create a process

execution resource (unique

URL)

GET to poll status of process

execution

Navigate to outputs when

(20)

Baby steps towards

(21)

Is there a generic

lightweight pattern?

A Standard way to say “Processing, Come back later”

Where should the “process execution description”?

Specific DAP response

Containers, e.g. Catalogues

Provenance responses

(22)

Summary

Clients and Servers need to improve to support efficient

high-level OPeNDAP access through the NetCDF API

Benchmarking could help us find where we need to focus our

work

Caching and server-side processing should be part of the

solution

(23)(24)
(25)
http://github.com/stephenpascoe/dapbench

References

Related documents

It is hoped that this study can also be beneficial to pastors and the engaged couples they counsel in that it addresses what topics can be better covered in premarital

Topic – Basic Concept of Financial Reporting During and Post Covid 19 by :   CA- Pawan Kumar Dhawan​ (Independent Director,Hindustan Copper Limited) .. Topic- General Overview

Life cycle anchor points Risk management Key practices Success models Business case IKIWISI Stakeholder win-win Property models Cost Schedule Performance Reliability Product

Given these findings suggesting that parental psychopathology, and specifically ADHD and depression is associated with a more severe offspring ADHD clinical presentation, the next

As an example for the following description of visualization schemas, we construct a custom multiple-view visualization for the database of website hits whose data schema is shown

Six months after the Mabo decision, in December 1992, Prime Minister Paul Keating launched Australia into, what the United Nations had declared, the International Year of the

However, a current staff member who holds an appointment at the G-6 or G-7 level may also apply to temporary positions in the Professional category up to and including the P-3

Many machine learning tools have been applied to the problem of condition monitoring using static machine learning structures such as artificial neural network, support vector