• No results found

Data Formats for Long-term Archiving of Climate Model Data at WDC Climate and DKRZ

N/A
N/A
Protected

Academic year: 2021

Share "Data Formats for Long-term Archiving of Climate Model Data at WDC Climate and DKRZ"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Formats for Long-term Archiving

of Climate Model Data

at WDC Climate and DKRZ

Michael Lautenschlager and Jörg Wegner

WDC Climate / Max-Planck-Institute for Meteorology, Hamburg

MPG e-Science Seminar

(2)

DKRZ:

• Earth system model development

• Simulations of past, present and

future climate

WDC Climate:

• Long-term data archiving

• Inter-disciplinary data

dissemination

(3)
(4)

Diagram of the

Hamburg

IPCC-Climate Model

(5)
(6)

Near surface

temperature change

for the scenarios

A1B und B1.

Presented is the

difference of the

30-year-means

2071-2100

minus 1961-1990.

(7)

Comparison of the

present-day sea ice

cover

In March and

September

(oben) with the climate

projection for the

scenario A1B (unten)

in 2100.

Additionally the snow

over land can be

(8)

Spatial resolution of the North Atlantic sector

in ECHAM5/MPI-OM

(9)

Data Volumina in Climate Projections:

z

IPCC AR4:

„

ECHAM5[

T63L19

]/MPI-OM produces 23 TB/year

„

Climate projection over 240 years (1860-2100):

5,5 TB

and appr. 2 months computer run time

z

Future:

„

ECHAM5[

T106L31

] produces 44 GB/year

„

Climate projection over 240 years (1860-2100):

106

(10)

Extrapolated HLRE2 linear archive increase (

10 times HLRE

)

Compute server architectures:

C90: Cray C90 / HLRE: NEC SX-6 / MPP: SUN-Cluster / HLRE2: new system

(11)

lokale

Systeme

lokale

Systeme

CS

DS

NW

entfernte

Systeme

entfernte

Systeme

GFS

DKRZ System Diagram

(12)

x 32 LAN x 16 x 35 UCFM Cache 17 TB 9840C x 7 9940B x 18 T10000 x 8 LTO2 x 2 x 16 GFS Disk 70 TB x 32 x 48 DBMS Disk 30 TB x 20 x 112 x 36 x 24 x 12 SX SX--66 SXSX--66 SXSX--66 SXSX--66 SXSX--66 SXSX--66 SX SX--66 SXSX--66 SXSX--66 SXSX--66 SXSX--66 SXSX--66 SXSX--66 SXSX--66

IXS

24 nodes

x 2 DXUL-DB Oracle9i 6 * 4/8 6 * 4/8 3 * 16/32 3 * 16/32--4848 x 12 x 6 GFS/UVDM UDSN UCFM 3 * 4/8 3 * 4/8 SUN ApplSrv x 6 x 8 x 6 DS test 8/16 8/16 UDSN 2 * 16/32 2 * 16/32 UCFM GFS/UVDM HSM DBMS 8/16 8/16 Az Az archive backup X compile user appl x 2 x 12 2 * 8/16 2 * 8/16 GFS GFS x 4 x 12

(13)

z

Data classes

„

Test data

from model code development,

life cycle: weeks to months

„

Project data

from scientific model evaluation and research

projects (DKRZ resources at project level),

life cycle: 3 – 5 years

„

Final results

as data products for international projects

(IPCC) and scientific publications,

life cycle: 10 years and longer

z

Data hierarchy levels

„

Temp

(orary)

scratch discs at compute server

„

Work

fixed disc space at project level for evaluation

„

Arch

(ive)

tape storage space (single copy) with expiration date for

project data beyond available disc space

„

Docu

(mentation)

documented, long-term tape archive

(security copy) for data

products, focus on interdisciplinary data utilisation,

(14)

Tape space distributon to archive classes at DKRZ begin

of 2007:

• part of the “work” space on tape because GFS too small

• “docu” domain consists of WDCC

• no expiration dates in “arch” domain, parts of “arch”

domain belongs to “docu” but not yet documented

(15)

z

Data documentation requirements are accomplished by

using the WDCC infrastruture

„

CERA-2 metadata model developed in 1999

‹

Catalogue interface:

cera.wdc-climate.de

‹

Input interface:

input.wdc-climate.de

„

CERA-2 metadata content is

complete with respect to

browse, to discover and to use climate data

which are

stored in the database system or outside in flat files

„

The WDCC matches international description

standards like

ISO 19115, Dublin Core or GCMD

and is

integrated in international data federations

„

Data storage structure assembles storage of climate

time series per variable in

BLOB data tables

. This

allows for web-based data catalogue search and data

(16)

CERA Data Model

Entry

Reference

Status

Distribution

Contact

Coverage

Parameter

Spatial

Reference

Local Adm.

Data Access

Data Org

(17)

Coloured columns correspond to

BLOB data tables

in

WDCC.

Collections of matrix rows represents storage in

model

raw data files

(complete model output storage time step

by storage time step).

(18)

Data infrastructure integrates

data stewardship

in the

long-term archive

• Bit-stream preservation

• Quality assurance

(19)

Long-term archive data stewardship

z

Bit-stream preservation

„

Secondary tape copies

on different tapes and

technology at separate location

„

Copy to new tapes after maximum number of tape

accesses are reached (

Refreshment

)

z

Quality assurance

„

Semantic examinations:

behavior of a numerical model

compared to observations and to other models, part of

the scientific evaluation process

„

Syntactic examinations:

formal aspects of data

archiving and ensurance that data archiving is free of

errors as far as possible

‹

Consitency

between metadata and climate data

‹

Completeness

of climate data

‹

Standard range

of values

(20)

Long-term archive data stewardship (continued)

z

Usability enabling

„

Complete and

searchable documenation

of climate

data entities (database tables and flat files) in the

catalogue system of the WDCC

„

WDCC offers

web-based data access

to small data

granules (individual entries in BLOB DB tables)

„

Archive technology transfer must be

downward

compatible

to keep old data technically readable

„

Data processing tools and data format access libraries

(21)

Standard Data Formats (SDFs) at WDC Climate

z

Requirements

„

Self-descriptive (use metadata)

„

Machine independent

„

Should contain compression or packing

z

Benefits

„

SDFs support long-term data preservation

„

SDFs support data exchnage and dissemination

„

SDFs allow for application of standardized data

(22)

Data Form a ts at W D C C

G RIB 1

G RIB 2

NetCDF 3.x

NetCDF 4.x

cdo, cdat, xconv, IDV

cdo, cdat, nco, ncl

cdat, grads, ncview, G M T

convert

m anipulate

visualize

tools:

(23)

G RIB 1

-'GRIB' |

Section 0 -length of message, edition nu m ber | Section 1 - product description section |

Section 2 - grid description section | repeated Section 3 - bit map section |

Section 4 - binary data section | -'7777' |

ds8 55 %grib -ginfo zzz.grb

Rec : Position Size : V PDS GD S BMS BDS : Code Level : LType GType 1 : 0 36948 : 1 28 32 0 36876 : 133 20000 : 100 4 2 : 36948 36948 : 1 28 32 0 36876 : 133 20000 : 100 4 3 : 73896 36948 : 1 28 32 0 36876 : 133 20000 : 100 4 4 : 110844 36948 : 1 28 32 0 36876 : 133 20000 : 100 4 5 : 147792 36948 : 1 28 32 0 36876 : 133 20000 : 100 4 ds8 56 %grib -gdsinfo zzz.grb

Rec : GDS NV PVPL Typ : xsize ysize Lat1 Lon1 Lat2 Lon2 dx dy 1 : 32 0 255 4 : 192 96 88572 0 -88572 358125 1875 48 ds8 57 %

- co mpressed data -> s mall file size

- every 2d field (record) is a GRIB file -> UNIX co m m ands for catenating -library support for FO RTR A N & C

- strong restrictions for header informations

- header information coded (nu mbers) - need of tables for decoding

(24)

G RIB 2

Section 0 -'GRIB' indicator section Section 1 -identification section*

Section 2 -local use section (optional) |

Section 3 - grid definition section* | |repeated

Section 4 - product definition section* | |repeated|

Section 5 - data representation section |repeated | |

Section 6 - bit map section | | |

Section 7 - data section | | | Section 8 - end section '7777'

* Sections 1,3,4 represent the GRIB1 product description section.

This splitting, com bined with the option for iterating sections and the concept of templates make the main difference to GRIB1 and keeps GRIB2 very flexible.

Concept of templates: You can define templates for grid definition, product definition and data representation by your o wn.

(25)

A 500 hPa height field forecasts on a Northern He misphere polar stereographic grid produced by a particular num erical m o del at forecast hours 12, 24, 36, and 48. These four fields could be represented by a single GRIB2 message by repeating the sequence of Sections 4 to 7 four times, making the appropriate forecast time changes in the Product Definition Section in each iteration of the sequence.

Section 0: Indicator Section Section 1: Identification Section

Section 2: Local Use Section (optional) Section 3: Grid Definition Section

Section 4:Product Definition Section (hour = 12)| repetition 1 Section 5: Data Representation Section |

Section 6: Bit-Map Section | Section 7: Data Section |

Section 4:Product Definition Section (hour = 24)| repetition 2 Section 5: Data Representation Section |

Section 6: Bit-Map Section | Section 7: Data Section |

Section 4:Product Definition Section (hour = 36) | repetition 3 Section 5: Data Representation Section |

Section 6: Bit-Map Section | Section 7: Data Section |

Section 4:Product Definition Section (hour = 48) | repetition 4 Section 5: Data Representation Section |

Section 6: Bit-Map Section | Section 7: Data Section | Section 8: End Section

Note that since the Grid Definition Section is not repeated, it re mains in effect for all four forecast hours.

(26)

NetCDF 3.x

-

dimensions (1 unlimited possible)

-

variables & attributes

-

global attributes

- data

netcdf simple_xy { dimensions: x = 6 ; y = 12 ; variables: int data(x, y) ; // global attributes:

:C D O = "Climate Data Operators version 0.9.5 " ; :source = "E C H A M5.2" ; data: data = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 ; }

- no co m pression for data -> file size bigger than GRIB1 - data stored n-dim ensional

-library support for FO RTR A N & C -file size => 2 GByte with NetCDF3.6 - no restrictions for header informations

(27)

NetCDF 4.x, HDF5

NetCDF-4/HDF5 For mat

With version 4.0 of netCDF, another new data format was introduced: netCDF-4/HDF5 format. This format is HDF5, with full use of the new dimension scales, creation

ordering, and other features of HDF5 added in its version 1.8.0 release.

Multiple unlimited dimensions.

Groups to organize data.

New types, including com p ound types and variable length arrays.

Parallel I/O.

netcdf4 "exa mple" { group "/" { group "group1" { dataset "set1" { dimension variables data} dataset "set2" { dimension variables data} } group "group2" { ... }} netcdf3.x file

(28)

Tools

nco:

for NetCDF

http://nco.sourceforge.net/

ncl:

for NetCDF3, NetC DF4, G RIB1, GRIB2, HDF4

http://w w w.ncl.ucar.edu/

ncview:

for NetCDF

http://m eteora.ucsd.edu/~pierce/ncview_ho me_page.ht ml

cdo:

for GRIB1, NetC DF, ieg, EXTR A, Service

http://w w w.m pi met.mpg.de/filead min/software/cdo/

cdat:

for GRIB1 (with GrAD S control file), NetC DF, HDF

http://w w w-pc mdi.llnl.gov/software/

xconv:

for NetCDF, G RIB

http://badc.nerc.ac.uk/help/software/xconv/

IDV:

for GRIB, NetCDF

http://w w w.unidata.ucar.edu/software/

TH G HDF5 tools:

for HDF

http://w w w.hdfgroup.org/products/hdf5_tools/

GrA DS:

GRIB1 (with control file), NetCDF3.x

http://grads.iges.org/grads/grads.ht ml

References

Related documents