• No results found

The working results of this diploma thesis are an important contribution to the CEOP-AEGIS project. The research field of this project is the hydrology and meteorology of the Qinghai-Tibetan Plateau with the scope to further understand its role in climate, monsoon and extreme meteorological events. Related to this project, the achieved results of this thesis play an important key role as part of the eighth work package that is defined under the topic of Monitoring the water balance and water yield of the Plateau (WP 8). The goal of this work package is also to share project results and information within the scientific community by the implementation of a CEOP-AEGIS Data Portal as a database management system to support the hydrological monitoring of the Plateau. This database management system is a component of the Global Earth Observation System of Systems (GEOSS). The open access of standardized project related data in GEOSS is of a primary interest for the related scientific community, for water resource management institutions, and for human societies as well. This CEOP-AEGIS Data Portal is based on the technologies NetCDF, OPeNDAP, Dapper and DChart that are documented within this thesis at a full length.

Data modeling with NetCDF requires base knowledge about its technical capabilities and its elements. NetCDF relies on data models, data formats and conventions, and upon these elements the possibilities and limitations for NetCDF data modeling depend. The duo of Dapper and DChart form a rather complicated technology that is not easy to understand and that depends on various components. Dapper as OPeNDAP enabled web server has some important limitations in regard of supported NetCDF data model implementations. This is in particular the case for in-situ data that needs to be served by Dapper’s CDP service. The web based user interface DChart can only visualize data in case that it is correctly streamed via OPeNDAP. This requires however a correct translation from a NetCDF file to an OPeN- DAP stream, which is accomplished by the Dapper OPeNDAP server. The functionality of the CEOP-AEGIS Data Portal depends thereof on the cor- rect interaction of the four technologies NetCDF, OPeNDAP, Dapper and

DChart. The correct interaction and functionality of these components de- pends though ultimately on the implemented NetCDF data model that is employed. Consequentially it was essential to first evaluate the important theoretical backgrounds of these components before a suitable implementa- tion of NetCDF was developed. These backgrounds are fully provided in chapter 2 and give a detailed overview of the technologies NetCDF, OPeN- DAP and Dapper / DChart as well as their interaction with each other. The sections 2.3, 2.4 and 2.5 of this chapter serve also as useful documentation about the CEOP-AEGIS Data Portal that is based on the Dapper / DChart technology. By the use of this detailed documentation it was possible to con- figure the data portal for the already standardized NetCDF output data in order to publish this data online. The actual configuration files for Dapper and DChart are printed in the appendix at subsection A.1 (page 126) and A.2 (page 127). These configuration files can be easily expanded in case that more data should be feed to the CEOP-AEGIS Data Portal.

The focus of the decision making process for NetCDF modeling was in- fluenced by the goal of generating highly interoperable output datasets for the related scientific community. A NetCDF implementation consists in de- terminations in matters of what data model, data format and convention of NetCDF to employ. CEOP-AEGIS project data consists of a large amount and variety of multidimensional array-oriented gridded data from remote sensing products and model outputs as well as of in-situ measurements from ground observation stations (see section 1.2 on page 5). Well established NetCDF conventions are existing for gridded data, but not for in-situ data. One of these conventions is the CF Climate and Forecast convention that was employed within this project for many good reasons, as discussed in subsection 3.2.3 on page 89. The CF convention is one of the most popular conventions for sharing and processing NetCDF files and achieved primacy among the other conventions for data stored in the NetCDF 3 classic data for- mat. It defines precise descriptions about the data content of a variable and specifies spatial and temporal properties. The focus of the CF convention is set on gridded data. The definitions of this convention are indeed sparse and unfortunately not sufficient to completely constrain in-situ data. The lack of a suitable and well established convention for point observation data is a real problem in matters of data model design with NetCDF. For storing such data in NetCDF, Unidata recommends the use of its Observation Dataset Convention. On the other hand, this convention though is declared as depre- cated in favor of a new CF Convention for Point Observations. However, the CF Convention for Point Observations does actually only exist in the form of proposals and drafts and is not an established convention yet. This deficit of standardization becomes also obvious in terms of dataset representation

within NetCDF. Various forms of how to structure data in NetCDF exist for in-situ data. Among others, the data organization may be implemented in the form of a linked list of record numbers, in the form of contiguous lists, or as multidimensional nested structures. NetCDF 3 though does only support limited capacities to implement such proposed structures. In addi- tion, the order of dimensions for data variables is neither standardized. In the end, these problems caused much trouble in terms of the development of a suitable NetCDF data model for CEOP-AEGIS in-situ data with the condition of conformance to the CF Climate and Forecast convention and the condition of respect of the constraints of the CDP-service of the Dapper OPeNDAP server. Anyhow, thanks to additional information provided from the developer of Dapper and DChart, such a NetCDF data model for in-situ data could be determined within this thesis. It expands the CF convention to reach conformance with the Dapper In-situ convention and as a result functionality within Dapper. Nevertheless, it was remarked within this the- sis that more documentation of the technical details of Dapper and DChart would be needed in order to fully benefit of this powerful and very useful open-source technology.

Any foreground project data that is going to be published in the CEOP- AEGIS Data Portal must be standardized and follow international conven- tions. This is important not only in consideration of interoperability among the scientific community, but also in regard of functionality within the CEOP- AEGIS Data Portal. Universal input data that is conformal to one single and suitable standard is however in the general case not given. This is because of the large amount of data and due to the heterogeneity in regard of con- tent, storage and metadata description. The results of this thesis respond to these problems by the introduction of a newly developed upstream data interface application which can convert heterogeneous input data of project partners to standardized NetCDF output files. The determined data model for CEOP-AEGIS is implemented within this application and constrains the output data in the desired form. Conformance is checked by several internal routines. If conformance for a dataset is not given, it can be easily modified within the determined intermediate data model. The goals of a properly de- signed data interface to convert heterogeneous input data of project partners in standardized and aggregated NetCDF output data files could be achieved by the development of this data interface application. The NetCDF out- put files of this CEOP-AEGIS Data Interface are based on a standardized, consistent and adequate data and metadata model that ensures maximal compatibility and interoperability.