System architecture and Dapper services - Development of an interface for the conversion of geo

2.3 Dapper

2.3.2 System architecture and Dapper services

As explained by Sirott (n.d.a, n.d.b), Dapper is a Java servlet with two configurable services to convert gridded and in-situ data to an OPeNDAP stream, and can be embedded in web servers that are implementing Java servlet APIs, such as Tomcat.

As documented by Cornillon et al. (2009), Sirott (2002) and Sirott et al. (2004b), OPeNDAP enabled client programs can access data via an OPeN- DAP request to the Dapper web-server, that streams the requested data out to the client. The data conversion to an OPeNDAP stream is accomplished by two Dapper services that are implemented with the use of a Java servlet (Sirott, n.d.b). Beside DChart, such OPeNDAP enabled clients for in-situ data served by Dapper are for example MATLAB, ncBrowse, Ferret, Java OceanAtlas, OceanShare or GrADS, and a various others for gridded data, as mentioned by Sirott (n.d.a).

The request in space and time for an in-situ dataset is based on a two- stage protocol (Cornillon et al., 2009): The first step includes a query from the Dapper enabled client about the spatial and temporal availability of data. Each in-situ dataset is defined by an unique ID value, as defined by the Dapper In-situ Conventions in subsection 3.2.2 on page 80. Those IDs that are respecting the selected criteria are returned to the requesting application via the OPeNDAP protocol. Again by the use of the OPeNDAP protocol, the desired data can be chosen on the basis of the obtained IDs, as Cornillon et al. (2009) define.

Dapper actually contains two configurable OPeNDAP routing request services to convert gridded data formats or in-situ data formats represented in form of time series or profile data into the OPeNDAP protocol to stream this data to an OPeNDAP enabled client, such as DChart. The two services of Dapper – implemented as a Java servlet – are the CDP-service and the NetCDF-service (Sirott et al., 2004b, 2004a).

CDP-service

The CDP-service of Dapper is used to provide an OPeNDAP-interface to data in form of profiles or time series that are available at the EPIC com-

patible Climate Data Portal (Sirott et al., 2004b; Denbo et al., 2004). As documented by Sirott et al. (2004b, 2004a), HTTP requests from an OPeN- DAP enabled client are received by Dapper, and the Dapper CDP service converts the OPeNDAP command to the related CDP requests. By do- ing so, the Dapper CDP service contacts the CDP by the use of the stan- dard distributed object architecture ORB (Object Request Browser) CORBA (C ommon Object Request Broker Architecture) and its communication pro- tocol IIOP (I nternet I nter-ORB Protocol) that is based on TCP/IP (Sirott et al., 2004b, 2004a).

Instead of accessing individual NetCDF files for obtaining metadata, a relational database management system (RDBMS) based on MySQL is queried, what outcomes in faster results (Sirott et al., 2004b, 2004a). This relational database serves for the storage of coordinate boundaries and metadata in- formation for each NetCDF file. The measurements remain in the individual NetCDF files, as also mentioned by Sirott (n.d.b).

The Dapper database loader program (called dapperload) is based on a MySQL database (version 5.0 from Dapper version 1.0.0RC1) and used for updating the server with metadata from in-situ files, in case that the CDP- service is employed (Sirott, 2002, 2009). Sirott (2009, n.d.b) documents that data which is loaded by the use of the Dapper database loader is organized on a three level hierarchy: On the lowest level are in-situ files that contain a set of variables and that should all follow the same convention. Each variable contains a measurement of physical data that was collected at a station. A group of such in-situ files is loaded to a dataset for aggregation that forms the middle layer. A database containing a group of datasets forms the highest level. By the use of the Dapper database loader it is possible to load and unload in-situ data to a dataset, to create and to remove a database, and to list the content of a dataset or a database (Sirott, 2009, n.d.b).

Regarding the data, each profile or time series within a CDP dataset is represented by one single NetCDF file (Sirott et al., 2004b, 2004a). All NetCDF files within a dataset should follow the same convention to avoid unpredictable problems (Sirott, 2009). Only files that are conform with the in-situ conventions supported by Dapper, as mentioned in subsection 2.3.4 on page 61, can be loaded and aggregated to a dataset of the Dapper database. The database is only updated with new metadata from a file of a dataset in case that this file has been modified since the last time that it was loaded to Dapper, as Sirott (2009) remarks.

According to the CDP request, the CORBA ORB returns the requested data to Dapper via the application programming interface (API) of CDP, and Dapper streams this data out to the requesting client by the use of the OPeNDAP protocol (Sirott et al., 2004b, 2004a). Since Dapper version

1.0.0RC1, as announced by Sirott (2009), the CDP server is integrated in Dapper and uses no more CORBA for the access of in-situ data.

The requested data is of a CDP data type and streamed as an OPeNDAP Sequence, as explained by Sirott et al. (2004b, 2004a) and Sirott (n.d.b). Cornillon et al. (2009) remark that this data type – used for in-situ data represented in form of profiles or time series – however may constrain the interoperability due to a lack of consistent organizational structures. Sirott (2002) calls OPeNDAP in-situ data a “[. . . ] poor stepchild of OPeNDAP gridded data” that is poorly supported by clients and web applications”. NetCDF-service

The Dapper NetCDF-service can be used to convert individual NetCDF files to the OPeNDAP protocol by profiting of an enhanced version of the Java NetCDF library that was developed by Unidata (Sirott et al., 2004b, 2004a; Sirott, n.d.b). The NetCDF library enhancement consists in an adaption of the constructor and a modification that makes the parsing of NetCDF header attributes optional, what results, according to Sirott et al. (2004a), in a true streaming server with a two times faster speed. By the use of the NetCDF service, data is converted to the more common Grid or Array data types of OPeNDAP. Since not all clients support the OPeNDAP Sequence data type, this fast streaming service allows to access individual gridded or in- situ NetCDF files in the OPeNDAP Grid data type, as Sirott et al. (2004b) explain.

Data served by the Dapper NetCDF-service is not loaded to the Dapper database, but instead located in a directory that is declared in the Dapper properties configuration file. This directory can also contain other directories (Sirott, 2009, n.d.b). Figure 2.9 on page 61 illustrates the Dapper system architecture defined by Sirott et al. (2004b), as it was explained in this section.

In document Development of an interface for the conversion of geodata in a NetCDF data model and publication of this data by the use of the web application DChart, related to the CEOP-AEGIS project (Page 88-90)