Building the workflow - Web architecture development

5.3 Web architecture development

5.3.2 Building the workflow

The workflow required a high level of control and some translation between models, as in some instances it was not possible to directly pass the output from one model to the input of another. While workflow composition and orchestration is not the direct focus of this work, it must be considered when identifying the real challenges in both the Model Web, and uncertainty management in the Model Web.

LCCS outputs uncertain transition matrices, with Dirichlet distributions for each row. As LandSFACTS can only accept transition matrices with probabilities, the distributions must be sampled before LandSFACTS is executed, requiring an additional service to do so. When manag- ing uncertainty in workflows, intermediate sampling is a common requirement, as many models do not inherently support uncertain inputs. A process for sampling uncertain transition matrices, which utilises the matlab-connector library to execute matrix sampling MATLAB code, was developed and exposed on the Web.

The output from LandSFACTS contains simulated crops for each field, for a number of years. In the FERA case study, we are only interested in fields which contain wheat, and thus only want to run AquaCrop for these fields. This requires non-trivial orchestration logic, as the uncertain nature of the workflow means that for a single field, for a given simulation year, there could be n× s simulated crops, where n is the number of samples drawn from the LCCS uncertain transition matrix, and s is the number of internal LandSFACTS simulations performed. In addition, as LandSFACTS is run n times, there are n separate responses, which must either be combined or checked individually by the orchestrator for occurrences of wheat.

Once the list of fields containing wheat are selected for each year, the orchestrator must retrieve the appropriate weather and soil data from the SOS. This involves a spatial query based on the location of the field, and also a temporal query for the given year. Supplying these inputs to AquaCrop requires a form of translation to be performed, as the weather and soil data are O&M measurements, but as it has no spatial or temporal context, AquaCrop only requires single double

values as inputs. The weather data has already been aggregated to a monthly level, meaning the O&M result value can be directly copied to the relevant AquaCrop input for the month of the measurement.

As there is no spatial or temporal context in AquaCrop, the orchestrator must track individual AquaCrop runs, ensuring that it is aware of which calculated yield output matches which field and year. This burden on the orchestrator could be alleviated if the AquaCrop model accepted O&M inputs. However, this would negatively impact the usability of the model, as a user would be required to create multiple O&M measurements to supply as inputs, and this may seem unnecessary considering the additional metadata provided in the measurements will not actually be used by the model.

The final workflow output is a set of wheat yield estimates grouped by field, for a five year period, and if correctly tracked by the orchestrator, can be assembled as a set of O&M observations. If required, field level yield estimates could be aggregated to regional level, using the UncertWeb developed Spatio-temporal Aggregation Service (STAS). Integration with the Model Web through the adoption of O&M enables other generic tools to be used, such as Greenland5, for visualisation of geospatial data. Figure 5.2 shows an example visualisation of the simulated crop yields.

Implementation of the workflow was undertaken by a research assistant on the UncertWeb project. The suitability of BPEL and Taverna to orchestrate the workflow were considered, but excluded in favour of a JavaScript Web client. While it would be possible to create and orchestrate the workflow in Taverna, FERA required spatial visualisations for intermediate results. These visualisations would ideally be provided through a Web browser, requiring complex integration between Taverna, a server-side component, and the Web client. BPEL workflows can be deployed on an engine, after they are available as a Web service, therefore easing the integration problems faced with Taverna. However, BPEL still lacks a truly usable composition client, and problems were encountered adding the required level of workflow control using the Eclipse BPEL Designer. Exposing the models in the FERA case study using the processing service framework allowed the JavaScript workflow client to be developed with ease. Within the client, requests were built as JavaScript objects and sent to the Web service interface asynchronously using jQuery, a JavaScript library. Upon receiving the response, jQuery automatically parses the response into an object. The adoption of GeoJSON allowed request and response data to be visualised using the OpenLayers mapping library, without additional translation.

Figure 5.2: Comparing yield estimates over a 24 year period using the Greenland visualisation tool.

Unfortunately, the inclusion of the SOS as a data provider required some of the models to be executed through their SOAP interface, rather than the JSON one. The 52◦North SOS implementation does not currently support JSON as a data format, most likely due to the JSON encoding of O&M being non-standardised, thus we must retrieve data as XML from the SOS to pass to the SOAP interface. Although the use of the processing service framework allowed the switch to a SOAP interface to be made without any server-side changes, this demonstrates the challenges faced in the composition of Model Web workflows, and emphasises the need for standardised data formats.

In document Uncertainty analysis in the Model Web (Page 150-153)