1 July 2006
and
Year-4 Program Plan
NSF Cooperative Agreements ATM03-31574, 31578, 31579, 31480, 31586,
31587, 31591, 31594
Table of Contents
1. Introduction – the Genesis and Mission of LEAD . . . . 6
2. System Capabilities and the LEAD Environments . . . . . 7
3. The LEAD Architecture . . . 8
4. The LEAD Grid . . . 12
5. Research Agenda and Strategic Timeline . . . 13
5.1 Balancing Research and Deployment. . . 13
5.2 Strategic Timeline and Integrative Test Beds. . . 14
5.3 Research Agenda for Dynamic Adaptation . . . 15
5.3.1 Working Definitions and Key Questions and Issues . . 17
5.3.2 Dynamic Adaptation Use Case #1 . . . 18
5.3.3 Dynamic Adaptation Use Case #2 . . . 19
5.3.4 Dynamic Adaptation Use Case #3 . . . 20
5.4 Mapping the Research Agenda . . . 20
6. Project Administration and Oversight . . . 22
6.1 Project Manager and Technical Integration Coordinator. . . . 22
6.2 Internal and External Assessment of the LEAD Enterprise. . . 23
6.3 Collaboration Mechanisms . . . 23
6.4 External Advisory Panel . . . 25
7. Year-3 Accomplishments Compared with Plans . . . 25
7.1 LEAD Grid . . . 27 7.2 Data Thrust . . . 29 7.2.1 Components . . . 29 7.2.2 Integration . . . 31 7.3 Orchestration Thrust . . . 34 7.3.1 Overview . . . 35 7.3.2 Services Development . . . 36
7.3.3 Orchestration and Monitoring . . . 39
7.3.4 Siege – A Thick Client Strategy for Workflows . . . 41
7.3.5 Documenting the LEAD Architecture . . . 44
7.4 Tools Thrust . . . 44
7.4.1 ADaM . . . 46
7.4.2 Decoders . . . 47
7.4.3 Noesis . . . 48
7.4.4 Feature Detection in Streaming Data and Dynamic Workflow 49 Response 7.4.5 IDV . . . 50
7.4.7 Siege . . . 51 7.4.8 THREDDS . . . 51 7.4.9 ADAS . . . 52 7.4.10 WRF . . . 52 7.5 Meteorology Thrust . . . 53 7.6 Portal Thrust. . . 61 7.6.1 Overview . . . 61
7.6.2 Portal Functionality and Experiment Construction . . . 64
7.6.3 Accomplishments . . . 67
8. Community Engagement . . . 69
9. Inaugural Meeting of the Forum on Geosciences Information . . 71
Technology (FGIT) 10. Tri-Annual Unidata Users Workshop . . . 72
10.1 Introduction and Workshop Goals . . . 72
10.2 Application of LEAD at the Workshop . . . 73
10.3 Preparing for the Workshop . . . 75
10.4 Capturing the User Experience . . . 75
10.5 Preliminary User Feedback . . . 75
10.6 Preliminary Technical Lessons Learned . . . 76
11. Education and Outreach . . . 76
12. Diversity Enhancement . . . 79
13. Changes in Direction . . . 82
14. Collaboration with Other Projects . . . 82
14.1 Center for Collaborative Adaptive Sensing of the . . . 82
Atmosphere (CASA) 14.2 TeraGrid . . . 84
14.3 VGrADS . . . 84
14.4 WRF Developmental Test Bed Center (DTC) . . . . 85
14.5 Digital Library for Earth System Education (DLESE) . . 85
14.6 NOAA National Severe Storms Laboratory and Storm Prediction 85 Center 15. Broader Impacts . . . 86
15.1 Supercomputing 2006 and TeraGrid 06 . . . 86
15.2 NSF Middleware Initiative . . . 87
16. References Cited . . . 88
17.1 Journal Articles and Conference Proceedings Directly Related to 89 or Sponsored in Whole or in Part by LEAD
17.2 Book Chapters . . . 94
17.3 Related Journal Articles and Conference Proceedings . . 94
17.4 Internet Dissemination . . . 96
17.5 Degrees Awarded . . . 96
17.6 Presentations Involving LEAD . . . 97
18. International Activities . . . 100
18.1 Foreign Collaborators . . . 100
18.2 Visitors . . . 100
18.3 International Presentations . . . 100
19. Project Participants . . . 101
20. Year-3 Budget Analysis . . . 110
20.1 Year-3 Budget Allocations by Institution . . . 110
20.2 Expenditures by Institution. . . 111
20.2.1 University of Oklahoma. . . 111
20.2.2 University of Illinois. . . 113
20.2.3 University of Alabama in Huntsville . . . 114
20.2.4 UCAR Unidata Program . . . 115
20.2.5 Indiana University . . . . 116
20.2.6 Howard University . . . 117
20.2.7 Millersville University . . . 118
20.2.8 Colorado State University . . . 119
20.2.9 University of North Carolina . . . 120
20.3 Expenditures by Project Thrust. . . 121
20.3.1 Entire Project. . . 121
20.3.2 University of Oklahoma. . . 121
20.3.3 University of Illinois. . . 122
20.3.4 University of Alabama in Huntsville . . . 122
20.3.5 UCAR Unidata Program . . . 122
20.3.6 Indiana University . . . 123
20.3.7 Howard University . . . 123
20.3.8 Millersville University . . . 123
20.3.9 Colorado State University . . . 124
20.3.10 University of North Carolina . . . 124
20.4 Expenditures by Project Function . . . 124
20.4.1 Entire Project . . . 125
20.4.2 University of Oklahoma . . . 125
20.4.3 University of Illinois . . . 126
20.4.4 University of Alabama in Huntsville . . . 126
20.4.5 UCAR Unidata Program . . . 126
20.4.6 Indiana University . . . 127
20.4.8 Millersville University . . . 127
20.4.9 Colorado State University . . . 128
20.4.10 University of North Carolina . . . 128
20.5 Expenditures by Personnel Classification . . . 128
20.5.1 Entire Project . . . 129
20.5.2 University of Oklahoma . . . 129
20.5.3 University of Illinois . . . 129
20.5.4 University of Alabama in Huntsville . . . 130
20.5.5 UCAR Unidata Program . . . 130
20.5.6 Indiana University . . . 130
20.5.7 Howard University . . . 130
20.5.8 Millersville University . . . 131
20.5.9 Colorado State University . . . 131
20.5.10 University of North Carolina . . . 131
21. Disclosures, Licenses and Patents . . . 132
22. Prizes and Awards . . . 132
23. Images for NSF Public Relations Use . . . 132
24. Program Plans for Year-4 . . . 132
24.1 LEAD Grid . . . 133
24.2 Data Thrust . . . 134
24.3 Orchestration Thrust . . . 135
24.4 Tools Thrust . . . 136
24.4.1 ADaM . . . 136
24.4.2 ADAS and WRF 3DVAR . . . 137
24.4.3 Common Data Model . . . 138
24.4.4 IDV . . . 138 24.4.5 LDM and IDD . . . 138 24.4.6 Siege . . . 138 24.4.7 THREDDS . . . 139 24.4.8 WRF . . . 139 24.4.9 Noesis . . . 140 24.5 Meteorology Thrust . . . 140 24.6 Portal Thrust . . . 141
24.7 Community Engagement and Deployment . . . 142
24.9 Education and Outreach . . . 143
24.10 Diversity Enhancement . . . 145
25. Year-4 Budget Justifications . . . 145
25.1 Budget Allocations Among Participating Institutions . . 145
25.2 University of Oklahoma . . . 146
25.3 University of Illinois . . . 148
25.5 UCAR Unidata Program . . . 152
25.6 Indiana University . . . 154
25.7 Howard University . . . 156
25.8 Millersville University . . . 157
25.9 Colorado State University . . . 159
Appendix A. Year-2 Site Visit Agenda . . . 162
Appendix B. Year-2 Site Visit Report, Responses and Updates. . . 165
Appendix C. Selected Acronyms Used in LEAD . . . . 188
1. Introduction – the Genesis and Mission of LEAD
On 1 October 2003, the National Science Foundation funded a Large Information Technology Research (ITR) grant known as Linked Environments for Atmospheric Discovery (LEAD). A multi-disciplinary effort involving 9 institutions and more than 100 scientists, students and technical staff, LEAD is creating an integrated, scalable framework in which meteorological analysis tools, forecast models, and data repositories can operate as dynamically adaptive, on-demand, grid-enabled systems that a) change configuration rapidly and automatically in response to weather; b) respond to decision-driven inputs from users; c) initiate other processes automatically; and d) steer remote observing technologies to optimize data collection for the problem at hand. Although mesoscale meteorology is the particular domain to which these concepts are being applied, the methodologies and infrastructures being developed are extensible to others including as medicine, ecology, hydrology, geology, oceanography and biology.
LEAD has adopted a service-oriented architecture with two principal objectives:
• To lower the entry barrier for using, and increase the sophistication of problems that can be addressed by, complex end-to-end weather analysis and forecasting/simulation tools. Existing weather tools such as data ingest, quality control, and analysis/assimilation systems, as well as simulation/forecast models and post-processing environments, are enormously complex even if used individually. They consist of highly sophisticated software developed over long periods of time, contain numerous adjustable parameters and inputs, require one to deal with complex formats across a broad array of data types and sources, and often have limited transportability across computing architectures. When linked together and used with real data, the complexity increases dramatically. Indeed, the control infrastructures that orchestrate interoperability among multiple tools – which notably are available only at a few institutions in highly customized settings – can be as complex as the tools themselves, involving thousands of lines of code and requiring months to understand, apply and modify. Although many universities now run experimental forecasts on a daily basis using public-domain software such as the Weather Research and Forecast (WRF) model, they do so in very simple configurations using mostly local computing facilities and pre-generated analyses to which no new data have been added. LEAD seeks to democratize the availability of advanced weather technologies for research and education, lowering the barrier to entry, empowering application in a grid context, increasing the realism of how technologies are applied, and facilitating rapid understanding, experiment design and execution.
• To improve our understanding of and ability to detect, analyze and predict mesoscale atmospheric phenomena by interacting with weather in a dynamically adaptive manner. Most technologies used to observe the atmosphere, predict its evolution, and compute, transmit and store information about it operate not in a manner that accommodates the dynamic behavior of mesoscale weather, but rather as static, disconnected elements. Radars do not adaptively scan specific regions of storms, numerical models mostly are run on fixed time schedules in fixed configurations, and
cyberinfrastructure does not allow meteorological tools to operate on-demand, change their mode in response to weather, or provide the fault tolerance needed for rapid reconfiguration. As a result, today’s weather technology, and its use in research, operations and education, are far from optimal when applied to any particular situation [1]. To address these severe limitations, LEAD is
o Developing capabilities to allow models and other atmospheric tools to respond dynamically to their own output, to observations, and to user inputs so as to operate as effectively as possible in any given situation;
o Developing, in collaboration with the NSF Engineering Research Center for Collaborative Adaptive Sensing of the Atmosphere (CASA), capabilities to allow models and other atmospheric tools to dynamically task adaptive observing systems, with an emphasis on Doppler radars, to provide data when and where needed based upon the application, user or situation at hand;
o Developing appropriate adaptive capabilities within supporting IT infrastructure.
2. System Capabilities and the LEAD Environments
LEAD comprises a complex array of services, applications, interfaces, and local and remote computing, networking and storage resources – so-called environments – that can be used in a stand-alone fashion or linkedtogether in workflows to study mesoscale weather; thus the name “Linked Environments for Atmospheric Discovery.” This framework provides users with an almost endless set of capabilities ranging from simply accessing data and perhaps visualizing it to running highly complex and linked data ingest, assimilation and forecast processes in real time and in a manner that adjusts dynamically to inputs as well as outputs. A brief overview of the LEAD service-oriented architecture (SOA) is presented in §3 and additional detail can be found in the year-2 annual report (see also [1]).
Figure 2.1 shows the logical structure of the LEAD environments. At the fundamental level of functionality, as shown by the top horizontal gray box, LEAD enables users to accomplish the following:
• Query for and Acquire a wide variety of information including but not limited to observational data (including real time streams) and gridded model output stored on local and remote servers, definitions of and interrelationships among meteorological Achieving these goals requires more than translating existing capabilities (e.g., batch processing of numerical models configured using name list input files) into new IT frameworks built upon symbolic workflow task graphs, web services and grid computing. Rather, it requires fundamental changes – underpinned by basic research in meteorology and CS/IT – in how experiments are conceived and performed, in the structure of user application tools and middleware, and in methodologies used to observe the atmosphere. The stretch goal for LEAD is to begin ushering in this paradigm change.
quantities, the status of an IT resource or workflow, and education modules at a variety of grade levels that are designed specifically for LEAD.
Figure 2.1. The fundamental capabilities (top gray box) and tangible outcomes (bottom gray box) of LEAD are enabled by a rich fabric of tools, functions and middleware services that
represent the LEAD research domain.
• Simulate and Predict using numerical atmospheric models, particularly the Weather Research and Forecast (WRF) model system now being developed by a number of organizations. The WRF can be run in a variety of modes ranging from basic (e.g., single vertical profiles of temperature, wind and humidity in a horizontally homogeneous domain) to very complex (full physics, terrain, and inhomogeneous initial conditions in single forecast or ensemble mode). Other models (e.g., ocean) can be included but are not fundamentally part of the LEAD system now being created.
• Assimilate data by combining observations, under imposed dynamical constraints, with background information to create a 3D atmospheric gridded analysis. As noted in the tools description below, LEAD supports the ARPS Data Assimilation System (ADAS) and will incorporate the WRF 3D Variational (3DVAR) Data Assimilation System beginning in year-4 (see §24.4.2).
LEAD Lexicon and Hierarchy of Capabilities
LEAD Lexicon and Hierarchy of Capabilities
Fundamental
Capabilities Simulate Assimilate Analyze/Mine Predict Visualize
Query Acquire Enabling Functions Store Execute Publish Authorize Move Edit Configure Monitor Foundational
User Tools ADAS WRF ADaM IDV
Catalog
Outcomes Data Sets Model Output Gridded Analyses Animations Static Images Relationships
Build Compile Allocate Trigger Steer Manage Define
Query & Acquire
Portal Middleware Services • Authorization • Authentication • Notification • Monitoring • Workflow • Security • ESML • Ontology • Host Environment • GPIR • Application Host • Execution Description • Application Description • VO Catalog • THREDDS Catalog • MyLEAD • Control • Query • Stream • Transcoder Middleware Services • Authorization • Authentication • Notification • Monitoring • Workflow • Security • ESML • Ontology • Host Environment • GPIR • Application Host • Execution Description • Application Description • VO Catalog • THREDDS Catalog • MyLEAD • Control • Query • Stream • Transcoder
New Knowledge, Understanding, Ideas
MyLEAD• Analyze and Mine observational data and model output to obtain quantitative information about spatio-temporal relationships among fields, processes, and features. • Visualize and quantitatively evaluate observational data and model output in 1D, 2D
and 3D frameworks using batch and interactive tools.
LEAD comprises a large number of tools ranging from simple services to highly sophisticated meteorological, data mining and visualization packages. Within this array we define a sub-set of foundational application or productivity tools that include:
• LEAD Portal, which serves as the primary though not exclusive user entry point into the LEAD environments;
• ARPS Data Assimilation System (ADAS; [2]), a sophisticated tool for data quality control and assimilation including preparation of model initial conditions;
• myLEAD [3], a flexible personalized data management tool that at its core is a metadata catalog. myLEAD stores metadata associated with data products generated and used in the course of scientific investigations and education activities.
• Weather Research and Forecast model (WRF; [4]), a next-generation limited-area atmospheric prediction and simulation model that runs on single or multiple processors at grid spacings ranging from meters to hundreds of kilometers;
• Algorithm Development and Mining (ADaM; [5]), a powerful suite of tools for mining observational data, assimilated data sets and model output; and
• Integrated Data Viewer (IDV; [6]), a widely used desktop application for visualizing, in an integrated manner, a broad array of multi-dimensional geophysical data.
The power of LEAD lies not only in the capabilities of its various tools but more importantly in the manner in which they can be linked together to solve a broad array of problems, as shown schematically in Figure 2.2. The tangible outcomes (bottom bar in Figure 2.1) include data sets, model output, gridded analyses, animations, static images, and a wide variety of relationships and other information that leads to new knowledge, understanding and ideas. The fabric in Figure 2.1 that links the top set of requirements with the bottom set of outcomes – namely, the extensive middleware, tool and service capabilities – is the research domain of LEAD and is described throughout the remainder of this report.
3. LEAD Architecture
As shown in Figure 3.1, the LEAD SOA is realized as five distinct yet highly interconnected layers (detailed descriptions of each service are provided in [1]). The bottom layer represents
raw resources consisting of computation as well as application and data resources distributed throughout the LEAD Grid (see §4) and elsewhere. At the next level up are web services that
provide access to “raw/basic” capabilities and services for accessing weather data. LEAD is leveraging these resources from other projects and modifying them as appropriate. A wide variety of configuration and execution services compose the next layer and represent services invoked by LEAD workflows. They are divided into four principal groups, the first being the
application and configuration service that manages the deployment and execution of fundamental user applications such as the WRF model, ADAS data assimilation system, and ADaM data mining tools.
Figure 2.2. Conceptual/functional linkages among components of LEAD. Virtually any mesoscale research or educational problem can be mapped onto this figure.
For each of these, additional services are needed to track deployment and execution environment requirements to enable dynamic staging and execution on any of the available host systems. A closely related service is the application resource broker, which is responsible for matching the appropriate host for execution to each application task based upon time constraints of the execution and other factors. Both of these services are invoked by workflow services, which drive experimental workflow instances. Catalog services control the manner in which a user discovers data for use in experiments via a virtual organization (VO) catalog. Finally a host of data services are used to search for and apply transformations to data products. An
ontology service resolves higher-level atmospheric concepts to specific naming schemes used in the various data services, and decoder and interchange services transform data from one form to another. Stream services manage live data streams such as those generated by the NEXRAD Doppler radar network.
Identify Problem Define Region
Of Interest, Data & Tools Requirements Query for Desired Data Configure Tools Allocate Computing and Storage Resources Acquire Data Create/Edit Meta Data IDV Data Visualiza-tion ADaM Data Mining MyLEAD Storage Decode Data Remapping, Conversion, Quality Control ADAS Objective Analysis/WRF 3DVAR Remapping of ADAS to WRF Coordinate Ensemble Initial Condition Generation WRF Forecast Model/Data Assimilation Decision Trigger CASA DCAS RadarsCASA DCAS Radars User
Mon, Alloc
Mon, Alloc Mon, Alloc Mon, Alloc Mon, Alloc Mon, Alloc
Mon, Alloc
Monitor
Mon, Alloc Mon, Alloc
Mon, Alloc
Mon, Alloc Mon, Alloc Mon, Alloc
Mon, Alloc
Mon, Alloc
Mon, Alloc
Figure 3.1. The LEAD service-oriented architecture.
Several services are used within all layers of the SOA and are referred to as crosscutting services, indicated in the left column of Figure 3.1. One such service is the notification service, which lies at the heart of both static and dynamic workflow orchestration. Each service is able to publish notifications and any service or client can subscribe to receive them. Another critical component is the monitoring service, which provides, among other things, mechanisms to ensure that desired tasks are completed by the specified deadline – an especially important issue in weather research and education.
A vital crosscutting service that ties multiple components together is the user metadata catalog
known as myLEAD. As an experiment runs, it generates data that are stored on the LEAD Grid or elsewhere (e.g., TeraGrid) and cataloged to the user’s myLEAD catalog. Notification messages generated during the course of workflow execution also are written to metadata and stored on behalf of a user. A user accesses metadata about the products used during or generated by an investigation through a set of metadata catalog-specific user interfaces built into the LEAD Portal. Note that users can edit metadata, and that LEAD has developed a specific schema based upon existing standards. Through these interfaces the user can browse holdings, search for products based on rich meteorological search criteria, publish products to broader groups or to the public, snapshot an experiment for archiving, or upload text or notes to
augment the experiment holdings. Authentication and authorization are handled by
specialized services based upon grid standards.
Finally, at the top level of the architecture in Figure 3.1 is the user interface, which consists of the LEAD portal and a collection of “service-aware” desktop tools . The portal is a container for user interfaces, called portlets, which provide access to individual services. When a user
Distributed Resources Computation Specialized Applications Steerable Instruments Storage Data Bases Resource Access Services GRAM Grid FTP SSH Scheduler LDM OPenDAP Generic Ingest Service User Interface Desktop Applications • IDV • WRF Configuration GUI
LEAD Portal
PortletsVisualization Workflow Education Monitor Control Ontology Query Browse Control Crosscutting Services Authorization Authentication Monitoring Notification C onf igur ation and Exe cution Ser vic es WorkflowMonitor MyLEAD Workflow Engine/Factories VO Catalog THREDDS Application Resource Broker (Scheduler) Host Environment GPIR Application Host Execution Description WRF, ADaM, IDV, ADAS Application Description
Application & Configuration Services
Client Interface Observations • Streams • Static • Archived D ata S ervice s Work fl ow S ervi ces Ca ta lo g Se rv ices RLS OGSA-DAI Geo-Reference GUI Control Service Query Service Stream Service Ontology Service Decoder/ Resolver Service Transcoder Service/ ESML
logs into the portal, his or her grid authentication and authorization credentials are loaded automatically. Each portlet can use these certificates to access individual services on behalf of the user, thus allowing users to command the portal to serve as his or her proxy for composing and executing workflows on back-end resources.
Alternatively, users may access services by means of desktop tools. For example, the
Integrated Data Viewer (IDV) can access and visualize data, and provide domain sub-setting capability, using a variety of sources including OPeNDAP servers. Similarly, the workflow composer tool can be used to design a workflow on the desktop that can be uploaded to the user’s myLEAD space for later execution, as at the Unidata Users Workshop (see §10).
4. The LEAD Grid
The LEAD Grid is a series of distributed computing systems located at six of the nine participating institutions (Oklahoma, Unidata, Illinois, Indiana, Alabama in Huntsville, and University of North Carolina – Figure 4.1). It represents a “clean room environment” for experimentation in which version control (e.g., of Globus) can be strictly managed. The resources at each site range from single-CPU Linux systems with a few terabytes of local storage to large cluster-based systems and mass storage facilities capable of serving many petabytes of data. The LEAD Grid is built upon a foundation consisting of two systems: the
Globus Grid infrastructure framework and the Unidata Local Data Manager (LDM).
Figure 4.1. The LEAD grid.
In the first two years of the project, most of the development and testing were performed on the LEAD Grid. During the past year we have migrated applications to the TeraGrid and use the LEAD Grid mostly to host data sets of relevance to mesoscale meteorology as well as the LEAD Portal. Each grid node runs the Unidata LDM/IDD software to ingest and make available several months worth of data including those from NEXRAD radars, upper-air
Unidata
OU
UI IU
UAH
UNC
The LEAD Grid
balloons, satellites, surface networks, commercial aircraft, wind profilers, and other systems. Further, the Grid sites host static data sets such as terrain and vegetation information.
5. Research Agenda and Strategic Timeline
5.1. Balancing Research and Deployment
From project inception, the LEAD team has faced the challenge of striking an appropriate balance between software developed to explore fundamental concepts versus the instantiation of this software as stable, persistent capabilities in well engineered systems deployed for use by the broader community. Although resource constraints prevent full attention from being given to both research and deployment, LEAD has created conceptual and practical frameworks for achieving what is believed to be an appropriate balance.
As shown in Figure 5.1.1, LEAD research begins with fundamental ideas (oval in the lower center of the diagram) that lead to basic research and an overarching system architecture. The resulting software components, most of which are developed piecemeal, then are integrated to provide increasingly greater functionality as part of so-called integrative test beds (ITBs, described in §5.2). A vitally important part of LEAD, ITBs are experimental frameworks for instantiating LEAD system components in a coordinated manner to evaluate fundamental concepts in an end-to-end fashion using selected end user testers. The former ensures an integrated systems-orientation to testing while the latter allows LEAD to obtain very specific user feedback on system design, capabilities, performance, etc. The integrative test beds spawn new ideas, from which new capabilities are developed and tested in a cyclic manner (Figure 5.1.1).
Note that most of the end user testers have been identified and educated about LEAD by the Education and Outreach Thrust. They include teachers in grades 6-12, students in grades 6 through graduate school, researchers in meteorology and computer science, non-research faculty, attendees at workshops (e.g., WRF Community Workshop, Unidata Users Workshop), and other special groups (e.g., participants in the NOAA/NCAR Developmental Test Bed Center). Efforts now are being extended to expose LEAD to other communities (e.g., oceanography; see §24.7), though judiciously owing to limited resources.
The conceptual (green) and practical (yellow) activities in Figure 5.1.1 are fundamentally part of LEAD during its 5-year tenure as an NSF ITR grant. The formal deployment of LEAD as an integrated system – something one might term “operational capability” in a 24/7 environment where user expectations are high and reliability is essential – resides in the red part of the diagram and for the most part is beyond the scope of the 5-year research grant. However, recognizing the importance of exposing LEAD to an audience larger and broader than our limited number of end user testers, Unidata will support somewhat scaled-down versions of ITB capabilities in a persistent manner (see §5.2) for its community that involves thousands of users (principally at the undergraduate and graduate level and including numerous faculty). Initial emphasis will be placed on data acquisition which, according to a recent study [7] prepared specifically for LEAD, is the need most frequently articulated by users.
Figure 5.1.1. The LEAD research process and role of integrative test beds as an instantiation of end-to-end systems concepts evaluated by end user testers. Most of the LEAD activity as a
5-year ITR grant resides to the left of the dashed line, though some limited deployment of persistent services to the broader community will be orchestrated by Unidata. Concept from D.
McLaughlin, NSF Engineering Research Center for Collaborative Adaptive Sensing of the Atmosphere (CASA).
5.2. Strategic Timeline and Integrative Test Beds
Figure 5.2.1 shows the LEAD strategic timeline, which is founded upon three ITBs that sequentially build upon one another. Each is designed around a specific set of capabilities and each has a principal goal that addresses key research and education issues. Note that the starting and ending times of the ITBs, apart from the starting time of ITB1 (see below), intentionally are very “soft,” as implied by the gradient shading in the figure. As each ITB proceeds, early capabilities within them will become more stable and will be transitioned to the Unidata deployment, as noted above, while other capabilities are added.
The goal of the first integrative test bed, ITB1, is to expose end user testers to a service-oriented architecture containing the principle elements of LEAD with an eventual view toward democratizing complex experimentation with meteorological tools and creating an architecture suitable for studying adaptive capabilities. The specific features associated with ITB1 are:
• Multiple static workflows that can be selected from a repository, compiled and run
• Access to multiple data sets (e.g., METAR, rawinsonde, NCEP grids, ACARS)
including streams (NEXRAD Level II, CASA)
• The ability to search for and manage data and products (includes ontology & dictionary) • Data sub-setting within the query system and IDV
• Creation of ADAS and/or WRF 3DVAR analyses over user-configured domains at specified grid spacings
• Creation of a single WRF forecast over user-configured domains at user-specified grid spacings based upon ADAS, 3DVAR or NAM analyses
• Visualization of observations and/or grids using IDV • Automatic meta data generation, cataloging and storage
• Pre-scheduling of runs using resource brokering on the TeraGrid with real time monitoring
• Feature extraction of basic parameters from gridded model output and streaming NEXRAD Level II radar data
Figure 5.2.1. The LEAD strategic timeline built around three integrative test beds. See the text for further details.
The meteorological and computer science research and education issues to be addressed by ITB1 include but are not limited to
• Assessing the impact of data types, grid spacing, and other parameters on analyses and forecasts
• Evaluating user reaction and adaptation to the service-oriented architecture
• Evaluating the value of a service-oriented architecture in the conduct of research and education
• Quantifying system response time in preparation for dynamically adaptive systems • Evaluating a desktop client-based system, Siege, compared to BPEL
• Extracting features from observations, forecasts and analyses to prepare for dynamic adaptivity
• Assessing the ability of the TeraGrid to accommodate dozens of LEAD users
simultaneously
• Evaluating the ability to monitor a variety of LEAD system components, including cyberinfrastructure (CI), and make the information known via notification services • Understanding scheduling constraints on the TeraGrid and using them as a means for
developing on-demand capability with some prior notification • Testing LEAD on a wide variety of grid systems
ITB1 formally began during the week of 10-14 July 2006 in Boulder, Colorado with the set of capabilities put before 82 participants at the Unidata Users Workshop titled “Expanding the Use of Models as Educational Tools in the Atmospheric & Related Sciences” (see §10). Capabilties included pre-scheduled resources on the TeraGrid for launching WRF forecasts initialized from NCEP and ADAS analyses based upon user-specified domain location and grid spacing, basic cataloging of input and output, use of workflow task graphs, and real time monitoring and meta data generation. The specific work associated with ITB1 is described throughout the remainder of this document and “tire track” plots are being developed to more clearly portray the temporal sequencing of detailed research efforts and the instantiation of their outcomes into ITB1 and subsequent integrative test beds.
Simultaneous with ITB1 is so-called “look-ahead” research (Figure 5.2.1) that will be implemented in ITB2 (see §5.3). This research emphasizes dynamic adaptation (workflows, models, cyberinfrastructure), streaming static data, and workflows that users can compose from a variety of services (this capability exists now to some extent but has not been made available in general ways within the LEAD Portal). ITB2 will serve to test dynamically adaptive capabilities and provide on-demand scheduling on the TeraGrid, the creation of assimilated data sets using ADAS or WRF 3DVAR, WRF forecasts based upon these assimilated data sets including ensembles, more sophisticated monitoring that includes elements of resource prediction, and more capable meta data generation and automated cataloging. ITB2 also will bring in additional service capabilities focused on analysis and mining, as described below. The look-ahead research parallel with ITB2 (Figure 5.2.1) moves LEAD fully into the dynamically adaptive realm with reconfigurable workflows, dynamic resource allocation on the TeraGrid, and steering of the CASA radars via feature detection in models and observations. The third test bed, ITB3, reflects this research and focuses on dynamic workflows linked to the dynamically adaptive CASA radars.
5.3. Research Agenda for Dynamic Adaptation
As noted in §1, the second goal of LEAD, in addition to democratization, is dynamic adaptivity to weather of meteorological tools, cyberinfrastructure and observing systems, particularly Doppler radars via CASA. This is the transformative research challenge upon which LEAD was founded and to meet it, we have created a research agenda centered around three use case
scenarios, as recommended by the year-2 site visit team and consistent with our earlier use of canonical research problems (see Appendix A of the year-2 annual report).
5.3.1. Working Definitions and Key Questions & Issues
Before describing our research agenda in dynamic adaptation it is important to establish working definitions of key terms as used by LEAD:
• On-Demand – The capability to perform action immediately, with or without prior planning or notification;
• Real-Time – The transmission or receipt of information about an event nearly simultaneously with its occurrence, or the processing of data immediately upon receipt or request;
• Dynamically-Adaptive – The ability of a system as a whole, or any of its components, to respond manually or automatically, in a coordinated fashion, to internal and external influences in a manner that optimizes overall system performance;
• Streaming (from D. Luckham) – A real-time, continuous, ordered (implicitly by arrival time or explicitly by timestamp) sequence of items.
Additionally, it is important to recognize that adaptation can take many forms but in all cases
the objective of adaptive systems is to improve upon their static counterparts in some manner, ideally one that formally optimizes or at least quantitatively improves upon certain aspect(s) of performance. Systems or components may adapt in time, space or modality and the adaptation can be automated, manual, objective, heuristic, etc. Further, adaptation can occur in a variety of locations within the system (i.e., within the LEAD environments), at multiple levels and in highly connected, nonlinear ways.
Given this complexity, it is useful to frame the associated research agenda by key issues and questions in adaptivity that implicitly include concepts of streaming and on-demand functionality. The list below is not intended to be exhaustive but rather representative of the most compelling issues relevant to and being addressed by LEAD:
• When is adaptation useful?
• Can the cost and benefit of adaptation be quantified? • What types of adaptation are possible?
• What types of adaptation are most effective and how can they be chosen and combined? • How is adaptivity triggered/controlled?
• What elements of the system can or should adapt (application, workflow,
cyberinfrastructure, observing systems, combinations of these)?
• How can one deal with loss of resources or less than ideal availability to achieve the required adaptation?
• What metrics can be used to measure the effectiveness of adaptation? • What negative consequences exist to adaptation?
• Can “optimal” adaptivity be defined?
• Do adaptivity and on-demand functionality need to be pre-scheduled to any extent? • What triggers the decision to adapt and how is the decision communicated across the
system?
• How can applications most effectively be maintained in “stand-by” mode, ready to be invoked by an adaptive trigger?
• What does quality of service mean in an adaptive system? 5.3.2. Dynamic Adaptation Use Case #1
With that preface, Dynamic Adaptation Use Case #1, shown in Figure 5.3.2.1, involves operating on streaming NEXRAD Level II observations, and/or ADAS analyses, with a persistent “listener” or agent (e.g., via components of the ADaM data mining engine) that identifies prescribed features within radar reflectivity or radial velocity. Upon locating a region in which the criteria have been met, a single grid WRF forecast is automatically launched in an on-demand fashion on the TeraGrid using capabilities now largely resident within ITB1. In this case the workflow is the system component that responds to the event trigger by launching a WRF job while all other elements of the system remain static.
Figure 5.3.2.1. Dynamic Adaptation Use Case #1: A single-grid WRF forecast is triggered when a persistent data mining agent detects high reflectivity in streaming NEXRAD Level II
data and/or an ADAS analysis.
Although seemingly simple, this use case is rich with complexity and involves, among other things, basic research in trigger mechanisms, the optimal choice of WRF configurable parameters (e.g., domain size, grid spacing, forecast start time, trigger criteria), strategies for dealing with delays in data and cyberinfrastructure acquisition, on-demand resource allocation,
real time monitoring, and the intrinsic value of dynamic weather forecasts in comparison to their static counterparts, e.g., those run on fixed schedules in fixed configurations.
With regard to the last point, one of the potentially negative consequences to adaptation in the context of operational meteorology is that forecasters may not easily be able to learn the behavior of a given model in an adaptive framework because, in principle, every forecast could be run using a different configuration, i.e., tied dynamically to the event of the day or even hour. Such tradeoffs between dynamic and static systems are a fundamental component of LEAD research and these and other issues will be evaluated in a variety of ways, including in an operational forecast context as part of the ongoing NOAA Storm Prediction Center Spring Program (§14.6). During 2007 (§24.7) and subsequently, LEAD is expected to be the major provider of daily experimental WRF forecasts and will evaluate forecaster response to both dynamic and static capabilities.
We envision exploring a limited number of extensions to Dynamic Adaptation Use Case #1 to evaluate scalability and other system behavior, e.g., via mining multiple radars simultaneously (a capability that now exists) or mining the NSSL radar mosaic; streaming CASA or NEXRAD radar data directly into memory rather than using files; dealing with data outages, missing WRF input files or incorrectly specified WRF parameters; modifying the model configuration if only some percentage of the requested cyber resources are available; providing likelihood estimates that cyber resources will be needed as a means for soft on-demand capability (e.g., modifying the probability throughout time based upon the continuous detection of radar features) – a capability that will need to be included in the TeraGrid, perhaps using the NCSA MOAB scheduler (§24.4.6).
5.3.3. Dynamic Adaptation Use Case #2
Dynamic Adaptation Use Case #2, shown in Figure 5.3.3.1, extends Use Case #1 by adding a data mining component for detecting features in WRF forecast output and using this information, or that obtained from persistent mining of NEXRAD radar data or ADAS/3DVAR analyses, to refine an existing WRF grid over specified regions, e.g., intense storms, low-level boundaries. Multiple options exist for accomplishing the refinement, including launching an entirely new WRF forecast (workflow adapts) at finer spacing or creating a nested grid within WRF itself (application adapts). The choice among options, not all of which are listed here, represents a fundamental research issue in dynamically adaptive systems and will be explored with this use case.
As for Use Case #1, this seemingly simple example contains significant research questions
regarding optimality, dealing with potentially competing strategies for assessing the trigger condition and effectuating adaptation (here, grid refinement), and continuing the “parent” WRF run to provide boundary conditions for the next domain versus running a one-way nest. The latter, for example, may depend upon the availability of cyber resources. Extensions to this scenario include the use of multiple nests, mechanisms for choosing multiple small nested domains versus a single large fine-grid nest (again possibly a function of available machine partitions), adding ensembles based upon a specified condition, and using precursors of weather
Figure 5.3.3.1. Dynamic Adaptation Use Case #2: An extension of Use Case #1 in which WRF output is mined to determine when, where and how to emplace one or more finer-spacing grids. (e.g., tornado watches) as a trigger to launch a coarse-grid background forecast in preparation for finer-grid nests (a capability that is planned as an extension to Use Case #2).
5.3.4. Dynamic Adaptation Use Case #3
Finally, Dynamic Adaptation Use Case #3, shown in Figure 5.3.4.1, departs from the previous two cases in that it does not involve WRF or ADAS but rather the detection of features within streaming CASA data produced by the 4-radar network now in place in Oklahoma and use of this information to re-task the radars to meet a specified objective (e.g., the identification of a tornado, heavy rain region, convergence line). This capability is planned to exist within CASA using algorithms and models built into the CASA meteorological command and control (MC&C) software. However, interfacing the external LEAD system to CASA is a principal research challenge. Use Case #3 sets the stage for combining its features with those from the other two scenarios to arrive at a “closed loop” capability, shown in Figure 5.3.4.2, in which streaming data, workflows, tools and cyberinfrastructure mutually interact with mesoscale weather.
5.4. Mapping the Research Agenda
A project as complex as LEAD requires careful coordination and planning. Specific research tasks embedded in the three use cases just described have been mapped into detailed timelines across thrusts and organizations (see year-2 annual report), with the PI of each institution, and the cross-institutional thrust leaders, responsible for ensuring timely completion. To ensure coordination, LEAD employs requirements traceability matrices [7] and uses them as a road map for weekly AccessGrid integration and other meetings (see §6).
Figure 5.3.4.1. Dynamic Adaptation Use Case #3: Streaming data from the CASA test bed in Oklahoma are mined and the results used to re-task the radars.
Figure 5.3.4.2. The closed-loop dynamically adaptive LEAD system, in which streaming data, workflows, tools and cyberinfrastructure mutually interact with mesoscale weather.
6. Project Administration and Oversight
6.1. Project Manager and Technical Integration Coordinator
The governance of LEAD was described in the year-2 annual report and is not further discussed here apart from changes. Starting in year-4, Dr. Daniel Weber, Senior Research Scientist at CAPS, will assume the role of Project Manager and Technical Integration Coordinator (Figure 6.1.1), the former of which was held by Ms. Terri Leyton and the latter by Dr. Robert Wilhelmson. Midway through year-2, Ms. Leyton and her husband relocated to North Carolina and the LEAD team agreed that, for the sake of continuity, she would continue as Project Coordinator at 0.5 FTE until the end of year-3. Dr. Wilhelmson has been consumed by administrative issues at NCSA and will be required to spend considerable time in the next several months on other activities, though he will continue to participate in LEAD. Consequently, combining both his and Ms. Leyton’s previous functions into a single position at a particularly appropriate time will yield important benefits for LEAD.
Dr. Weber is in the process of visiting every LEAD institution to gain deeper insight into thrust and institutional activities and, with his background in meteorology and computing is ideally suited among available personnel to assume this broad, integrative responsibility. The LEAD web site, which Ms. Leyton has maintained, is being combined with the Portal, thereby reducing the overhead associated with the maintenance of two independent resources. Administrative support for scheduling site visits, EAP meetings and other related logistics will be assumed by the University of Oklahoma with assistance from other LEAD institutions.
Figure 6.1.1. LEAD organization chart starting in year-4 (1 October 2006).
External Advisory Panel University of Oklahoma (K. Droegemeier, PI; M. Xue, K. Brewster, D. Weber, Co-PIs) Meteorological Research, Education
PI and Project Director
(K. Droegemeier) Committee (PIs)Executive
University of Alabama in Huntsville
(S. Graves, PI; R. Ramachandran, J. Rushing, Co-PIs) Data
Mining, Interchange Technologies, Semantics
UCAR/Unidata
(M. Ramamurthy, PI; A. Wilson, Co-PIs)
Data Streaming and Distributed Storage
Indiana University
(D. Gannon, PI; Beth Plale, Co-PI) Data, Workflow, Orchestration, Services University of Illinois/NCSA + UNC (R. Wilhelmson, PI + D. Reed, PI)
Monitoring and Data Management
Millersville University
(R. Clark, PI; S. Yalda, Co-PI)
Education and Outreach
Howard University
(E. Joseph, PI)
Meteorological Research Education and Outreach
Colorado State University (Chandra, PI) Instrument Steering, Dynamic Updating National Science Foundation Partner Organizations Technical/Integration and Project Coordinator (D. Weber)
6.2. Internal and External Assessments of the LEAD Enterprise
In preparation for the spring all hands meeting held in St. Louis, Missouri on 4-5 May 2006, LEAD conducted an internal SWOT (strengths, weaknesses, opportunities and threats) analysis where responses were submitted anonymously and compiled by Ms. Leyton. Simultaneously, Dr. Katherine Lawrence and colleagues from the University of Michigan completed a year-long study of the LEAD R&D Team [8]. Both compilations were discussed at the all hands meeting and a number of actions have since been taken to improve collaboration and decision making, clarify strategic direction, delineate roles and responsibilities, and better define the demarcation between research and deployment. The latter outcome was described in §5.1 and, in conjunction with other meetings held at NCSA and during TeraGrid ’06 (Table 6.3.1), LEAD has considerably been strengthened.
6.3. Collaboration Mechanisms
As noted in the first two annual reports, LEAD relies heavily upon electronic tools, especially the Access Grid (AG) and LEAD web site (http://lead.ou.edu), to facilitate remote collaboration as a supplement to planned and ad hoc face-to-face meetings. As shown in Figure 6.3.1, two face-to-face all-hands meetings are scheduled each year, along with an External Advisory Panel meeting. [No external advisory panel meeting was held in 2005 owing to the year-2 site visit.] A summary of the year-2 site visit report, LEAD’s responses, and an update of actions taken since that visit are provided in Appendix B. The fall 2006 all-hands meeting will be held in September or October, probably at the University of Oklahoma and potentially involving participants in CASA.
Figure 6.3.1. LEAD calendar.
During the first two years of LEAD, more than a dozen AG sessions were held monthly among the thrust groups and integration team, along with a bi-weekly PI conference call. Once per month, a 2-hour all-hands AG session also was held. Notes and action items from each of these sessions were prepared by T. Leyton and placed on the private portion of the LEAD web site. Materials from face-to-face meetings, including presentations and notes, likewise were posted. Although effective, the large number of meetings became a detriment to productivity, particularly as software development began in earnest. To compensate, the number of AG sessions was reduced while the weekly (Thursday) session was retained, along with the weekly (Friday) PI conference call. The effectiveness of this reduction continued to be debated and the
Sept Oct Nov Dec
Jan
Feb Mar Apr May Jun
Jul Aug
12 1 2 3 4 5 6 7 8 9 10 11 All-Hands Meeting External Advisory Panel Annual Report Weekly AccessGrid and Ad Hoc Face-to-Face Meetings
Month in Grant Year
All-Hands Meeting
present schedule (Figure 6.3.2), which appears to be optimal, involves two 1-hour AG sessions per week (Monday and Thursday), the first rotating among the research thrusts and the second devoted to integration. Thrust leaders are responsible for creating the agenda and running the Monday meeting while the Thursday meeting now is facilitated by Dr. Weber, Technical Integration Coordinator and Project Manager.
Figure 6.3.2. LEAD weekly meeting schedule..
Owing to the limitations of electronic collaboration, LEAD conducts several ad hoc face-to-face meetings each year, some of which are scheduled during national conferences to reduce travel and facilitate attendance as well as cross-disciplinary scholarship. The face-to-face meetings held during or leading up to year-3 are summarized in Table 6.3.1.
Table 6.3.1. Face-to-Face Planning Meetings and Reviews Held During or Leading Up to Year-3.
Purpose Location Date
LEAD Year-2 Site Visit Urbana-Champaign, IL July 20-22, 2005
Special PI Meeting Bloomington, IN August 22, 2005
Special Data Subsystem Interoperability Meeting Bloomington, IN August 23, 2005
Special Education/Portal Meeting Bloomington, IN September 19-21,2005
National Forum on Geosciences Information Technology Washington, D.C. October 5-6, 2005
Fall All-Hands Meeting Chicago, IL October 20-21, 2005
LEAD/VGrADS Research Discussion Chapel Hill, NC November 4, 2005
Unidata/NCSA Meeting for Initial Design of Siege Client Atlanta, GA Jan. 29 – Feb. 2, 2006
Special Metadata and Vocabulary Meeting Huntsville, AL April 10-11, 2006
LEAD/VGrADS Integration Workshop Chapel Hill, NC April 18-21, 2006
Spring All-Hands Meeting St. Louis, MO May 4-5, 2006
Special Siege Workflow System Meeting Urbana-Champaign, IL June 8-9, 2006
Special Research Discussion Indianapolis, IN June 12, 2006
Special Portal Workflow System Meeting Indianapolis, IN June 13, 2006
Special Strategic Discussion Indianapolis, IN June 13, 2006
Special Monitoring and Workflow Orchestration Bloomington, IN June 15-16, 2006
Unidata Workshop Boulder, CO July 10-14, 2006
LEAD Weekly Meetings
Fri Thu Mon PI Conference Call Integration (AccessGrid) Workflow Research (AccessGrid) Week 4 PI Conference Call Integration (AccessGrid)
Meteorology Research & Education (AccessGrid)
Week 3
PI Conference Call Integration (AccessGrid)
Data & LEAD Grid Research (AccessGrid)
Week 2
PI Conference Call Integration (AccessGrid)
Portal & HCI Research (AccessGrid) Week 1 Fri Thu Mon PI Conference Call Integration (AccessGrid) Workflow Research (AccessGrid) Week 4 PI Conference Call Integration (AccessGrid)
Meteorology Research & Education (AccessGrid)
Week 3
PI Conference Call Integration (AccessGrid)
Data & LEAD Grid Research (AccessGrid)
Week 2
PI Conference Call Integration (AccessGrid)
Portal & HCI Research (AccessGrid)
6.4. External Advisory Panel
The LEAD External Advisory Panel (EAP) met late in year-1 but not in year-2 owing to the year-2 site visit. The panel, as currently constituted with 11 members, is shown in Table 6.4.1. Membership is rather significantly unbalanced with regard to type of institution represented, with two members from academia, eight from Federal laboratories or FFRDCs, and one from industry. Gender balance also is quite skewed, with nine men and two women on the panel.
Table 6.4.1. Current composition of the LEAD External Advisory Panel.
We are in the process of revising the EAP and as of this writing, the new 15-member panel consists of seven persons from academia, six from Federal laboratories/FFRDCs, and two from industry. Among this group we have added a teacher (the Chief e-Learning Office of Chicago Public Schools), a graduate student, a second member from industry, and another university faculty member. The disciplinary balance is roughly the same while diversity is much improved: two of the nine males, and one of the six females, are ethnic minorities. As recommended by the EAP in 2004, we have augmented the membership to include an additional individual in the area of data/data bases. We expect the EAP membership to be finalized by the end of July, 2006, at which time the list will be posted to the LEAD web site.
7. Year-3 Accomplishments Compared with Plans
Shown below, in arbitrary order, are the principal goals for year-3 taken directly from the year-2 annual report. We describe in subsequent sections our progress toward meeting these goals and note that the “stretch goal” was not achieved because the SPC and NSSL suspended their joint spring program in 2006 owing to their move to the National Weather Center building.
• Continue to refine system functional requirements, architecture plans and R&D priorities
• Move to type-specific data ingest on the LEAD Grid (e.g., Oklahoma ingests only NEXRAD radar data) and expand the number and scope of data sets available Complete the conversion of ADAS, WRF, and ADaM to services and expose more functionality • Develop fault tolerance in key services such as ADAS, ADaM and WRF
Member and Affiliation Primary Expertise
Ian Foster (ANL) Computer Science
John Horel (University of Utah) Meteorology
Joe Klemp (NCAR) Meteorology
Mary Marlino (NCAR) Education/Digital Libraries
Mike Wright (NCAR) Education/Digital Libraries
Dave Fulker (UCAR Unidata, retired) Computer Science/Data
Peter Cornillon (University of Rhode Island) Oceanography/Data
Tony Hey (Microsoft) Computer Science
Bill Johnston (LBL) Computer Science/Data
Roberta Johnson (UCAR) Education
• Integrate ontology, catalog, MyLEAD, broker, naming and repository services for interoperability
• Develop tool for writing and reading metadata schema
• Automate the generation of input files for key applications such as WRF and ADAS • Refine workflow system including service component interaction and monitoring • Develop first generation dynamic workflow capability
• Continue to refine all tools and enhance interoperability
• Improve the usability of the portal by implementing terminology that is more intuitive for various classes of users and remove unnecessary jargon
• Improve the human computer interface aspects of the portal by working with all end users, especially educators and students
• Be able to execute all key elements of canonical problems 1-3 across all test beds
• Make available selected capabilities to user-partner sites, especially the DTC and education partners, and conduct evaluations
• Integrate FSL’s WRF configuration application, if available, into LEAD • Execute the University of Michigan’s community engagement plan
• Complete the study of hazardous weather feature detection using radar-based algorithms applied to raw data versus assimilated data sets
• Continue work in ensemble Kalman filtering applied to streaming observations
• Complete the study to assess the value of dynamic adaptation in numerical analysis and prediction (CASA student)
• Ingest streaming data from the 4-radar CASA NetRad test bed in Oklahoma and produce associated meta data
• Conduct initial testing of dynamic instrument steering using the 4-radar CASA NetRad test bed in Oklahoma
• Develop new education proposals in collaboration with DLESE and other groups
• Present results in high quality refereed journals and at major conferences including those involving disciplines outside the core LEAD competencies\
As noted in previous annual reports, research and development within LEAD are organized around five parallel research thrusts (Figure 7.1), along with two cross-cutting components, the latter of which ensure that the former are tightly and continuously integrated. Note that intentional overlap exists among the thrusts and each is led by a PI from two separate institutions (Figure 6.1.1). We present below our results for year-3 and in §24 our plans for year-4.
Stretch Goal: On the Teragrid, use LEAD to generate daily WRF analyses and forecasts as part of the NOAA Storm Prediction Center & National Severe Storms Laboratory 2006 Spring
Figure 7.1. Organization of LEAD research and development. 7.1 LEAD Grid
The LEAD Grid group was established to configure, install and operate the LEAD Grid (§4 and Figure 4.1) as a stable research and development environment. The principal goals in year-3 were:
• Ensure that an optimal testbed environment is available as needed to support all LEAD goals and functionality
• Refine system requirements and specifications as the project evolves • Continue to develop and refine procedures and policies
• Develop a set of lessons learned for the broader community and disseminate them via web and journal publications
• Work with the GRIDS Center and TeraGrid personnel on strategies for deploying selected elements of the LEAD technology on other grids
In year-3, the LEAD Grid group worked to ensure that an optimal testbed environment was available as needed to support all LEAD goals and functionality, as well as to ensure that the Grid environment is moving toward the longer-term goals of providing a useful, easily deployable and maintainable architecture. During year-3, the LEAD Grid was used not only for research and development but also for a number of live workshops and demonstrations including the initiation of ITB1 during the July, 2006 Unidata Users Workshop (§10).
The work of the Grid group has required careful coordination with each of the LEAD thrusts to ensure that hardware, software, and data requirements are understood and the components correctly installed and available in a timely fashion. It also has required substantial coordination with outside partners, such as the GRIDS Center development team, other NMI
Portal
Meteorology(Models, Algorithms, Assimilation)
Tools(Cataloging, Aggregation, Mining, Visualization)
Orchestration(Monitoring, Allocation, Workflow)
Data(Cloud, Semantics, Streaming, Management)
Education and Outreach
(E&O) LE AD Gri d Integration Integration Systems Integration
developers, the TeraGrid and the broader community of users, to support longer-term LEAD goals. The LEAD Grid team continued to refine system requirements and specifications and develop systems administration documentation. Many issues related to security have been addressed, including authentication and authorization and site-specific policies, requirements, and implementations. The team helped develop and then implemented a security architecture in close coordination with the other thrust groups and TeraGrid personnel. A strategy for dealing with firewalls also has been implemented. Installation and configuration documentation and test plans continue to be developed for each system component as they become available and are posted on the LEAD internal web site.
Tiger teams were established as necessary to address various issues including those related to security and code repository policies. The LEAD code repository, based on CVS, now includes installation notes and configuration information in addition to component code and README files. The CVS repository and Bugzilla bug tracking system, both of which were established in year-1, helped facilitate integration and testing in year-3. Throughout year-3, the LEAD Grid supported all activities in the software development lifecycle, dealt with numerous issues concerning system functionality, software, processes, and documentation capabilities, and successfully hosted prototype demonstrations and workshops.
Starting in August 2005, the TeraGrid began funding LEAD scientists M. Christie and S. Marru, both at Indiana University, to work on the NMI portal, from which the LEAD portal has been developed. The Portal is being hosted on the LEAD Grid at Indiana. Because LEAD is developing a complex cyberinfrastructure for research, a great many lessons are being learned. They have been documented by the LEAD Grid team and disseminated via the web and presentations at workshops and conferences. This information has been especially valuable for the GRIDS Center and TeraGrid personnel. Other specific accomplishments in year-3 include:
• Refined LEAD Grid system requirements: Functional, hardware, operating system, middleware, data, tools
• Ensured that each node on the LEAD Grid had the capability to ingest datasets via LDM; installations were verified through the use of decoder output and visualization • Expanded the Unidata testbed to accommodate more than 120 days of data for most
datasets, with plans to provide 6 months of data on rotating disk in year-4
• Continued to develop the LEAD security model with an emphasis on interoperability • Refined and updated software and hardware configuration documentation
• Investigated the use of environment variables to ease integration of system components in the heterogeneous LEAD Grid environment
• Developed installation procedures, configuration guides, and test procedures for software components
• Developed and refined system management policies and procedures
• Participated in planning and supported cross-site software checkout and test for Integrative Testbed activities
7.2 Data Thrust
The data system within LEAD consists of services that provide discovery, access, and search support to data objects during the course of atmospheric sciences research and education. It has adopted a common XML schema for communicating between services regarding data objects and also provides storage services for select data products, most notably those generated during the course of experiments conducted within LEAD. The meteorology community has benefited from a relatively long history of access to a large number of observational and model generated data products, resulting in the early establishment of community-supported data dissemination, access, and visualization tools. These tools, most notably Internet Data Dissemination (IDD), THREDDS, and IDV – all developed by Unidata – serve a broad community of users. The LEAD data system has built upon this strong foundation with research in four areas of extended functionality:
• Information discovery – the Noesis user-interactive exploration tool, Glossary, and IDV visualization tool enable users to explore data, definitions, and terms interactively through the LEAD Science Gateway.
• Automated cataloging of private data – data products generated by a workflow are described automatically through automated metadata generation, at the point of creation, and are stored in a private repository for the user close to where the computations take place. Metadata, which includes domain-science attributes, is stored separately in a database to accommodate complex search criteria. Separation of metadata from data product storage is well suited to large-volume data products that are characteristic of mesoscale meteorology research and education.
• Approximate search over heterogeneous data collections – a common vocabulary, built-in search features in metadata catalogs, and a domain-specific ontology collectively support approximate search – that is, search capable of handling various dimensions of vagueness. Back end support for multiple heterogeneous collections extends search capability beyond a single catalog protocol.
• Smooth integration path for future services and data collections brought into the LEAD data subsystem – built into the design by means of an SOA architecture, a common metadata format, and ‘crosswalks’ as a means to map from other metadata schemas to the LEAD Metadata Schema, the data subsystem easily can be extended to include new services and data collections.
7.2.1 Components
As the end of year-3 approaches, the data system now exists on a solid foundation. All components of the architecture have been developed, although in varying stages of maturity. Specifically, the system (Figure 7.2.1) comprises a set of services that can roughly be categorized into storage repositories along the bottom, metadata catalogs and access services across the middle, and a portal layer across the top. The lower and middle layers, as a general rule, execute as persistent services on one or more machines in the LEAD grid. The portal and
portlets along the top execute part of their functionality at a portal server residing on the LEAD grid and part on a level that is downloaded to the client’s desktop or laptop machine.
Storage services along the bottom of Figure 7.2.1 include the Globus Toolkit GT4 Data Replica Server/Replica Locator Service (DRS/RLS), the Unidata THREDDS Data Service (TDS), and the THREDDS/OPeNDAP catalog suite. The latter hosts select community data products produced by the Unidata Internet Data Distribution (IDD) system. Data products are stored to an OPeNDAP server and cataloged by a THREDDS XML page. In addition to providing data sub-setting and aggregation via OPeNDAP, the TDS is integrated with the IDV, providing thus full data access and visualization capabilities to data stored there.
Figure 7.2.1. The LEAD data system architecture.
The LEAD Data Thrust determined in year-2 that metadata descriptions of data products would be encoded using the same general, extensible XML schema now known as the LEAD Metadata Schema (LMS). In response to finalizing the details of LMS, many initiatives were undertaken to support it. In one, we developed a crosswalk (shown in the lower right