27/01/2014
Workflows for Remote Sensing Data Processing:
Software Development Strategy &
2
SW Development strategy – Life cycle model
Project-driven
evolutionary life-cycle model, i.e. build a larger prototype/operational
system by pooling single projects to serve a common objective
»
Objective 1
(Internal):
Continuously upgrading and consolidating (
= operational
code
) the department-internal technical and scientific capacity of operational
photogrammetric & hyperspectral image/data processing (terrestrial, airborne,
satellite).
»
Objective 2
(External):
Operational platform
• According clear and measurable Service Level Agreements (SLA)
• From Level1 onwards, the Customer1 and Customer2 workflows are 100% the same
Customer2 Operational platform V1
Research, assembly and integration test platform
V2
SLA (Service Level Agreement) 1 SLA 2
Innovation by means of classical “project work” (FP7, BELSPO, IWT, …): AGIV, VITO, universities, companies, …
2011 2012
CVB: Processing Power + Archive (raw data + metadata & data products & database)
Customer1 Operational platform V1
SW Development strategy – Life cycle model
Innovation cycles
Valorization cycles
4
Development strategy – Life cycle model verification
BSCC (ESA Board for Software Standardisation and Control)
ECSS-E-40 is the ECSS standard for software engineering and ECSS-Q-80 is the ECSS standard for software Product Assurance. These standards cover a wide range of applications, some of them may not be applicable for small, low-cost projects. The selection process of the appropriate requirements for a particular project is called a tailoring. To be accurate, a tailoring must follow a certain number of rules and generally has to be done with the help of a standards expert.
Unfortunately, the "Guide to applying the ESA standards to small software projects, BSSC(96)2 Issue 1, May 1996" has no equivalence in ECSS, as the ECSS standards are intended to be tailored to the need of each project, be it "small".
However, for small projects, since (a) the BSSC(96)2 provides direct hands-on guidelines which can be easily adopted by developers lacking the ECSS standardization knowledge and (b) the project resources are way to limited to include a standardization expert, it is advisable to follow the guidelines as defined in BSSC(96)2.
ESA software engineering standards: life cycle verification and validation approach.
1. In ESA terminology, unit testing refers to the process of testing modules against the detailed design. The inputs to unit testing are the successfully compiled modules from the coding process. These are assembled during unit testing to make the largest units, i.e. the components of architectural design. The successfully tested architectural design components are the outputs of unit testing. At VITO a Functional Test Framework (FTF) is continuously maintained (grouping of data, auxiliary data, configuration files, executables, logging and results) to test (a) module behavior according design and (b) to validate the ATBD. An up to date FTF is essential in the Rapid Application Development (RAD) process structure.
2. This FTF is also being used for integration testing. For example, atmospheric correction involves a number of subsequent modules (creating MODTRAN lookup tables, image-based AOD retrieval, image-based water-vapor estimation, BRDF correction, land/water/cloud identification, the actual atmospheric correction). Once each module is unit tested, the series of modules can be integrated in a processing sequence that can be implemented in the FTF by means of a simple batch file. As such, the FTF can be used to validate the behavior and results of this module sequence, which thus functions as an integration test.
3. After passing the unit testing and integration testing and once integrated in the workflow software system, system tests validate the resulting system against the SRD Bèta Hardware System 4. At VITO, it is preferred that the final acceptance tests are executed by the project partner which
represent the user segment. And this is formalized by a project “acceptance review meeting”. When an external partner is not available due to the given project consortium layout, an internal acceptance review meeting is organized.
6
Development strategy – Functional Test Framework
Functional Test Framework and Operational Scenarios
High Performance Computing Cluster for Research, questions to solve:
System dimensioning via testing: 1. CPU needs 2. Storage needs 3. Reliability needs 4. Availability needs 5. Scalability needs 6. Connectivity needs
8
Product workflows (fast temporary processing storage + processing cluster) Coordinated Data Acquisition: • airborne multispectral, • airborne hyperspectral • …
Archive (storage + database + life cycle management) Level 0 Raw image data, raw metadata. Level 1 Geometric, radiometric and spectral calibrated image data. Level 2 Atmospheric corrected orthophoto products. Level 3 Mosaics covering a region of interest. Level 4 Change detection products. Quality Assurance & Control (Operators, SW/HW Developers, Scientists, Support Software)
Network services:
WWW interface archive & processing cluster, FTP, external hard-disks, OGC Web Services
U s e r C om m uni ty : In ter n al / ext er n al o p er at o rs, ext er n al u ser s Archiving Workflow L0 → L1 Workflow L1 → L2 GBG/GBA Workflow L2/3 → L4 Workflow L2 → L3
Research & Development: innovation (FP7, BELSPO, IWT, … projects)
Software System
SW system components: 1. Middleware • Workflow system • Database • Workflow monitoring • Hardware system monitoring • Network interfaces 2. Algoritmes • VITO C++, Java, Python code • Bestaande open source libraries • All based on provenrecords (scientific papers)
10
Subdivided in 3 workflows grouped in one single WWW user interface:
1. Level1 (raw) to Level2 (geometric and atmospheric corrected block of images) 2. Level2 to Level3 (mosaic of Level2)
3. Level2/3 to Level4 (e.g. change detection products, soft classifications, …) This allows for:
1. Different processing “entry points”.
2. The option to forward Level2, 3 or 4 products in the archive system
Middleware is computer software that connects software components or applications. The middleware software consists of a set of enabling services that allow multiple processes running on one or more machines to interact across a network.
• Airborne missions generate thousands of images need for distributed computing need to chose patterns for parallelism:
Master/Worker: Master application constructs a list and maintains the job-dependency. Worker applications ask the master for a job and execute this job.
Task/Data Decomposition: algorithmic module is executed on smaller subsets of data. Master applications implements the task and data decomposition. Task decomposition = functional decomposition (orthorectification, atmospheric correction, building MODTRAN4 lookup tables, …)
• Parallelism is implemented in the middleware, NOT in the algorithmic applications (this keeps the C++/C/Fortran/Java/IDL/Python code of the applications as simple as possible)
12
Middleware for cluster/distributed computing is developed by VITO:
• Message passing (over reliable TCP/IP sockets) and job-pulling Master-Worker pattern developed in Java (VITO, Dept. Remote Sensing)
• Multiple Masters can run next to each other. Multiple masters can run on one machine.
• Masters can be configured to take only processing jobs from specific registered users/operators. • One worker per machine. Workers keep multiple threads alive which invoke the running of
applications. The number of threads can be altered on-the-fly.
• Workflow Monitoring Software: Java GUI application for monitoring and configuring the processing cluster
»
Multiple masters allow for flexibility to
deploy multiple workflows in parallel
next to each other
»
Dynamically adding and removing
workers allows for adding extra
horsepower as the need arises
»
Short Term Storage servers are
written to in a round robin fashion to
load-balance I/O output
The master
decides to which STFS a worker job
will write output to.
»
Further details
see presentation
hardware.
Software System: Middleware vs Hardware Scalability
Image/Data Archive: Enterprise-level storage. Short Term File Servers (STFS) are optimized for I/O speed.
14
Software System Processing Cluster: other “middleware”
• MySQL 5.X or PostgreSQL: product/user/order database
• CACTI (www.cacti.net): monitoring of memory, disk space, system load, network traffic
• NAGIOS (www.nagios.org): service monitoring …
Example Level1 – Level2 workflow (orthorectification and atmospheric
correction)
APEX flight calibration data: 23/06/2009 – level1
16
Automatic seam-line via:
1. cost grid based on combination of gradient smoothness (similarity between neigh-boring pixels) and similarity between the overlapping pixels.
2. Iterative cost grid masking.
3. Ford-Fulkerson Graph-Cut on masked cost grid pixels.
4. and optional multi-resolution spline blending.
Iteratively masked
cost grid: searching for the best possible solution (just before connectivity fails).
Example: GRB mutation and anomaly detection.
GRB (Grootschalig Referentiebestand) change detection: blue polygons are already mapped buildings, green polygons are the result of an automatic building detection process (combined K-means, Quadratic Discriminant Analysis with Mahalanobis distance metrics and post-classification logic)
Example Level2/3 – Level4 workflow (automated change detection in large
scale national/regional vector databases of civil infrastructure)