What Software Engineering has to Offer to Computational Science

Scientific Challenges

6.3 What Software Engineering has to Offer to Computational Science

Our detailed analysis of the specific characteristics of scientific software development enables us to identify some shortcomings of existing proposals that are concerned with bridging the “chasm” between software engineering and computational science (Section 6.3.1). Based on this, we argue that there is a need of adapting existing software engineering methods for computational science and we give reasons why focusing on Model-Driven Software Engineering (MDSE) techniques is a promising starting point for this endeavor (Section 6.3.2).

6.3.1 Existing Attempts at Bridging the Software Chasm

Previous attempts at addressing the credibility and productivity crisis of computational science can be categorized into three groups:

1. Publish and review source code along with scientific articles to ensure reproducibility or at least repeatability of in silico experiments.

6.3. What Software Engineering has to Offer to Computational Science

2. Let software engineers build or re-engineer (parts of) the scientific software.

3. Train scientists to enable them to use state-of-the-art software engineering methods.

In the context of the credibility crisis of computational science, a dis- cussion about the reproducibility of scientific results that rely on computa- tion emerged both within science itself (Peng 2011) and in the software engineering community (LeVeque et al. 2012). Being able to—at least in principle—validate the findings of other scientists by reproducing their experiments is at the heart of the scientific method. However, it is common practice in many areas of computational science not to release the source code on which the findings of a publication are based. This practice impedes the reproduction of published results or even renders it outright impossible. Therefore, several authors suggest to make public disclosure of the source code mandatory for peer-reviewed publications and some even propose to include the code itself in the peer review process (Barnes 2010; Ince et al. 2012; Morin et al. 2012). In the software engineering community, for exam- ple, several large conferences recently started employing a peer-reviewed implementation artifact evaluation process (Krishnamurthi and Vitek 2015). These suggestions and efforts are certainly important steps in the right direction and could help to increase the appreciation of software and its quality in the computational science community. However, publishing source code alone does not adequately address the fundamental problem that the scientists lack the software engineering skills to tackle the underlying problems of both the credibility and the productivity crisis.

A second attempt to a solution is to try to have software engineers implement the software for the scientists. The experiences of Segal (2005), who put this approach to test, and the considerations given in Section 6.2.3 a) suggest that this is not a practicable way.

So far, the most promising attempt to solve the dual scientific software crisis seems to be education via workshop-based training programs focusing on Ph.D. students, such as the ones organized by Wilson (2014) and Messina (2015). While the education approach does address the skill gap that is central to the “software chasm,” it does so with inadequate means. Our analysis in Section 6.2 clearly indicates that just exposing scientists to software engineering methods will not be enough because these methods

often fail to consider the specific characteristics and constraints of scientific software development. We therefore conclude that we have to select suitable software engineering techniques and adapt them specifically to the needs of computational scientists.

6.3.2 Embracing the Characteristics of Scientific Software

Development

The results of our literature study clearly show that computational scientists are only “accidentally” involved in software development: ultimately, their goal is not to create software but to obtain novel scientific results (Section 6.2.3 c)). At the same time, however, they are very concerned about having full control over their applications and how these actually compute their results, which is why many prefer “older” programming languages with a relatively low level of abstraction from the underlying hardware (Section 6.2.2 b)).

Among the techniques and tools that software engineering has to offer, Model-Driven Software Engineering (MDSE) and specifically Domain-Spe- cific Languages (DSLs) are promising starting points for addressing the needs of computational scientists. SinceDSLsare designed to express solutions at the abstraction level of the domain, they allow the scientists to care about what matters most to them: doing science without having to deal with technical, implementation-specific details. While they use high-level domain abstractions, they still stay in full control over their development process as it is them who directly implement their solutions in formal and executable (e. g., through generation) programming languages. Additionally, generation from a formal language into a low-levelGPLpermits to examine the generated code to trace what is actually computed.

DSLscan also help to overcome the conflict between the quality require- ments of performance on the one hand and portability and maintainability on the other hand, which is responsible for many of the difficulties expe- rienced in scientific software development (Section 6.2.2 d)).DSL source code is maintainable because it is often pre-structured and much easier to read thanGPLcode, which makes it almost self-documenting. This almost self-documenting nature ofDSLsource code and the fact that it can rely on an—ideally—well-tested generator for program translation ensure the

6.3. What Software Engineering has to Offer to Computational Science

reliability of scientific results based on the output of the software. Portability of DSLcode is achieved by just replacing the generator for the language with one that targets another hardware platform. WithDSLs, the high abstraction level does not have to result in performance losses because the domain-specificity first of all enables to apply—at compile time—domain- specific optimizations and greatly simplifies automatic parallelization (see Chapter 2).

In the way described above, DSLs integrated into a custom MDSE approach could help to overcome both the productivity and the credibility crisis of computational science. A first indicator that supports this hypoth- esis can be found in the survey report of Prabhu et al. (2011), who find that those scientists who program withDSLs“report higher productivity and satisfaction compared to scientists who primarily use general purpose, numerical, or scripting languages.” In Chapter 7, we introduce anMDSEap- proach that takes into consideration the characteristics of scientific software development presented in this chapter and aims to take full advantage of the potential benefits ofDSLsfor computational science.

Chapter 7 The Sprat Approach: Hierarchies

In document Model-Driven Software Engineering for Computational Science Applied to a Marine Ecosystem Model (Page 94-99)