• No results found

Research Data Management CODING

N/A
N/A
Protected

Academic year: 2021

Share "Research Data Management CODING"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

CODING

(2)

Research Data Management

Coding

When writing software or analytical code it is important that others and your future self can understand what the code is doing.

Wilson et al. (2013) published 10 steps that they regard as the “Best Practices for Scientific Computing” and we agree.

“As scientists are never taught how to build software many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists’ productivity and the reliability of their software.”

(3)

Research Data Management

Best Practice Coding

1. Write programs for people, not computers

• A program should not require its readers to hold more than a handful of facts in memory at once.

• Names should be consistent, distinctive, and meaningful • Code style and formatting should be consistent

• All aspects of software development should be broken down into tasks, roughly an hour long (50-200 lines of code)

(4)

Research Data Management

Best Practice Coding

2. Automate repetitive tasks

• Rely on the computer to repeat tasks

• Save recent commands in a file for reuse – this could be as simple as using MAKE.

• Use a build tool to automate your scientific workflows

3. Use the computer to record history

• Software tools should be used to track computational work automatically It is already possible to record the:

• Unique identifiers and version numbers for raw data records, programs and libraries

(5)

Research Data Management

Best Practice Coding

4. Make incremental changes

• Work in small steps with frequent feedback and course correction At each stage of this incomplete code, check that it is working correctly

5. Use version control

Keeping alterations in successive versions means that data can be reverted and it can collaboratively developed.

• Use a standard version control system (VCS)

• Everything that has been created manually should be put in version control

(6)

Research Data Management

Best Practice Coding

Wilson et al. (2013)

6. Don’t repeat yourself (or others)

Programmers will use the DRY principal to avoid repeating analysing data, and rewriting code;

• Every piece of data must have a single authoritative representation in the system • At small scales, code should be modularized rather than copied and pasted

(7)

Research Data Management

Best Practice Coding

Wilson et al. (2013)

7. Plan for mistakes - they’re inevitable

• Defensive programming - add assertions to programs to check their operation

They ensure that if something goes wrong, the program halts immediately, which aids debugging and they are also executable documentation i.e. the explain the program as well as checking its behaviour

• Automated Testing - check to make sure that a single unit of code is returning correct results, or that the behaviour of a program hasn’t changed

Use an off-the-shelf unit testing library to initialize inputs, run tests, and report their results in a uniform way

(8)

Research Data Management

Best Practice Coding

Wilson et al. (2013)

7. Plan for mistakes (they’re inevitable)

• Use a variety of oracles - tells a developer how a program should behave or what its output should be

In research this includes analytical results, experimental results, and previous results from other tried and tested software.

• Turn bugs into test cases - write tests that trigger the bug and will prevent that bug from reappearing later

• Use a symbolic debugger, which allows you to pause a program, inspect the variable values, and move up and down the code to find the problem

(9)

Research Data Management

Best Practice Coding

Wilson et al. (2013)

8. Optimize software only after it works correctly

In most cases, the most productive way of optimizing code is to get it working correctly, then identify areas that can be sped up.

• Use a profiler to identify bottlenecks in your code

• Write code in the highest-level language possible – you can always shift to a low-level language (like C or Fortran) if the performance boost is needed

9. Document design and purpose, not mechanics

• refactor code instead of explaining how it works, i.e. rather than write a paragraph to explain a complex piece of code, reorganize it so that its self-explanatory

(10)

Research Data Management

Best Practice Coding

Wilson et al. (2013)

10. Collaborate

• code reviews are the most cost-effective way of finding bugs in code

• use pair programming when bringing someone new up to speed and when tackling particularly tricky problems – one developer writes the code which the other

provides real-time feedback

• In larger teams of developers, use an issue tracking toll to maintain a list of tasks to be performed and bugs to be fixed

References

Related documents