• No results found

Two Best Practices for Scientific Computing

N/A
N/A
Protected

Academic year: 2021

Share "Two Best Practices for Scientific Computing"

Copied!
66
0
0

Loading.... (view fulltext now)

Full text

(1)

Version Control Systems & Automated Code Testing

David Love

Software Interest Group University of Arizona

(2)

Applied alumnus, Carlos Chiquete, posted this paper on Facebooka

All were great, but I’d never encountered many

aJustifying every second ever wasted on Facebook

(3)

Lead author Greg Wilson founded a group called Software Carpentry

They have many videos documenting best practices for scientific computing

(4)

Basics Branching

Remote Repositories

2 Unit (and other) Testing

Assertions

(5)

Basics Branching

Remote Repositories

(6)

What is a Version Control System?

Version Control Systems are pieces of software designed to: Maintain a complete history of the state of a project

Works especially well with program code, LATEX files—anything you

can read in a text editor

Other file types aren’t stored as efficiently

Allow for different versions (branches) to exist concurrently and independently

Provides tools to integrate changes from different branches together

(7)

How It Works

Version Control Systems maintain a database of document versions, called a repository

Users “check out” files from the repository, change them, then “commit” those changes to the repository

The VCS checks whether two editors (or branches) have edited the same lines, notes the conflict, and makes you resolve it

Greatly reduces the chance that editors will overwrite each other accidentally

Changes will not get lost

(8)

Best Use of Version Control

Best Practices for Scientific Computing

“In practice, everything that has been created manually should be put version control, including programs, original field observations, and the source files for papers. Automated output and intermediate files can be regenerated at need. Binary files (e.g., images and audio clips) may be stored in version control, but it is often more sensible to use an

(9)

Git

Types of VCSs

There are two basic types of VCSs:

Centralized Maintains the repository on a centralized server. Clients only check out specific versions of files.

CVS (Concurrent Versions System) SVN (Subversion)

Decentralized Keeps a copy of the entire repository on every system. Any client could (potentially) act as a server.

Git

(I will demonstrate Git)

(10)

Git

Types of VCSs

There are two basic types of VCSs:

Centralized Maintains the repository on a centralized server. Clients only check out specific versions of files.

CVS (Concurrent Versions System)

SVN (Subversion) (Software Carpentry teaches SVN)

Decentralized Keeps a copy of the entire repository on every system. Any client could (potentially) act as a server.

(11)

Types of VCSs

There are two basic types of VCSs:

Centralized Maintains the repository on a centralized server. Clients only check out specific versions of files.

CVS (Concurrent Versions System)

SVN (Subversion) (Software Carpentry teaches SVN)

Decentralized Keeps a copy of the entire repository on every system. Any client could (potentially) act as a server.

Git (I will demonstrate Git)

(12)

Why Git?

A very popular distributed VCS

Does not require setting up a separate location to store the database

This makes being a single user easier

Supported on most popular code hosting services

Google Code,SorceForge github12, Bitbucket

git svn uses Git locally but works with a Subversion server Free & Open Source

(13)

Git Resources

1 Pro Git(used for this talk) 2 Version Control By Example 3 Top 10 Git Tutorials for Beginners 4 O’Reilly Webcast: Git in One Hour

5 Git+LATEX Workflow The highest rated answer to this stack

(14)

Basic Configuration

Git stores your name and email and attached them to your contributions

1 git config --global user.name "David Love"

2 git config --global user.email [email protected]

Name your favorite editor

3 git config --global core.editor vim

Select a diff & merge tool

4 git config --global merge.tool meld

(15)

Merge Tools

Open Source 1 Diffuse 2 Emerge (emacs) 3 gvimdiff (gvim) 4 KDiff3 5 Meld 6 tkdiff 7 TortoiseMerge 8 xxdiff Free Commercial Software 1 opendiff (OS X developer tool) 2 P4Merge Pay Software 1 Araxis Merge 2 Beyond Compare 3 ECMerge

(16)

Creating a Git Repository

To create a new repository:

1 Move to the directory with your files

2 git init

To clone an existing repository: Use git command clone

Format: git clone <url> [<directory>] Urls can use protocols git, http(s), ssh:

git clone git://github.com/schacon/grit.git git clone http://github.com/schacon/grit.git git clone

(17)

The File Status Lifecycle in Git

Pro Git Image 2-1

Command git status lists Untracked files

Modified but unstaged files Staged but uncommitted

Moving within the lifecycle: Stage files with git add <file>

(18)

Committing Changes to the Repository

When you commit changes to the repository, Git asks for a commit message

Git opens your favorite editor, and gives a (commented out) default message

Now, type a short message describing what you changed during this commit

Structuring Commits

(19)

Commit Information

Once committed, Git gives a message like

[master b05ca11] Commit message

1 file changed, 3 insertions(+), 2 deletions(-)

master Branch name

b05ca11 SHA-1 hash key

(abbrev)

Commit message Your commit

message

1 file changed Number of files changed

3 insertions(+) Number of lines

inserted

2 deletions(-) Number of lines

deleted Git stores commits by a 40 digit SHA-1 hash key

(20)

Committing all changes

git commit without add

(21)

Viewing Changes

git diff Prints the differences between modified file and the most recent committed version

git difftool Uses the merge tool to highlight the differences

--cached Modifies either command to show differences between staged file and most recent

(22)

Viewing the Commit History

Commit Log

git log shows the commit history in reverse chronological order. Default information

Commit hash Author

Date & time committed Commit message

Options:

-<number> Latest <number> entries, e.g., git log -4

--pretty=oneline Abbreviates to one line of output

--since= Look at commits since some time, e.g., yesterday, 1.week, "2 months", 2013/02/01, 02/01/2013

(23)

Undoing Changes

Changing Your Last Commit

You can modify your previous commit to a new commit with git commit --amend.

Unstaging a Staged File

A file can be unstaged with git reset HEAD <file>

Unmodifying a Modified File

You can delete modifications to a file with git checkout -- <file>

(24)

What is a Branch?

In Git and other VCSs, a branch is an independent copy of the working directory

(25)

Basic Branch Commands in Git

The basic branch operations:

List branches git branch

Create branch git branch <branch name>

Check out branch git checkout <branch name>

Merge into current branch git merge <branch name>

(26)

Data Storage in Git

Pro Git Image 1-5

(27)

Data Storage in Git

Pro Git Image 3-2

Data Git stores about a commit, including the hash, the author, commit message etc. Horizontal arrows are pointers pointing to the

(28)

Branching in Git, Conceptually

Pro Git Image 3-3

(29)

Branching in Git, Conceptually

Pro Git Image 3-4

(30)

Branching in Git, Conceptually

Pro Git Image 3-5

(31)

Branching in Git, Conceptually

Pro Git Image 3-6

(32)

Branching in Git, Conceptually

Pro Git Image 3-7

(33)

Branching in Git, Conceptually

Pro Git Image 3-8

(34)

Branching in Git, Conceptually

Pro Git Image 3-9

(35)

Branch Merging, Conceptually

Pro Git Image 3-10

You want to fix issue #53. Next: Create a branch for that purpose

(36)

Branch Merging, Conceptually

Pro Git Image 3-11

(37)

Branch Merging, Conceptually

Pro Git Image 3-12

You stumble upon a bug that needs to be fixed immediately. Go back to master so your partial work on iss53 doesn’t get integrated too early. Commands to execute:

git checkout master git checkout -b hotfix to create and immediately check out branch hotfix Make a commit to fix the bug.

(38)

Branch Merging, Conceptually

Pro Git Image 3-13

After testing your work, you want to add the bug fix to master

(39)

Branch Merging, Conceptually

Pro Git Image 3-14

To merge hotfix into master:

1 git checkout master

2 git merge hotfix

Git responds with message that includes Fast forward Meaning: Git simply moved the master label up history of commits

Next: delete branch

hotfix—it is no longer needed Next: Go back to working on iss53

(40)

Branch Merging, Conceptually

Pro Git Image 3-15

(41)

Branch Merging, Conceptually

Pro Git Image 3-15

Want to merge iss53 into master

But master can’t just move up the commit history

Will do a three-way merge

(42)

Branch Merging, Conceptually

Pro Git Image 3-16

git merge iss53 Git analyzes the changes applied to the common

ancestor by master and iss53 If master and iss53 made changes to the same lines, Git notes a conflict that must be resolved manually

(43)

Branch Merging, Conceptually

Pro Git Image 3-16

Git surrounds conflicts with standard conflict resolution markers:

Code between <<<<<<< and ======= is the code from HEAD (master)

Code between ======= and >>>>>>> is the code from the merging branch (iss53)

(44)

Branch Merging, Conceptually

Pro Git Image 3-16

Run git mergetool to use your merge tool to resolve the conflict

Git creates some files to help you merge the conflicts successfully:

file.local from the current branch (master)

file.base from the common

ancestor

file.remote from the

(45)

Branch Merging, Conceptually

Pro Git Image 3-17

Git creates a merge commit once the conflicts are resolved (or if no conflicts)

Note: after resolving a conflict, you must then git merge to generate the merge commit.

(46)

Git Branching

Comparing Branches

git difftool branch shows differences between the current branch and branch using the merge tool

For branches A and B, A..B selects all commits in the history of B since splitting from A

git log A..B gives all commit messages in B since splitting from A

Triple Dot Notation

A...B selects commits on both branches since splitting

(47)

Git Branching

Comparing Branches

git difftool branch shows differences between the current branch and branch using the merge tool

Double Dot Notation

For branches A and B, A..B selects all commits in the history of B since splitting from A

git log A..B gives all commit messages in B since splitting from A

A...B selects commits on both branches since splitting

(48)

Comparing Branches

git difftool branch shows differences between the current branch and branch using the merge tool

Double Dot Notation

For branches A and B, A..B selects all commits in the history of B since splitting from A

git log A..B gives all commit messages in B since splitting from A

Triple Dot Notation

A...B selects commits on both branches since splitting

(49)

Remote Repositories

Git can connect to remote repositories over networks to collaborate with others

origin repository

When you clone from a remote source, the remote repository is automatically added to your local repository and named origin

git remote List remote repositories

git remote -v List remote repositories with more information

git remote add Add a new remote repository

git remote rename Rename a remote repository

(50)

Remote Branches

Remote repositories have their own branches that you can examine and merge with

Remote Branch Names

Remote branches have names <repository>/<branch>, e.g., origin/master

git branch -r Show remote branches

(51)

Getting Updates from a Remote

Repository

Two options to get data from a remote repository:

git fetch origin Updates remote branch from origin. Does not

change any local branches.

git pull origin Updates remote branch from origin. Tries to

merge these changes into your local branch. You will have to resolve any conflicts

(52)

Adding Updates to a Remote Repository

One command to update a remote branch with your local copy

git push origin master Update master branch on origin with

your local copy of master

If no one has made changes to origin since your last pull, the push will go through.

If someone else has pushed to origin, Git will prevent you from pushing your changes.

You must first merge the changes in the local repository before pushing the new code.

1 Use git pull to merge the changes into your copy

1 get mergetool to resolve any conflicts

2 get commit to generate the commit merge

(53)

Workflow with Remote Git

1 Pull changes to start your work time

1 Read the logs of changes made

2 Create local branches to make your changes

3 Once they are correct, merge your local changes back together

4 Push the changes back to the server

1 If rejected, pull to merge changes

(54)

Sync in GitHub:Widows and Mac

Github Sync

(55)

2 Unit (and other) Testing

Assertions

(56)

Assertions

Assertion

An assertion is a statement that something is true at a particular point in a program. If the statement is false, the program will halt

immediately.

Assertions can be used to ensure that:

1 Inputs are valid

2 Program or function outputs are consistent

(57)

Example: Assertions in Matlab

My code has a lower bound zLower that should be uniformly nondecreasing as the algorithm progresses

It is updated with zLower = c*x

I use an assertion to ensure the nondecreasing bound updating zLower

Code example:

(58)

Runtime Testing

Best Practices in Scientific Computing

“Assertions can make up a sizable fraction of the code in well-written applications, just as tools for calibrating scientific instruments can make up a sizable fraction of the equipment in a lab.”

If something goes wrong, the code halts immediately, greatly simplifying debugging

Best Practices in Scientific Computing

“Assertions are executable documentation, i.e., they explain the

(59)

Automated Testing

Best Practices for Scientific Computing

“[R]egression testing is the practice of running pre-existing tests after changes to the code in order to make sure that it hasn’t regressed, i.e., that things which were working haven’t been broken.”

The next line of defense is Automated Testing:

Unit Test Tests a single unit of a program, e.g., a function or method

(60)

Kinds of Test Cases

Oracles Anything that tells you how a program should be working

1 Closed form solutions to special cases

2 Simple/small cases of the problem 3 Older versions of the code

1 Slow, simple algorithm to test complicated, fast algorithm

2 High level implementation to test lower level code

(e.g., MATLAB to C++)

(61)

MATLAB xUnit Test Framework

xUnit is a framework for writing unit tests

It has been implemented for almost any language you can think of

MATLAB xUnit Test Framework

(62)

Building tests with xUnit

xUnit tests have the same basic structure:

input = ...

expectedOutput = ...

realOutput = YourCode( input );

assertEqual( expectedOutput, realOutput );

Define the input and expected output (perhaps for multiple cases) Run your code for each input value

(63)

xUnit Assertions

assertEqual(A,B) A and B are equal.

assertElementsAlmostEqual Elements of floating point matrices A and B are within some (absolute or relative) tolerance

assertVectorsAlmostEqual norm(A-B) is within some (absolute or

relative) tolerance of zero

assertTrue,assertFalse Check Boolean values

assertFilesEqual Checks that files are the same

(64)

Running tests with xUnit

With MATLAB xUnit Test Framework: Write your tests in their own directory

Write each test case as an M-file function that returns no output arguments

The function should start or end with test or Test

Go to the test directory Run all tests with runtests

(65)

Test Driven Development

Test Driven Development

Broadly speaking, TDD is the practice of writing the test cases for new software before the software is written.

Benefits:

Helps to clarify the purpose of the program before coding begins Tends to create more modular and extensible code

Helps ensure tests are actually written! Possible drawbacks:

May include poorly written tests May create false confidence

(66)

References

Related documents

▸ Spruce up these commit messages ▸ Force pushing is a time bomb.. ▸ 2 everyday kinds of rebase ▸ Pull

Using GitHub edit files staging area git add git commit GitHub repo your copy git clone your machine accept assignment git push... Any other questions about the command line,

pushes to remote URL origin all commits up to commit tagged by “tag” (including). git pull --tags pulls commits up to

mAniAc.mAyhEm.XSTRETCHY/Watch Demon Slayer Mugen Train (2020) Full Movie Online Free HD,Demon Slayer Mugen Train Full Free, [#DemonSlayerMugenTrain2020] Full

Consumer Advocate, Steve Burgess, following recent rate filings by leading medical malpractice insurers writing in Florida. The Office conducted the hearing in Tallahassee, Florida

Managing agents shall ensure that there is a demonstrable and transparent written Pricing Policy which provides a clear expectation of pricing levels and

Among the indicators supporting a positive forward-looking view are strengthening macroeconomic conditions (including the possibility of a rising interest rate); continued organic

In other words, if we move into a soft market, it is likely that those insurers taking a hard line on coverage for mold claims will agree to remove the exclusion on a case by