Version Control Systems & Automated Code Testing
David Love
Software Interest Group University of Arizona
Applied alumnus, Carlos Chiquete, posted this paper on Facebooka
All were great, but I’d never encountered many
aJustifying every second ever wasted on Facebook
Lead author Greg Wilson founded a group called Software Carpentry
They have many videos documenting best practices for scientific computing
Basics Branching
Remote Repositories
2 Unit (and other) Testing
Assertions
Basics Branching
Remote Repositories
What is a Version Control System?
Version Control Systems are pieces of software designed to: Maintain a complete history of the state of a project
Works especially well with program code, LATEX files—anything you
can read in a text editor
Other file types aren’t stored as efficiently
Allow for different versions (branches) to exist concurrently and independently
Provides tools to integrate changes from different branches together
How It Works
Version Control Systems maintain a database of document versions, called a repository
Users “check out” files from the repository, change them, then “commit” those changes to the repository
The VCS checks whether two editors (or branches) have edited the same lines, notes the conflict, and makes you resolve it
Greatly reduces the chance that editors will overwrite each other accidentally
Changes will not get lost
Best Use of Version Control
Best Practices for Scientific Computing
“In practice, everything that has been created manually should be put version control, including programs, original field observations, and the source files for papers. Automated output and intermediate files can be regenerated at need. Binary files (e.g., images and audio clips) may be stored in version control, but it is often more sensible to use an
Git
Types of VCSs
There are two basic types of VCSs:
Centralized Maintains the repository on a centralized server. Clients only check out specific versions of files.
CVS (Concurrent Versions System) SVN (Subversion)
Decentralized Keeps a copy of the entire repository on every system. Any client could (potentially) act as a server.
Git
(I will demonstrate Git)
Git
Types of VCSs
There are two basic types of VCSs:
Centralized Maintains the repository on a centralized server. Clients only check out specific versions of files.
CVS (Concurrent Versions System)
SVN (Subversion) (Software Carpentry teaches SVN)
Decentralized Keeps a copy of the entire repository on every system. Any client could (potentially) act as a server.
Types of VCSs
There are two basic types of VCSs:
Centralized Maintains the repository on a centralized server. Clients only check out specific versions of files.
CVS (Concurrent Versions System)
SVN (Subversion) (Software Carpentry teaches SVN)
Decentralized Keeps a copy of the entire repository on every system. Any client could (potentially) act as a server.
Git (I will demonstrate Git)
Why Git?
A very popular distributed VCS
Does not require setting up a separate location to store the database
This makes being a single user easier
Supported on most popular code hosting services
Google Code,SorceForge github12, Bitbucket
git svn uses Git locally but works with a Subversion server Free & Open Source
Git Resources
1 Pro Git(used for this talk) 2 Version Control By Example 3 Top 10 Git Tutorials for Beginners 4 O’Reilly Webcast: Git in One Hour
5 Git+LATEX Workflow The highest rated answer to this stack
Basic Configuration
Git stores your name and email and attached them to your contributions
1 git config --global user.name "David Love"
2 git config --global user.email [email protected]
Name your favorite editor
3 git config --global core.editor vim
Select a diff & merge tool
4 git config --global merge.tool meld
Merge Tools
Open Source 1 Diffuse 2 Emerge (emacs) 3 gvimdiff (gvim) 4 KDiff3 5 Meld 6 tkdiff 7 TortoiseMerge 8 xxdiff Free Commercial Software 1 opendiff (OS X developer tool) 2 P4Merge Pay Software 1 Araxis Merge 2 Beyond Compare 3 ECMergeCreating a Git Repository
To create a new repository:
1 Move to the directory with your files
2 git init
To clone an existing repository: Use git command clone
Format: git clone <url> [<directory>] Urls can use protocols git, http(s), ssh:
git clone git://github.com/schacon/grit.git git clone http://github.com/schacon/grit.git git clone
The File Status Lifecycle in Git
Pro Git Image 2-1
Command git status lists Untracked files
Modified but unstaged files Staged but uncommitted
Moving within the lifecycle: Stage files with git add <file>
Committing Changes to the Repository
When you commit changes to the repository, Git asks for a commit message
Git opens your favorite editor, and gives a (commented out) default message
Now, type a short message describing what you changed during this commit
Structuring Commits
Commit Information
Once committed, Git gives a message like
[master b05ca11] Commit message
1 file changed, 3 insertions(+), 2 deletions(-)
master Branch name
b05ca11 SHA-1 hash key
(abbrev)
Commit message Your commit
message
1 file changed Number of files changed
3 insertions(+) Number of lines
inserted
2 deletions(-) Number of lines
deleted Git stores commits by a 40 digit SHA-1 hash key
Committing all changes
git commit without add
Viewing Changes
git diff Prints the differences between modified file and the most recent committed version
git difftool Uses the merge tool to highlight the differences
--cached Modifies either command to show differences between staged file and most recent
Viewing the Commit History
Commit Log
git log shows the commit history in reverse chronological order. Default information
Commit hash Author
Date & time committed Commit message
Options:
-<number> Latest <number> entries, e.g., git log -4
--pretty=oneline Abbreviates to one line of output
--since= Look at commits since some time, e.g., yesterday, 1.week, "2 months", 2013/02/01, 02/01/2013
Undoing Changes
Changing Your Last Commit
You can modify your previous commit to a new commit with git commit --amend.
Unstaging a Staged File
A file can be unstaged with git reset HEAD <file>
Unmodifying a Modified File
You can delete modifications to a file with git checkout -- <file>
What is a Branch?
In Git and other VCSs, a branch is an independent copy of the working directory
Basic Branch Commands in Git
The basic branch operations:
List branches git branch
Create branch git branch <branch name>
Check out branch git checkout <branch name>
Merge into current branch git merge <branch name>
Data Storage in Git
Pro Git Image 1-5
Data Storage in Git
Pro Git Image 3-2
Data Git stores about a commit, including the hash, the author, commit message etc. Horizontal arrows are pointers pointing to the
Branching in Git, Conceptually
Pro Git Image 3-3
Branching in Git, Conceptually
Pro Git Image 3-4
Branching in Git, Conceptually
Pro Git Image 3-5
Branching in Git, Conceptually
Pro Git Image 3-6
Branching in Git, Conceptually
Pro Git Image 3-7
Branching in Git, Conceptually
Pro Git Image 3-8
Branching in Git, Conceptually
Pro Git Image 3-9
Branch Merging, Conceptually
Pro Git Image 3-10
You want to fix issue #53. Next: Create a branch for that purpose
Branch Merging, Conceptually
Pro Git Image 3-11
Branch Merging, Conceptually
Pro Git Image 3-12
You stumble upon a bug that needs to be fixed immediately. Go back to master so your partial work on iss53 doesn’t get integrated too early. Commands to execute:
git checkout master git checkout -b hotfix to create and immediately check out branch hotfix Make a commit to fix the bug.
Branch Merging, Conceptually
Pro Git Image 3-13
After testing your work, you want to add the bug fix to master
Branch Merging, Conceptually
Pro Git Image 3-14
To merge hotfix into master:
1 git checkout master
2 git merge hotfix
Git responds with message that includes Fast forward Meaning: Git simply moved the master label up history of commits
Next: delete branch
hotfix—it is no longer needed Next: Go back to working on iss53
Branch Merging, Conceptually
Pro Git Image 3-15
Branch Merging, Conceptually
Pro Git Image 3-15
Want to merge iss53 into master
But master can’t just move up the commit history
Will do a three-way merge
Branch Merging, Conceptually
Pro Git Image 3-16
git merge iss53 Git analyzes the changes applied to the common
ancestor by master and iss53 If master and iss53 made changes to the same lines, Git notes a conflict that must be resolved manually
Branch Merging, Conceptually
Pro Git Image 3-16
Git surrounds conflicts with standard conflict resolution markers:
Code between <<<<<<< and ======= is the code from HEAD (master)
Code between ======= and >>>>>>> is the code from the merging branch (iss53)
Branch Merging, Conceptually
Pro Git Image 3-16
Run git mergetool to use your merge tool to resolve the conflict
Git creates some files to help you merge the conflicts successfully:
file.local from the current branch (master)
file.base from the common
ancestor
file.remote from the
Branch Merging, Conceptually
Pro Git Image 3-17
Git creates a merge commit once the conflicts are resolved (or if no conflicts)
Note: after resolving a conflict, you must then git merge to generate the merge commit.
Git Branching
Comparing Branches
git difftool branch shows differences between the current branch and branch using the merge tool
For branches A and B, A..B selects all commits in the history of B since splitting from A
git log A..B gives all commit messages in B since splitting from A
Triple Dot Notation
A...B selects commits on both branches since splitting
Git Branching
Comparing Branches
git difftool branch shows differences between the current branch and branch using the merge tool
Double Dot Notation
For branches A and B, A..B selects all commits in the history of B since splitting from A
git log A..B gives all commit messages in B since splitting from A
A...B selects commits on both branches since splitting
Comparing Branches
git difftool branch shows differences between the current branch and branch using the merge tool
Double Dot Notation
For branches A and B, A..B selects all commits in the history of B since splitting from A
git log A..B gives all commit messages in B since splitting from A
Triple Dot Notation
A...B selects commits on both branches since splitting
Remote Repositories
Git can connect to remote repositories over networks to collaborate with others
origin repository
When you clone from a remote source, the remote repository is automatically added to your local repository and named origin
git remote List remote repositories
git remote -v List remote repositories with more information
git remote add Add a new remote repository
git remote rename Rename a remote repository
Remote Branches
Remote repositories have their own branches that you can examine and merge with
Remote Branch Names
Remote branches have names <repository>/<branch>, e.g., origin/master
git branch -r Show remote branches
Getting Updates from a Remote
Repository
Two options to get data from a remote repository:
git fetch origin Updates remote branch from origin. Does not
change any local branches.
git pull origin Updates remote branch from origin. Tries to
merge these changes into your local branch. You will have to resolve any conflicts
Adding Updates to a Remote Repository
One command to update a remote branch with your local copy
git push origin master Update master branch on origin with
your local copy of master
If no one has made changes to origin since your last pull, the push will go through.
If someone else has pushed to origin, Git will prevent you from pushing your changes.
You must first merge the changes in the local repository before pushing the new code.
1 Use git pull to merge the changes into your copy
1 get mergetool to resolve any conflicts
2 get commit to generate the commit merge
Workflow with Remote Git
1 Pull changes to start your work time
1 Read the logs of changes made
2 Create local branches to make your changes
3 Once they are correct, merge your local changes back together
4 Push the changes back to the server
1 If rejected, pull to merge changes
Sync in GitHub:Widows and Mac
Github Sync
2 Unit (and other) Testing
Assertions
Assertions
Assertion
An assertion is a statement that something is true at a particular point in a program. If the statement is false, the program will halt
immediately.
Assertions can be used to ensure that:
1 Inputs are valid
2 Program or function outputs are consistent
Example: Assertions in Matlab
My code has a lower bound zLower that should be uniformly nondecreasing as the algorithm progresses
It is updated with zLower = c*x
I use an assertion to ensure the nondecreasing bound updating zLower
Code example:
Runtime Testing
Best Practices in Scientific Computing
“Assertions can make up a sizable fraction of the code in well-written applications, just as tools for calibrating scientific instruments can make up a sizable fraction of the equipment in a lab.”
If something goes wrong, the code halts immediately, greatly simplifying debugging
Best Practices in Scientific Computing
“Assertions are executable documentation, i.e., they explain the
Automated Testing
Best Practices for Scientific Computing
“[R]egression testing is the practice of running pre-existing tests after changes to the code in order to make sure that it hasn’t regressed, i.e., that things which were working haven’t been broken.”
The next line of defense is Automated Testing:
Unit Test Tests a single unit of a program, e.g., a function or method
Kinds of Test Cases
Oracles Anything that tells you how a program should be working
1 Closed form solutions to special cases
2 Simple/small cases of the problem 3 Older versions of the code
1 Slow, simple algorithm to test complicated, fast algorithm
2 High level implementation to test lower level code
(e.g., MATLAB to C++)
MATLAB xUnit Test Framework
xUnit is a framework for writing unit tests
It has been implemented for almost any language you can think of
MATLAB xUnit Test Framework
Building tests with xUnit
xUnit tests have the same basic structure:
input = ...
expectedOutput = ...
realOutput = YourCode( input );
assertEqual( expectedOutput, realOutput );
Define the input and expected output (perhaps for multiple cases) Run your code for each input value
xUnit Assertions
assertEqual(A,B) A and B are equal.
assertElementsAlmostEqual Elements of floating point matrices A and B are within some (absolute or relative) tolerance
assertVectorsAlmostEqual norm(A-B) is within some (absolute or
relative) tolerance of zero
assertTrue,assertFalse Check Boolean values
assertFilesEqual Checks that files are the same
Running tests with xUnit
With MATLAB xUnit Test Framework: Write your tests in their own directory
Write each test case as an M-file function that returns no output arguments
The function should start or end with test or Test
Go to the test directory Run all tests with runtests
Test Driven Development
Test Driven DevelopmentBroadly speaking, TDD is the practice of writing the test cases for new software before the software is written.
Benefits:
Helps to clarify the purpose of the program before coding begins Tends to create more modular and extensible code
Helps ensure tests are actually written! Possible drawbacks:
May include poorly written tests May create false confidence