Effective programming practices for economists
2. Version Control
Hans-Martin von GaudeckerDepartment of Economics, Universit¨at Mannheim
Problem 1: Collaboration
I Collaboration is becoming increasingly important:
I Science: [. . . ] half of EU research articles had international co-authors in 2007, more than twice the level of two decades ago (Hand, 2010)
I Economics: In the 1970s only 30 percent of the articles in the top five journals were coauthored. In the 1990s about 60 percent were coauthored. In the longer run the trend is even more striking: in 1959 only 3 percent of the articles in the JPE were coauthored. (Ellison, 2002)
I Need to have effective means for working with others. I You are somebody else in the future as well . . .
Problem 2: Going back in time
I How many times in the process does one believe a paper is finished?
I And when you think it is done, enter . . . I co-authors,
I supervisor,
I conference participants, I referees.
I Often long lags in between changes in beliefs. Project changes in the meantime.
I Suddenly you’re not able to reproduce the results you had at the time you submitted {paper, thesis}.
I Need to keep track of older versions.
Ad-hoc solutions
I Sending emails?
I Might work for a little document – but entire directory structures?
I Even then very difficult to search for specific versions. I Back up stuff regularly:
my_project my_project_v0 my_project_v1 my_project_v2
I You’re certain to miss the important version.
I Better things to waste your energy on research than on such mindless stuff.
Ad-hoc solutions
You Your coauthor
(Your future self)
Ad-hoc solutions
The right solution
You Your coauthor
(Your future self)
The right solution
You Your coauthor
Repository with all versions of files.
The right solution
You Your coauthor
(Your future self)
Repository with all versions of files.
Working copies, each with one version of files
The right solution
You Your coauthor
Repository with all versions of files.
Working copies, each with one version of files check in /
The right solution
You Your coauthor
(Your future self)
Repository with all versions of files.
Working copies, each with one version of files check in /
commit check out (initialise)
The right solution
You Your coauthor
Repository with all versions of files.
Working copies, each with one version of files check in /
commit check out (initialise) /
Checking out a populated repository
Right-click in the area of the PyDev Package Explorer, select New. . . → Project . . .Checking out a populated repository
Expand the SVN category, select “Project from SVN” and click “Next”.
Checking out a populated repository
Checking out a populated repository
I Fill in the information as needed, here:
https://coll.gess.uni-mannheim.de/svn/prog-econ-comm-2010
I Don’t forget the ‘s’ of ‘https://’.
I Add svn/project-identifier to the address of the collaboration server
I You can find out the project identifier in Settings/Information on the project’s web page. I For your group X, it should be prog-econ-2010-grp-X I Click “Next”.
Checking out a populated repository
I Select “Check out as a project . . . ” and click on “Finish”.
I You may change the name of the project if you do not like the default. Eclipse will create a directory with that name in your workspace.
How version numbering works
I Every time you commit a changed file to the repository, this file and all directories higher up in the hierarchy get a new version number.
I Means that a revision number always uniquely identifies the entire repository.
I But you can quickly see when something (or something further down the hierarchy) changed the last time.
I On the collaboration server (the “repository”), all versions are kept.
I In your “working copy”, you only see one.
I Generally the latest version, but easy to check out previous ones as well.
The basic cycle: Check what has changed
Right-click on the directory you want to check . . .The basic cycle: Check what has changed
One file changed, see entire history at the bottom of the page.The basic cycle: Check what has changed
After double-click on changed file.The basic cycle: Check what has changed
Double-clicking a tab maximises it (and back).The basic cycle: Update your working copy
Aside: Turn the SVN console on to see what’s happening.The basic cycle: Update your working copy
Right-click on the folder to update, and . . .The basic cycle: Update your working copy
exercise solution template.tex was modified.The basic cycle: Changing your working copy
I You still need to do the real work . . . I Modify files.
I Add files and directories.
I Here: Put a file named paper template.tex in the templates directory.
I Do that outside Eclipse.
I Need to hit F5 / right-click and “Refresh” on a directory somewhere up the hierarchy from the added file.
The basic cycle: Changing your working copy
I Meaning of the symbols:
? The file has not been added to SVN yet – it is “unversioned”.
> The directory has changed.
The basic cycle: Committing your changes
Add the new file to version control.The basic cycle: Committing your changes
Note how symbol has changed, but there is no version number.The basic cycle: Committing your changes
Pick a meaningful comment.The basic cycle: Committing your changes
I Done!
I The two directories up the hierarchy will get new version numbers in the next update they are included in.
I Eclipse/Subversive will automatically include files we put in the directories, so manually adding them was not necessary (but it is required in other clients!).
I Why does the root directory appear to have changed?
Eclipse does not show you everything . . .
Check by attempting a commit from the root directory:Keep only source code under version control
I .project contains internal Eclipse information. I Only relevant for your local machine.
I Changes far too often, distracts from “real” changes. I In general - what to put under VC?
I An early version of these slides’ pdf file contained:
/Author()/Title()/Subject()/Creator(LaTeX with beamer [...] /CreationDate (D:20100823183017+02’00’)
/ModDate (D:20100823183017+02’00’)
I Will change every time you run pdflatex, even if there are no substantive changes whatsoever . . .
I . . . and thus lead to many fake changes of the repository. I So keep only sources under VC.
I Original data and source code from statistics programs, LATEX, etc..
I Exception: Use it to keep project in sync across computers and there is a step you can only run on one machine (e.g. on a cluster).
Setting properties on directories – svn:ignore
Right-click on the directory in question . . .Setting properties on directories – svn:ignore
I Each property has a name . . .
I Usually pre-defined, but could be user-defined. I We will meet:
svn:ignore svn:externals svn:keywords
I . . . and a value
I Depending on the property, the value requires a certain format.
I Right-click in the area to add a property.
Setting properties on directories – svn:ignore
Setting properties on directories – svn:ignore
I Every line specifies a pattern to ignore.
I Could specify entire file/directory names, or patterns
∗ means “match any number of characters of any type” (this includes no character at all).
I The .svn directories are automatically ignored (of course ).
I Can always override the svn:ignore property by manually adding a file as we did before.
I E.g. for pdf’s that are sources (data documentation, references, . . . ).
I Here we want to apply the properties to all directories. I (Note for reference that you won’t understand now)
Setting properties on directories – svn:ignore
I The content has not changed, but the properties have.
I Now the .project file is ignored indeed.
Trying to commit from an outdated working copy
I Ooops.
I You always have to commit from the latest version.
I The root directory and templates directory are still at revision 3.
I Run an svn update from the root and everything works as expected.
Why is that so?
I While you are working on the project, someone else might be working on it as well.
I Unless you lock a resource (file, directory) – nobody else except you will be able to commit changes to this resource. I Almost never necessary, far too restrictive.
I What if changes get into the way of each other?
I Merge your version and the latest one from the repository. I For safety, Subversion requires you to do that in the working
copy, i.e. during an update and not during a commit. I Two types of changes:
I Non-conflicting: Subversion will automatically merge them (but you better check).
I Conflicting: You have to edit conflicts yourself (with some help from specialised editors).
. . . and get the ‘commit failed’ message again . . .
A couple of lines were added, no problem . . .
Handling non-conflicting changes
I The file with my local modifications and the one from the repository with Ina’s changes were merGed by Subversion.
I Nothing left to do, just commit version 7, see below.
I But always good to check things the way we did. I That changes do not mean conflicts to Subversion’s
algorithm does not mean they are no conflicts conceptually. I E.g. insert two functions with the same name at different
locations in a file.
How we end up at the next screen . . .
I Both me and Ina realise the typo in *.pdfsyc.
I Ina corrects it back to *.pdfsync, commits her change. I The repository is now at revision 9.
I I change it to *.pdfs* and try to commit. I Pretty stupid change, actually.
I Unless there is a clear reason for a wildcard, better be explicit.
I The commit attempt leads to the well-known error. I Run an update without further checks.
How to resolve conflicts
How to resolve conflicts
I Only the left pane is editable, of course.
I You could:
I Accept the change from the repository, I leave your local change as is,
I change the left pane to something entirely new. I We will opt for taking the change from the repository.
Sometimes a three-way merge is useful
See common base (here: revision 8 with typo) in top pane.The final state of affairs
Everything is at revision 9 and Subversion sees that there are no local changes anymore.
Better alternative: Synchronise with repository
When you get the previous ‘commit failed’ message, go to . . .Better alternative: Synchronise with repository
If you find yourself resolving conflicts all the time
I . . . it is a sign that something is wrong with the workflow in your project.
I Not talking to co-authors as often as you should? I Responsibilities not clearly assigned?
I Subversion helps you detect this.
I The conflict resolution mechanisms we have seen are limited to text files.
I I.e., human readable.
I Most parts of pdf-files, MS Office files, .mat or .dta data formats, . . . are not.
I The deeper reason for having only sources under VC. I Subversion will mark such files as ‘binary’.
The case for plain L
ATEX
I So use simple text files for writing papers. I LATEX is the de-facto standard.
I Will see later that it is also very useful for getting closer to that “red button”.
I What about the middle ground – Scientific Workplace, LyX?
I Better, but not good. I Also format the source code.
I Not designed for human readability – merging conflicts is a pain in the ∗∗∗.
I More control over what is going on.
I Learning curve is steep.
I But shortly, your mind will ignore the markup commands when reading LATEX source code.
Using Subversion to go back in time
I One of the big benefits you get from using a version control system.
I You can use it as an infinite undo-button.
I Its coarseness depends on how often you commit. I Better too often than too seldomly.
I Try to commit chunks that share a logical connection. I Distinguish between two cases:
I Local changes that you do not want to keep.
I Changes you committed already – go back to a previous revision.
Reverting local changes
Reverting local changes
After this, a dialogue will pop up asking whether you really want to revert your changes. Click ‘OK’ and you’re done.
Going back to a previous state of the repository
I Now we look at what to do when I committed the previous changes as revision 12, instead of reverting.
Going back to a previous state of the repository
Going back to a previous state of the repository
After clicking ‘OK’, the Synchronise perspective will come up, but it is not very helpful here. So go back to PyDev perspective.Merging with prior versions is not ideal . . .
10 11 12
Merging with prior versions is not ideal . . .
10 11 12 13 14
Scientific work always involves trying lots of things, some of them work, some do not.
The better solution: Branching
10 11 12
The better solution: Branching
11 10 Trunk Branch “Experimental” 12The better solution: Branching
11 12 13 10 Trunk Branch “Experimental”The better solution: Branching
11 12 13 10 Trunk Branch “Experimental” 14The better solution: Branching
11 12 13 10 Trunk Branch “Experimental” 14Taking a snapshot of the repository
11 12 13 10 Trunk Branch “Experimental” Tag “As submitted”Recommended repository layout
Trunk Branches Tags
Recommended repository layout
Trunk Branches Tags
Creating a new project in Redmine
When you create a project in Redmine, you will automatically get a repository (wait for a couple of minutes).
Initialising a virgin repository
Initialising a virgin repository
Initialising a virgin repository
Tagging in action
Tagging in action
A little more on Subversion’s internals
I In each directory that Subversion knows about, your local working copy contains a hidden folder .svn.
I Stores all information about its parent direcory that Subversion needs to know.
I In particular, it contains information on: I the location of the repository;
I files and directories in that folder that Subversion knows about;
I properties like svn:ignore; I etc.
A little more on Subversion’s internals
I Consequence is that when moving entire directories, you have to think about the .svn directory.
I (Subversive for) Eclipse assumes you want to do to the repository what you did to your working copy (e.g. move a directory from one place to another). It issues the relevant commands implicitly, but this does not work always, e.g. moving nested directory structures does not seem to work smoothly.
I When you move things in the Windows Explorer, the information in the .svn directories does not change. Might lead to conflicting information.
I For TortoiseSVN, see:
http://tortoisesvn.net/docs/release/TortoiseSVN_en/tsvn-dug-rename.html
When everything stops working . . .
I Don’t panic!!!
I You always have the situation from the last commit in the repository.
I So be sure to commit frequently.
I Always solve problems immediately so that you won’t loose much information should you have to go back.
I Try out the solution strategies on the next slide in the order in which they appear.
When everything stops working . . .
I First attempt: svn cleanup.
I If that doesn’t work: In the Windows Explorer, move all files with important local changes to some safe place outside the working copy.
I svn revert, see whether you can update. If so, move back the files with changes and commit.
I If that doesn’t work: In the Windows Explorer, delete the directory in question and run an svn update on its parent directory. Then move back the files with changes and commit.
I If directory that causes trouble is the one you checked out, just delete the entire directory and check it out anew.
Moving a “clean” folder somewhere
I Sometimes you need a directory in a different repository, etc..
I If you moved it na¨ıvely in the Windows Explorer, the internal .svn directories would get into each others’ way.
I Elegant solution: svn export – moves everything under Version Control to a new directory.
I Access it from context menus.
I If you’re past that (i.e. you moved na¨ıvely and now have the conflict) . . .
I Turn on “view hidden files” somewhere in your Explorer “View” menu.
A L
ATEX primer
I Most important concept: You are responsible for the structure.
I LATEX will do the design based on that structure.
I Concept also called markup.
I Basic structure of a LATEX source file.
I Commands.
I References: Bibliographies.
I Use manager, e.g. Jabref on all platforms, BibDesk on Mac. I Best book: Kopka and Daly (2004).
I Disclaimer.
I Very useful reference resources:
I http://en.wikibooks.org/wiki/LaTeX/Formatting I http://en.wikibooks.org/wiki/LaTeX/Mathematics
Aside: Eclipse issues
I Upon first start, Eclipse will show you a welcome screen. Just click it away to get the usual perspective.
I Eclipse will ask you at some point whether you want to upload your usage data, which developers will use to improve its usability. Decide for yourself what you want to do, performance is unaffected.
References I
Ellison, Glenn (2002). “The Slowdown of the Economics Publishing Process”. In: The Journal of Political Economy 110.5, pp. 947–993. Hand, Eric (2010). “‘Big Science’ Spurs Collaborative Trend”. In: Nature News 463. Available at
http://www.nature.com/news/2010/100120/full/463282a.html.
Kopka, Helmut and Patrick W. Daly (2004). Guide to LATEX. 4th. Addison Wesley / Pearson Education.
Acknowledgements and revision number
I This course is designed after and borrows a lot from the Software Carpentry course designed by Greg Wilson for scientists and engineers.
I The Software Carpentry course material is made available under a Creative Commons Attribution License, as is this course’s material.
I Last changed revision: 202
I Last changed date: 2010-11-01 16:49:50 +0100 (Mon, 01 Nov 2010)
License for the course material
[Links to the full legal text and the source text for this page.] You are free:
I to Share to copy, distribute and transmit the work
I to Remix to adapt the work Under the following conditions:
I Attribution You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
With the understanding that:
I Waiver Any of the above conditions can be waived if you get permission from the copyright holder.
I Public Domain Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
I Other Rights In no way are any of the following rights affected by the license:
I Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations;
I The author’s moral rights;
I Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.
Notice For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.