The rise of distributed version control systems (DVCSs) is revolutionizing the way teams cooperate. Where open source projects once emailed patches or posted them on forums, tools like Git and Mercurial make it incredibly easy to pull patches back and forth between developers and teams and to branch and merge work streams. DVCSs allow you to work easily offline, commit changes locally, and rebase or shelve them before pushing them to other users. The core characteristic of a DVCS is that every repository contains the entire history of the project, which means that no repository is privileged except by convention.
Thus, compared to centralized systems, DVCSs have an additional layer of indi-rection: Changes to your local working copy must be checked in to your local repository before they can be pushed to other repositories, and updates from other repositories must be reconciled with your local repository before you can update your working copy.
DVCSs offer new and powerful ways to collaborate. GitHub, for example, pioneered a new model of collaboration for open source projects. In the traditional model, committers acted as gatekeepers to the definitive repository for a project, accepting or rejecting patches from contributors. Forks of a project only occurred in extreme circumstances when there were irreconcilable arguments between committers. In the GitHub model, this is turned on its head. Contributions are made by first forking the repository of the project you wish to contribute to, making your changes, and then asking the owners of the original repository to pull your changes. On active projects, networks of forks rapidly proliferate, each with various new sets of features. Occasionally these forks diverge. This model is far more dynamic than the traditional model in which patches languish, ignored, on mailing list archives. As a result, the pace of development tends to be faster on GitHub, with a larger cloud of contributors.
However, this model challenges a fundamental assumption of the practice of CI: That there is a single, canonical version of code (usually called mainline, or trunk) to which all changes are committed. It is important to point out that you can use the mainline model of version control, and do CI perfectly happily, using a DVCS. You simply designate one repository as the master, have your CI server trigger whenever a change is made to that repository, and have everybody push all their changes to this repository in order to share them. This is a perfectly reasonable approach that we have seen used successfully on many projects. It retains the many benefits of DVCS, such as the ability to commit your changes very frequently without sharing them (like saving your game), which comes in very useful while exploring a new idea or performing a complex series of refactorings.
However, there are some patterns of use of DVCS that prevent CI. The GitHub model, for example, violates the mainline/trunk model of code sharing, and so prevents true continuous integration.
In GitHub, each user’s set of changes exists in a separate repository, and there is no way to easily determine which sets from which users will successfully
ptg integrate. You could take the approach of creating a repository to watch all the
other repositories and attempt to merge them all together whenever it detects a change to any of them. However, this will almost always fail at the merge stage, let alone when running the automated tests. As the number of contributors, and hence repositories, grows, the problem gets exponentially worse. Nobody will take any notice of what the CI server says, so CI as a method of communicating whether the application is currently working (and if not, who and what broke it) fails.
It is possible to fall back to a simpler model that provides some of the benefits of continuous integration. In this model, you create a CI build for each repository.
Every time a change is made, you attempt to merge from the designated master repository and run the build. Figure 3.2 shows CruiseControl.rb building the main repository for the Rapidsms project along with two forks of it.
Figure 3.2 Integrating branches
In order to create this system, a branch pointing to the main project repository was added to each of CC.rb’s Git repositories using the command git remote add core git://github.com/rapidsms/rapidsms.git. Every time the build is triggered, CC.rb attempts to merge and run the build:
git fetch core
git merge --no-commit core/master [command to run the build]
After the build, CC.rb runs git reset --hard to reset the local repository to head of the repository it is pointing at. This system does not provide true contin-uous integration. However, it does tell the maintainers of the forks—and the maintainer of the main repository—whether their fork could in principle be
ptg merged with the main repository, and whether the result would be a working
version of the application. Interestingly, Figure 3.2 shows that the main repository’s build is currently broken, but the Dimagi fork not only merges suc-cessfully with it, but also fixes the broken tests (and possibly adds some additional functionality of its own).
At one more step away from continuous integration is what Martin Fowler calls “promiscuous integration” [bBjxbS]. In this model, contributors pull changes not just between forks and the central repository, but also between forks. This pattern is common in larger projects that use GitHub, when some developers are working on what are effectively long-lived feature branches and pull changes from other repositories that are forked off the feature branch. Indeed in this model there need not even be one privileged repository. A particular release of the software could come from any of the forks, provided it passed all the tests and was accepted by the project leaders. This model takes the possibilities of DVCS to their logical conclusion.
These alternatives to continuous integration can create high-quality, working software. However, this is only possible under the following conditions:
• A small and very experienced team of committers who manage pulling patches, tend the automated tests, and ensure the quality of the software.
• Regular pulling from forks, so as to avoid large amounts of hard-to-merge inventory accumulating on them. This condition is especially important if there is a strict release schedule, because the temptation is to leave merging till near the release, at which point it becomes extremely painful—the exact problem that continuous integration is designed to solve.
• A relatively small set of core developers, perhaps supplemented by a larger community which contributes at a relatively slow pace. This is what makes the merges tractable.
These conditions hold for most open source projects, and for small teams in general. However, they very rarely hold for medium or large teams of full-time developers.
To summarize: In general, distributed version control systems are a great ad-vance and provide powerful tools for collaboration, whether or not you are working on a distributed project. DVCSs can be extremely effective as part of a traditional continuous integration system, in which there is a designated central repository to which everybody regularly pushes their changes (at least once a day). They can also be used in other patterns that do not allow for continuous integration, but may still be effective patterns for delivering software. However, we caution against using these patterns when the right conditions, listed above, are not satisfied. Chapter 14, “Advanced Version Control,” contains a full dis-cussion of these and other patterns and the conditions under which they are effective.
ptg
Summary
If you were to choose just one of the practices in this book to implement on a development team, we would suggest that you choose continuous integration.
Time and time again we have seen it make a step change to the productivity of software development teams.
To implement continuous integration is to create a paradigm shift in your team. Without CI, your application is broken until you prove otherwise. With CI, the default state of your application is working, albeit with a level of confi-dence that depends upon the extent of your automated test coverage. CI creates a tight feedback loop which allows you to find problems as soon as they are introduced, when they are cheap to fix.
Implementing CI forces you to follow two other important practices: good configuration management and the creation and maintenance of an automated build and test process. For some teams, that will seem like a lot to bite off, but they can be achieved incrementally. We discussed the steps to good configuration management in the previous chapter. There is more on build automation in Chapter 6, “Build and Deployment Scripting.” We cover testing in more detail in the next chapter.
It should be clear that CI requires good team discipline—but then, any process requires this. What is different about continuous integration is that you have a simple indicator of whether or not discipline is being followed: The build stays green. If you discover that the build is green but there is insufficient discipline, for example poor unit test coverage, you can easily add checks to your CI system to enforce better behavior.
This brings us to our final point. An established CI system is a foundation on which you can build more infrastructure:
• Big visible displays which aggregate information from your build system to provide high-quality feedback
• A system of reference for reports and installers for your testing team
• A provider of data on the quality of the application for project managers
• A system that can be extended out to production, using the deployment pipeline, which provides testers and operations staff with push-button deployments
ptg
Introduction
Too many projects rely solely on manual acceptance testing to verify that a piece of software conforms to its functional and nonfunctional requirements. Even where automated tests exist, they are often poorly maintained and out-of-date and require supplementing with extensive manual testing. This and the related chapters in Part II of this book aim to help you to plan and implement effective automated testing systems. We provide strategies for automating tests in com-monly occurring situations and describe practices that support and enable automated testing.
One of W. Edwards Deming’s fourteen points is, “Cease dependence on mass inspection to achieve quality. Improve the process and build quality into the product in the first place” [9YhQXz]. Testing is a cross-functional activity that involves the whole team, and should be done continuously from the beginning of the project. Building quality in means writing automated tests at multiple levels (unit, component, and acceptance) and running them as part of the deploy-ment pipeline, which is triggered every time a change is made to your application, its configuration, or the environment and software stack that it runs on. Manual testing is also an essential part of building quality in: Showcases, usability testing, and exploratory testing need to be done continuously throughout the project.
Building quality in also means constantly working to improve your automated testing strategy.
In our ideal project, testers collaborate with developers and users to write automated tests from the start of the project. These tests are written before devel-opers start work on the features that they test. Together, these tests form an ex-ecutable specification of the behavior of the system, and when they pass, they demonstrate that the functionality required by the customer has been implemented completely and correctly. The automated test suite is run by the CI system every time a change is made to the application—which means the suite also serves as a set of regression tests.