Testing at the Speed and Scale of Google - How Google Tests Software

by Pooja Gupta, Mark Ivey, and John Penix

Continuous integration systems play a crucial role in keeping software working while it is being developed. The basic steps most continuous integration systems follow are:

1. Get the latest copy of the code.

2. Run all tests.

3. Report results.

4. Repeat 1–3.

This works great while the codebase is small; code flux is reasonable and tests are fast. As a codebase grows over time, the efficiency of such a system decreases.

As more code is added, each clean run takes much longer and more changes get crammed into a single run. If something breaks, finding and backing out the bad change is a tedious and error-prone task for development teams.

Software development at Google happens quickly and at scale. The Google codebase receives over 20 changes per minute and 50 percent of the files change every month! Each product is developed and released from “head” relying on automated tests verifying the product behavior. Release frequency varies from multiple times per day to once every few weeks, depending on the product team.

With such a huge, fast-moving codebase, it is possible for teams to get stuck spending a lot of time just keeping their build “green.” A continuous integration system should help by providing the exact change at which a test started failing, instead of a range of suspect changes or doing a lengthy binary search for the offending change. To find the exact change that broke a test, we can run every test at every change, but that would be very expensive.

To solve this problem, we built a continuous integration system (see Figure 2.6) that uses dependency analysis to determine all the tests a change transitively affects and then runs only those tests for every change. The system is built on top of Google’s cloud computing infrastructure, enabling many builds to be executed concurrently and allowing the system to run affected tests as soon as a change is submitted.

Here is an example where our system can provide faster and more precise feedback than a traditional continuous build. In this scenario, there are two tests and three changes that affect these tests. The gmail_server_tests are broken by the second change; however, a typical continuous integration system will only be able to tell that either change #2 or change #3 caused this test to fail. By using concur-rent builds, we can launch tests without waiting for the curconcur-rent build-test cycle to finish. Dependency analysis limits the number of tests executed for each change, so that in this example, the total number of test executions is the same as before.

ptg7759704 FIGURE 2.6 A typical continuous integration system.

Our system uses dependencies from the build system rules that describe how code is compiled and data files are assembled to build applications and tests.

These build rules have clearly defined inputs and outputs that chain together to precisely describe what needs to happen during a build. Our system maintains an in-memory graph of these build dependencies (see Figure 2.7) and keeps it up to date with each change that gets checked in. This allows us to determine all tests that depend (directly or indirectly) on the code modified by a change, and hence, need to be re-run to know the current state of the build. Let’s walk through an example.

Typical continuous integration system

Continuous integration system with dependency analysis

change #1

change #2 &

change #3

gmail_client_tests gmail_client_tests

gmail_server_tests gmail_server_tests

change #1 change #2

Represents time when a change triggers tests change #3

Tests triggered. Length represents test’s run time.

Failed test.

gmail_client_tests

X X

X

gmail_client_tests

gmail_server_tests gmail_server_tests

X

ptg7759704 FIGURE 2.7 Example of build dependencies.

We see how two isolated code changes, at different depths of the dependency tree, are analyzed to determine affected tests, that is the minimal set of tests that needs to be run to ensure that both Gmail and Buzz projects are “green.”

CASE 1: CHANGE IN COMMON LIBRARY

For first scenario, consider a change that modifies files in common_collec-tions_util, as shown in Figure 2.8.

FIGURE 2.8 Change in common_collections_util.h.

GMAIL Test Target:

name: //depot/gmail_client_tests name: //depot/gmail_server_tests

BUZZ Test Targets:

name: //depot/buzz_server_tests name: //depot/buzz_client_tests

buzz_client_tests gmail_client_tests buzz_server_tests

common_collections_util

youtube_client gmail_client gmail_server youtube_server

buzz_client

gmail_server_tests

buzz_server

common_collections_util

buzz_client_tests gmail_client_tests buzz_server_tests

youtube_server gmail_server

gmail_client

gmail_server_tests

buzz_server buzz_client

youtube_client

ptg7759704 When this change is submitted, we follow the dependency edges back up the

graph, eventually finding all tests that depend on it. When the search is complete (after a fraction of a second), we have all the tests that need to be run and can determine the projects that need to have their statuses updated based on results from these tests (see Figure 2.9).

FIGURE 2.9 Tests affected by change.

CASE 2: CHANGE IN A DEPENDENT PROJECT

For the second scenario, we look at a change that modifies files in youtube_client(see Figure 2.10).

FIGURE 2.10 Change in youtube_client.

We perform the same analysis to conclude that only the buzz_client_tests are affected and the status of Buzz project needs to be updated (see Figure 2.11).

buzz_client_tests gmail_client_tests buzz_server_tests

common_collections_util

youtube_server gmail_server

gmail_client

gmail_server_tests

buzz_server buzz_client

youtube_client

buzz_client_tests gmail_client_tests buzz_server_tests

common_collections_util

youtube_server gmail_server

gmail_client

gmail_server_tests

buzz_server buzz_client

ptg7759704

In document How Google Tests Software (Page 81-85)