Development and Test Workflow - How Google Tests Software

Before we dig into the workflow specific to SETs, it is helpful to understand the overall development context in which SETs work. SETs and SWEs form a tight partnership in the development of a new product or service and there is a great deal of overlap in their actual work. This is by design because Google believes it is important that testing is owned by the entire engineering team and not just those with the word test in their job title.

Shipping code is the primary shared artifact between engineers on a team. It is the organization of this code, its development, care, and feeding that becomes the focus of everyday effort. Most code at Google shares a sin-gle repository and common tool chain. These tools and repository feed

2“The Life of” is a tip of the hat to a series of Google internal courses on how Search and Ads work. Google has courses for Nooglers (new Google employees) called Life of a Query that details the dynamics of how a search query is implemented and Life of a Dollar to show how Ads work.

3Patrick Copeland covers the dawn of the SET in his preface to this book.

ptg7759704 Google’s build and release process. All Google engineers, regardless of their

role, become intimately familiar with this environment to the point that per-forming any tasks associated with checking in new code, submitting and executing tests, launching a build, and so on can be done without conscious thought by anyone on the team (assuming that person’s role demands it).

4“Twenty percent time” is what Googlers call side projects. It’s more than a concept; it is an offi-cial structure that allows any Googler to take a day a week and work on something besides his day job. The idea is that you spend four days a week earning your salary and a day a week to experiment and innovate. It is entirely optional and some former Googlers have claimed the idea is a myth. In our experience, the concept is real and all three of us have been involved in 20 percent projects. In fact, many of the tools outlined in this book are the result

of 20 percent efforts that eventually turned into real, funded products. But many Googlers choose to simply work on another product in their 20 percent time, and thus, the concept of a 20 percent contributor is something many products, particularly the new cool ones, enjoy.

This openness of the codebase, the harmony of the engineering toolset, and com-panywide sharing of resources has enabled the development of a rich set of shared code libraries and services.

Shipping code is the primary shared artifact between engineers on a team. It is the organization of this code, its development, care, and feeding that becomes the focus of everyday effort.

This single repository makes a great deal of sense as engineers moving from project to project have little to relearn and so called “20 percent con-tributors”⁴can be productive from their first day on a project. It also means that any source code is available to any engineer who needs to see it. Web app developers can see any browser code they need to make their job easier without asking permission. They can view code written by more experi-enced engineers and see how others performed similar tasks. They can pick up code for reuse at the module or even the control structure or data struc-ture level of detail. Google is one company and has one easily searchable (of course!) source repository.

This openness of the codebase, the harmony of the engineering toolset, and companywide sharing of resources has enabled the development of a rich set of shared code libraries and services. This shared code works reli-ably on Google’s production infrastructure, speeding projects to completion and ensuring few failures due to underlying shared libraries.

ptg7759704 Code associated with the shared infrastructure has a special type of

treatment by engineers, and those working on it follow a set of unwritten but common practices that speak to the importance of the code and the care that engineers take when modifying it.

• All engineers must reuse existing libraries, unless they have very good reason not to based on a project-specific need.

• All shared code is written first and foremost to be easily located and readable. It must be stored in the shared portion of the repository so it can be easily located. Because it is shared among various engineers, it must be easy to understand. All code is treated as though others will need to read or modify it in the future.

• Shared code must be as reusable and as self-contained as possible.

Engineers get a lot of credit for writing a service that is picked up by multiple teams. Reuse is rewarded far more than complexity or clever-ness.

• Dependencies must be surfaced and impossible to overlook. If a project depends on shared code, it should be difficult or impossible to modify that shared code without engineers on dependent projects being made aware of the changes.

• If an engineer comes up with a better way of doing something, he is tasked with refactoring all existing libraries and assisting dependent projects to migrate to the new libraries. Again, such benevolent commu-nity work is the subject of any number of available reward

mechanisms.⁵

• Google takes code reviews seriously, and, especially with common code, developers must have all their code reviewed by someone with a

“readability” in the relevant programming language. A committee grants readabilities after a developer establishes a good track record for writing clean code which adheres to style guidelines. Readabilities exist for C++, Java, Python, and JavaScript: Google’s four primary lan-guages.

• Code in the shared repository has a higher bar for testing (we discuss this more later).

Platform dependencies are dealt with by minimizing them. Every engi-neer has a desktop OS as identical as possible to Google’s production sys-tem. Linux distributions are carefully managed to keep dependencies at a minimum so that a developer doing local testing on his own machine will likely achieve the same results as if he were testing on the production system. From desktop to data center, the variations between CPU and

5The most common of which is Google’s often touted “peer bonus” benefit. Any engineer whose work is positively impacted by another engineer can serve a peer bonus as a thank you.

Managers additionally have access to other bonus structures. The idea is that impactful com-munity work be positively reinforced so it keeps happening! Of course, there is also the infor-mal practice of quid pro quo.

ptg7759704 operating system are minimal.⁶If a bug occurs on a tester’s machine,

chances are it will reproduce on a developer’s machine and in production.

All code that deals with platform dependencies is pushed into libraries at the lowest level of the stack. The same team that manages the Linux dis-tributions also manages these platform libraries. Finally, for each program-ming language Google uses, there is exactly one compiler, which is well maintained and constantly tested against the one Linux distribution. None of this is magic, but the work involved in limiting the impact of multiple environments saves a great deal of testing downstream and reduces hard-to-debug environmental issues that distract from the development of new functionality. Keep it simple, keep it safe.

6The only local test labs Google maintains outside this common infrastructure are for Android and Chrome OS where various flavors of hardware must be kept on hand to exercise a new build.

Note

Keeping it simple and uniform is a specific goal of the Google platform: a common Linux distribution for engineering workstations and production deployment machines; a centrally managed set of common, core libraries; a common source, build, and test infrastructure; a single compiler for each core programming lan-guage; language independent, common build specification; and a culture that respects and rewards the maintenance of these shared resources.

The single platform, single repository theme continues with a unified build system, which simplifies working within the shared repository. A build specification language that is independent of a project’s specific pro-gramming language directs the build system. Whether a team uses C++, Python, or Java, they share the same “build files.”

A build is achieved by specifying a build target (which is either a library, binary, or test set) composed of some number of source files. Here’s the overall flow:

1. Write a class or set of functions for a service in one or more source files and make sure all the code compiles.

2. Identify a library build target for this new service.

3. Write a unit test that imports the library, mocks out its nontrivial dependencies, and executes the most interesting code paths with the most interesting inputs.

4. Create a test build target for the unit test.

5. Build and run the test target, making necessary changes until all the tests pass cleanly.

ptg7759704 6. Run all required static analysis tools that check for style guide

compli-ance and a suite of common problems.

7. Send the resulting code out for code review (more details about the code review follow), make appropriate changes, and rerun all the unit tests.

The output of all this effort is a pair of build targets: the library build target representing the new service we wish to ship and a test build target that tests the service. Note that many developers at Google perform test-driven development, which means step 3 precedes steps 1 and 2.

Larger services are constructed by continuing to write code and link together progressively larger library build targets until the entire service is complete. At this point, a binary build target is created from the main source file that links against the service library. Now you have a Google product that consists of a well-tested standalone binary, a readable, reusable service library with a suite of supporting libraries that can be used to create other services, and a suite of unit tests that cover all the interesting aspects of each of these build targets.

A typical Google product is a composition of many services and the goal is to have a 1:1 ratio between a SWE and a service on any given prod-uct team. This means that each service can be constrprod-ucted, built, and tested in parallel and then integrated together in a final build target once they are all ready. To enable dependent services to be built in parallel, the interfaces that each service exposes are agreed on early in the project. That way, devel-opers take dependencies on agreed-upon interfaces rather than the specific libraries that implement them. Fake implementations of these interfaces are created early to unblock developers from writing their service-level tests.

SETs are involved in much of the test target builds and identify places where small tests need to be written. But it is in the integration of multiple build targets into a larger application build target where their work steps up and larger integration tests are necessary. On an individual library build target, mostly small tests (written by the SWE who owns that functionality with support from any SET on the project) are run. SETs get involved and write medium and large tests as the build target gets larger.

As the build target increases in size, small tests written against inte-grated functionality become part of the regression suite. They are always expected to pass and when they fail, bugs are raised against the tests and are treated no differently than bugs in features. Test is part of the function-ality, and buggy tests are functional bugs and must be fixed. This ensures that new functionality does not break existing functionality and that any code modifications do not break any tests.

In all of this activity, the SETs are centrally involved. They assist devel-opers in deciding what unit tests to write. They write many of the mocks and fakes. They write medium and large integration tests. It is this set of tasks the SET performs that we turn to now.

ptg7759704

In document How Google Tests Software (Page 48-53)