Risk Analysis

Risk is everywhere. At home, on the roads, and at our workplace, every-thing we do has an element of risk, and shipping software is no different.

We buy safer cars and practice defensive driving to mitigate the risk of driv-ing. At work, we watch what we say in meetings and try to find projects suitable to our skill set to mitigate the risk of losing our job. How do we mitigate the risk of shipping software? How can we navigate the over-whelming odds that the software will fail (after all, no software is perfect) and cause untold damage to our company’s reputation?

Certainly not shipping the software isn’t an option despite its complete negation of the risk of failure. The enterprise benefits from well calculated risks.

Note that we did not say “well quantified” risks. Risk, at least for our purposes, isn’t something that requires mathematical precision. We walk on sidewalks and not the street not because of a formula that shows a 59 per-cent decrease in risk but a general knowledge that roads are not a safe place for pedestrians. We buy cars with air bags not because we know the math behind increasing our survival chances in case of a wreck but because they represent an obvious mitigation to the risk of breaking our faces on the steering wheel. Risk mitigations can be incredibly powerful without a great deal of precision and the process of determining risk is called risk analysis.

For software testing, we follow a common sense process for understanding risk. The following factors have been helpful to us when understanding risk:

• What events are we concerned about?

• How likely are these events?

• How bad would they be to the enterprise?

• How bad would they be for customers?

• What mitigations does the product implement?

• How likely is it that these mitigations would fail?

• What would be the cost of dealing with the failure?

• How difficult is the recovery process?

• Is the event likely to recur or be a one-time problem?

There are many variables in determining risk that makes quantifying it more trouble than mitigating it. At Google, we boil risk down to two pri-mary factors: frequency of failure and impact. Testers assign simple values for each of these factors to each capability. We’ve found that risk is real as a qualitative number rather than as an absolute number. It isn’t about assign-ing an accurate value; it’s about determinassign-ing whether one capability is more

ptg7759704 or less risky than another capability. This is enough to determine which

capabilities to test and which order to test them in. GTA presents the options, as shown in Figure 3.8.

FIGURE 3.8 Estimating risk in terms of frequency and impact in GTA for Google+.

GTA uses four predefined values for frequency of occurrence:

• Rarely:It’s hard to imagine a case where failure would occur and recovery would not be trivial.

• Example: Download page for Chrome.⁴The content is largely static with parameterization for only auto-detection of the client OS, even if there was a break in the core HTML or script on the page that would be detected quickly by monitoring code.

• Seldom:There are cases where failure can occur, but low complexity or low usage would make such occurrences rare.

• Example: The Forward button in Chrome. This button is used, but far less frequently than the Back button. Historically, it doesn’t fail often, and even if it did regress, we would expect our early adopters on the early release channels to catch this issue quickly as it would be fairly obvious.

4The web page where people can download Chrome is http://www.google.com/chrome.

ptg7759704

• Occasionally:Failure circumstances are readily imaginable, somewhat complex, and the capability is one we expect to be popular.

• Example: Chrome Sync capabilities. Chrome synchronizes the bookmarks, themes, form-fill, history, and other user profile data across clients. There are different data types and multiple OS plat-forms, and merging changes is a somewhat complex computer sci-ence problem in its own right. Users are also likely to notice if this data isn’t synchronized. Synchronization happens only when data to be synchronized changes, like when a new bookmark has been added.

• Often:The capability is part of a high-use, high-complexity feature that experiences failure on a regular basis.

• Example: Rendering web pages. This is the primary use case of a browser. Rendering HTML, CSS, and JavaScript code of whatever origin and quality is the principle task a browser performs. If any of this code fails, it’s the browser that the user blames. Risk increases when you consider such a failure on a high-traffic site.

Rendering issues aren’t always caught by users either; they often result in elements misaligned slightly, but still functional, or ele-ments go missing, but the user wouldn’t know they aren’t there.

Testers choose one of these values for each capability. We used an even number of values on purpose to keep testers from simply picking the mid-dle value. It’s important to think about it a bit more carefully.

Estimating impact takes a similarly simplistic approach and is also based on choosing from an even number of possibilities (more examples from the Chrome browser):

• Minimal:A failure the user might not even notice.

• Example: Chrome Labs. This is optional functionality. Failure to load the chrome://labs page would affect few users. This page contains optional Chrome experimental features. Most users don’t even know they are there; the features themselves are labeled “use at your own risk” and don’t pose a threat to the core browser.

• Some:A failure that might annoy the user. If noticed, retry or recovery mechanisms are straightforward.

• Example: Refresh button. If this fails to refresh the current page, the user can retype the URL in the same tab, simply open a new tab to the URL, or even restart the browser in the extreme case.

The cost of the failure is mostly annoyance.

• Considerable:A failure would block usage scenarios.

• Example: Chrome extensions. If users installed Chrome extensions to add functionality to their browser and those extensions failed to load in a new version of Chrome, the functionality of that exten-sion is lost.

ptg7759704

• Maximal:A failure would permanently damage the reputation of the product and cause users to stop using it.

• Example: Chrome’s autoupdate mechanism. Should this feature break, it would deny critical security updates or perhaps even lead the browser to stop working.

Sometimes the impact to the enterprise and the user are at odds. A ban-ner ad that fails to load is a problem for Google but might not even be noticed by a user. It is good practice to note whether risk to the enterprise or risk to the user is being considered when assigning a score.

We can generate a heat map of risk areas of Google Sites based on the values entered by the tester and the Attribute-Component grid shown ear-lier. This appears in Figure 3.9.

FIGURE 3.9 Heat map of risk for attribute-component grid for an early version of Google+.

The entries in the grid light up as red, yellow, or green depending on the risk level of the components assigned to those intersections. It is a sim-ple calculation of risk for each of the values you’ve entered—we simply take the average of each capability’s risk. GTA generates this map, but a spreadsheet can also be used.

This diagram represents the testable capabilities of the product and their risk as you assign the values. It’s difficult to keep bias from these num-bers, and testers do represent a specific point of view. We’re careful to solicit feedback from other stakeholders as well. Following is a list of stakeholders

ptg7759704 and some suggestions about getting them involved in assigning these risk

values:

• Developers:Most developers, when consulted, will assign the most risk to the features they own. If they wrote the code, they want tests for it! It’s been our experience that developers overrate the features they own.

• Program Managers:PMs are also humans and introduce their own biases. They favor the capabilities they see as most important. In gen-eral, they favor the features that make the software stand out from its competition and make it “pop.”

• Salespeople:Sales are the ones who get paid for attracting users. They are biased toward features that sell the product and look good in demos.

• Directors and VPs:Executives often highlight the features that set the software apart from its major competitors.

Obviously, all stakeholders have significant biases so our approach has been to solicit all their opinions and have them each separately score the capabilities using the previous two scales. It isn’t always easy to get their participation, but we’ve hit upon a strategy that has been successful.

Instead of explaining the process and convincing them to help, we simply do it ourselves and present them with our heat map. Once they see our bias, they are quick to supply their own. Developers participate in mass if they know we are using the map to prioritize testing; the same goes for PMs and salespeople. They all have a stake in quality.

There is power in this approach. In determining risk ourselves, we have undoubtedly come to a conclusion they will argue with. Indeed, in present-ing our risk analysis as the basis for all forthcompresent-ing tests, we have given them something to argue about! And that is the point. Instead of asking their opinions about a concept they consider nebulous, we have shown them a specific conclusion they can argue against. People are often ready to tell you what the answer is not. We also avoid having everyone wade through all data for which they have little interest or context. With this little ruse in our toolkit, we generally get a lot of focused input that we can factor into the risk calculations.

Once risk is generally agreed upon, it’s time to start mitigating it.

In document How Google Tests Software (Page 128-132)