ing system provides an example of a usability specification 6 Table 1 is a summary of the usabil i ty specification for the first version of the VAX NOTES system . Five items are defined for each attribute: the measuring technique, the metric, the worst-case leve l , the planned level, and the best-case level.
1 27
Software Productivity Tools
Table 1 Summary Usability Specification for VAX NOTES Version 1 .0
Worst- Best-
Usability Measuring Case Planned Case
AHribute Technique Metric Level Level Level
Initial NOTES Number of 1 -2 3-4 8- 1 0
use benchmark successful
task interactions
in 30 min utes
I nitial Attitude Evaluation 50 67 83
evaluation questionnaire score (0 to 1 00)
Error Critical- Percent 1 0% 50% 1 00%
recovery incident i ncidents
analysis "covered"
The measuring technique defines the method used ro measure the attribute . Deta i ls of the measuring technique (not shown in Table 1 ) accompany the brief description i n the summary table. There are many different techniques for measuring usab i l ity attributes. We have usua l ly measured usability attributes by asking users to perform a standardized task in a laborarory set ting. We can then use this task as a benchmark for comparing usability attribute l evels of different systems.
In the VAX NOTES case, we chose ro measure initial use with a 1 4-i tem benchmark task that an expert VAX NOTES user could finish in three minutes. Initial users were Digita l employees who had experience with the VMS operating sys tem and the Digital Command Language but not with conferencing systems. The users completed their initial eval uations using 1 0-item Likert-style questionnaires after they finished the benchmark task. Error recovery was measured by a critical incident analysis. In the analysis, we used ques tionnaires and interviews to collect information about costly errors (critical incidents) made by users of the prorotype versions of the VAX NOTES software .
The metric specifies how an attribute is expressed as a measurable quantity . Table 1 shows the definitions of the metrics in the VAX NOTES specification . For the initial-use attri bute , the metric was the number of successful interactions in the first 30 minutes of the bench mark task. For the initial-evaluation attribute, we scored the questionnaire on a scal e ranging from 0 (strongly negative) ro 1 00 (strongly posi tive) , with 5 0 representing a neutra l eva luation .
1 28
For error recovery, the metric was the percent age of i ncidents reported with the prototype systems that woul d be "covered" ( i .e . , elimi nated) by changes made in version 1 . 0 of the VAX NOTES system .
The worst-case and planned levels define a range from failure to m eet m inim u m accept able requirements to meeting the specification in full . This range is an extension of Deming's single criterion value, which determines success or fai lure . It is easier to specify a range of val ues than a single value for success and fai lure . Providing a range of values for several attributes also makes it easier to manage trade-offs in l evels of qual ity of different attributes.
The best-case leve l provides usefu I manage ment information by estimating the state-of-the art level for an attribute. The best case is an esti mate of the best that could be achieved with this attribute, given enough resources.
For the initial use of VAX NOTES software, we defined t he p lanned leve l as experiencing 3 or 4 successfu l interactions in the first half hour of use . We considered 1 or 2 successfu l interactions to be the minimum acceptable leve l , and 8 t o 1 0 successful interactions t o be the best that cou ld be expected . In practice the actual l evel was 1 3 successful interactions, suggesting that we set the l evels for this attribute too con servatively.
The planned level for initial evaluation (67) was fairly positive . Users' neutral feel i ngs were acceptable but negative feelings were not , so we set the worst case at 5 0 . We set the best case at 8 3 , which represented the highest scores we had seen so far w hen using this questionnaire with
Digital Technical journal
other products. The actual tested value was 67, matching the planned leve l .
We planned an error-recovery level that could cover 50 percent of the reported critical inci dents. The worst-case level was set at a fairly low 1 0 percent, whereas the best case wou ld be to cover all of the reported critical incidents. I n practice, 7 2 percent of the critical i ncidents were covered, exceeding the planned level.
Many usability specifications provide further detai l by including " now" levels and references. Now levels represent current levels for an attri bute, either for the current version of the product or for competitive products. References can be used to add more detail, such as describing how the levels were chosen, and to document the usability specification .
User needs and expectations are shaped i n part b y the marketplace; therefore competitive analyses can provide i mportant data for usabil ity specifications. We have constructed usabi lity specifications that compare the system u nder development to e ither the curren t market leader, the product with the most highly acclaimed user i nterface in the market, or both. We can also compare the systems by measuring usability on appropriate benchmark tasks.