Automatic Accessibility and Usability Validation

2.3 Existing Usability Tool Support for Web Developers

2.3.4 Automatic Accessibility and Usability Validation

To assess the accessibility and usability of existing web pages, automated validators can be utilised. These programs are given the URL of a web page. They download the HTML code

100% complete correctly 100% correct reported false negatives false positives

actual reported actual reported actual reported

Figure 2.10: Sets of actual usability problems which are present in a website vs. usability problems reported by a validator. When comparing validators, higher completeness of a tool means it has fewer false negatives, higher correctness means fewer false positives. of the page, or in some cases all pages of a given website. Subsequently, the HTML is analysed and possible problems with the page are listed in a report. In [Brajnik04Comparing], the quality of a good automated usability validator’s report has been summarized using the three aspects below. Figure2.10 helps to make them clear by comparing the set of actual usability problems that are present in a website with the set of problems that is reported by a validator.

• _{Completeness: When restricting oneself to the usability problems which are actually}

present in the website, the completeness of a tool corresponds to the fraction of these usability problems that are correctly identified and reported. In other words, the completeness value specifies how well the tool reduces the amount of false negatives, i.e. actual usability problems which are not detected. It does not make any statement regarding issues which are reported even though they are not present in the site. The left part of the figure shows an extreme example: If a tool outputs an excessive amount of problems, it will reach a high completeness, even if many problems are not really present.

Tools with high completeness can identify many issues – this way, their built-in expert knowledge can supplement the web developer’s own knowledge about usability guidelines.

• _{Correctness: The more correct a tool, the higher is the percentage of usability issues in its}

reports which really are problems with the actual website. This means that higher correctness implies a reduction of false positives, i.e. properties which are incorrectly reported as problems even though they are not. Correctness is thus a counterpart to completeness. This symmetry is also visible in the figure. On the right, the actual/reported problem sets for a fully correct tool are shown. A tool can achieve high correctness by only outputting problems about whose presence it is certain.

The important aspect of correctness is that tools must be able to suppressincorrectoccur- rances of problems in their reports to an acceptable level. A large number of false positives in the report can make it too long to read and can prevent actual problems from being spotted.

Because false positives are less problematic (they only cause additional work for the developer), current tools tend to err on this side. However, this is far from ideal – too many false positives will cause the developer to stop using the tool because there is no perceived benefit compared to just checking the website manually.

Figure 2.11: Ensuring syntactical correctness of web pages (e.g. using the W3C validator,

http://validator.w3.org) is a first step towards good accessibility of a website.

• _{Specificity: The tool’s report describes the problem in a way that is as detailed as possible,}

or, from another point of view, it distinguishes between different possible errors for each property that is checked. This requirement follows from the assumption that the web developer may not be a usability expert, so he must be given enough information to understand the reported problems.

Due to the fact that section6.1contains a detailed description of previous work in this area, only an overview of important tool properties is presented in the following.

Evaluation of single page or entire website: Many validators only download and analyse a single HTML page, for example the WAVE [Kasday00WAVE]. In contrast to this, others such as Bobby [WatchfireBobby] download all pages which are up to a specified number of clicks away from the website’s homepage. In the latter case, the tool’s validation heuristics can also include the website’s link structure in its analysis, which can make the tool more powerful.

Simple syntax checks or complicated usability heuristics: Different tools validate the page using different approaches. At the conceptually simplest level (e.g. the W3C or Schneegans validators,http://validator.w3.org, figure2.11, andhttp://schneegans.de/sv/), they only verify that the syntax of the page’s (X)HTML and CSS is correct. More advanced validators instead aim at detecting a set of more abstract accessibility problems with the page – for example, the ATRC web accessibility checker (formerly A-Prompt,

http://checker.atrc.utoronto.ca) contains code to detect problems with WCAG

conformance of pages. Even more abstract tests are imaginable – chapter6 will present a few additional ideas and approaches of implementing them.

WebTango [Ivory01Metrics] is special in that it does not directly check guidelines using an algorithmic implementation of usability guidelines. Instead, it measures “web quality”

Figure 2.12: The report of the TAW accessibility checker (http://www.tawdis.net) high- lights problems using an annotated version of the analysed page.

of a site using a statistical method: First, a large number of sites with a given usability rating (determined by human experts) are inspected by the tool, and about 150 different measures are calculated, such as the amount of text on a page or its loading speed. Later, the measures are compared to those for a new site which is to be evaluated. Using e.g. linear regression, the new site’s probable human expert rating is calculated.

Fully automatic and interactive operation: Many of the available validators operate fully automatically. This is advantageous because it facilitates integrating automatically scheduled checks into the workflow of web development, and for large-scale accessibility studies of many websites, e.g. the study carried out in [Marincu04Comparative].

However, there is an advantage to asking the user for additional information. For example, if the heuristic algorithm for a certain usability guideline encounters a case where it is unsure, it can ask the developer for details and take the answer into account when it encounters the same situation again in the future. This way, only a little additional information can improve validation quality by eliminating numerous identical false positives in the usability report. NAUTICUS [Correani04Nauticus] is an example for a tool which supports interactive operation.

Presentation of accessibility/usability report: When it comes to presenting the results of the validation, the available validators can be divided into several groups: The “classical” type of output is a web page (or a dialogue by a desktop GUI program) which lists the failed tests for inspection by the developer and optionally provides information on how to repair

the problems. One of the first accessibility checkers, the original Bobby service (no longer available as an online service, now replaced by WebXACT), fell into this category. An alternative way of highlighting problems is used by TAW (Test de Accesibilidad Web,

http://www.tawdis.net) and other tools: As the test result, the tool returns the web

page it has just analysed, annotated with additional icons for each problematic feature (see figure2.12). Finally, some validators such as ATRC can output their report in a machine- readable format – [W3C-EARL] is the predominant standard.

Web-based service or desktop GUI application: With regard to the way they are imple- mented, automated usability validators can be divided into two types of tools: Many programs are provided via a website on a server and download web pages that are specified using an HTML form (e.g. ATRC). Other tools are GUI applications that are in- stalled on a desktop computer and used by a single user at a time (e.g. MAGENTA,

http://giove.isti.cnr.it/accessibility/magenta/). In some cases, both is pos-

sible, such as with WebXACT, which offers a web-based service of the same name, and a desktop application under the name Bobby. (The original Bobby was a web-based service.) Hard-coded heuristics and extensible rulesets: The majority of available tools implements accessibility and usability checks with specially-written code, using e.g. Java or C. Adding more rules usually requires good knowledge of the programming language that was used, and a recompilation of the validation program. Thus, adding rules can become quite complicated. Additionally, it is advantageous if the same person is both a programmer and a usability expert, the two cannot be separated easily.

For these reasons, some approaches concentrate on developing special-domain languages for the description of usability tests, and provide mechanisms for extending the validator’s set of guidelines at runtime. This includes the Kwaresmi tool [Beirekdar02Kwaresmi] with its Guideline Definition Language (GDL) and EvalIris [Abascal03EvalIris] which uses an XML-based guideline language.

Statistics-based tools like WebTango do not fit in either of the two categories of hard-coded tests or extensible rulesets.

Automatic critique: In addition to identifying problems, a tool can provide suggestions of how to repair them. In many cases, it can achieve this simply by adding to the usability report some fixed text describing possible solutions. This is done by many validators, e.g. ATRC. A more advanced form of automated critique is to provide suggestions which are tailored to the exact instance of the problem. For example, rather than telling the user to add analt

attribute to his<img>tag, the tool will cite the relevant portion of the HTML document, with a dummyaltattribute already inserted.

Automatic repair: As an extension of automatic critique, a tool can make it easy for the developer to apply the suggested solution to the website, by integrating analysis and editing support into one application – this results in an automated interactive repair system, an example is NAUTICUS. Non-interactive operation is also imaginable; at the level of syntactical correctness of the HTML code, HTML tidy (http://tidy.sourceforge.net) can be used to turn documents into valid HTML.

Conclusion

It is noticeable that each of the tools and services listed above concentrates on a certain area of automatic validation. Taken together, the features of the tools are impressive, but each individual tool has its shortcomings, usually either in terms of completeness, i.e. the number of different usability aspects that are tested, or with regard to correctness, e.g. the number of incorrectly reported problems. Some academic prototypes are only meant to demonstrate a concept and are not suitable for productive use. Today, no single tool performs very well in all three aspects cited by [Brajnik04Comparing] – while the foundations of the different tool concepts have been laid, there is room for improvement in many areas.

In document Atterer, Richard (2008): Usability Tool Support for Model-Based Web Development. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 39-44)