2.4 Audio Unaware Systems
2.4.2 Screen Reading
Unfortunately, the majority of existing desktop GUI applications do not provide di- rect, consistent, programmatic access to the state and function of their models. As an example, consider that, on the day of this writing, the CNET Top Ten Windows Downloads tracker (http://www.download.com/most-popular-windows-software/) lists only one application that offers a model-based API. Applications that do provide a public API often limit what information an external process can retrieve or manipulate for reasons such as security, data consistency, and protection of intellectual property. For example, Microsoft Office has an API for programmatically manipulating its primary document types: spreadsheet, document, slideshow, database. Yet this API does not cover all of the tasks a user can accomplish when working with these document formats in the Microsoft Office GUI.
Screen reading is an approach to enabling auditory display for GUI programs with closed models. Instead of relying on access to an application model, a screen reader
Console Screen Readers
In 1986, IBM released its first of many screen reader products, the IBM Screen Reader for DOS, a program capable of creating a synthesized speech display for applications running in the 80 by 24 character text-mode console. The software operated by peeking into the video buffer, extracting characters and cursor position, and speaking them to the user. The screen reader defined keyboard commands for querying information on the console immediately (e.g., read the next character, word, line), after a significant change (e.g., read new lines as they scroll into view), or on changes in specific screen regions (e.g., read status text when it changes in the ten bottom-right characters).
Console screen readers are still in use today, particularly in Unix-like environments where use of the text-only console is still practical and powerful. Open source projects such as Speakup (http://www.linux-speakup.org) and YASR (http://yasr.sourceforge. net) are available free of charge and operate on Linux virtual terminals. These new projects mimic most, if not all, of the features of the original IBM product. However, as most modern applications now run in GUI environments, simply relying on a text console for all work is not a solution for people with visual impairments, especially in the workplace.
Off-Screen Model Screen Readers
The advent of the graphical desktop brought with it a host of problems for screen reading. When running in graphics mode, the video buffer is no longer a convenient source of text content that can be directly reported to the user. Instead, it is filled with red, green, and blue color values void of semantic information at the programmatic level. The content of the video buffer is now a visual gestalt.
The first screen readers for GUI desktops overcame this problem by extracting in- formation from earlier stages of the rendering pipeline. Messages from applications to video drivers proved to be useful, though unwieldy descriptions of the visual display.
The SoundTrack system (Edwards, 1988) was an early attempt at generating an audi- tory display from this low-level source. As mentioned previously, SoundTrack simply played sounds and spoke the name of widgets as the mouse pointer passed over them, but this interaction proved extremely difficult for users with visual impairments. In con- trast, the IBM Screen Reader for OS/2 (SR/2) used its render hooks to reorganize the content of GUI applications into a speech display reminiscent of console screen readers (Thatcher, 1994). Thatcher argued that GUIs and text-based interfaces were different ways of doing the same thing, and that familiar text-based access methods should be the basis of GUI access. This sentiment is echoed, to some extent, by desktop screen readers to this day.
A push for component reuse in the 1990s drove applications and desktop environ- ments to provide methods of programmatic information access. Applications and desk- top environments began providing APIs for querying and controlling data models and user interfaces across process boundaries. Today, the two most popular screen readers in existence, JAWS (http://www.freedomscientific.com/fs products/software jaws. asp) and WindowEyes (http://www.gwmicro.com/Window-Eyes/), use the native Win- dows operating system API, application specific APIs (e.g., Internet Explorer COM interface), and document model APIs (e.g., Internet Explorer DOM interface) as addi- tional information sources (Blenkhorn and Evans, 2000).
Screen readers that observe multiple, disparate information sources create an ag- gregate off-screen model (OSM) (Schwerdtfeger, 1991). Such a screen reader then uses this model as the basis of its output to the user. For instance, the off-screen model in SR/2 is fed by one source, the video render pipeline, and organized to support row and column queries for text, much like a console. The off-screen models for JAWS and its commercial kin are far more complex. These OSMs are fed by a special video driver, one or more component or document object model APIs mentioned above, and at least
one accessibility API mentioned in the next section. This off-screen model permits four common auditory displays of screen content:
1. Focus, selection and caret tracking. The screen reader reports the source of the last important on-screen event caused by user input or application feedback. 2. Spatial review. The screen reader reports the content under a spatial pointer such
as the mouse pointer as the user moves it across the GUI desktop.
3. Hierarchical review. The screen reader reports widget content as the user explores them in a pre-order tree traversal (e.g., desktop, window, container, menu bar, file menu, and so on).
4. Document review. The screen reader reports document content as the user navi- gates by content type such as paragraphs, headings, and sections.
The primary auditory method for conveying the content of the OSM to the user is synthesized speech. For instance, when an item is selected in a tree view widget, a typical screen reader reports tree item, Inbox, one of ten in two (i.e., role, text, peer index, total peers, tree level). Likewise, when a user issues a request to Read the next Web page link, a screen reader might respond with link, http://www.unc.edu (i.e., role, text). The report is always serial and always concerned with one point of regard, a focus of user attention, at a time (e.g., a single widget, a character at the text caret, the title of a window). Figure 2.1 visually depicts the concept of a point of regard.
Some commercial screen readers also support the mapping of auditory icons to events and content. For instance, JAWS users can associate a sound with the red underline in Microsoft word to indicate a misspelled word. Again, the sounds are almost exclusively used to indicate information at the current point of regard only.
Figure 2.1: Screen reader view of the GUI desktop. When the calculator window is activated and the Clr button receives the input focus, a screen reader reports Calcu- lator - Basic, push button, Clr. The screen reader is typically silent about events and information outside this point of regard.
Accessibility API Screen Readers
The trouble with an off-screen model is that it is hard to maintain in the face of an ever-growing number of applications, widget toolkits, desktops, and associated APIs. Recent developments have led to the creation of accessibility APIs for inspecting and controlling GUI applications through a single, consistent conduit on a given desktop. An accessibility API exposes a tree of accessible objects, rooted at the desktop, each having attributes describing its role, state, attributes, and content, and methods for affecting them (e.g., Figure 2.2). The interface also includes support for observing
important desktop events at the widget level such as text insertion, list item selection, the appearance of a new widget, and document reloading to name a few.
Figure 2.2: Accessibility hierarchy for a GUI dialog. The window on the left is the keyboard properties dialog from GNOME 2.14. The tree on the right shows the roles and names of the widgets exposed by the platform accessibility API. Note the existence of many unnamed fillers nodes that do not have corresponding visual counterparts.
Commercial OSM screen readers take advantage of accessibility APIs by using them as additional information feeds for their off-screen models. For instance, JAWS and Win- dowEyes both currently use the Microsoft Active Accessibility toolkit and plan to use Microsoft’s improved Universal Interface Automation API as GUI information sources on the Windows desktop. More recent screen readers, however, discard the concept of an ag- gregate OSM and come to rely solely on accessibility APIs. For example, The VoiceOver screen reader (http://www.apple.com/macosx/features/voiceover/) for Apple’s OS X desktop relies exclusively on the simply named Accessibility Framework for reporting
www.baum.ro/gnopernicus.html), Orca (http://live.gnome.org/Orca), and Linux Screen Reader (Parente and Clippingdale, 2006) applications for the GNOME desktop environment use the Assistive Technology Service Provider Interface (AT-SPI) alone to report information to users with visual impairments.
Though accessibility APIs have eased the screen reader burden of maintaining a complex model, the screen reader view (i.e., the auditory display) has remained largely the same. A single stream of synthesized speech describes the current user point of regard, and, in some systems, auditory icons name events and widgets.