Chapter 4 GridVis: a tool for supporting goal directed use through within-
5 System Implementation
In this section the use of XML to encode the documents and taxonomy is described then
an account of the visualisation’s implementation is given,
5.1
Encoding the metadata In XML
The documents and their metadata and taxonomies discussed in section 2 needed to be
encoded in a technically appropriate format, so that they could be conveniently authored
and visualised. XML was chosen to do this since, as discussed in chapter 2 section
2.3.2.1, it has been designed to allow structured documents to be conveniently encoded
and manipulated.
Each document’s structure, text and metadata are encoded in a single XML file. An
XML-Schema was used to ensure well formedness and to facilitate the metadata
authoring process. The metadata was encoded alongside the text to simplify technical requirements. It allows an XSL-T style sheet to directly filter and format paragraphs
according to the metadata assigned to them. Also, encoding the metadata this way allows GridVis to derive attributes from the text and the document structure (such as
paragraph length and section level) for use in the visualisation.
The XML tags were designed to allow an arbitrary hierarchical document structure and
enable rich metadata tagging at the paragraph level. The different XML tags used and
their legal relations to each other are captured in Figure 27. The nested
<collectionTag>s are used to encode document structure. A <terminalTag> holds a
single paragraph whose text sits in its <text> element, and whose metadata sits in its
<Description> element. The <Content_tag> contains an attribute which holds both the
tag and its location in the taxonomy (e.g. ‘publication.The_Independent’), and one
0..C O — ^ D e sc rip tio n r - - ! ^ t e r m i n a l T a g ] EI— co llectio n T ag [^1—( - — - ) 3 - _Zr \ _________ ' ,---y - ' C o n te n t_ ta gC o n te n t t a a .! 0..0O L— - t e x t collectionT ag 0 ..0 O
Figure 27 The structure o f the XM L documents visualised by GridVis. (Dotted lines indicate optional elements)
The X M L standard for describing m etadata is RDF (see ch apter 2 section 2.3.2.1). This w as initially used but proved unsuitable, since it has been designed to provide data for AI reasoning algorithm s. Since no such algorithm s are used in G ridV is, R DF is o f no im m ediate utility. M oreover, the use of RDF would have com plicated the production of the m etadata. Since it w ould require separate elem ents for applicability, it is only the <description> elem ents that follow the RDF standard.
An X M L-Schem a was not used to facilitate the authoring process for the first set of docum ents. W hen the first set o f docum ents were produced, the X M L -Schem a standard had not been settled. The consequence o f not using X M L -S chem a (or D T D ’s) were tw o fold. It necessitated error-checking code in G ridV is w hich could not provide very targeted debugging inform ation; this in turn increased tim e spent debugging the XML. A nother consequence was a tendency for equivalent tags to turn up in different places in the hierarchy, or with slightly different spellings (e.g. ‘sto re’ and ‘sto res’). These errors had to be elim inated by a draw n-out and tiresome process o f m anual inspection.
In order to avoid sim ilar problem s, an X M L-Schem a was used, alongside a specialised X M L authoring environm ent, to facilitate the authoring process for the second set of docum ents. The Schem a defines each allowed XM L elem ent, the attributes it m ay have, the types those attributes m ake take and the legal relationships betw een elem ents. It also
Chapter 4 section 5 .2 185
allows for all the possible values of an attribute to be defined. This feature was used to
ensure that tags were consistent and correct across paragraphs and documents. The
XML-Spy authoring environment used allowed the tags defined in the schema to be
selected from a drop down menu. The use of XML-Schemas alongside a specialist XML
authoring package, made the process of authoring the XML within-document metadata much faster and easier.
5.2
The visualisation impiementation
GridVis is designed for use on a corporate intranet where software installation for low
priority applications is problematic, it therefore has a client-server architecture; the
visualization is generated by a client-side Java applet, the queries are answered by a Java servlet using XLS-T to produce customized HTML documents.
GridVis starts by building the data stmctures, needed for the visualisation, from two
XML documents. One document defines the metadata taxonomy, and the other contains document content and accompanying metadata. The taxonomy is used to build the
metadata tree (see Figure 23). The metadata for each paragraph is read off and assigned
to the appropriate section of the metadata tree. Each paragraph and section is represented by a Chunk object, which retains the paragraphs size, section heading and
HTML anchor number. These Chunk and Tag objects are stored in both linear {Vector)
and hierarchical (DOMTree) data structures. The applicability of each piece of metadata
to a particular paragraph is stored in a. Chunk by Tag, matrix (i.e. a two dimensional
array). For the dynamic level of detail management feature, other data structures must
be built; these will be referred to as the visibleGrid data structures. These essentially
mirror, and are built from, the data structures just described, but only represent the
GridVis uses these data structures to build the visualisation. The matrix of applicability
information is used to construct the grid, while the linear data Chunk data structure is
used to produce the iconic document overview, and the hierarchical Tag data structure is
used to build the metadata tree. If the level-of-detail management feature is being used,
each user event results in the visibleGrid data structures being rebuilt using simple
recursive tree traversal algorithms.