• No results found

Chapter 4 GridVis: a tool for supporting goal directed use through within-

5 System Implementation

In this section the use of XML to encode the documents and taxonomy is described then

an account of the visualisation’s implementation is given,

5.1

Encoding the metadata In XML

The documents and their metadata and taxonomies discussed in section 2 needed to be

encoded in a technically appropriate format, so that they could be conveniently authored

and visualised. XML was chosen to do this since, as discussed in chapter 2 section

2.3.2.1, it has been designed to allow structured documents to be conveniently encoded

and manipulated.

Each document’s structure, text and metadata are encoded in a single XML file. An

XML-Schema was used to ensure well formedness and to facilitate the metadata

authoring process. The metadata was encoded alongside the text to simplify technical requirements. It allows an XSL-T style sheet to directly filter and format paragraphs

according to the metadata assigned to them. Also, encoding the metadata this way allows GridVis to derive attributes from the text and the document structure (such as

paragraph length and section level) for use in the visualisation.

The XML tags were designed to allow an arbitrary hierarchical document structure and

enable rich metadata tagging at the paragraph level. The different XML tags used and

their legal relations to each other are captured in Figure 27. The nested

<collectionTag>s are used to encode document structure. A <terminalTag> holds a

single paragraph whose text sits in its <text> element, and whose metadata sits in its

<Description> element. The <Content_tag> contains an attribute which holds both the

tag and its location in the taxonomy (e.g. ‘publication.The_Independent’), and one

0..C O — ^ D e sc rip tio n r - - ! ^ t e r m i n a l T a g ] EI— co llectio n T ag [^1—( - — - ) 3 - _Zr \ _________ ' ,---y - ' C o n te n t_ ta gC o n te n t t a a .! 0..0O L— - t e x t collectionT ag 0 ..0 O

Figure 27 The structure o f the XM L documents visualised by GridVis. (Dotted lines indicate optional elements)

The X M L standard for describing m etadata is RDF (see ch apter 2 section 2.3.2.1). This w as initially used but proved unsuitable, since it has been designed to provide data for AI reasoning algorithm s. Since no such algorithm s are used in G ridV is, R DF is o f no im m ediate utility. M oreover, the use of RDF would have com plicated the production of the m etadata. Since it w ould require separate elem ents for applicability, it is only the <description> elem ents that follow the RDF standard.

An X M L-Schem a was not used to facilitate the authoring process for the first set of docum ents. W hen the first set o f docum ents were produced, the X M L -Schem a standard had not been settled. The consequence o f not using X M L -S chem a (or D T D ’s) were tw o­ fold. It necessitated error-checking code in G ridV is w hich could not provide very targeted debugging inform ation; this in turn increased tim e spent debugging the XML. A nother consequence was a tendency for equivalent tags to turn up in different places in the hierarchy, or with slightly different spellings (e.g. ‘sto re’ and ‘sto res’). These errors had to be elim inated by a draw n-out and tiresome process o f m anual inspection.

In order to avoid sim ilar problem s, an X M L-Schem a was used, alongside a specialised X M L authoring environm ent, to facilitate the authoring process for the second set of docum ents. The Schem a defines each allowed XM L elem ent, the attributes it m ay have, the types those attributes m ake take and the legal relationships betw een elem ents. It also

Chapter 4 section 5 .2 185

allows for all the possible values of an attribute to be defined. This feature was used to

ensure that tags were consistent and correct across paragraphs and documents. The

XML-Spy authoring environment used allowed the tags defined in the schema to be

selected from a drop down menu. The use of XML-Schemas alongside a specialist XML

authoring package, made the process of authoring the XML within-document metadata much faster and easier.

5.2

The visualisation impiementation

GridVis is designed for use on a corporate intranet where software installation for low

priority applications is problematic, it therefore has a client-server architecture; the

visualization is generated by a client-side Java applet, the queries are answered by a Java servlet using XLS-T to produce customized HTML documents.

GridVis starts by building the data stmctures, needed for the visualisation, from two

XML documents. One document defines the metadata taxonomy, and the other contains document content and accompanying metadata. The taxonomy is used to build the

metadata tree (see Figure 23). The metadata for each paragraph is read off and assigned

to the appropriate section of the metadata tree. Each paragraph and section is represented by a Chunk object, which retains the paragraphs size, section heading and

HTML anchor number. These Chunk and Tag objects are stored in both linear {Vector)

and hierarchical (DOMTree) data structures. The applicability of each piece of metadata

to a particular paragraph is stored in a. Chunk by Tag, matrix (i.e. a two dimensional

array). For the dynamic level of detail management feature, other data structures must

be built; these will be referred to as the visibleGrid data structures. These essentially

mirror, and are built from, the data structures just described, but only represent the

GridVis uses these data structures to build the visualisation. The matrix of applicability

information is used to construct the grid, while the linear data Chunk data structure is

used to produce the iconic document overview, and the hierarchical Tag data structure is

used to build the metadata tree. If the level-of-detail management feature is being used,

each user event results in the visibleGrid data structures being rebuilt using simple

recursive tree traversal algorithms.