Visual Encoding - Information Visualisation: Making Sense of Digital Data

2 Information Visualisation: Making Sense of Digital Data

2.3 Visual Encoding

Visual encoding is the process of mapping data to either suitable graphical marks that are used to form a view, or to suitable visualisation techniques (a predefined view, such as bar charts, line charts, etc.). Graphical marks are areas, lines and points used to encode information through positional (one, two or three dimensions), retinal (size, colour, texture, shape, etc.) and temporal properties (Bertin, 1983). This section discusses 1) systems that map data to graphical marks to build visualisations and 2) systems that map data to a predefined visualisation technique, and compares and contrasts both methods.

2.3.1 Mapping Data to Graphical Marks

A number of visualisation systems (Livny et al., 1997, Tang et al., 2004, Stolte et al., 2002, Mackinlay, 1986) map data to graphical marks, which are combined to create a visualisation. For example, DEVise (Livny et al., 1997) consists of source data tables (TData) and graphical representations (GData). The records in the TData tables are mapped to visual symbols to create GData table entries consisting of visual attributes, such as colour, orientation, axes, size and shape, and the mapping process uses the TData schema. Rivet (Tang et al., 2004) uses an architecture which maps nominal values to colour and quantitative values to size and defines encodings which map certain data fields to certain visual variables, such as axes and colour.

Tableau uses VizQL, a formal language for describing visualisations (Hanrahan 2006). Users drag and drop data fields onto a visual canvas provided by Tableau to generate VizQL statements, which build visualisations. Tableau categorises and partitions database fields into scale and role classifications. The scale classification categorises the field as ordinal or quantitative and this classification is used for visual representation. Quantitative fields can be represented as axes and ordinal fields can be shown as header or classes. The role classification partitions fields into dimensions and measures. The graphic generation is comprised of three components, which include table configuration, graphic type within a pane, and visual encodings. The table configuration component uses database fields (ordinal and quantitative) to create expressions. The expressions consist of operands (database fields) and three operators (+, x and /). These expressions are used to form the clauses to create table axes. The graphic type is chosen based on the axes, which is determined within the table configuration, and a mark type chosen by the user. The mark types consist of text, line,

Gantt bar, rectangle, glyph, polygon, circle and image. Finally the visual mappings component uses the selected mark to encode the relevant database fields.

A Presentation Tool (ATP) (Mackinlay 1986) uses compositional algebra consisting of graphical languages and operators to generate designs. It requires application designers to supply data, which is analysed for structural properties of inputs, such as qualitative, ordinal, numerical and nominal values when synthesising designs, which are produced as an abstract image description. ATP uses graphical objects, including points, lines, and areas as sentences of graphical languages (collection of tuples), and semantics, to encode arrangements of graphical representations. The graphical sentence specifies the location of the graphical object, which can be used to determine the height and width of the object (expressiveness). ATP also considers the perceptions of an individual viewing the image and encodes ordering, size and colour into the graphical design process based on the input data. The composition algebra is used to generate designs by using a set of primitive graphical languages and composing design by merging parts that encode the same information. The synthesis algorithm consists of three processes: 1) partitioning the set of relations until a match is found with a primitive language; 2) selecting of candidate designs for each partition; and 3) combining the designs using composition operators.

Analysis

Both more recent and historical visualisation systems have used, and are currently using, techniques to map data to graphical marks. Expressions, consisting of operands and operators and compositional algebra have been used effectively by Tableau and APT respectively to map data to graphical marks. These types of mappings offer several advantages such as increased generalisation, extensibility and flexibility as new mappings can easily be added (Tang et al., 2004). In addition various types of datasets can use existing mappings. However, the literature suggests that over-generalisation, which is a feature that can result from these types of mappings, requires additional work to create appropriate encodings for specific visualisations.

2.3.2 Mapping Data to Visualisation Techniques

A growing number of visualisation systems (Viegas et al., 2007, Gonzalez et al., 2010, Derntl et al., 2012) map data to visualisation techniques which are either part of the system or taken from the widely available range of existing visualisation libraries. For example, Many Eyes (Viegas et al., 2007) automatically renders user data through commonly used visualisation techniques. It uses a table-based data model consisting of same-length named

columns where each column can either have numeric or textual data. The user-inputted data can also be interpreted by the system as unstructured (coming from freeform text entered onto the form), or it can be interpreted as a table if entered as tab-delimited. For tabular data, the system uses heuristics to determine a visualisation technique and it allows users to reverse this decision to select a different technique. Each dataset has metadata associated with it, some aspects of which are provided by the user and other aspects are automatically determined by the system. Many Eyes consists of over a dozen visualisation techniques, each of which has a predefined schema that specifies the data requirements. The schemas consist of mandatory and optional type slots, such as textual slots, multiple textual slots, numeric slots, multiple numeric slots and unstructured slots. The typed slots are matched against the typed columns to determine a visualisation technique to match the dataset.

Google Fusion Tables (Gonzalez et al., 2010) uses the Google Visualisation API13 to source the visualisation techniques. The data types in the input data are compared to the types needed for each visualisation technique to determine the appropriate set of techniques to be used. The set of visualisation techniques used by the infrastructure discussed by Derntl et al. (Derntl et al., 2012) are the chart-based techniques sourced from the Google Visualisation API. Gretl14 supports econometric analysis through a number of visualisation techniques, including line charts, box plots and scatter plots. The user is required to provide formatted data and select the technique to render it. Microsoft Excel15 supports data analysis by automatically generating visualisation techniques for user specified data, with users selecting the chart type from a number of available techniques.

Analysis

Some of the more recent visualisation systems map data to supported visualisation techniques or make use of available visualisation libraries and map data directly to appropriate visualisation techniques. Using these libraries offers the advantage of reducing the coding effort required to build the visual encoding component. Nevertheless, an understanding of the characteristics and affordances of each of the visualisation techniques used needs to be developed and integrated into the visual encoding component to allow the correct visualisations to be selected.

This section has addressed the two methods used for visual encoding and highlighted their advantages and limitations. In comparing and contrasting both practices, it can be seen that

13_{http://developers.google.com/chart/} 14_{http://gretl.sourceforge.net/} 15_{Microsoft Office 2013}

mapping data to graphical marks is more extensible and flexible than mapping data to visualisation techniques as it can handle a broader range of mappings. However, it requires a greater coding effort as it cannot take advantage of existing visualisation libraries. In the case of mapping data to visualisation techniques, a myriad of literature (Dias et al. 2012, Chi 2000, Carr et al. 1987, Heer et al. 2010, Graham & Kennedy, 2010), including evaluation papers exist that can support designers in encoding characteristics for visualisation techniques. In summary, both practices serve their individual purposes quite well and depending on the system requirements one can be chosen over the other.

In document The derived data approach to support the construction and consumption of explorable visual narratives (Page 33-36)