Ordered and Unordered Treemap
Algorithms and Their Applications
on Handheld Devices
BJÖRN ENGDAHL
Master’s Degree Project
Stockholm, Sweden 2005
Numerisk analys och datalogi Department of Numerical Analysis
KTH and Computer Science
100 44 Stockholm Royal Institute of Technology
SE-100 44 Stockholm, Sweden
BJÖRN ENGDAHL
TRITA-NA-E05033
Master’s Thesis in Computer Science (20 credits) at the School of Computer Science and Engineering,
Royal Institute of Technology year 2005 Supervisor at Nada was Olle Bälter
Examiner was Yngve Sundblad
Ordered and Unordered Treemap Algorithms
and Their Applications on Handheld Devices
iii
Abstract
This thesis describes a way to visualize hierarchical structures on PDAs using Treemaps.
A new ordered layout algorithm for treemaps, called Split Layout, is pre-sented. Traditionally, treemap algorithms have exhibited a trade off between good aspect ratio and ordered layout. The new Split Layout is compared to five published treemap algorithms. Properties such as avarage aspect ratio, stability, run time and readability are measured, and it is shown that the new Split Layout performs better than any known ordered layout. The average aspect ratio is about 23% better and the stability increased between 28% and 40%. A user study was performed to measure the quality of the ordering of the new algorithm. 30 users gave a total of 360 measured response times, and it was found that the median time to locate a specific rectangle in the treemap only took 0.14 seconds (5%) longer for the Split Layout than for the Pivot by Split Size algorithm.
The technique was implemented on a PDA and used for visualizing threaded discussion forums. User studies confirm that the concept of using treemaps on PDAs looks promising. The contents of the forum were easily grasped, even though the number of articles exceeded one hundred.
iv
Ordnade och oordnade treemap-algoritmer och deras
användningsområden på handdatorer
Sammanfattning
Denna rapport beskriver ett sätt att visualisera hierarkiska data på PDA:er genom att använda Treemaps.
En ny ordnad Treemap-algoritm kallad Split Layout presenteras. Den jäm-förs med fem publicerade algoritmer. Egenskaper som längd-breddförhållande, stabilitet, körtid och läsbarhet mäts och det visas att den nya Split Layout-algoritmen skapar rektanglar vars längd-breddförhållande är 23% bättre än kända algoritmer. Dessutom ökar stabiliten i layouten med mellan 28% och 40%. En användarstudie utfördes för att mäta kvaliten på ordningen av rek-tanglarna. 30 användare gav totalt 360 mätresultat som visar att mediantiden att hitta en specifik rektangel bara tog 0.14 sekunder (5%) längre för Split Layout än för Pivot by Split Size.
Tekniken implementerades på en PDA och användes för att visualisera trådade diskussionsforum. Användarstudier bekräftar att metoden att använda Treemaps på PDA:er verkar lovande. Innehållet i forumet var lätt att överblicka, trots att antalet artiklar översteg hundra stycken.
Contents
Contents v 1 Introduction 1 1.1 Treemaps . . . 1 1.2 Project Goal . . . 1 1.3 Definitions . . . 2 1.4 Outline . . . 3 2 Background 5 2.1 Traditional Visualization Techniques . . . 52.2 Treemaps . . . 6
2.3 Other Visualization Techniques . . . 8
2.3.1 Cone Trees . . . 9
2.3.2 Reconfigurable Disc Trees . . . 9
3 Visualizing Threaded Discussion Forums 11 3.1 Threaded Discussion Fourms . . . 11
3.2 Layout Style . . . 12
3.2.1 Overview Layout Style . . . 13
3.2.2 Leaf Node Layout Style . . . 13
3.2.3 Detailed Layout Style . . . 15
3.3 Interaction . . . 15
4 Implementation 17 4.1 Hardware . . . 17
4.2 Software . . . 18
4.2.1 Connecting to Newsgroups . . . 18
4.2.2 Pocket Piccolo and Treemap Rendering . . . 19
4.2.3 Layout Algorithm . . . 19
4.2.4 Color assignment . . . 19
5 Algorithms 21 5.1 Slice And Dice . . . 22
5.2 Squarified . . . 23 v
vi CONTENTS 5.3 Strip . . . 26 5.4 Pivot . . . 27 5.5 Split Treemaps . . . 30 5.6 Comparison of Algorithms . . . 32 6 Results 35 6.1 Algorithm evaluation . . . 35 6.1.1 Automatic Test . . . 35 6.1.2 Readability . . . 39
6.1.3 Treemaps on PDAs and Discussion Forums . . . 41
6.2 Pocket Piccolo . . . 42
7 Conclusion 43
Chapter 1
Introduction
This report is part of the final project of the Master’s programme in Computer Science at the Royal Institute of Technology in Sweden.
The project was carried out as a part of a larger project at the Collaborative Visual Computing (CVC) research group in the Department of Computer Science at the University of Cape Town in South Africa.
1.1
Treemaps
Listening to a talk by Ben Shneiderman about the research performed at the Human Computer Interaction Lab (HCIL) at the University of Maryland, the idea was first born to try the concept of treemaps on compact displays. Having been a research topic at the HCIL for several years, the treemap is a means to display hierarchies in a space-filling approach. In doing so, you attempt to overcome the inherent drawbacks of using the traditional, treelike methods for visualizing hierarchical data.
Traditionally, hierarchies have been visualized using treelike structures with nodes that can be expanded and collapsed as the user navigates down the structure. One of the more obvious benefits of this traditional method is that you quickly grasp objects’ location in the hierarchy, but this comes to the cost of screen space - most pixels are used for the background rendering. One other disadvantage is that as the tree grows and the number of nodes become larger than what fits on the display without scrolling, the difficulties of grasping the hierarchical structures increase. The treemap tries to overcome the above problems by using the entire display for the hierarchical structure. Figure 2.1 shows a traditional tree of a directory structure. A treemap of the same structure is displayed in figure 2.2.
1.2
Project Goal
The properties stated in the previous section make the idea of using treemaps on compact displays appealing. The goal of this Master’s project was to explore dif-ferent algorithms for laying out the nodes of the treemap. The original Slice and
2 CHAPTER 1. INTRODUCTION
Dice algorithm proposed by Ben Shneiderman, very simple in its approach divided the space vertically and horizontally as the tree model was descended. It turns out that this algorithm has some disadvantages, a major one being poor aspect ratios. The rectangles produced by the algorithm are often very thin and hence difficult to select and compare to other rectangles. Figure 5.2 illustrates the problem. To overcome this, several other algorithms have been published during the past years, none of which however targeted for hand held devices with compact displays as a primary focus. The topic of this project was to evaluate the more popular algorithms on compact displays, and come up with modifications that might improve the final result. In a broader perspective, a goal was to find and enhance treemap algorithms targeted for PDAs (Personal Digital Assistant), and if applicable, even for treemaps in general.
The thesis was conducted as a part of a larger project, the goal of which was to evaluate different visualization techniques of threaded discussion forums on PDAs. Therefore, the target application for the treemap was a software for browsing threaded discussion forums running on a WiFi (def. sec 1.3) enabled iPAQ (PDA from HP). This was taken in account when stating the criteria for the properties of the treemap algorithm.
1.3
Definitions
Aspect ratio The longest side of a rectangle divided by its shortest side
GUI Graphical User Interface
JVM Java Virtual Machine, Required to run Java programs.
Con-verts the platform independent byte code to platform depen-dent, executable machine code.
Layout algorithm The way the rectangles of the treemap are distributed on the screen. Position and aspect ratio may vary for different layout algorithms.
Layout Style For treemaps, the way the original data structure is
con-verted into a treemap data structure. For example, some nodes of the original data structure might be omitted from the treemap data structure.
nntp Ancient protocol used to communicate with newsgroup
servers.
PDA Personal Digital Assistant. Handheld computer such as a
Palm Pilot or an iPAQ.
WiFi Wireless Fidelity, Local area network that uses high
fre-quency radio signals to transmit and receive data over dis-tances of a few hundred feet uses ethernet protocol.
1.4. OUTLINE 3
1.4
Outline
The thesis is organized in seven chapters. A background to the field is given in chapter 2. Chapter 3 describes how this project set about to visualize discussion forums with treemaps. In chapter 4, the software and hardware used in the project are described. The different layout algorithms used and their appearance on the PDA screen are discussed in chapter 5. The advantages and disadvantages of each algorithm are elaborated in detail. Furthermore, the new Split Layout is described. The overall results and experiences from the project are covered in chapter 6 and the thesis is concluded with a summary in chapter 7.
Chapter 2
Background
This chapter gives a background to what has been published so far in the field of hierarchical information visualization. First some traditional techniques are pre-sented. Second, an introduction to treemaps is given, and the chapter is concluded by other suggested alternatives to traditional layouts.
2.1
Traditional Visualization Techniques
The traditional way of visualizing hierarchical data can roughly be classified into three categories: listings, outlines and tree diagrams [11].
Listings are generally good at presenting content. The structural information, however, is not reflected very well. Displaying the location in the tree structure next to the items in the list is one solution, but that however is also suboptimal, since it requires that the user parse that path information.
Outlines, a trade-off between listings and tree diagrams, summarizes the first couple of lines in the listing, showing the structural location of the items by inden-tation. An example of this is shown in figure 2.1.
Figure 2.1. Outline (left) and Tree diagram (right) of the course folder hierarchy on one of the project computers. The outline is more detailed than the graphical tree diagram.
6 CHAPTER 2. BACKGROUND
The tree diagram is probably the most common method for displaying hierar-chical data sets that contain more than a few nodes. The items of the data sets are shown as icons in a graphical tree. The nodes can be expanded and collapsed which shows or hides the children of the nodes. A typical tree diagram is the directory tree in a file manager. The tree diagram is inferior to the other methods at presenting content, but instead, is much more capable of visualizing the structure. Figure 2.1 shows an example of an outline and a tree diagram of a folder hierarchy.
All three of the traditional methods have specific benefits and drawbacks, but they all share the property of producing defective layouts for very large data sets. The definition of large is not written in stone and of course varies with screen size. A desktop screen can display a larger structure than the screen of a handheld computer, but studies from treemap readability on desktop computers show that as the number of nodes grow larger than a hundred, it becomes increasingly difficult to grasp the structure [11]. This limit definitely decreases on smaller screens. Another problem with tree diagrams, which in particular occurs on small screens, is horizontal scrolling. In order to expand all the nodes in a hierarchy, the tree might become too wide to fit on the screen. Horizontal scrolling is very likely to make the user loose the orientation in the hierarchy, and should therefore be avoided in most situations [6, 4].
2.2
Treemaps
To overcome the shortcomings of the traditional visualization techniques described in the previous section, the treemap was developed [11]. It works by dividing the screen into nested sequences rectangles each of which representing a node in the tree. The area of the rectangles is proportional to some quantity in the data set and the color of the rectangle encodes another dimension. For example, figure 2.2 shows a treemap of the same file structure as figure 2.1. The area of the rectangles is proportional to the file size and the color indicates date. In this way you get an overview of the hierarchy as well as the content, without blurring up the display with too much text.
In a traditional tree diagram, a considerable part of the screen space is used as background pixels, the tree diagram itself only taking up a proportionally small amount of the screen. This is an issue for small screens in particular, where you have to use the display area with care. The treemaps, on the contrary, are space filling in their approach, thereby using 100 percent of the available screen space.
An important aspect of the treemap, which has received much attention in the recent years [1, 2, 13], is the algorithm used to layout the rectangles. Given a data set with fixed item sizes, it is possible to distribute the rectangles in many ways, yet retaining the proportional relationship between the areas of the rectangles and obviously some ways are better than others. In order to gain an understanding of which methods are preferable, some sort of metric of the algorithm has to be employed. A common metric used by many designers [1, 2] is the aspect ratio of the
2.2. TREEMAPS 7
Figure 2.2. Treemap (Slice and Dice layout) of the same file structure as shown in 2.1. The area of the rectangles is proportional to the file size and the color propor-tional to the date of the file. The frames represent the directories.
rectangles. Slice and Dice, the first published algorithm [11], often produces very thin rectangles which are difficult to select and whose areas are difficult to compare. Therefore, one goal for a treemap algorithm is to produce square-like rectangles whose aspect ratios are close to one. The squarified treemap algorithm [2] is good in this sense, and produces a layout with an average aspect ratio very close to one. This algorithm is described in detail in section 5.2.
Another metric used by [12] is the amount of change a layout undergoes as the data set is updated (stability). Imagine that you frequently watch the same data set, but the content may have been updated since the last time you studied it. Despite the change in the data, you expect to find the rectangles in roughly the same position as they were the last time. A layout algorithm that totally repositions the rectangles as the data is updated is given a high value and a consistent layout receives a value of zero. [12] defines the layout distance function, d(r1, r2), between to successive
layouts of a certain rectangle. Let a rectangle r be defined by a 4-tuple (x, y, w, h) indicating the x- and y-coordinate of the upper left corner, the width and the height. If rectangles r1 and r2 are defined by (x1, x2, w1, h1) and (x2, y2, w2, h2), then the
layout distance function is given by
d(r1, r2) =
q
(x1− x2)2+ (y1− y2)2+ (w1− w2)2+ (h1− h2)2
Consistency between two successive layouts will hence give a zero change (high stability), whereas the more change means the higher value of the distance function (low stability).
The squarified layout described in section 5.2 is an example of an algorithm with a low stability, whereas the Slice and Dice (section 5.1) algorithm has an average change close to zero, an extremely high stability.
Although aspect ratio often is a good metric of the usability and clearness of the treemap layout, there are some cases where it falls short. If you have a data
8 CHAPTER 2. BACKGROUND
set that produces a layout with plenty of small rectangles, aspect ratio might not be the best attribute to measure. What really should be avoided is a layout with a lot of rectangles that are too thin to select. It is often better to make the aspect ratio of the bigger rectangles worse in favour of the smaller rectangles. Therefore, this project used a metric of all rectangles whose width or height was smaller than a certain threshold. This gave a good indication of how readable and interactable a certain layout was.
Yet another way to evaluate a layout algorithm is its readability. Being a subjec-tive property per se, measuring it is not as the straightforward as the other metrics described earlier. One way is to perform a user study, requesting the participants to perform certain tasks. The readability could then be traced through timings and interviews. Another, more mechanical way used in [1], assigns a numerical value to a given layout, the value based on the number of times that the motion of the reader’s eye changes direction as the treemap layout is scanned in order. The value would then in some way correspond to how easy it is to find a particular item in the treemap. More precisely, they follow the center of the rectangles in order and count the number of times the vector from one center to the next changes direction more than 0.1 radians. The amount of change is not taken into consideration, as they argue that once the direction of scanning has changed at all, the eye has to stop, the actual amount of change being less important. Normalizing the result gives a value of 1.0 to the most readable layout, where all items follow a horizontal or vertical pattern, and 0.0 to a completely shuffled layout. An example of the former is the Slice and Dice layout (section 5.1), and of the latter one is the squarified layout (section 5.2).
The discussion in the previous paragraph brings up the concept of ordered and unordered treemaps. Since many real world data sets have a natural order, a de-sirable property of the layout algorithm would be to preserve that order. In other words, adjacent items in the data set, or list in the ordered case, would be assigned to rectangles next to each other in the treemap. Since the treemap is two-dimensional, there are many ways to convert the one-dimensional list into a two-dimensional treemap and the area has received a lot of attention recently [13, 12, 1]. Specifi-cally, although noone has come up with a mathematical proof, there seems to be a trade-off between aspect-ratio and ordering. The algorithm, currently known, that produces the best aspect ratios (section 5.2) is also the one that produces the worst ordering and vice verse).
2.3
Other Visualization Techniques
Treemaps are not the only alternative to the traditional layout techniques mentioned in section 2.1. Here are some other techniques that have received interest during the recent years.
2.3. OTHER VISUALIZATION TECHNIQUES 9
Figure 2.3. Example of a cone tree taken from [9], page 193. Note how the nodes overlap when projected onto a 2-dimensional plane.
2.3.1 Cone Trees
The Cone Tree [9] is a three dimensional visualization technique which like the treemap tries to make effective use of the available screen space while displaying the whole data structure at the same time. The node is projected onto the apex of the cone and the children are placed around the circular base. To further enhance the interaction, when a node is selected with the mouse, the cones rotate so that the selected node and all the nodes following the path to the top are brought to the front. Figure 2.3 shows a cone tree taken from [9] page 193. One disadvantage with the cone tree is that as the three dimensional structure is projected onto two dimensional plane, overlap inevitably occurs which obscures the view of the rear items. According to the author, cone trees have been used in a number of applications, ranging from file managers, organizational structure browsers to decision support systems, where cone trees visualize the companies’ operating plan. Other potential applications include IDE (Interactive Development Environment) class browsers and network management interfaces.
2.3.2 Reconfigurable Disc Trees
This method (RDT) was presented in [7] and is also 3D visualization technique of large hierarchies. According to the authors, the RDT is an improvement of the cone tree and tries to overcome the disadvantages of the cone tree, where overlapping occurs when the 3d structure is mapped onto a plane. The RDT has two properties associated with each cone: A reference point which is a node above the center point of the plane of the current node, and an apex point which is the top of the cone. By varying the distance between the parent node, the reference point and the apex point, different shapes can be created: The shapes can be flattened out and projected onto a 2d plane, which like the treemap use the available screen space more efficiently. This is achieved without overlap, this in contrast to the cone trees.
Chapter 3
Visualizing Threaded Discussion Forums
One of the goals of this project was to evaluate whether treemaps are a suitable visualization method on handheld devices with small screens. The treemap was applied to an application for viewing threaded discussion forums on PDAs. Used as a browsing tool, the treemap would give a good structural overview of the forum, as well as displaying additional information of the articles, such as thread activity and article size.
This chapter describes how the discussion forum data set was rendered on the screen as a treemap. Section 3.1 presents the conceptual model for threaded discus-sion forums and gives an insight into how a treemap can be used to visualize that model. In section 3.2, the three different methods that were used in the project to convert the model are described in detail. The last section 3.3, although not really within the scope of this project, outlines ideas how the user may interact with the treemap. However, nothing is mentioned about what algorithm is used to distribute the rectangles in the treemap. This is covered in chapter 5.
3.1
Threaded Discussion Fourms
A threaded discussion forum is hierachical data set that may consist of hundreds of threads and sub nodes. Here is a definition of the terms that will be used for describing a discussion forum.
A forum is a collection of articles. The articles are grouped in discussion topics which are called threads. A thread is represented by the first article in the discussion topic, i.e. the article that initiated the thread. To convert this into a tree, let each forum be represented as a root node in a tree. The threads will be the immediate children to the forum nodes. The articles of a thread are the thread node, together with all its children. If an article is posted as a reply to a specific article in the tree, that article node will become the parent of the reply. If a new thread (discussion topic) is created, that article will be a direct child to the root node (forum node).
Fitting the content on a small screen with the traditional text based tree struc-ture, where nodes are expanded and collapsed, is not ideal. The users will have
12 CHAPTER 3. VISUALIZING THREADED DISCUSSION FORUMS
to scroll horizontally as well as vertically and moreover, the overview is lost as the number of posts in the forum exceeds the number of text rows that can be displayed on the screen. To overcome the problems connected with traditional text based browsing schemes, a treemap is used to display the contents of the discussion fo-rums, the intention being to provide users with a quick and comprehensible overview of the contents and the social structure (such as most popular thread) of the forum. Rather than being a tool for people seeking information on a very specific matter, the main focus is to support browsing behavior, where users are interested in seeing the activity in various groups. Traditional text based visualization techniques give the same space to each thread with no regard to attributes that might be interesting to forum users. When browsing a forum socially, some threads are more interesting and treemaps can highlight these threads in a natural way on a small screen by varying attributes such as size and color.
In this project, the threads are rendered as rectangles in a treemap, their size being proportional to the number of articles in the thread. The color can be pro-portional to the activity of the thread or, when searching in the forum, propro-portional to the relevance of the query for that particular thread. Representing the posts in the discussion forum as rectangles in the treemap, differs to some extent from what is normally the case with treemap interfaces. Often [1, 14], it is the leaf nodes of the tree that are the actual content nodes: files in a directory structure and photos in a folder hierarchy, multidimensional data in biomedicine are some examples. The interior nodes normally work as meta-content placeholders for the actual content. In discussion forums, however, there is no such distinction: the action of creating a new thread is not independent of the first posting to that thread. So, unlike a file system which may have an empty folder, discussion forums have data at both branch and leaf nodes. This means that one cannot do a naïve application of treemaps, but must consider how best to modify them for discussion groups. The only exception to this rule are the root nodes, which represent the discussion forums itself. Forum nodes are not content nodes, and can hence, unlike the interior article nodes, be rendered as meta-content place holders.
3.2
Layout Style
Bearing in mind what is mentioned in the previous section, about the differences in converting a discussion forum into a treemap, compared to traditional treemap data sets, three different solution were considered. The first one, which is called the overview version, promotes overview to detail, the emphasis of which is to provide the user with a top down view of the different on-going discussions in the forum. The second version promotes detail and displays all articles in a single thread. The third version, takes a different approach and displays all the leaf nodes of the discussion forum. To illustrate this, the three subsections below describe each of the layout styles. The example figures shown in the subsection shown all refer to the sample discussion forums shown in figure 3.1. The picture shows two forums with two
3.2. LAYOUT STYLE 13
Figure 3.1. Sample discussion forums used for when rendering different treemaps in the next secitons.
threads each. The first thread in the first forum has no reply, the second thread in the first forum has three replies. The first thread in the second forum has no reply, and the second thread has one reply.
To read the actual contents of the articles is beyond the scope of the project, but typically, the user clicks on one of the rectangles. Depending on what the rectangle represents (a thread or an individual article), text appears displaying the article content or a list with articles in the thread.
3.2.1 Overview Layout Style
Indicated by its name, the purpose of this style is to give the user an overview of the contents of the entire discussion forum. Motivated by research on browsing behaviour in social networks [3], the style shows only the forum nodes and the thread nodes, the intention being to give a new user an easy way to find which are currently the hot topics.
The thread nodes are drawn as rectangles in the treemap with the size propor-tional to the number of articles in the thread. The color of the rectangle indicates how active that particular thread is. To show which articles that belong to which forums, a frame is painted around the thread rectangles that reside in the same forum. An example is shown in figure 3.2. An example of the final application using overview style is given in figure 3.5.
3.2.2 Leaf Node Layout Style
This experimental style takes a different approach and renders only the leaf nodes of the discussion forum and the representation of the data model is therefore somewhat harder to grasp than for the other layout style.
The idea is to only render the leaf nodes, i.e. the most recently posted article in every sub tree within a thread. The motivation for this is that the top level thread node is not necessarily a good indication of the discussion topic for all subtrees in the thread. In fact, the topic of discussion often evolves as more articles are posted
14 CHAPTER 3. VISUALIZING THREADED DISCUSSION FORUMS
Figure 3.2. Overview layout style of the discussion forum shown in figure 3.1. Rectangle area is proportional to the number of articles in the thread. The info rectangle appears when the user touches the different rectangle with the stylus and shows topic and date for the thread. The color is proportional to the thread activity.
to the thread, which creates different discussion topics within a thread, all of which originating from the same top level article. Possible advantages of only rendering the leaf nodes are:
• Not only the topic that originated the discussion is shown. Rather, all possible branches are displayed. If the discussion has evolved into different topics, the leaf nodes will probably represent them all.
• By clicking on a leaf node rectangle, a list with all the articles up to the thread root node is shown. This way, the hierarchical structure of a discussion, which is hard to render on a small screen, is converted into a linear structure, the content of which can easily be presented with a list.
The drawbacks are as mentioned, that the concept of only rendering the leaf nodes, is less intuitive and harder to grasp for the user. Additionally, the articles in between, that are not leaf nodes are hidden from the user. Even though they are shown when clicking on a leaf node, some content is lost. Figure 3.3 shows a picture of the layout style. Compare the picture with the tree structure in figure 3.1. What is rendered are the leaf nodes in the structure. Each rectangle in the figure 3.3 represent a leaf node of the tree in figure 3.1. When the user taps the stylus on one of the rectangles representing a leaf node, the user will be shown the topic of that particular leaf node article, together with all the articles up to the top node article.
3.3. INTERACTION 15
Figure 3.3. Example of Leaf Node Layout Style. The articles are grouped by thread. The area is proportional to the number of articles in each thread. The color of the rectangles is proportional to the date of article.
3.2.3 Detailed Layout Style
This layout style is either meant to be used for smaller forums, or for viewing a single thread, for instance when clicking on a thread in the overview layout style. While the overview style only rendered the top level threads of the discussion forums, this style displays all the articles in the thread. This is done nesting the rectangles inside one another. The color of the rectangle indicates the date it was written, and the size indicates the number of children. Since the number of nodes that have to be rendered is much bigger, and screen space is lost due to the nesting of the articles, this style is only applicable to small forums or individual threads. A picture of the style is shown in figure 3.4. The figure shows all the articles in the second thread of the first forum in figure 3.1.
3.3
Interaction
In order to create a realistic application, some level of interactivity needs to be present. Some treemaps display a label inside the rectangles with information of the item. Due to the small screen, that is not possible in this project. At first, the thought was to provide this information in an information bar at top of the screen. Clicking on the rectangle would show the subject of that article. It was however decided that clicking would add an unnecessary step, and instead a solution similar to tool tips was implemented, where holding the stylus over a rectangle brings up a square next to the stylus, with the subject, date and author of the article.
Features like bringing up a complete article with text and posting new articles were considered not to part of the project, and were thus left unimplemented.
16 CHAPTER 3. VISUALIZING THREADED DISCUSSION FORUMS
Figure 3.4. Detailed layout style showing the second thread of the first forum in figure 3.1. This type of display is shown when the user clicks on a thread in the overview layout style (figure 3.3). The color of the rectangles is proportional to the date of the article.
Figure 3.5. Example of how the application might look when running on a PDA. Rectangles represent threads. Size prop. to number of articles in thread and color prop. to thread activity.
Chapter 4
Implementation
To test the different treemap algorithms and the suitability of using treemaps for visualizing discussion forums on PDAs, a prototype application was implemented on an iPAQ. This chapter will describe the various aspects and properties of the hard-ware platform used, the softhard-ware that was implemented and the tools and libraries that were employed when developing the software.
4.1
Hardware
The PDA used in the project had to follow the requirements stated below.
• To be able to display the treemap satisfactory, the screen must be good enough with a high resolution and bit depth. To low a resolution would mean less detail in the treemap which would make small rectangles difficult to select and compare.
• High performance. Some of the treemap algorithms have a complexity of O(n2) and with hundreds of objects to render, a slow CPU would mean the user would have to wait for treemap updates.
• Mobility. The PDA must be easy to carry with so it can be used naturally. Moreover, a wirless internet connection enhances the experience greatly, since the user may connect to internet usegroups without having to put the PDA in its cradle to access the Internet.
The device that was used was a Compaq iPaq 4150 equipped with built-in WLAN (Wireless LAN). The PDA was small, light and easy to carry with oneself. With a built in-support for WLAN, you could connect to different newsgroups on the internet whenever you were in range of an access point. Without WLAN support, in order to access different newsgroups, you would have needed an internet connected desktop to which you would attach the cradle with the PDA.
18 CHAPTER 4. IMPLEMENTATION
The screen was of a good quality with a resolution of 239x291 pixels and 64000 colors. The processor was an Intel (R) XScale 400 MHz, and the device was equipped with 64 MB RAM.
On the negative side were problems with the letter recognition. Apparently, the algorithms are not perfect yet, but as the application used in this project did not involve much writing, the problem was considered of less importance.
4.2
Software
The prototype application was developed in C# .NET using Microsoft Visual Studio 2003.
The first issue that had to be addressed was whether to use Java or a Microsoft solution. After a little consideration, it was decided to abolish the thought of using Java, the reason being that in order to install a JVM on the PDA, either Pocket PC had to be replaced by Linux or a special Pocket PC JVM had to be installed on the device. Both of the alternatives had drawbacks and were most likely to cause an overhead in development time compared to using a Microsoft solution. Furthermore, Visual Studio offers excellent debugging support when running the applications on the PDA. Having a good debugger is very important and saves a lot of time.
To make the evaluation process easier, the application was split into three parts. One part implemented the nntp protocol and was used to download articles from public news servers on the internet. The second part, which actually used a third part library called Pocket Piccolo, was responsible for rendering and interaction with the treemap elements. The last part was the layout algorithm, which was completely separated from the rest of the code, which made it particularly convenient to switch between and implement new layout algorithms, without relying on which technique was used to render the treemap.
4.2.1 Connecting to Newsgroups
In the first part, which required that the PDA be in range of a wlan access point, allowed the user to connect to a public newsgroup server on the internet. All the newsgroups on the server were downloaded and the user chose which groups to browse. When doing that, all the articles in the selected newsgroups were down-loaded to the PDA.
To allow for fast off-line access to the newsgroup, there was an option to save the downloaded newsgroups as an xml file locally on the PDA. In this way, the user was able to access the newsgroup later when not in range of a wlan access point.
Note that the emphasis of this part was purely to provide necessary and essential functionality so that the treemap algorithms could be evaluated. Therefore, the interface was as simple as possible and no effort was made to enhance the gui or implement more functionality apart from what was considered absolutely crucial for the rest of the application to function.
4.2. SOFTWARE 19
4.2.2 Pocket Piccolo and Treemap Rendering
Pocket Piccolo is a graphics library developed by the University of Maryland. The reason for using this graphics library was that it was easy to create graphics object and add interaction to the objects. Since as little time as possible of the 20 project weeks could be spent on prototype implementation, speed was important. The rectangles of the treemap were all represented as a Pocket Piccolo PNode object which had properties such as color and text. Each PNode object had built in support for interaction such as stylus down and tap and for manipulation such as zooming and scaling. This made it easy to get something working that also looked good in little time. In chapter 6 it is argued whether it would be feasible to use the same graphics library in a real system.
All the articles that the user wanted to browse were stored in a data structure that resembled a tree. When it was time to render the tree as a treemap, the nodes were traversed and for each node that would be drawn on the screen a PNode object was created and added to the canvas. To each PNode object, an input listener was attached which kept track on when the stylus was tapped inside that particular PNode. Upon such an event, an information square was displayed with data, such as date, author and subject, of the article represented by that rectangle.
4.2.3 Layout Algorithm
When all the PNode objects had been created, they were laid out on the screen by the current layout algorithm. All PNode objects implemented a common interface the purpose of which was to separate the layout algorithm from the presentation layer, thereby making the algorithm independent of Pocket Piccolo. In this way, it was easy to plug in new layout algorithms into the application. The different algorithms are described in detail in chapter 5, but they were all resulted in that each PNode object was assigned a bounds value, indicating the coordinates on the screen for each PNode rectangle.
4.2.4 Color assignment
The way in which the color was assigned to each rectangle was not obvious. In this application, the color indicated the date of the article, so that recent articles become green and older ones become darker. To make this flexible, each PNode had a function called getColorAttribute() which returned an integer, which in this case was the number obtained from the date by adding year, month, day, hour and second. For example, March 7 2005, 11.30 p.m. became 200503072330.
The second issue was how to distribute the colors among the articles. One option is to use all the colors, which basically means that all the rectangles are sorted by date and the most recent rectangle is assigned the greenest hue, the second most recent is assigned the second greenest hue, and the oldest article becomes black. This has the advantage that all the available colors are used, which hopefully makes it easy for the user to discriminate between the articles and to locate the most recent
20 CHAPTER 4. IMPLEMENTATION
article. The disadvantage is that if all the articles are very recent, say from the same day, still one will become green and one black, no difference in hue from another data set with articles ranging from today and three years ago.
Another option would be to sort the articles by date but let the actual date influence the color. That is, if 3 articles are from the same day and one article is three years old, the old one will be black whereas the recent articles will all be very green. This means, as opposed to the former alternative, that the hue will be proportional to the date. The disadvantage is that unless the article dates are equally distributed on the time scale, not all colors will be used when rendering the treemap.
The method used in this application resembles the first alternative which uses all colors, but with some modification. In social discussion forums, it is probably the most recent articles that are important, but if an article is six or eight weeks old is of less importance.
Therefore, it would be good if the resolution was higher for more recent arti-cles than for older ones. To accomplish this, the idea was to apply some sort of logarithmic function to the date value when the color was obtained. To get more tuning though, a discreet solution was implemented which assigned 50 percent of the colors to 20 percent of the most recent rectangles. This way, it became easier to discriminate between the more recent articles, whereas the older ones were more similar in hue.
Chapter 5
Algorithms
A key part of the treemap layout is the algorithm used to calculate the dimension and location of the rectangles. As mentioned in earlier chapters, there exists no unique mapping from a given data set to the way the rectangles are distributed. Neither does there to this day exist an equation to calculate the optimum layout for a given data set. Rather, the different algorithms are designed to optimize various aspects of the layout.
The algorithms that have been used in this project all serve a different purpose, each of which taking a different approach aimed at optimizing one or multiple of the metrics described in section 2.2.
Although optimizing different metrics, the general procedure is roughly the same for the algorithms. The overall workflow can be described in a recursive way:
void layoutNodes(Node[] nodes, Rectangle totalBounds) { foreach node in nodes
Rectangle nodeBounds = calculateBounds(node.size, totalBounds); node.bounds := nodeBounds;
totalBounds.remove(nodeBounds);
// Layout the children within the bounds of the parent node if (node.hasChildren)
layoutNodes(node.children, node.childrenBounds); }
The central part of the algorithm, is of course the third line, where some part of the available area is assigned to each node. The above listing is very simplified. It follows a sequential pattern, processing each node one at a time. This might not be the case. Some algorithms look ahead in the list of nodes when caculating the bounds for a single node. But the overall flow is identical for all the algorithms -a function th-at processes e-ach node, depending on its size, -assigns -a portion of the available area. If the node has children, they are laid out within the bounds of the parent node.
22 CHAPTER 5. ALGORITHMS
Each of the algorithms listed below were used in the experiment. They will be described in detail and each graded on the metrics aspect ratio, thin rectangles, change and readability described in section 2.2.
The algorithm can be grouped depending on how they function. Among the ordered treemap algorithms, which preserve the natural order of the data set are, are the Slice and Dice and the Strip layout.
Partially ordered treemaps are the Pivot and the Split layout. Although pre-serving order, the layout is not as readable as for the fully ordered layouts which follow a linear pattern. Instead, adjacent elements in the data set are laid out next to each other in the treemap, but the direction may change vertically or horizontally for each element, making it somewhat more difficult to grasp.
The squarified layout is totally unordered. The rectangles are sorted by size and their aspect ratios are extremely good.
All algorithms except the Split layout are well known published algorithms. The new Split layout was developed in this project to get an ordered algorithm with good average aspect ratio.
5.1
Slice And Dice
Published in 1991, this was the first layout algorithm for treemaps and perhaps one of the simplest to understand [11]. It functions well for smaller layouts, though its biggest problem being poor aspect ratios, the algorithm becomes less useful for larger data sets, and naturally small screens.
For each level in the tree the layout direction is conversely changed between horizontal and vertical, and for all the children at a given level, the display area is divided amongst the children, each child obtaining a slice proportional to its size attribute.
For example, suppose there are three children with sizes 1, 2 and 2 at a certain level and the layout direction for that level is horizontal. If the available space is a rectangle of dimension 100x100 pixels, the first item of size 1 will be placed to the left. Its size being 1+2+21 = 15 of the total size, the rectangle representing the first item will have dimensions 20x100. Analogously, the second and third item will be in the middle and to the right, both in rectangles of dimensions 40x100 pixels. Now suppose the second rectangle has two children with sizes 2 and 8. Descending one level, the rectangles will be laid out vertically inside the middle rectangle of dimension 40x100. The first child will be given 2
10 of the total area and thus placed
in the upper rectangle of dimension 40x20 pixels, whereas the second item will be placed in the lower rectangle with dimension 40x80 pixels.
Figure 5.1 shows how a small tree is converted into a Slice and Dice treemap. The structure is easy to grasp for very small trees, but as the number of nodes grows larger, some rectangles become very thin. This makes it difficult, not only to compare the area of the rectangles, but also to interact with the treemap and select different items.
5.2. SQUARIFIED 23
Figure 5.1. Example of how a tree is converted into a Slice and Dice treemap. The label indicate the node sizes which correspond to the rectangle areas.
The strengths of the algorithm are found in the change and readability measures. Being an ordered treemap algorithm, it is particularly predictable in its placement of the rectangles, and hence it is easy to find a certain item among the rectangles. When updated, all the rectangles stay in the same place, the only difference being that the left siblings of the changed node are squeezed to the left, and the right siblings are squeezed to the right which gives a very good value of the change metric. This is in contrast to the other algorithms where the rectangles may change location completely, making it difficult to locate the same item as the data set is updated.
Due to the drawbacks with low aspect ratio described above, this algorithm is rarely used in practice, merely provided as a comparison to other algorithms.
Performance
Analyzing the performance of the Slice and Dice algorithm is straightforward. The treemap can be drawn with one traversal through the tree. Assuming the calculation of size and color takes constant time, the run time is O(n), where n is the number of nodes in the tree.
5.2
Squarified
The squarified layout was first published by Bruls, Huizing and van Wijk in 2000 [2]. Realizing the need to find treemap algorithms that produce rectangles with good aspect ratios, they developed the squarified layout, which seeks a solution where all rectangles have an aspect ratio as close to one as possible. For a set of rectangles, R to be laid out, they try to fulfil the criterion
∀r ∈ R,max(r.width, r.height)
min(r.width, r.height) ≈ 1
Finding the optimal solution is not feasible as the problem is NP-hard, but by using a fairly simple rationale it is possible to get close to producing optimal aspect
24 CHAPTER 5. ALGORITHMS
ratios. Instead of looking at all levels at the same time, which would result in an explosion of computation time, the algorithm tries to produce square-like rectangles for the siblings at a certain level. Assuming that the algorithm succeeded in mak-ing good squares for the nodes at one level, this gives a good startmak-ing point when descending to the next level, and recursively assigning rectangles to the children of each parent sibling.
The idea behind the algorithm is to layout the rectangles along a row in the available display area as long as the worst (highest) aspect ratio of any rectangle in the current row keeps improving. Once the highest aspect ratio among the rectangles has reached a minimum, and adding another rectangle to the current row would increase the worst aspect ratio, the current row is fixed and a new row is created.
Upon creation of a new row, the layout direction for the row is determined. If the width of the remaining available area is larger than the height, the rectangles are added vertically within the row, otherwise horizontally. The order in which the rectangles are added plays a significant importance. As the larger rectangles are generally harder to fit than the smaller ones, the list of items to lay out is initially sorted so that the larger items are processed first.
The algorithm is described below in pseudo code
squarify(Queue nodes) { Queue currentRow;
nodes.sort(); // Sort on size while {nodes.length > 0) {
Item current := nodes.dequeue();
// Worst aspect ratio improves - add to current row if (worst(currentRow + current) < worst(currentRow))
currentRow.enqueue(current);
// Worst aspect ratio increased - create new row else { layoutRow(currentRow); currentRow.clear(); currentRow.enqueue(current); } }
foreach (Item node in nodes) squarify(node.children); }
First the nodes are sorted with the biggest item first. Then for each node in the queue, the function worst checks if adding the item to the current row would improve the worst aspect ratio of any rectangle in the current row. If not, the function layoutRow is called, which has access to a global data structure Rectangle availableArea, and sets the width and height of the nodes in the row depending on
5.2. SQUARIFIED 25
Figure 5.2. Comparison between squarified layout (left) and Slice and Dice layout (right). Note the much better aspect ratios of the squarified layout
their size attribute. The last row processes the children of the nodes by calling squarify recursively for each list of children.
This algorithm produces very good layouts in terms of aspect ratios, with average aspect ratios close to one, most often only differing in the second decimal. Though possible to come up with counter examples of poor layouts, for the type of data used in this project however, the average aspect ratio was excellent.
Figure 5.2 shows a comparison of the squarified and the Slice and Dice layout. Note how close to 1 the aspect ratio is in the squarified layout. Also note how difficult layouts the Slice and Dice algorithm produces as the data set grows larger. The main drawback with the algorithm is that it is unordered. Since the nodes are sorted by size, the natural ordering of many data set is lost. This makes it less readable in application where finding a specific item in the treemap is important. The change metric is also worse than most of the other algorithms. Updating the data set may completely relocate the rectangles in the treemap, making it difficult for users to orientate between successive accesses to the same data set.
Performance
First, the rectangles are sorted by size, which takes O(n log n) time. The run time for the remaining part of the algorithm depends on the average number of rectangles in each row. Since the dimension of the rectangles in the current row must be recomputed for every new rectangle added to the row, which has to be done for every rectangle in the data set, the run time becomes O(n log n) + O(nβ(n)) where β(n) is the average number of rectangles in the row. The run time is thus bounded below by O(n log n) which will be the case if O(β(n)) ≤ O(n log n). The worst case will be if all the rectangles are laid out in one row, meaning that β(n) = n, the run
26 CHAPTER 5. ALGORITHMS
time becoming O(n2), which hence constitutes the upper bound of the run time.
On average however, each row will contain√n rectangles, giving a total average run time of O(n√n).
5.3
Strip
The Strip treemap [12] is an ordered treemap and is a modification of the squarified algorithm described in section 5.2. The inputs to the algorithm is the available area to be subdivided and a set of items to distribute. Throughout the layout process, a current strip is maintained. For each item to layout, a check is made whether adding the item to the current strip decreases (improves) or increases the average aspect ratios of the rectangles in the current strip1. If the average aspect ratio decreases, the rectangle is added to the current strip, otherwise the current strip is fixed and a new strip is created, to which the item is added.
As opposed to the squarified algorithm, when the layout direction is re-determined for every new row, the direction for this layout is fixed from the start. If width of the total available area is bigger than the height, the strips are lined up horizontally, and vice verse vertically.
The Strip algorithm is an ordered layout algorithm, and hence the list of rectan-gles is first sorted for a given attribute. The main advantage of this algorithm is that it is very simple to implement and produces excellent readability. The placement of the rectangles is predictable. The average aspect ratio and the stability is slightly worse than the Pivot and the Split treemap algorithm described in section 5.4 and 5.5.
strip(Queue nodes) {
nodes.sort(); // Sort on any attribute while (nodes.length > 0) {
List currentStrip;
Item currentItem = nodes.dequeue;
if (avgAspectRatio(currentStrip + currentItem) < avgAspectRatio(currentStrip)) currentStrip.add(currentItem); else { layoutRow(currentStrip); currentStrip.clear(); currentStrip.add(currentItem); } }
foreach (Item node in nodes) strip(node.children); }
1As opposed to the squarifed treemap where you checked the maximum aspect ratio of any of the rectangles in the current row
5.4. PIVOT 27
Figures 5.4 and 5.5 compare the different ordered treemap layouts.
As mentioned, the readability metric of the algorithm is very good. Finding a specific item in the treemap is fast, due to the predictable layout of the items. The average aspect ratios are not as good as for other ordered algorithms, but often acceptable. The number of thin rectangles is sometimes a problem though when a lot of small and large items are being laid out. Moreover, the stability is quite good compared to the other algorithms. A problem in this regard, is that a small update in the data set can cause an entire strip to be removed or added, which induces a lot of change in the layout.
The algorithm as defined above works quite well in most cases, but a frequent problem occurs with the last strip that is laid out. Since the decision whether a rectangle is added to the current strip, or to a new strip, is based only on the average aspect ratio of the rectangels in the current strip, one can in the end be left with a couple of thin, poor aspect ratio rectangles in the last strips. Adding a lookahead to the layout often solves the problem. By comparing the average aspect ratio of the rectangles in the current strip and the lookahead strip to what happens if the lookahead rectangles are moved to the current strip, you can often avoid the thin rectangles in the last strips. Though at least doubling the run time of the algorithm, the complexity remains the same. This is motivated by that a maximum of one more strip is examined, which on average contains √n rectangles. 2√n or √
n has no effect on the complexity of the algorithm.
Performance
The run time of the algorithm can be analyzed as follows: First the items are sorted, which is done in O(n log n) time. Each strip contains an average of√n rectangles. For each strip, the dimension of all the rectangles in the current strip must be recomputed (√n rectangles). The recomputation is done for every rectangle in the data set (n rectangles). Therefore, the total average run time will be O(n log n) +
O(n√n) ∈ O(n√n).
Of course the very unlikely event may occur, that the entire layout is constituted by a single strip with all the rectangles, for example if the display area is very thin. This means that the number of rectangles in the strip is n and the worst case run time becomes O(n2).
5.4
Pivot
The Pivot is a partially ordered layout algorithm [12]. With a procedure similar to the quick sort algorithm where the elements in the list are processed recursively to render an ordered layout. The list of items are allocated into four parts: three sub lists and one special item called the pivot. The dimension of the pivot is fixed and the contents of the three sub lists are laid out recursively by applying the same sub-division process to each one of them.
28 CHAPTER 5. ALGORITHMS
Figure 5.3. Pivot configuration. The largest element is laid out in the Pivot. The figure shows the configuration for rectangles whose width >= the height. If this is not the case, the configuration is flipped along the line y = x.
The input to the algorithm is an ordered list of items of different sizes and a rectangle in which the items will be laid out. The first step is to select the pivot according to some strategy. The Pivot-by-Size variant, for instance, assigns the pivot to the largest element in the list. Next, all items in the list with an index less than that of the pivot item are assigned to list L1 and are laid out in a large sub
rectangle to the left of the pivot. The elements with an index higher than the pivot element are assigned to lists L2 and L3 which are laid out in sub rectangles below and to the right of the pivot item respectively. The distribution of items between L2 and L3 is determined so that all items in L2 have an index less than the items of
L3 and so that the aspect ratio of the pivot rectangle is as close to one as possible. Here is a description of the algorithm in more detail:
- Input: A list of items that sorted on some attribute, L to layout and a rectan-gle, R, in which the items are placed.
1. If the number of items in L are 0, return without doing anything.
2. If the number of items in L equals 1, set the bounds of the item to be R. Recursively lay out the children of the item by starting at step 1 with the item’s children as the list L and the rectangle R as their bounding rectangle. 3. |L| > 1, choose an item to be the pivot, P , according to some Pivot selection
strategy described later.
4. Divide R into four sub rectangles, r1, r2, rp and r3, according to figure 5.3. 5. Put the pivot P in rectangle rp, and put all items in L with an index less than
that of P in the list L1. Put the elements of L with an index greater than P
into L2.
6. Move an item from the end of L2 and insert it at the beginning of L3 as long as that action improves the aspect ratio of rp.
5.4. PIVOT 29
7. When moving yet another item would make the aspect ratio of rp worse, we
have three lists which all may be empty: L1 with all items of L whose index is
less than P , L2 with items greater than P , and L3 with items whose index is greater than those in L2. Now that the contents of L1, L2and L3is determined
the exact dimensions of rp is calculated and fixed.
8. Now recursively lay out the items of L1, L2, and L3 in the rectangles r1, r2,
and r3 starting at step 1.
The average aspect ratio is often slightly better than the Strip treemap. However, since it depends on the type of data set, there are several occasions where the Strip layout actually performs better regarding the average aspect ratio.
As to the number of thin rectangles, the Pivot layout almost always performs better than the Strip treemap. Moreover, the layout is considerably more stable. The problem with the Strip layout, where an update in the data set can cause an entire strip to be added or removed does not occur in the Pivot layout. Being a partially ordered algorithm, the readability is far better than the squarified treemap, although not as good as the Strip layout which follows a simple linear pattern.
Figures 5.4 and 5.5 show examples of the Pivot by Split Size layout.
Selecting the pivot
According to [12], the strategy used to select the pivot has implications upon the quality of the treemap. The best average aspect ratio is obtained if the Pivot-by-Size variation is followed. It assigns the pivot item to the largest item in the list. This makes sense, since the largest item often is the most difficult one to fit. Processing them first gives a lot of space to play with in order to make there aspect ratio close to one.
An alternative is the Pivot-by-Middle variation, assigns the pivot to the middle item of the list. (If the list contains n items, the pivot will be the item with index bn/2c). In doing this, the layout often becomes more stable, since the pivot selection is independent of the size of the items, while at the same time, splitting the list in the middle is likely to create a balanced layout. In general, Pivot-by-Size has slightly better aspect ratio, while Pivot-by-Middle performs better on the stability.
Performance
The worst case run time for the Pivot by Size algorithm is O(n2). For all n elements,
the biggest element has to be located. If this happened to be the last element in the list, n elements would have to be traversed, thus creating a worst case run time of O(n2). However, on avarage, only half the list will have to be traversed, yielding an
30 CHAPTER 5. ALGORITHMS
5.5
Split Treemaps
None of the above algorithms is ideal. The squarified layout, which has excellent aspect ratio, mixes the natural order of the data set, while the ordered layout algo-rithms have suboptimal aspect ratios.
In an attempt to produce an ordered layout with better average aspect ratio than any existing algorithm, the Split treemap layout was designed for this project. Especially for small screens, the importance of avoiding thin rectangles must be adhered to. Otherwise users will not be able to gain from the benefits that treemaps may provide compared to traditional layouts.
The first experiments on a small screen for the newsgroup browsing application were implemented with the squarified layout. Its square like rectangles made the treemap work well on a small screen. However, realizing the benefits of ordered layouts, the Pivot and the Strip layouts were implemented. Unfortunately, the average aspect ratio was too high and it appeared as though the current ordered treemap algorithms (Pivot and Strip) were not a satisfactory substitution for the squarified version already implemented. Therefore, an attempt was made to develop a new ordered treemap algorithm with improved average aspect ratio.
The result is the Split treemap which, like the Pivot, is a partially ordered algorithm. It produces a layout where the natural ordering of the data set is roughly preserved, while in most cases producing better aspect ratios than the Pivot and the Strip treemaps.
The procedure is very simple. Inputs to the algorithm are an ordered list, L =
{l1, l2, ..., ln} of items to layout and a rectangle, R, in which the items are distributed.
Furthermore, define the weight of the list, w(L) to be the sum of the sizes of all the elements in the list.
The algorithm follows a recursive process, where L is split into two halves, L1and
L2, such that that w(L1) is as close as possible to w(L2). Note that the ordering of the elements must not be changed. L1and L2are both ordered, and all the elements
of L1 have an index less than those of L2. We now have:
w(L1) ≈ w(L2) ≈ w(L)2 and ∀li ∈ L1, ∀lj ∈ L2 : li≤ li+1≤ lj ≤ lj+1
Now, define the α(R) to be the area of a rectangle R. The rectangle R is split, either horizontally or vertically depending on whether the width is bigger than the height, into two sub rectangles, R1 and R2 such that their areas corresponds to the
size of the elements of L1 and L2. That is:
α(R1) α(R) = w(L1) w(L) , α(R2) α(R) = w(L2) w(L)
Finally, recursively layout the contents of L1 and L2 in R1 and R2 according to
5.5. SPLIT TREEMAPS 31
splitLayout(List items, Rectangle r) { if (items.length == 0)
return;
if (items.length == 1) { items.bounds = r;
splitLayout(items.children, r); // Layout the children within }
List l1, l2; Rectangle r1, r2; double halfSize = w(items) / 2; double w1 = 0, tmp = 0;
// Pick out half the weight into l1, half into l2 for all items {
Item front = items[front]; tmp = w1 + front.size;
// Test if it got worse by picking another item if (abs(halfSize - tmp) > abs(halfSize - w1))
break;
// It was good to pick another l1.enqueue(items.dequeue); w1 = tmp;
}
// The rest of the items go into l2 l2 = items;
r1 = new Rectangle(r.x, r.y,
r.width * w(l1)/w(l1 + l2), r.height);
r2 = new Rectangle(r.x + r1.width, r.y, r.width - r1.width, r.height);
splitLayout(l1, r1); splitLayout(l2, r2); }
What about the quality of the treemap produced by the Split treemap algorithm? More about that in chapter 6 below, but generally, the Split algorithm produces layouts with lower average aspect ratio and better stability than does either of the Strip and the Pivot algorithms. Figures 5.4 and 5.5 show examples of the Split layout.
32 CHAPTER 5. ALGORITHMS
Performance
The run time of the algorithm is analyzed as follows. First the total sum of the items has to be computed, which takes O(n) time. Furthermore, the list and sublists have to be divided a total of n − 1 times. For each sublist division, the split point of the list has to be located. A naïve implementation does a linear search, which takes O(n) time. But by using dynamic programming, the cumulative sum can be saved the first time the sum is calculated on the entire list. That is, for each list element li, the sum w(l1) + w(l2) + · · · + w(li−1) + w(li) is saved. Doing that, the split point
can be found with a binary search, which takes O(log n) time in the worst case. On average however, the split point will be in the middle, which gives O(1) time to locate the middle. The split point must be found for each n − 1 list division.
The total run time, supposing dynamic programming is used, hence becomes: O(n) + O[(n − 1) log n] ∈ O(n log n) (Worst case)
O(n) + O(n − 1) ∈ O(n) (Average)
That means that the average and worst case run time of the split layout is better than any of the other ordered layout algorithms, except for the Slice and Dice.
5.6
Comparison of Algorithms
To get an estimation of the quality of the different algorithms, figures 5.4 and 5.5 shows a comparison of the algorithms laid out on an area proportional to a PDA screen.
The first set (figure 5.4) compare the three ordered layout algorithms for a rela-tive small data set of 20 elements. The Split layout has the best average aspect ratio, but the Strip treemap on the contrary, lays out the element in a very predictable manner, making the layout more readable. For a data set of this size, the Strip treemap is thus probably the best choice of algorithm.
The second set of figures (figure 5.5) compares the algorithm for a bigger data set of 100 elements. The order of the element is indicated by the shading. First note the outstanding aspect ratio of the squarified layout compared to the other algorithms. However, since it is unordered, locating a specific element takes a very long time. Among the ordered algorithms, the Split treemap again has the best average aspect ratio. The Strip treemap starts to produce layouts with thin rectangles for data sets of this size, and hence for sizes of 100 or more elements, the Strip treemap is probably an unwise choice.
5.6. COMPARISON OF ALGORITHMS 33
Figure 5.4. Comparison of layout of 20 items for three ordered algorithms. Left: Strip, avg aspect ratio = 2.19. Middle: Pivot by Split Size, avg aspect ratio = 1.92. Right: Split treemap, avg aspect ratio = 1.68.
Figure 5.5. Comparison of layout of 100 elements. Shading indicates order. UL: Squarified, avg aspect ratio = 1.15. UR: Strip, avg aspect ratio = 1.96. LL: Pivot by Split Size, avg aspect ratio = 2.7. LR: Split, avg aspect ratio = 1.88.
Chapter 6
Results
In this chapter, the general results from the project are presented. The chapter is divided into three parts. The first section evaluates the different treemap algorithms that were tested. In the second section, the general results from using treemaps on PDAs are discussed, specifically the prototype newsgroup browser, and finally the chapter is concluded by a remark on using Pocket Piccolo as a graphics library.
6.1
Algorithm evaluation
A total of five algorithms were implemented and evaluated. They are all described in detail in chapter 5. When evaluating the algorithms, you can either do a subjective discussion of the overall impression of the result or you can make an automatic test that gives numerical data on different properties of the algorithms.
6.1.1 Automatic Test
The properties that were tested were the average aspect ratio, stability, readability, number of thin rectangles and run time. In order to get results that were comparable to published data, the same technique as [1] describes was employed.
The surrounding bounds (screen) had the same proportion as an iPaq 4450. The test was run for different one level data sets where the number of items were 20, 100, 200, 300, 400, 600 or 800. One additional test was run with a two-level data set of size 10x10, i.e. a total of 100 items, with 10 items on the top level, and each of the top level items having 10 children.
For each experiment (data set), 100 trials were tested and for each trial the data set was updated 100 times.
Two different experiments were performed - one where the rectangle sizes were drawn from log-normal distribution created by exponentiating a normal distribution with mean 0 and variance 1. This distribution is frequent for positive natural data [10]. In the second experiment, the data was drawn from the Zipf distribution [10].