UNIVERSITY OF CALIFORNIA, IRVINE
GeoSocialMap: Visualization of Geosocial Data THESIS
submitted in partial satisfaction of the requirements for the degree of
MASTER OF SCIENCE in Networked Systems
Thesis Committee: Athina Markopoulou, Chair Nalini Venkatasubramanian Chen Li
TABLE OF CONTENTS
LIST OF FIGURES v
LIST OF TABLES vi
ABSTRACT OF THE THESIS viii
1 Introduction 1
2 Related Work 4
3 System Design 7
3.1 Goal and Main Concepts . . . 7
3.1.1 Geosocial Graph . . . 7 3.1.2 Cluster . . . 9 3.1.3 Connection . . . 10 3.2 Functionality . . . 11 3.2.1 Create a Cluster . . . 11 3.2.2 Create a Connection . . . 13
3.2.3 Hide/Show or Remove Clusters and Connections . . . 14
3.2.4 Save/Retrieve Visualizations . . . 15
3.2.5 Navigate Geographical Areas . . . 15
3.3 Implementation . . . 15
3.3.1 Implementation Overview . . . 16
3.3.2 Create a Cluster by Attributes . . . 16
3.3.3 Create a Cluster by Polygon Enclosing . . . 19
3.3.4 Create a Connection . . . 21
3.3.5 Hide and Remove Clusters and Connections . . . 21
3.3.6 Save and Share Visualization . . . 22
3.3.7 Dataset Management . . . 22
4 Visualization Examples 23 4.1 Database . . . 23
4.2.1 Universities . . . 24 4.2.2 Countries . . . 27 4.2.3 North American regions . . . 27
5 Conclusion 29
LIST OF FIGURES
3.1 133 universities in the US . . . 9
3.2 Public and private universities . . . 10
3.3 Connections within and between Public and private universities . . . 11
3.4 The Cluster Panel and Connection Panel . . . 13
3.5 Create a Cluster by Polygon Enclosing . . . 14
3.6 Create a Cluster by Grouping . . . 18
3.7 The Cluster Class, Connection Class and the hashtables . . . 19
3.8 Create a Cluster by Polygon Enclosing . . . 20
3.9 Test point in or out of a polygon . . . 20
4.1 Set attributes filter for private universities . . . 24
4.2 Create a connection between exsiting clusters . . . 25
4.3 West Coast vs. East Coast . . . 26
4.4 Intra-continental country connections . . . 27
LIST OF TABLES
3.1 Node Attributes . . . 8
3.2 Edge Attributes . . . 8
3.3 Functioanl Requirements Overview . . . 12
I would like to thank my advisor, Professor Athina Markopoulou, for her excellent guidance and great patience throughout this thesis project as well as my graduate studies. This thesis would not exist without her help. I have definitely learned a lot from our interactions. I would like to thank the two other members of my committee, Professor Nalini Venkata-subramanian and Professor Chen Li. Their flexibility in fitting me into their busy schedule is deeply appreciated.
I would thank Dr. Minas Gjoka for reviewing the thesis draft and giving many insightful comments on both structure and content.
I would also thank the postdoctoral scholars in our group, Dr. Maciej Kurant and Dr. Minas Gjoka, for their time and great suggestions for the project development.
ABSTRACT OF THE THESISGeoSocialMap: Visualization of Geosocial Data
By Yan Wang
Master of Science in Networked Systems University of California, Irvine, 2012
Athina Markopoulou, Chair
Visualization of data is an important research tool in many fields of disciplines, as it enhances the understanding and enables interpretation of the studied subjects. Visualization is a challenging task on its own, and it is highly application-dependent.
In this thesis, we designed and implemented GeoSocialMap (available at
http://geosocialmap.com), a web-based visualization tool that aims at visually repre-senting social relationships embedded in geography. GeoSocialMap allows users to select geographical regions of interest, using a combination of attributes from a given data set. It then interactively shows a geographical map overlaid by highlighted spots and lines that represent these regions and their corresponding social ties. A social tie between two regions reflects the “friendship” (the interpretation of which is highly dependent on the dataset, e.g. it may indicate the frequency of interactions) among people residing in the regions. The width of the edge represents the tie strength.
I believe we need a ”Digital Earth”. A multi-resolution, three-dimensional repre-sentation of the planet, into which we can embed vast quantities of geo-referenced data. Imagine, for example, a young child going to a Digital Earth exhibit at a local museum. After donning a head-mounted display, she sees Earth as it appears from space. Using a data glove, she zooms in, using higher and higher levels of resolution, to see continents, then regions, countries, cities, and finally individual houses, trees, and other natural and man-made objects. Having found an area of the planet she is interested in exploring, she takes the equivalent of a ”magic carpet ride” through a 3-D visualization of the terrain. Of course, terrain is only one of the many kinds of data with which she can interact. Using the systems voice recognition capabilities, she is able to request information on land cover, distribution of plant and animal species, real-time weather, roads, political boundaries, and population.
Former Vice president Al Gore, The digital earth: Understanding our Planet in 21st Century, 1988
The term visualization was mentioned by a geographer as early as the 1950s, and was later formally defined by the National Science Foundation. Visualization became an interdisci-plinary field among computer graphics, image processing and users interface studies. As stated in , the creation and use of geographic visualizations in the form of a cartographic map is one of the most basic means of human communication, at least as old as the in-vention of language and arguably as significant as the discovery of mathematics. Although geographic visualization takes many forms and the approaches used to implement it vary, it can be usually categorized into the following basic conceptual classes.
• Chart. These are statistical graphs that display data according to spatial coordinates. These have proved to be relatively convenient for generating maps and widely used in reports and papers. The generation of charts goes beyond just static images. The users are able to view, manage and compare data charts with the help of animation and interactive 3D programs, which gives them a better sense of the data and facilitates conclusions.
• Static map. These are generated images of geographic maps with an overlay of customized information and labels. These static images are suitable for light-weight geographic data, like climate, population and administrative regions. Static maps with fantastic art elements can be generated to make them decent looking. They are usually used for public posters, commercials and magazines. However, static maps have limited capacity to show all dimensions of the data.
• Interactive map. These visualization systems are programs or web applications that have a zoomable map and a full dimensional database of geographical information. Users are allowed to select the data that they are interested in to show on the map canvas. These systems can also accept new user data or incorporate a visualization Application Programming Interface (API) that makes it possible to customize the visualization.
We build a web-based visualization system called GeoSocialMap, which can be classified as an interactive map. It can select and group nodes with their geographical information or as-sociated attributes, draw edges of different thickness to show the strength of the relationships among the nodes, and save and share visualizations easily with a short URL.
This thesis is organized as follows. We discuss existing systems for their features in chapter 2. The functionalities and technologies used in the implementation are discussed in chapter 3. In chapter 4, we show examples of an application: we use GeoSocialMap to visualize Facebook data. Chapter 5 concludes the thesis.
Geographical databases usually contain a large range of different data and are complex to interpret or understand useful information from them. Thus, over recent years many different visualization programs that display geographical information have been developed by researchers and engineers. Different forms of visualizations allow users to see data from different viewpoints and get a better understanding of it. We considered and reviewed several existing visualization tools for geographic information and online social network data as follows.
ArcGIS is a commercial Geographic Information System (GIS) for working with maps and geographic information. This system provides extensive interfaces for geography researchers to analyze data on a map and to discover geographic information. It is supported by a complex infrastructure that offer maps and geographic information available on the web for public access. ArcGIS suite contains several programs that run on Microsoft Windows such as ArcReader, ArcView, ArcEdit and ArcInfo. It also includes server-based and mobile-based products.
software and data for individuals or enterprises to create visualizations of their interest. Visualizations on GeoCommons.com are web-based and interactive. Users are allowed to make customized visualizations on their own data, which they can upload. These can be made available to customize and share on other external websites.
GeoSocialApp is developed at the Pennsylvania State University GeoVista Center. This application tool helps users interpret social network data by visualizing over geography, attribute spaces and network. GeoSocialApp is able to visualize social network graphs in a way that helps to discover hidden complex patterns in large social network datasets.
Vizster is a visualization system designed and implemented at the University of Califor-nia, Berkeley for end users to explore and play with their online social network connections. Vizster generates abstract graphs with nodes and links that are not tied to geographical ele-ments or any node attribute spaces. TouchGraph provides visualization services similar to Vizster.
To the best of our knowledge, GeoSocialMap, developed in this thesis, provides unique fea-tures that are not available in any of the tools we studied above. ArcGIS provides complete functionalities for different forms of visualization, however, it is a complex commercial soft-ware that has high cost and learning curve to generate simple visualizations. GeoCommons, Google Geomap and GeoSocialApp are light-weight tools, but they focus on showing geo-graphical nodes and regions, and lack functionality to visualize the edges between nodes, that can represent necessary correlation information among them. Although Vizster and
Touchgraph display edges (i.e. between users of a social network), their visualization is not embedded in a geographical space. We aim at building GeoSocialMap as a light-weight online application that supports visualizing data of nodes and edges on top of geographic maps.
Goal and Main Concepts
Given a geosocial graph as input, GeoSocialMap visualization is built on top of two funda-mental concepts, Cluster and Connection which work interactively with each other. In this section we define the term geosocial graph and describe these two fundamental concepts.
The input dataset to the visualization contains two data tables, that include the node and edge attributes, and we call it the geosocial graph. A node contains a geographical location together with customized attributes as listed in Table 3.1. For instance, a university can be a node. It is required to have geographical coordinates and may have other attributes such as name, population, ranking, etc.
Table 3.1: Node Attributes
Attribute Datatype Value Range
Node ID Integer Positive
Latitude Real [-90, 90]
Longitude Real [-180, 180]
Node Attribute1 Text Node Attribute2 Integer ...
Node AttributeN Enumerate Table 3.2: Edge Attributes Attribute Datatype Value Range Edge ID Integer Positive Node1 ID Integer Positive Node2 ID Integer Positive
Weight Real Positive
Table 3.2). For example, an edge between two universities can represent how well the students in one university know those at the other university.
In geographical visualizations it is not uncommon to have a large number of geographical elements (nodes and edges) in a single map. Since the simultaneous display of all such elements is impractical, we introduce the concepts of Cluster and Connection.
Figure 3.1: 133 universities in the United States
A cluster contains a set of nodes that are defined in the geosocial graph. Each of these nodes have one or more attributes in common. Essentially, the purpose of a cluster is to selectively visualize information.
To demonstrate the concept of a cluster, we start with Figure 3.1, which shows 133 universi-ties in the United States. If public universiuniversi-ties are of interest to us, we can form a cluster that contains only public universities and assign a green color to each node in the cluster(see Figure 3.2). Similarly, we can form a cluster of private universities and represent them with a blue color (see Figure 3.2).
We also create a cluster using a combination of node attributes. For example, we can create a cluster of universities that are public and have a quarter calendar, and another one of private universities that have a semester calendar.
Figure 3.2: Public universities(green spots), private universities(blue spots)
A connection represents a set of the strongest edges between the nodes of two clusters. The edges between the nodes in the same cluster can be expressed by a connection that contains the same cluster twice. Edge weight in the input geosocial graph is visualized using different edge thickness. When a connection is added or removed, a legend of edge weights is updated. For instance, in the aforementioned university example, we can create a connection between the two clusters created in Figure 3.2 to examine the strength of relationships between the students in private and public universities.
We create 3 connections as shown in Figure 3.3. The edges in red are between private and public universities, the edges in green are within public universities and the edges in blue are within private universities. All edges are drawn in scale in order to be comparable. By assigning a different color to each connection, we can gain a better intuition of what the data indicates. In this case, we observe that students in the same type of university have stronger connections than those in different types. The connection component is also useful to visualize any measurable information among the geographical nodes contained in
Figure 3.3: Connections within and between Public(green spots) and private(blue spots) universities
our defined clusters, i.e., the individual social network, traffic conditions and airline network, if the appropriate data is available.
There are 5 main functionalities that GeoSocialMap provides to users and they are listed in Table 3.3. In this section we discuss the functional requirements that build on the concepts introduced in Section 3.1.
Create a Cluster
The user should be able to create a cluster either by selecting a set of node attributes that are desirable or by enclosing a polygon around an area. To create a new cluster by attributes, the user first fills a form of preferred attributes, and also sets a name, description and color
Table 3.3: Functioanl Requirements Overview Functionality: Create a Cluster (Section 3.2.1, ) Input(by attributes): A set of node attribute value ranges. Input(by enclosing): Polygon vertices coordinates.
Output: Nodes displayed on the map canvas. Functionality: Create a Connection (Section 3.2.2) Input: Two existing clusters.
Output: Edges displayed on the map canvas. Functionality: Hide/Show or Remove (Section 3.2.3) Input: An existing cluster or connection. Output: Nodes/edges disappear/show up. Functionality: Save Visualizations (Section 3.2.4) Input: Information of current visualization.
Output: A permanent link URL.
Functionality: Retrieve Visualizations (Section 3.2.4)
Input: The URL obtained by saving a visualization. Output: The visualization saved with the URL. Functionality: Navigate Geographical Areas (Section 3.2.5) Input: Mouse clicking and dragging events on map. Output: Changes of the map position and zoom level.
(a) The Cluster Panel (b) The Connection Panel
Figure 3.4: The Cluster Panel and Connection Panel
for the new cluster. Then GeoSocialMap outputs the created cluster by showing the nodes on the map as spots with the assigned color, name and annotation. The created clusters are listed in the cluster panel for further hide-and-remove operations, as shown in Figure 3.4(a). A cluster can be also created by drawing a polygon on the map. A user should be able to pick up any vertices on the map and form a non self-intersecting polygon. The resulting cluster should contain only nodes that are inside this polygon. Figure 3.5 illustrates this functionality in GeoSocialmap.
Create a Connection
A user should be able to create a Connection using the connection panel (see Fig 3.4(b)). Exactly two of the existing clusters need to be selected according to the concept of the Connection defined in Section 3.1.3. The name of the new connection is automatically generated from the concatenation of the names of the two selected clusters. A user should also be able to set a numeric valueythat limits this connection to contain only theystrongest edges out of all possible edges. The system then displays the edges of the new connection
Figure 3.5: Create a Cluster by Polygon Enclosing
and lists the connection in the connection panel (Figure 3.4(b)) for further hide-and-remove operations.
When creating new connections, we recompute and regenerate the legend according to the edge weights of the currently available edges of all connections.
Hide/Show or Remove Clusters and Connections
As mentioned in Section 3.2.1 and Section 3.2.2, created clusters and connections are avail-able to be hidden/shown or removed. A user can hide/show a cluster or connection by clicking the blue label “h”/“s” next to its name (Figure 3.4(a)). A user can also perma-nently remove a cluster or connection by clicking the red label ”x” next to its name.
GeoSocialMap provides an interface that enables users to save the current visualization online and obtain a short URL. Users are allowed to click on “Get Link” and type a description for the visualization, which will be displayed when this visualization is retrieved. The system takes the description and current visualization information as an input and then outputs a URL. This URL is a permanent link to retrieve the visualization, which can be easily shared to the public. The retrieved visualization is available to be modified and saved as another URL.
Navigate Geographical Areas
GeoSocialMap is built on top of an embedded Google Map canvas. Users are able to use the interface inherited from Google Map to navigate the geographical area. The map be dragged, moved, zoomed in and zoomed out. Such features enhance the user experience, especially when the map canvas is too crowded with nodes and edges. Users can move the canvas to any position and zoom to any level of their satisfaction.
Table 3.4 lists the main functionalities of Geosocialmap and the corresponding frontend and backend actions that take place upon trigger by the user. Next, we describe the implemen-tation details and algorithms of these functionalities.
Create a Cluster by Attributes
The Cluster class, as shows in Figure 3.7, has the following class members: cluster ID, cluster name, appearance attributes, collection of node IDs contained in the cluster, and a cluster filter. The cluster filter contains a set of Boolean conditions that specify node inclusion criteria in the node attribute space. The collection of node IDs in the cluster only includes
Table 3.4: Functionalities and front-end/back-end Implementation
Functionality Frontend actions Back-end actions
nodes attributes Ajax request Database query
Create cluster by enclosing Google Map API drawing, Ajax response, nodes in polygon Computation geometry, Database query
Create connections by Ajax request Ajax response,
selecting existing clusters Database query
Save current visualization Encoding data into URL text Ajax response, and obtain a short URL Ajax request Database update Retrieve saved visualization Decoding URL to data, Database query
by obtained URL Ajax request Ajax response
Upload data of geographic HTML form submission Database operation, nodes and edges
Figure 3.6: Create a Cluster by Grouping
Figure 3.7: The Cluster Class, Connection Class and the hashtables
Create a Cluster by Polygon Enclosing
We use the following method to judge if a node is inside or outside the polygon. As in Figure 3.9, we draw a random ray from the test point to infinity. The ray needs to cross the polygon border an odd number of times to reach infinity if the test point is inside the polygon, otherwise it would need an even number of times.
After determining the nodes to be included, the rest of the cluster creation process is similar to the creation of a cluster by attributes (also described in Figure 3.8). The clusters created by these two different methods are instances of the same cluster class.
Figure 3.8: Create a Cluster by Polygon Enclosing
Create a Connection
We render the edges as colored geodesic lines on the map. When creating new connections, we recompute and regenerate the legend according to the edge weights of the currently available edges of all connections.
Hide and Remove Clusters and Connections
Users are allowed to hide and remove the created clusters and connections. When a cluster or a connection is set to be hidden, scripts set a hidden flag on that instance of the compo-nent and erase the object from the map canvas. The same compocompo-nent can be shown again efficiently since the information is still available and there is no need to request it from the server again.
When a cluster is set to be removed, scripts do not remove the associated node instances from the hashtable because clusters can overlap. Nodes are stored only once in the hashtable but may be referred by multiple clusters.
Save and Share Visualization
GeoSocialMap provides an interface that allows users to save the current visualization online and obtain a short URL. The URL is a permanent link to retrieve the visualization and can be easily shared to the public. The retrieved visualization is available to be modified and saved as another URL.
When the GeoSocialMap is launched with the URL, front-end scripts retrieve the compressed text from the server with the MD5 value in the URL. Then cluster and connection information can be recovered and rendered by decoding the retrieved text.
The data management page is an interface that users can upload their customized dataset to the system as long as the data format is correct. GeoSocialMap currently accepts a CSV file. After uploading the file, user will be asked to assign attribute names and types to the columns. Then the updated CSV file would be inserted into the database and become available for visualization.
Last, we have an interface for system administration, which is secured and only authenticated users can access. Administrators are allowed to access every dataset and perform insert, modify and delete operations similar to the well known phpMyadmin application. This page also shows the traffic statistics, user tracking and database size information.
In this chapter we demonstrate GeoSocialMap with three types of geosocial graphs: college-to-college, country-to-country, and North American regions. We use the obtained visualiza-tion to gain insight into the friendship relavisualiza-tions among the types of nodes in each geosocial graph.
The data collected in  contains the geographical region information at various granularities, depending on Facebook’s penetration in that region. We process the data sets to obtain a location-to-location friendship graph. A node in the data set indicates a category accord-ing to some criterion. An edge between two nodes indicates the strength of the “friendship” between the two nodes. For instance, a node is a specific country if the catagories correspond to countries, and an edge, between country A and B ,is the probability that two randomly chosen users from A and B are friends.
Figure 4.1: Set attributes filter for private universities
Visualization with GeosoSocialMap
Public vs. Private
We first launch GeoSocialMap with the college-to-college geosocial graph at
http://geosocialmap.com/college2010.php, and we see an empty map canvas. The data set information on the top left panel shows that we have 133 nodes with 19 node attributes and 17372 edges (see Figure 4.1).
To create a cluster of private universities, we click on “Add Cluster” to open the ”New Cluster” form that exposes all available node attributes, as shown in Figure 4.1. We input the name “Private Uni”, choose the blue color on the top row, then select “Private” in the “type” attribute and submit the form. 65 private universities immediately show up as blue spots on the map. We repeat the process, this time to create a Cluster of “public” universities (shown as green in Figure 3.2).
Figure 4.2: Create a connection between exsiting clusters
complete this step, we need to select two of the existing clusters, the number of the top edges between them , and an edge color, as shown in Figure 4.2. The set of those top edges will then be painted on the map. If we repeat this process for the cluster pairs “Private”-“Private”, “Private”-“Public”, and “Public-Public”, we get the visualization shown in Figure 3.3. The visualization indicates that students of public colleges find their friends locally. In contrast, students of private universities do not seem to be affect that much by distance. Money might be an important factor here.
West coast vs. East coast
Consider a scenario in which we want to group universities by their location at the West or East coast. Since we do not have such a node attribute available, we can use the polygon enclosing method. We click on “Draw Cluster” inside the panel “Clusters”, pick up points on the map that roughly enclose the West coast and then click on “Finish”. The universities inside the polygon show up (Figure 4.3(a)) and the West coast Cluster is created. We can similarly create an East coast cluster of universities by drawing a polygon that following the East coast state border lines (Figure 4.3(b)).
(a) Enclosing west coast universities (b) Enclosing wast coast universities
(c) West Coast vs. East Coast
Figure 4.3: West Coast vs. East Coast.
Figure 4.4: Intra-continental country connections.
Available at http://geosocialmap.com/countries.php?tinyurl=continental Then, we obtain the visualization shown in Figure 4.3(c).
Based on the data set stated in 4.1, we create the Intra-continental country connections as shown in Figure 4.4. We notice that there are strong cliques formed between Middle Eastern countries and South-East-Asian countries, and that Facebook is not popular in China.
North American regions
For the USA and Canada, the 2009 data set  contains the geographical information at the granularity of 272 counties and provinces. This allows us to create the North American friendship map.
As presented in Figure 4.5, friendships within North America are strongly driven by geo-graphical proximity. If we zoom in we can discover more local connections which are not
Figure 4.5: Intra-continental country connections.
Available at http://geosocialmap.com/america.php?tinyurl=northamerica well visible at the continental level.
In this thesis, we have designed and implemented GeoSocialMap, a geographical visualization system that aims at visually representing social relationships embedded in geography. We discussed the main conceptes, functionalities and the implementation in chapter 3. Users are able to customize their visualization by creating the clusters, connections and manipu-lating their display properties. Moreover, visualizations are available to be saved and shared online. We utilized web programming technologies to implement the aforementioned func-tionalities of GeoSocialMap. We embedded Google Map as map canvas which provides interfaces to navigate the visualizations to any positions and zoom levels. GeosocialMap is available online at http://geosocialmap.com.
Last but not least, in chapter 4, step-by-step visualization examples were presented as a tutorial. Finally, we presented some example uses of GeoSocialMap to visualize friendship between various categories of Facebook data. In particular, we created visualizations of online social network data sampled from Facebook to gain insights into the factors that affect friendship relations.
 Arcgis online. http://www.arcgis.com/home/, Febrary 2012.  Geocommons. http://geocommons.com/, Febrary 2012.
 Graph visualization and social network analysis software — navigator - touchgraph.com. http://www.touchgraph.com/navigator, Febrary 2012.
 Visualization: Geomap google chart tools google code. http://code.google.com/ intl/zh-CN/apis/chart/interactive/docs/gallery/geomap.html, Febrary 2012.  M. Dodge, M. McDerby, and M. Turner.Geographic Visualization: Concepts, Tools and
Applications. John Wiley & Sons, 2011.
 M. Gahegan, M. Takatsuka, M. Wheeler, and F. Hardisty. Introducing geovista studio: an integrated suite of visualization and computational methods for exploration and knowledge construction in geography. Computers Environment and Urban Systems, 26(4):267–292, 2002.
 J. J. Garrett. Ajax: A new approach to web applications. http://adaptivepath.com/ ideas/essays/archives/000385.php, February 2005. [Online; Stand 18.03.2008].  M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Walking in Facebook: A Case
Study of Unbiased Sampling of OSNs. In Proceedings of IEEE INFOCOM ’10, San Diego, CA, March 2010.
 A. Gore. The digital earth: Understanding our planet in the 21st century. InOpen GIS Consortium, January 1988.
 J. Heer and danah boyd. Vizster: Visualizing online social networks. In IEEE Infor-mation Visualization (InfoVis), pages 32–39, 2005.
 M. Kurant, M. Gjoka, Y. Wang, Z. W. Almquist, C. T. Butts, and A. Markopoulou. Coarse-Grained Topology Estimation via Graph Sampling. In arXiv:cs.SI:1105.5488, May 2011.
 B. McCormick, T. DeFanti, and M. B. (Eds.). Visualization in scientific computing. In
Computer Graphics, 21(6). p. 63, August 1987.
 D. Sklar and A. Trachtenberg. PHP Cookbook (Cookbooks (O’Reilly)). O’Reilly Media, Inc., 2006.