Web Data Visualization
Department of Communication PhD Student Workshop Web Mining for Communication Research
April 22-‐25, 2014
http://weblab.com.cityu.edu.hk/blog/project/workshops
Review
• NodeXL • Visual Web Ripper • APIs Day 1: Data collection • Python • NLTK • SPSS Day 2: Data preprocessing • SNA • R Day 3: Data analysis • Text • Network • Spatial • Temporal Today: Data visualizationOutline
I. De[ine visualization
II. What visualization can do
III. Research questions and visualization
options
IV. Four types of data and related
visualization tools V. Resources
Outline
I. De2ine visualization
II. What visualization can do
III. Research questions and visualization
options
IV. Four types of data and related
visualization tools V. Resources
Jonathan Zhu: Visualization is of the
data, by the data, and for the data
• Data visualization differs from the generalgraphic design in that it is of the data, by the data, and for the data.
– Of the data: an integrated phase of the
discovery rather than a post-‐analysis phase to decorate the [indings
– By the data: guided primarily by data results
rather than esthetical considerations
– For the data: to tell accurate, informative, and
To Visualize is to
Writings, drawings, etc.
Present Highlight
Select
What variables have been tested in extant
literature? What is my innovation/
contribution?
How to present my study to the audiences (reviewers, audiences in a seminar, etc.)?
Outline
I. De[ine visualization
II. What visualization can do
III. Research questions and visualization
options
IV. Four types of data and related
visualization tools V. Resources
What Visualization Can Do
(Tufte 2001/1983)
• Show the data
• Induce to viewer to think about the data
• Avoid distorting what the data have to say
• Present many numbers in a small space
• Make large data sets coherent
• Encourage the eye to compare different
pieces of data
• Reveal the data at several levels of detail,
from overview to [ine structure
• Serve a clear purpose:
– Description, exploration, tabulation, or decoration
• Be closely integrated with the statistical and
Misleading Visualization
Misleading Visualization (continued)
Misleading Visualization (continued)
Misleading Visualization (continued)
“Finding the right way view your data is as
much an art as a science.”
Outline
I. De[ine visualization
II. What visualization can do
III. Research questions and visualization
options
IV. Four types of data and related
visualization tools V. Resources
Research Questions and
Visualization Options
See relationships among data points Scatterplot
Matrix Chart
Network Diagram
Compare a set of values Bar Chart
Block Histogram
Bubble Chart
Track rises and falls over time Line Graph
Stack Graph
Stack Graph for Categories
See the parts of a whole Pie Chart
Treemap
Treemap for Comparisons
Analyze a text Word Tree
Tag Cloud
Phrase Net
See the world Map
Outline
I. De[ine visualization
II. What visualization can do
III. Research questions and visualization
options
IV. Four types of data and related
visualization tools
Today we’ll focus on four types of
data
• Texts and discourse analysis
• Network and hyperlink network analysis
• Spatial data
TEXTS AND DISCOURSE
ANALYSIS
Wordle: How Toy Ad Vocabulary
Reinforces Gender Stereotypes
• Guess which one for boys and which one for
girls?
Source: http://www.achilleseffect.com/2011/03/word-cloud-how-toy-ad-vocabulary-reinforces-gender-stereotypes/#
Demo 1: Word Cloud of Obama’s
Addresses
• State of the Union Addresses 2009-‐2012
• Data can be downloaded from http://
weblab.com.cityu.edu.hk/blog/project/ workshops/
Word Trends in Voyant Tools
Data: Obama’s
addresses in 2009 and 2012
Word Trends of Three Premiers of
China
Source: http://news.qq.com/newspedia/baogao.htm
Reform
Word Net in Voyant Tools
Data: Obama’s address in 2012
Word Nets and Framing Analysis
Qin (2014). Snowden Wins on Twitter but Fails in News: The Mismatch between Social Media Frame and Mass Media Frame
Demo 2: Word Trends and Word
Nets in Obama’s Addresses
• State of the Union Addresses 2009-‐2012
• Data can be downloaded from http://
weblab.com.cityu.edu.hk/blog/project/ workshops/
NETWORKS AND HYPERLINK
NETWORK ANALYSIS
Topology of World Wide Web
based on Hyperlink Analysis
Daisy Model (Donato et al., 2005)
Bowtie Model (Broder et al., 2000) SCC: strongly connected component IN: unilaterally connected to SCC
My Dissertation
Highlights in My Dissertation
• The content of hyperlinks:
Inter-organizational hyperlinks are shaped by various pre-existing inter-organizational relationships, especially the personal ties.
• The strength of hyperlinks: Hyperlinks are
symbols of inter-organizational strong ties.
• The direction of hyperlinks: More of vertical
Tools I am going to Introduce
• NodeXL
• Google Fusion Tables (You need a Google
account to use this tool)
Book: Analyzing Social Media
Networks with NodeXL
I. Getting Started with Analyzing Social Media Networks
1. Introduction to Social Media and Social Networks 2. Social media: New Technologies of Collaboration 3. Social Network Analysis
II. NodeXL Tutorial: Learning by Doing
4. Layout, Visual Design & Labeling
5. Calculating & Visualizing Network Metrics 6. Preparing Data & Filtering
7. Clustering &Grouping
III Social Media Network Analysis Case Studies
8. Email 9. Threaded Networks 10. Twitter 11. Facebook 12. WWW 13. Flickr 14. YouTube 15. Wiki Networks
NodeXL
1. Import data: Edge lists.
2. Click “Show Graph”.
3. Play with [ilters and other options.
Demo 3: Hyperlink networks among 14 higher education institutions in Hong Kong
Google Fusion Tables
• Demo 4: Hyperlink networks among 14
Whereabout of Ph.D Graduates
Map of Doctoral Programs in
Communication in USA
Demo 5: The Map of Young Scholars
• http://www6.cityu.edu.hk/ccr/DuoWenYaJi_Scholar_All.aspx?year=2013
• Recall what you’ve learned on day 1:
– How to collect data from the web pages?
– How to preprocess the data?
Temporal Data
• Temporal means “of or relating to time”.
Change
change or growth
• Population
• Distribution
• Fire Perimeter
Dynamic
something that moves
• Planes • Vehicles • Animals • Satellites • Storms Discrete something that “just happens” • Crimes • Lightning • Accidents Stationary
stands still but records changes
• Weather Stations
• Traf2ic Sensors
• Air Quality Sensors
Air Traf[ic
U.S. Unemployment: A Historical
View
Interactive Visualization in Google
Charts
Demo 6: Google Code Playground
• https://code.google.com/apis/ajax/Outline
I. De[ine visualization
II. What visualization can do
III. Research questions and visualization
options
IV. Four types of data and related
visualization tools
Resources: Texts
• Bamboo DiRT: This wiki lists tools used by Digital Humanities
researchers. This link takes you to the list of text-‐analysis tools that includes brief descriptions.
• ManyEyes: A collection of data visualization tools. You can
upload your own data and create web-‐based visualizations that are made available to the public for comments and discussions. You need to create an account to upload data.
• Voyant (Voyeur): a web-‐based text-‐analysis environment that
incorporates visualization tools.
• WordSmith: A desktop text-‐analysis program that works with
Windows. The program has been tested and works with any Unicode (UTF-‐8) text.
• Wordij: A semantic network tool. Wordij creates networks of
Resources: Networks
• aiSee: Graph visualization
• Cytoscape: Visualizing molecular interaction networks
• Gephi: Visualization and exploration platform
• KrackPlot: Social network visualization program
• Mage: 3D vector display program (showing kinemage graphics)
• NetDraw: Program associated with UCINET
• NodeXL
• OGDF (successor of AGD): Open Graph Drawing Framework
• Otter: Tool for topology display
• SoNIA: Visualizing longitudinal network data
• Tulip: Visualization of large graphs
• uDraw(Graph) (successor of daVinci): Graph drawing
• VOSON: VOSON system is a web-‐based software that enables the collection and analysis of online network data.
Resources: Spatial and Temporal
Data
• Google Geomap