Introduction to social network analysis
Paola Tubaro
University of Greenwich, London
Introducing SNA
Rise of online social networking
services:
⇒
social networks to the fore.
New interest for social network
analysis (SNA).
Yet networks have always existed!
Likewise, SNA now has a long
history.
Today
Understand what SNA is.
Understand how you could use
it.
Learn basic principles and
measures.
Outline
1 Introduction 2 What is SNA 3 Data 4 Network metrics 5 Further readingsWhat can SNA be used for?
Improvements in organisational performance. Policy interventions for behaviour change;
Formal chart vs. network
Formal chart vs. network
With whom do you discuss issues important to your work?
Formal chart vs. network
With whom do you discuss issues important to your work?
Formal chart vs. network
With whom do you discuss issues important to your work?
Interventions
Networks for behaviour change: smoking prevention
Network of friendships among sixth grade pupils.
Use popular pupils (“opinion leaders”) to reduce smoking
in adolescents
Identify most popular pupils in class; Recruit and train them;
Use them to spread the message.
Valente et al. 2003: network method effective in reducing adolescents’ smoking.
Defining SNA
An approach to human behaviours and social interactions. A set of specific analytical and statistical methods. A special type of data (and techniques of data collection). A set of visualisation tools.
What is a network —a formal definition
= A set of units (nodes) connectedby one or more relations (ties) What is a node?
⇒Depends on setting: person, group/organisation, object. What is a tie?
⇒A relation or a shared trait: friendship, advice, exchange, co-work.
Graphs and networks
Circles (A,B) represent nodes.
Lines (e.g. between AandB) represent ties/edges.
Graph visualizes the whole structure of ties of a defined group.
Graphical conventions (colours, size of nodes and/or ties) can be added to show attributes.
For example: if this is a network of friendship, blue = boys, red = girls.
Graphs and networks
Circles (A,B) represent nodes.
Lines (e.g. between AandB) represent ties/edges.
Graph visualizes the whole structure of ties of a defined group.
Graphical conventions (colours, size of nodes and/or ties) can be added to show attributes.
For example: if this is a network of friendship, blue = boys, red = girls.
Isolates, dyads and triads
u a u b u c u d e u uf A A A A A AA new perspective
SNA requires a change of mindset with respect to other social science approaches. Emphasis is onrelationships, not
attributes.
Not just dyadicrelationships (just A and B), but dyadic relationships asembedded
A new perspective
SNA requires a change of mindset with respect to other social science approaches. Emphasis is onrelationships, not
attributes.
Not just dyadicrelationships (just A and B), but dyadic relationships asembedded
A new perspective
SNA requires a change of mindset with respect to other social science approaches. Emphasis is onrelationships, not
attributes.
Not just dyadicrelationships (just A and B), but dyadic relationships asembedded
Embedded relationships
Figure:Suppose the relationship represented here is friendship. How may friendship between A and B vary in these
Triads
u a u a u a u b uc b u uc b u uc A A A A A A U A A A A A A U A A A A A A K --Intransitive Transitive 3-cycles
Intransitive: Only bilateral ties.
Transitive: A friend of my friend is my friend. Three-cycles: a form of generalized exchange.
Triads
u a u a u a u b uc b u uc b u uc A A A A A A U A A A A A A U A A A A A A K --Intransitive Transitive 3-cycles
Intransitive: Only bilateral ties.
Transitive: A friend of my friend is my friend. Three-cycles: a form of generalized exchange.
Network effects, more globally
x a x b xc A A A A A A K x d x e 1For example, those who attract many choices will attract even more in future (reputation effect, “Matthew” effect).
Does a high (and growing) number of friends have advantages / disadvantages?
Network effects, more globally
x a x b xc A A A A A A K x d x e 1For example, those who attract many choices will attract even more in future (reputation effect, “Matthew” effect).
Does a high (and growing) number of friends have advantages / disadvantages?
Network effects, more globally
x a x b xc A A A A A A K x d x e 1For example, those who attract many choices will attract even more in future (reputation effect, “Matthew” effect).
Does a high (and growing) number of friends have advantages / disadvantages?
Network effects, more globally
x a x b xc A A A A A A K x d x e 1For example, those who attract many choices will attract even more in future (reputation effect, “Matthew” effect).
Does a high (and growing) number of friends have advantages / disadvantages?
Now you know:
What a network is;Correspondence between a network and a graph; Difference between triadic and dyadic structures; Global effects of network structure.
Network data
Data format:How network data look like
How they differ from other social science data From data to graph
Data collection:
Name generators/interpreters Archives
Data type 1:
Ego
networks
The whole set of contacts (alters) of one person or entity (ego).Usually includes attributes ofalters
and ties between them.
Usually collected for a sample ofegos
(e.g. in a survey).
Typically, graphically represented with
Example: Ego networks to discover “hidden” populations
Data type II: Whole networks
Mapping the whole set of ties of a particular group, setting or population. Not focused on one particular person or entity.Network boundaries must be well-defined.
Examples: network of friends in a classroom; network of
knowledge-sharing between employees of an organisation.
Data storage: traditional social science
Social science data are usually represented in the form of a rectangular table, where each row is an observation, each column is a variable: For example:
name age gender married
Jane 25 0 0 Mary 31 0 0 Bob 29 1 1 Sue 28 0 1 Alan 32 1 0 Tom 29 1 1
Network data storage I: matrix
Network data can be stored as an-by-nsquare matrix with all nodes listed in both columns and rows.
The value of cell (i,j) in the matrix indicates whether the nodeiand the nodej are connected (1) or not (0).
The diagonal is meaningless.
For example, for a friendship network:
Jane Mary Bob Sue Alan Tom
Jane 1 1 0 0 0 Mary 1 0 1 0 0 Bob 1 0 0 1 0 Sue 0 1 0 1 0 Alan 0 0 0 1 1 Tom 0 0 0 0 1
Data storage II: Edge list
The edge list stores each pair of connected nodes in a single row of a table.
For example, for the same friendship network: ego alter Jane Mary Jane Bob Mary Sue Bob Alan Alan Tom Alan Susan
Which format to choose
Most network analysis packages support both formats. Some provide conversion facilities (e.g. UCINET: edge list to matrix).
It is usually possible to combine network data (in matrix or edge list format) and attributes.
A rectangular table is usually needed for attribute data —as in traditional social science.
Some general rules
Matrix visually appealing when nodeset is small, but difficult to handle when it is large (because all possible pairs must be explicitly included).
With large node sets, edge list is more convenient (because only existing ties need to be listed).
Tie data I
Directedties:
Tie goes from one node to another, but not necessarily back.
E.g. Advice-giving, money-lending. Usual graphical representation: arrow. When directed ties do go in both directions, they arereciprocalties. Usual graphical representation: double arrow. x b x a -x b x a
Tie data I
Directedties:
Tie goes from one node to another, but not necessarily back.
E.g. Advice-giving, money-lending. Usual graphical representation: arrow. When directed ties do go in both directions, they arereciprocalties. Usual graphical representation: double arrow. x b x a -x b x a
Tie data II
Undirectedties:
Ties are mutual by definition. E.g. Siblings, co-workers.
Usual graphical representation: line.
x
b
x
Undirected ties: matrix is symmetric
Jane Mary Bob Sue Alan Tom
Jane 1 1 0 0 0 Mary 1 0 1 0 0 Bob 1 0 0 1 0 Sue 0 1 0 1 0 Alan 0 0 0 1 1 Tom 0 0 0 0 1
Directed ties: matrix is NOT symmetric
Jane Mary Bob Sue Alan Tom
Jane 1 1 0 0 0 Mary 0 0 1 0 0 Bob 0 0 0 1 0 Sue 0 0 0 0 0 Alan 0 0 0 1 1 Tom 0 0 0 0 0
Binary and valued ties
Binary ties indicate presence or absence of tie Valued ties can be stronger or weaker, under some definition of strength:
Emotional closeness; Frequency of contact; Duration of Relationships.
Graphically: line (arrow) thickness often represents strength of tie.
Storing valued ties in a edge list
The edge list can include a third column with attributes of each tie. In our friendship example, we can include duration of friendship:
ego alter duration (years) Jane Mary 5 Jane Bob 2 Mary Susan 3 Bob Alan 1 Alan Tom 2 Alan Susan 2
Storing valued ties in a matrix
Instead of 0-1 values, the matrix has different values depending on duration of the relationship:
Jane Mary Bob Sue Alan Tom
Jane 5 2 0 0 0 Mary 0 0 3 0 0 Bob 0 0 0 1 0 Sue 0 0 0 0 0 Alan 0 0 0 2 2 Tom 0 0 0 0 0
Graphs
Basic principles of graph representation are simple (nodes and edges).
But graph visualisation is a complex problem in computer science.
Which representation is most suitable for detecting network structure and
properties?
Graphs
Basic principles of graph representation are simple (nodes and edges).
But graph visualisation is a complex problem in computer science.
Which representation is most suitable for detecting network structure and
properties?
Graphs
Basic principles of graph representation are simple (nodes and edges).
But graph visualisation is a complex problem in computer science.
Which representation is most suitable for detecting network structure and
properties?
Graphs
Basic principles of graph representation are simple (nodes and edges).
But graph visualisation is a complex problem in computer science.
Which representation is most suitable for detecting network structure and
properties?
Graphs
Basic principles of graph representation are simple (nodes and edges).
But graph visualisation is a complex problem in computer science.
Which representation is most suitable for detecting network structure and
properties?
Now you know:
Format for network data: square matrix, rectangular matrix, edge list.
Difference betweenEgo and whole networks. Directed and undirected ties.
Binary and valued ties.
Collecting network data
Networks are built from nodes and the ties between them. Who are the nodes?
What are the ties? How to elicit information?
How to identify nodes
Ego-network data collections often included in larger surveys. Whole network data collection requires defining network boundaries, for example:
Members of an organisation; Students of one school;
Attendees of one particular event.
N.B. collection of whole network data needs to be exhaustive –sensitive to response rate.
Collecting network data through surveys: name generators
and interpreters
Name generatorsare questions to elicit respondents’ alters, for example:
From time to time, most people discuss important matters with other people. Looking back of the last six months, who are the people with whom you discussed matters important to you. Just tell me their names or initials.
(General Social Survey, 1985)
Can be accompanied byname interpretersto report alter characteristics and identify ties between alters.
Collecting network data through surveys: rosters
Provide respondents with a list of potential network members and ask them to choose from the list those to whom they are tied, for example:Here is the list of all the members of your Firm.
Would you go through this list, and check the names of those you socialize with outside work. You know their family, they know yours, for instance. I do not mean all the people you are simply on a friendly level with, or people you happen to meet at Firm functions.
Collecting network data through surveys: rosters (cont.)
Used for whole network studies.Also useful as a memory-aid.
Requires the researcher to have a complete list of nodes from start. Only feasible for relatively small networks (e.g. schools, companies).
Collecting network data from archives
For example: contract data from companies’ financial statements; citations data, from publishers’ portals.
Depends on the quality of the archive and the actual availability of network information.
Need to ensure definition of ties is consistent and data are reported uniformly across all nodes.
Webcrawling
Using dedicated software to retrieve websites and the links between them.
Increasingly popular with the rise of web-based networks, online social networking services, the study of the Internet as a network. Defining network boundaries may be difficult.
Frequent need for manual verification of data quality. Privacy protection issues.
Now you know:
Different ways of collecting network data: surveys, archives, webcrawling.
All have advantages and disadvantages.
Choice depends on research questions, context, and expected outcomes.
Measuring properties of networks
Focus is on properties of patterns of relationships, independently of node attributes.
Based on the mathematics of graph theory, refined with social science concepts.
A variety of algorithms, measures and software applications are available.
Size
Network size = number of nodes (= number of contacts in a personal network);
The “Dunbar number”: cognitive limitations restrict the size of personal networks to about 150 contacts;
An open question: have social media increased human capacity to maintain relationships?
Median network size on Facebook = 99, average about 150 - 200 (though large variation).
Density
The proportion of ties that actually exist and the ties that could exist in principle:
Density = (n∗(Ln−1)) 2
for undirected ties;
Density =
(n∗(nL−1))for directed ties.
Application: Dense networks and behaviours
Why is this so?
Degree centrality
Who are the most “important” nodes? Diane has the highest number of direct connections (degree);
A connector, or hub.
Degree centrality
Who are the most “important” nodes? Diane has the highest number of direct connections (degree);
A connector, or hub.
Betweenness centrality
Heather has fewer connections than Diane;Yet she occupies a strategic position, between different parts of the network; She controls what flows in the network.
Closeness centrality
Fernando and Garth have fewer connections than Diane;But they are at a shorter distance from all other network members;
They can monitor the information flow in the network.
Core-periphery structures
Ike and Jane have low centrality scores; e.g. they may be external contractors for a company;may be sources of fresh information!
Network centralisation
The extent to which a network is dominated by one (or a few) nodes:
u u u u u @ @ @ @ @ @ @ @ @ u u u u u u u u
Network centralisation
Measures the extent to which a network is dominated by a single central node.
Comparing centrality of the most central node to the centrality of other nodes.
Normalized by dividing by the maximum centralization possible for a network of the given size.
Centralisation may vary over time
Figure:The advice network of judges in a Parisian court. Correlation between degrees, first to second observation
Distance
Distance: number of steps from one member to another; Shorter paths in a network are the most important;
The shorter the path from one network to the other, the quicker and more efficient the flow of information, advice, knowledge.
Cliques
u u u u u u u u u u u u A A A A A A @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ P P P P P P P PP3-member clique 4-member clique 5-member clique
Acliqueis a sub-set of nodes where all possible pairs of nodes are directly connected.
Real-world cliques
u u u u u u u u u u u u @ @ @ @ @ @ @ @ @ @ @ @1-clique 2-clique 3-clique
Completely connected groups uncommon.
n-clique: points connected by a maximum path link.
n-cliques of greater than 2 empirically infrequent. Scott (2000).
Application: Small Worlds
A “small world” network is sparse, but with dense neighbourhoods and short paths; and there are few steps from one member to any other.
Now you know:
Key metrics to measure properties of networks: Size;
Density;
Centrality / Centralisation; Distance;
Books on social network analysis: general
Thomas W. Valente. Social networks and health. Models, Methods, and Applications, Oxford UP 2010.
Christina Prell. Social Network Analysis. History, Theory and
Methodology, Sage 2011 (October).
Books on social network analysis: general (cont.)
Stanley Wasserman and Katherine Faust. Social Network Analysis:Methods and Applications, Cambridge UP, 1994.
Peter J. Carrington, John Scott, Stanley Wasserman (Eds.) Models
and Methods in Social Network Analysis, Cambridge UP, 2005.
Books on social network analysis: Theory
Ronald S. Burt. Brokerage and Closure: An Introduction to Social Capital, Oxford UP, 2005.
Ronald S. Burt. Neighbor Networks: Competitive Advantage Local and Personal, Oxford UP, 2010.
Nan Lin. Social Capital: A Theory of Social Structure and Action, Cambridge UP, 2002.
Books on social network analysis: Economics
Matthew O. JacksonSocial and Economic Networks, Princeton UP, 2010.
Sanjeev Goyal. Connections: An Introduction to the Economics of Networks, Princeton UP, 2009.
Fernando Vega-Redondo. Complex Social Networks,Cambridge UP 2007.
Journals
Social Networks, Elsevier
Connections
Journal of Social Structure
Associations and conferences
INSNA: Sunbelt XXXIII conference, May 2013, Hamburg (www.insna.org);
AFS - RT26, Ecole d’été, September 2012;
UKSNA: 8th annual conference, Bristol, June 2012; ASNA: 9th annual conference, Zurich, September 2012.
Thank you!