• No results found

The Social Science Data Revolution

N/A
N/A
Protected

Academic year: 2021

Share "The Social Science Data Revolution"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

The Social Science Data Revolution

Gary King

Department of Government

Harvard University

(Horizons in Political Science talk, Government Department, Harvard University, 3/30/11)

(2)

The Changing Evidence Base of Social Science Research

The Last 50 Years:

Survey research

Aggregate government statistics

In depth studies of individual places, people, or events

The Next 50 Years: Spectacular increases in new data sources, due to. . .

Much more of the above — improved, expanded, and applied

Shrinking computers & the growing Internet: data everywhere

The replication movement: academic data sharing (e.g., Dataverse)

Governments encouraging data collection, distribution,

experimentation (e.g., GovData)

Advances in statistical methods, informatics, & software

The march of quantification: through academia, professions,

government, & commerce (SuperCrunchers,

The Numerati,

MoneyBall)

(3)

Examples of what’s now possible

Opinions of activists:

A few thousand interviews

millions of

political opinions in social media posts (1B tweets/week)

Exercise:

A survey: “How many times did you exercise last week?

500K people carrying cell phones with accelerometers

Social contacts:

A survey: “Please tell me your 5 best friends”

continuous record of phone calls, emails, text messages, bluetooth,

social media connections, electronic address books

Economic development in developing countries:

Dubious or

nonexistent governmental statistics

satellite images of

human-generated light at night, or networks of roads and other

infrastructure

Expert-vs-Statistician contests:

Whenever enough information is

quantified (& a right answer exists), stats wins every time

Many, many more. . .

(4)

How to make progress in the new data-rich world?

1

Computer-assisted methods:

Traditional quantitative-only or

qualitative-only approaches are infeasible

2

Large-scale, interdisciplinary, collaborative

research

3

New statistical methods & engineering

required

4

Better theory:

to respond to massive new evidence, privacy

challenges, data-driven science

(5)

Two Examples

of

Automated Text Analysis

(6)

Example 1: How to Read a Billion Blog Posts

(& Classify Deaths without Physicians)

Daniel Hopkins and Gary King. “

Extracting Systematic Social Science

Meaning from Text

AJPS,

commercialized via:

Gary King and Ying Lu. “

Verbal Autopsy Methods with Multiple

(7)

Data and Quantities of Interest

Input Data:

Large set of text documents (e.g., all English language blog posts)

Categories (posts about US candidates): extremely negative, negative,

neutral, positive, extremely positive, no opinion, not a blog

A small “training set” of documents hand-coded into the categories

Quantities of interest

Computer science

: individual document classifications (spam filters,

Google searches)

Social Science

: proportion in each category (proportion of email which

is spam; proportion extremely negative comment about Pres Bush)

Estimation

Can

get the 2nd by counting the 1st (if 1st is accurate)

High classification accuracy

;

unbiased category proportions

70% classification accuracy is high

disaster for category proportions

New methodology:

unbiased category proportions

, even when

classification accuracy is low

(8)

Out-of-sample Comparison: 60 Seconds vs. 8.7 Days

● ●

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Affect in Blogs

Actual P(D)

Estimated P(D)

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Affect in Blogs

Actual P(D)

Estimated P(D)

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Affect in Blogs

Actual P(D)

Estimated P(D)

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Affect in Blogs

Actual P(D)

Estimated P(D)

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Affect in Blogs

Actual P(D)

Estimated P(D)

(9)

Reactions to John Kerry’s Botched Joke

You know, education — if you make the most of it . . . you can

do well. If you don’t, you get stuck in Iraq.

● ●

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Affect Towards John Kerry

2006−2007

Proportion

Sept

Oct

Nov

Dec

Jan

Feb

Mar

−2

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Affect Towards John Kerry

2006−2007

Proportion

Sept

Oct

Nov

Dec

Jan

Feb

Mar

−1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Affect Towards John Kerry

2006−2007

Proportion

Sept

Oct

Nov

Dec

Jan

Feb

Mar

0

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Affect Towards John Kerry

2006−2007

Proportion

Sept

Oct

Nov

Dec

Jan

Feb

Mar

1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Affect Towards John Kerry

2006−2007

Proportion

Sept

Oct

Nov

Dec

Jan

Feb

Mar

2

(10)

Example 2: Computer-Assisted “Reading”

Justin Grimmer and Gary King. 2011. “General-Purpose Clustering

and Conceptualization”

Proceedings of the National Academy of

Sciences.

Conceptualization through

Classification

: “one of the most central

and generic of all our conceptual exercises. . . . Without classification,

there could be no advanced conceptualization, reasoning, language,

data analysis or,for that matter, social science research.” (Bailey,

1994).

Cluster Analysis:

simultaneously (1) invents categories and (2)

assigns documents to categories

(11)

What’s Hard about Clustering?

(Why Johnny Can’t Classify)

Goal: Computer-assisted conceptualization & clustering

Bell(n) = number of ways of partitioning

n

objects

Bell(2) = 2 (AB, A B)

Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C)

Bell(5) = 52

Bell(100)

10

28

×

Number of elementary particles in the universe

Now imagine choosing the

optimal

classification scheme by hand!

Available compromises pursue different goals:

Standard Approach:

Fully automated methods

no method works

well in general; impossible to know which to apply!

Our Approach:

Computer-assisted methods

You, not some

computer algorithm, decides what’s important, but with help

(12)

Switch from Fully Automated to Computer Assisted

Computer-Assisted Clustering

Easy in theory:

list all clusterings; choose the best

Impossible in practice:

Too hard for us mere humans!

An

organized list

will make the search possible

Insight:

Many clusterings are perceptually identical

E.g.,: consider two clusterings that differ only because one document

(of 10,000) moves from category 5 to 6

(13)

How to Zoom Out while Reading

You choose one (or more) clustering, based on insight, discovery, useful information,. . .

Space of Clusterings

biclust_spectral clust_convex mult_dirproc dismea rock som spec_cos spec_euc spec_man spec_mink spec_max spec_canb mspec_cos mspec_euc mspec_man mspec_mink mspec_max mspec_canb affprop cosine affprop euclidean affprop manhattan affprop info.costs affprop maximum divisive stand.euc divisive euclidean divisive manhattan kmedoids stand.euc kmedoids euclidean kmedoids manhattan mixvmf mixvmfVA kmeans euclidean kmeans maximum kmeans manhattan kmeans canberra kmeans binary kmeans pearson kmeans correlation kmeans spearman kmeans kendall

hclust euclidean ward hclust euclidean single

hclust euclidean complete hclust euclidean average

hclust euclidean mcquitty hclust euclidean median

hclust euclidean centroid hclust maximum ward hclust maximum single

hclust maximum completehclust maximum averagehclust maximum mcquitty hclust maximum median hclust maximum centroid

hclust manhattan ward hclust manhattan single

hclust manhattan complete hclust manhattan average hclust manhattan mcquitty hclust manhattan median

hclust manhattan centroid

hclust canberra ward hclust canberra single

hclust canberra complete hclust canberra average

hclust canberra mcquittyhclust canberra median hclust canberra centroid

hclust binary ward hclust binary single

hclust binary complete hclust binary average

hclust binary mcquitty hclust binary median

hclust binary centroid

hclust pearson ward hclust pearson single

hclust pearson complete hclust pearson average hclust pearson mcquitty hclust pearson median hclust pearson centroid

hclust correlation ward

hclust correlation single

hclust correlation complete hclust correlation average hclust correlation mcquitty

hclust correlation median hclust correlation centroid

hclust spearman ward hclust spearman single

hclust spearman complete hclust spearman average

hclust spearman mcquitty hclust spearman median hclust spearman centroid

hclust kendall ward

hclust kendall single

hclust kendall complete hclust kendall average

hclust kendall mcquitty hclust kendall median hclust kendall centroid

● ●

Clustering 1

Carter

Clinton

Eisenhower

Ford

Johnson

Kennedy

Nixon

Obama

Roosevelt

Truman

Bush

HWBush

Reagan

``Other

Presidents ''

``Reagan

Republicans''

Clustering 2

Carter

Eisenhower

Ford

Johnson

Kennedy

Nixon

Roosevelt

Truman

Bush

Clinton

HWBush

Obama

Reagan

``Roosevelt

To Carter''

`` Reagan To

Obama ''

(14)

Evaluation: More Informative Discoveries

Found 2 scholars analyzing lots of textual data for their work

Created 6 clusterings:

2 clusterings selected with our method (

biased

against us)

2 clusterings from each of 2 other methods (varying tuning parameters)

Created info packet on each clustering (for each cluster: exemplar

document, automated content summary)

Asked for

6

2

=15 pairwise comparisons

User chooses

only care about the one clustering that wins

Both cases a Condorcet winner:

“Immigration”:

Our Method 1

vMF 1

vMF 2

Our Method 2

K-Means 1

K-Means 2

“Genetic testing”:

(15)

Evaluation: What Do Members of Congress Do?

-

David Mayhew’s (1974) famous typology

-

Advertising

-

Credit Claiming

-

Position Taking

-

Data: 200 press releases from Frank Lautenberg’s office (D-NJ)

-

Apply our method

(16)

Example Discovery

: Partisan Taunting

biclust_spectral clust_convex mult_dirproc dismea dist_cos dist_fbinary dist_ebinary dist_minkowskidist_binarydist_maxdist_canb mec

rock som

sot_euc

sot_cor

spec_cosspec_manspec_euc spec_mink spec_maxspec_canb mspec_cosmspec_euc mspec_man mspec_mink mspec_max mspec_canb affprop cosine affprop euclidean affprop manhattan affprop info.costs affprop maximum divisive stand.euc divisive euclidean divisive manhattan kmedoids stand.euc kmedoids euclidean kmedoids manhattan mixvmf mixvmfVA kmeans euclidean kmeans maximum kmeans manhattan kmeans canberra kmeans binary kmeans pearson kmeans correlation kmeans spearman kmeans kendall

hclust euclidean ward hclust euclidean single

hclust euclidean complete hclust euclidean average

hclust euclidean mcquitty hclust euclidean median hclust euclidean centroid

hclust maximum ward hclust maximum single

hclust maximum completehclust maximum average hclust maximum mcquitty hclust maximum medianhclust maximum centroid

hclust manhattan ward hclust manhattan single

hclust manhattan complete hclust manhattan average hclust manhattan mcquitty hclust manhattan medianhclust manhattan centroid

hclust canberra ward

hclust canberra single

hclust canberra complete hclust canberra average

hclust canberra mcquitty hclust canberra median

hclust canberra centroid

hclust binary ward hclust binary single

hclust binary complete hclust binary average

hclust binary mcquitty hclust binary median

hclust binary centroid

hclust pearson ward hclust pearson single

hclust pearson complete hclust pearson averagehclust pearson mcquitty hclust pearson median

hclust pearson centroid

hclust correlation ward hclust correlation single hclust correlation complete

hclust correlation average hclust correlation mcquitty hclust correlation median

hclust correlation centroid

hclust spearman ward hclust spearman single

hclust spearman complete hclust spearman average hclust spearman mcquitty hclust spearman medianhclust spearman centroid

hclust kendall ward hclust kendall single

hclust kendall complete hclust kendall averagehclust kendall mcquitty hclust kendall medianhclust kendall centroid

biclust_spectral clust_convex mult_dirproc dismea dist_cos dist_fbinary dist_ebinary dist_minkowskidist_binarydist_maxdist_canb mec

rock som

sot_euc

sot_cor

spec_cosspec_manspec_euc spec_mink spec_maxspec_canb mspec_cosmspec_euc mspec_man mspec_mink mspec_max mspec_canb affprop cosine affprop euclidean affprop manhattan affprop info.costs affprop maximum divisive stand.euc divisive euclidean divisive manhattan kmedoids stand.euc kmedoids euclidean kmedoids manhattan mixvmf mixvmfVA kmeans euclidean kmeans maximum kmeans manhattan kmeans canberra kmeans binary kmeans pearson kmeans correlation kmeans spearman kmeans kendall

hclust euclidean ward hclust euclidean single

hclust euclidean complete hclust euclidean average

hclust euclidean mcquitty hclust euclidean median hclust euclidean centroid

hclust maximum ward hclust maximum single

hclust maximum completehclust maximum average hclust maximum mcquitty hclust maximum medianhclust maximum centroid

hclust manhattan ward hclust manhattan single

hclust manhattan complete hclust manhattan average hclust manhattan mcquitty hclust manhattan medianhclust manhattan centroid

hclust canberra ward

hclust canberra single

hclust canberra complete hclust canberra average

hclust canberra mcquitty hclust canberra median

hclust canberra centroid

hclust binary ward hclust binary single

hclust binary complete hclust binary average

hclust binary mcquitty hclust binary median

hclust binary centroid

hclust pearson ward hclust pearson single

hclust pearson complete hclust pearson averagehclust pearson mcquitty hclust pearson median

hclust pearson centroid

hclust correlation ward hclust correlation single hclust correlation complete

hclust correlation average hclust correlation mcquitty hclust correlation median

hclust correlation centroid

hclust spearman ward hclust spearman single

hclust spearman complete hclust spearman average hclust spearman mcquitty hclust spearman medianhclust spearman centroid

hclust kendall ward hclust kendall single

hclust kendall complete hclust kendall averagehclust kendall mcquitty hclust kendall medianhclust kendall centroid

biclust_spectral clust_convex mult_dirproc dismea dist_cos dist_fbinary dist_ebinary dist_minkowskidist_binarydist_maxdist_canb mec

rock som

sot_euc

sot_cor

spec_cosspec_manspec_euc spec_mink spec_maxspec_canb mspec_cosmspec_euc mspec_man mspec_mink mspec_max mspec_canb affprop cosine affprop euclidean affprop manhattan affprop info.costs affprop maximum divisive stand.euc divisive euclidean divisive manhattan kmedoids stand.euc kmedoids euclidean kmedoids manhattan mixvmf mixvmfVA kmeans euclidean kmeans maximum kmeans manhattan kmeans canberra kmeans binary kmeans pearson kmeans correlation kmeans spearman kmeans kendall

hclust euclidean ward hclust euclidean single

hclust euclidean complete hclust euclidean average

hclust euclidean mcquitty hclust euclidean median hclust euclidean centroid

hclust maximum ward hclust maximum single

hclust maximum completehclust maximum average hclust maximum mcquitty hclust maximum medianhclust maximum centroid

hclust manhattan ward hclust manhattan single

hclust manhattan complete hclust manhattan average hclust manhattan mcquitty hclust manhattan medianhclust manhattan centroid

hclust canberra ward

hclust canberra single

hclust canberra complete hclust canberra average

hclust canberra mcquitty hclust canberra median

hclust canberra centroid

hclust binary ward hclust binary single

hclust binary complete hclust binary average

hclust binary mcquitty hclust binary median

hclust binary centroid

hclust pearson ward hclust pearson single

hclust pearson complete hclust pearson average

hclust pearson mcquitty hclust pearson median

hclust pearson centroid

hclust correlation ward hclust correlation single hclust correlation complete

hclust correlation average hclust correlation mcquitty hclust correlation median

hclust correlation centroid

hclust spearman ward hclust spearman single

hclust spearman complete hclust spearman average hclust spearman mcquitty hclust spearman medianhclust spearman centroid

hclust kendall ward hclust kendall single

hclust kendall complete hclust kendall averagehclust kendall mcquitty hclust kendall medianhclust kendall centroid

● ● biclust_spectral clust_convex mult_dirproc dismea dist_cos dist_fbinary dist_ebinary dist_minkowskidist_binarydist_maxdist_canb mec

rock som

sot_euc

sot_cor

spec_cosspec_manspec_euc spec_mink spec_maxspec_canb mspec_cosmspec_euc mspec_man mspec_mink mspec_max mspec_canb affprop cosine affprop euclidean affprop manhattan affprop info.costs affprop maximum divisive stand.euc divisive euclidean divisive manhattan kmedoids stand.euc kmedoids euclidean kmedoids manhattan mixvmf mixvmfVA kmeans euclidean kmeans maximum kmeans manhattan kmeans canberra kmeans binary kmeans pearson kmeans correlation kmeans spearman kmeans kendall

hclust euclidean ward hclust euclidean single

hclust euclidean complete hclust euclidean average

hclust euclidean mcquitty hclust euclidean median hclust euclidean centroid

hclust maximum ward hclust maximum single

hclust maximum completehclust maximum average hclust maximum mcquitty hclust maximum medianhclust maximum centroid

hclust manhattan ward hclust manhattan single

hclust manhattan complete hclust manhattan average hclust manhattan mcquitty hclust manhattan medianhclust manhattan centroid

hclust canberra ward

hclust canberra single

hclust canberra complete hclust canberra average

hclust canberra mcquitty hclust canberra median

hclust canberra centroid

hclust binary ward hclust binary single

hclust binary complete hclust binary average

hclust binary mcquitty hclust binary median

hclust binary centroid

hclust pearson ward hclust pearson single

hclust pearson complete hclust pearson averagehclust pearson mcquitty hclust pearson median

hclust pearson centroid

hclust correlation ward hclust correlation single hclust correlation complete

hclust correlation average hclust correlation mcquitty hclust correlation median

hclust correlation centroid

hclust spearman ward hclust spearman single

hclust spearman complete hclust spearman average hclust spearman mcquitty hclust spearman medianhclust spearman centroid

hclust kendall ward hclust kendall single

hclust kendall complete hclust kendall averagehclust kendall mcquitty hclust kendall medianhclust kendall centroid

Clusters in this Clustering

Mayhew

biclust_spectral clust_convex mult_dirproc dismea dist_cos dist_fbinary dist_ebinary dist_minkowskidist_binarydist_maxdist_canb mec

rock som

sot_euc

sot_cor

spec_cosspec_manspec_euc spec_mink spec_maxspec_canb mspec_cosmspec_euc mspec_man mspec_mink mspec_max mspec_canb affprop cosine affprop euclidean affprop manhattan affprop info.costs affprop maximum divisive stand.euc divisive euclidean divisive manhattan kmedoids stand.euc kmedoids euclidean kmedoids manhattan mixvmf mixvmfVA kmeans euclidean kmeans maximum kmeans manhattan kmeans canberra kmeans binary kmeans pearson kmeans correlation kmeans spearman kmeans kendall

hclust euclidean ward hclust euclidean single

hclust euclidean complete hclust euclidean average

hclust euclidean mcquitty hclust euclidean median hclust euclidean centroid

hclust maximum ward hclust maximum single

hclust maximum completehclust maximum average hclust maximum mcquitty hclust maximum medianhclust maximum centroid

hclust manhattan ward hclust manhattan single

hclust manhattan complete hclust manhattan average hclust manhattan mcquitty hclust manhattan medianhclust manhattan centroid

hclust canberra ward

hclust canberra single

hclust canberra complete hclust canberra average

hclust canberra mcquitty hclust canberra median

hclust canberra centroid

hclust binary ward hclust binary single

hclust binary complete hclust binary average

hclust binary mcquitty hclust binary median

hclust binary centroid

hclust pearson ward hclust pearson single

hclust pearson complete hclust pearson average

hclust pearson mcquitty hclust pearson median

hclust pearson centroid

hclust correlation ward hclust correlation single hclust correlation complete

hclust correlation average hclust correlation mcquitty hclust correlation median

hclust correlation centroid

hclust spearman ward hclust spearman single

hclust spearman complete hclust spearman average hclust spearman mcquitty hclust spearman medianhclust spearman centroid

hclust kendall ward hclust kendall single

hclust kendall complete hclust kendall averagehclust kendall mcquitty hclust kendall medianhclust kendall centroid

Clusters in this Clustering

Mayhew

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Credit Claiming

Pork

biclust_spectral clust_convex mult_dirproc dismea dist_cos dist_fbinary dist_ebinary dist_minkowskidist_binarydist_maxdist_canb mec

rock som

sot_euc

sot_cor

spec_cosspec_manspec_euc spec_mink spec_maxspec_canb mspec_cosmspec_euc mspec_man mspec_mink mspec_max mspec_canb affprop cosine affprop euclidean affprop manhattan affprop info.costs affprop maximum divisive stand.euc divisive euclidean divisive manhattan kmedoids stand.euc kmedoids euclidean kmedoids manhattan mixvmf mixvmfVA kmeans euclidean kmeans maximum kmeans manhattan kmeans canberra kmeans binary kmeans pearson kmeans correlation kmeans spearman kmeans kendall

hclust euclidean ward hclust euclidean single

hclust euclidean complete hclust euclidean average

hclust euclidean mcquitty hclust euclidean median hclust euclidean centroid

hclust maximum ward hclust maximum single

hclust maximum completehclust maximum average hclust maximum mcquitty hclust maximum medianhclust maximum centroid

hclust manhattan ward hclust manhattan single

hclust manhattan complete hclust manhattan average hclust manhattan mcquitty hclust manhattan medianhclust manhattan centroid

hclust canberra ward

hclust canberra single

hclust canberra complete hclust canberra average

hclust canberra mcquitty hclust canberra median

hclust canberra centroid

hclust binary ward hclust binary single

hclust binary complete hclust binary average

hclust binary mcquitty hclust binary median

hclust binary centroid

hclust pearson ward hclust pearson single

hclust pearson complete hclust pearson averagehclust pearson mcquitty hclust pearson median

hclust pearson centroid

hclust correlation ward hclust correlation single hclust correlation complete

hclust correlation average hclust correlation mcquitty hclust correlation median

hclust correlation centroid

hclust spearman ward hclust spearman single

hclust spearman complete hclust spearman average hclust spearman mcquitty hclust spearman medianhclust spearman centroid

hclust kendall ward hclust kendall single

hclust kendall complete hclust kendall averagehclust kendall mcquitty hclust kendall medianhclust kendall centroid

Clusters in this Clustering

Mayhew

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Credit Claiming

Pork

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Credit Claiming

Legislation

biclust_spectral clust_convex mult_dirproc dismea dist_cos dist_fbinary dist_ebinary dist_minkowskidist_binarydist_maxdist_canb mec

rock som

sot_euc

sot_cor

spec_cosspec_manspec_euc spec_mink spec_maxspec_canb mspec_cosmspec_euc mspec_man mspec_mink mspec_max mspec_canb affprop cosine affprop euclidean affprop manhattan affprop info.costs affprop maximum divisive stand.euc divisive euclidean divisive manhattan kmedoids stand.euc kmedoids euclidean kmedoids manhattan mixvmf mixvmfVA kmeans euclidean kmeans maximum kmeans manhattan kmeans canberra kmeans binary kmeans pearson kmeans correlation kmeans spearman kmeans kendall

hclust euclidean ward hclust euclidean single

hclust euclidean complete hclust euclidean average

hclust euclidean mcquitty hclust euclidean median hclust euclidean centroid

hclust maximum ward hclust maximum single

hclust maximum completehclust maximum average hclust maximum mcquitty hclust maximum medianhclust maximum centroid

hclust manhattan ward hclust manhattan single

hclust manhattan complete hclust manhattan average hclust manhattan mcquitty hclust manhattan medianhclust manhattan centroid

hclust canberra ward

hclust canberra single

hclust canberra complete hclust canberra average

hclust canberra mcquitty hclust canberra median

hclust canberra centroid

hclust binary ward hclust binary single

hclust binary complete hclust binary average

hclust binary mcquitty hclust binary median

hclust binary centroid

hclust pearson ward hclust pearson single

hclust pearson complete hclust pearson average

hclust pearson mcquitty hclust pearson median

hclust pearson centroid

hclust correlation ward hclust correlation single hclust correlation complete

hclust correlation average hclust correlation mcquitty hclust correlation median

hclust correlation centroid

hclust spearman ward hclust spearman single

hclust spearman complete hclust spearman average hclust spearman mcquitty hclust spearman medianhclust spearman centroid

hclust kendall ward hclust kendall single

hclust kendall complete hclust kendall averagehclust kendall mcquitty hclust kendall medianhclust kendall centroid

Clusters in this Clustering

Mayhew

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Credit Claiming

Pork

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Credit Claiming

Legislation

● ● ● ● ●● ● ● ● ●

Advertising

Advertising

biclust_spectral clust_convex mult_dirproc dismea dist_cos dist_fbinary dist_ebinary dist_minkowskidist_binarydist_maxdist_canb mec

rock som

sot_euc

sot_cor

spec_cosspec_manspec_euc spec_mink spec_maxspec_canb mspec_cosmspec_euc mspec_man mspec_mink mspec_max mspec_canb affprop cosine affprop euclidean affprop manhattan affprop info.costs affprop maximum divisive stand.euc divisive euclidean divisive manhattan kmedoids stand.euc kmedoids euclidean kmedoids manhattan mixvmf mixvmfVA kmeans euclidean kmeans maximum kmeans manhattan kmeans canberra kmeans binary kmeans pearson kmeans correlation kmeans spearman kmeans kendall

hclust euclidean ward hclust euclidean single

hclust euclidean complete hclust euclidean average

hclust euclidean mcquitty hclust euclidean median hclust euclidean centroid

hclust maximum ward hclust maximum single

hclust maximum completehclust maximum average hclust maximum mcquitty hclust maximum medianhclust maximum centroid

hclust manhattan ward hclust manhattan single

hclust manhattan complete hclust manhattan average hclust manhattan mcquitty hclust manhattan medianhclust manhattan centroid

hclust canberra ward

hclust canberra single

hclust canberra complete hclust canberra average

hclust canberra mcquitty hclust canberra median

hclust canberra centroid

hclust binary ward hclust binary single

hclust binary complete hclust binary average

hclust binary mcquitty hclust binary median

hclust binary centroid

hclust pearson ward hclust pearson single

hclust pearson complete hclust pearson averagehclust pearson mcquitty hclust pearson median

hclust pearson centroid

hclust correlation ward hclust correlation single hclust correlation complete

hclust correlation average hclust correlation mcquitty hclust correlation median

hclust correlation centroid

hclust spearman ward hclust spearman single

hclust spearman complete hclust spearman average hclust spearman mcquitty hclust spearman medianhclust spearman centroid

hclust kendall ward hclust kendall single

hclust kendall complete hclust kendall averagehclust kendall mcquitty hclust kendall medianhclust kendall centroid

Clusters in this Clustering

Mayhew

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Credit Claiming

Pork

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Credit Claiming

● ● ● ● ●● ● ● ● ●

Advertising

Advertising

● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●

Partisan Taunting

The space of clusterings Found a

region

with particularly insightful clusterings

Let’s look at

one clustering

in this

region

Credit Claiming, Pork

:

“Sens. Frank R. Lautenberg (D-NJ)

and Robert Menendez (D-NJ)

announced that the U.S. Department of

Commerce has awarded a

$

100,000

grant to the South Jersey Economic

Development District”

Credit Claiming,

Legislation

:

“As the Senate begins its recess,

Senator Frank Lautenberg today

pointed to a string of victories in

Congress on his legislative agenda

during this work period”

Advertising

:

“Senate Adopts Lautenberg/Menendez

Resolution Honoring Spelling Bee

Champion from New Jersey”

Partisan Taunting

:

“Republicans Selling Out Nation on

Chemical Plant Security” “Senator

Lautenberg’s amendment would change

the name of. . . the Republican bill. . . to

‘More Tax Breaks for the Rich and

More Debt for Our Grandchildren

Deficit Expansion Reconciliation Act of

2006”’

Definition of Partisan Taunting

:

Explicit, public, and negative attacks on

another political party or its members

Taunting ruins

deliberation

(17)

In Sample Illustration of Partisan Taunting

Taunting ruins deliberation

Sen. Lautenberg

on Senate Floor

4/29/04

-

“Senator Lautenberg Blasts

Republicans as ‘Chicken Hawks’ ”

[Government Oversight]

-

“The scopes trial took place in

1925. Sadly, President Bush’s veto

today shows that we haven’t

progressed much since then”

[Healthcare]

-

“Every day the House Republicans

dragged this out was a day that

made our communities less

safe.”[Homeland Security]

(18)

Out of Sample Confirmation of Partisan Taunting

-

Discovered using 200 press releases; 1 senator.

-

Confirmed using 64,033 press releases; 301 senator-years.

-

Apply supervised learning method: measure

proportion of press

releases

a senator taunts other party

Prop. of Press Releases Taunting

Frequency

0.1

0.2

0.3

0.4

0.5

10

20

(19)

Out of Sample Confirmation of Partisan Taunting

-

Discovered using 200 press releases; 1 senator.

-

Confirmed using 64,033 press releases; 301 senator-years.

-

Apply supervised learning method: measure

proportion of press

releases

a senator taunts other party

Prop. of Press Releases Taunting

Frequency

0.1

0.2

0.3

0.4

0.5

10

20

30

Prop. of Press Releases Taunting

Frequency

0.1

0.2

0.3

0.4

0.5

10

20

30

On Avg., Senators Taunt

in 27 % of Press Releases

(20)

Normative Implications of Taunting

Partisan taunting:

Very common

Makes deliberation less likely

Occurs more often in homogeneously partisan districts (i.e., when

preaching to the choir)

Incompatibility of the principles of representative democracy

To get reflection: Homogeneous (noncompetitive) districts

To get deliberation (no taunting): Heterogeneous (competitive)

districts

(21)

Some New Data Types

1

Unstructured text:

emails (1 LOC every 10 minutes), speeches,

government reports, blogs, social media updates, web pages,

newspapers, scholarly literature

2

Commercial activity:

credit cards, sales data, and real estate

transactions, product RFIDs

3

Geographic location:

cell phones, Fastlane or EZPass transponders,

garage cameras

4

Health information:

digital medical records, hospital admittances,

google/MS health, and accelerometers and other devices being

included in cell phones

5

Biological sciences:

effectively becoming social sciences as genomics,

proteomics, metabolomics, and brain imaging produce huge numbers

of

person-level variables.

6

Satellite imagery:

increasing in scope, resolution, and availability.

7

Electoral activity:

ballot images, precinct-level results, individual-level

registration, primary participation, and campaign contributions

(22)

Some More New Data Examples

8

Social media:

facebook, twitter, social bookmarking, blog comments,

product reviews, virtual worlds, game behavior, crowd sourcing

9

Web surfing artifacts:

clicks, searches, and advertising clickthroughs.

(Google collects 1 petabyte/72 minutes on human behavior!)

10

Multiplayer web games and virtual worlds:

Billions of highly

controlled experiments on human behavior

11

Government bureaucracies:

moving from paper to electronic data

bases, increasing availability

12

Governmental policies:

requiring more data collection, such e.g., “No

Child Left Behind Act”; allowing randomized policy experiments;

Obama pushing data distribution

13

Scholarly data:

the replication movement in academia, led in part by

(23)

Enormous Emerging Opportunities for Social Scientists

For the first time:

technologies

,

policies

,

data

, and

methods

are

making it feasible to attack some of the most vexing problems that

afflict human society

A massive change from

studying problems

to

understanding and

solving problems

Opportunities require a change in our job descriptions, with new:

1

Computer-assisted methods:

Traditional quantitative-only or

qualitative-only approaches are infeasible

2

Large-scale, interdisciplinary, collaborative

research

3

New statistical methods & engineering

required

4

Better theory:

to respond to massive new evidence, privacy challenges,

data-driven science

And then there’s you & me:

In most legislatures, courts, academic departments, . . . , change comes

from replacement not conversion

Will we wait to be replaced? or put in the effort to convert and learn

how to use the new information?

(24)

For more information

References

Related documents

WHCK contains tests for GPS Sensors, Radio Manager, Device Fundamentals, System Fundamentals Power Management, Windows Driver Test Framework (WDTF), and USB Hardware Certification

might be that Denver police officers—and we suspect police officers in general—use arrests far too frequently when many incidents could have been handled through alternatives

The PML-RARA expression was detected only in the EGFP + fraction and not in the EGFP 2 fraction of the sorted human CD45 + /CD33 + cells from the NOG mice ( Figure 2C ).. The

“Biomimetic and non biomimetic synthesis of molecular sieves and their applications for controlled molecular release, electronics and catalysis” 9:50 – 10:30.

Based on this degradation model, the maintenance cost of two proposed prognostic maintenance policies are briefly reviewed and compared with the traditional degradation

 Walks around local community to examine human or natural environment..  Use compass directions during warm

Article 7 of the Rome Statute enumerates the acts which constitute crimes against humanity when committed as part of a widespread or systematic attack against a civilian

Issittup Ukiuanut 2007-08-mut suliniu- tiginnissimasut tassaapput nunat tama- laat akornanni ilisimatusarnikkut atta- veqaatit ICSU (International Council for Science) aamma WMO