Research Blog Survey 1-1
Imagine your child requires a life-saving opera;on. You
enter the hospital and are confronted with a stark choice.
a. Do you take the tradi;onal path with human
medical staff, including doctors and nurses, where
long-term trials have shown a 90% chance that they will save
your child’s life?
b. Or do you choose the robo;c track, in the
factory-like wing of the hospital, tended to by technical
specialists and an array of robots, but where similar
long-term trials have shown that your child has a 95% chance
of survival?
Research Blog Survey 1-2
How much do you value the “in;macy” between
doctor and pa;ent?
a. Very important
b. Important
c. Neutral
d. Not so important
e. Meh
2Research Blog Survey 1-3
Let’s say your medical treatment goes tragically
wrong but there was no human interference. Who
would you blame?
a. The person who programmed the robot
b. The hospital
c. The robot
d. The other doctors present during the
opera;on
Research Blog Survey 1-4
Let’s say that you are having heart problems.
Would you trust a diagnosis from a doctor that
has 40 years of experience, or would you trust
ar;ficial intelligence that uses paZern
recogni;on to compare your heart scans, blood
test results, etc. to thousands of other pa;ents
across the country?
a. Doctor
b. AI
Research Blog Survey 1-5
If you were to support robot driven surgeries, what
would be your main reason?
a. Efficiency
b. Robots wouldn’t be affected by extraneous
factors and wouldn’t get ;red if it was a long
surgery
c. Sanita;on
d. Quicker
e. Robots could poten;ally catch more than a
human eye
Research Blog Survey 1-6
Discussion ques;on (Answer on piazza)
What do you think are the pros and cons about
having ar;ficial intelligence in healthcare?
Ethics of Big Data
Peter Danielson
Univ. of British Columbia
COGS 300.002
•
Scien;fic accuracy (vs. hype):
–
Google Flu Trends
•
Manipula;on & Consent
–
in social science (Facebook)
–
in commerce & poli;cs
•
De-iden;fica;on & Big Data Research
Google Flu Trends
•
Success of GFT lead story in
Mayer-Schonberger, V. & Cukier, K. (2013)
Big data:
A revolution that will transform how we live,
work, and think
. NY:Houghton Mifflin
Harcourt.
“Flu” ~ Flu
14 MARCH 2014 VOL 343 SCIENCE www.sciencemag.org
1204
POLICY
FORUM
Algorithm Dynamics
All empirical research stands on a founda-tion of measurement. Is the instrumentafounda-tion actually capturing the theoretical construct of interest? Is measurement stable and compa-rable across cases and over time? Are mea-surement errors systematic? At a minimum, it is quite likely that GFT was an unstable refl ection of the prevalence of the fl u because of algorithm dynamics affecting Google’s search algorithm. Algorithm dynamics are the changes made by engineers to improve the commercial service and by consum-ers in using that service. Several changes in Google’s search algorithm and user behav-ior likely affected GFT’s tracking. The most common explanation for GFT’s error is a media-stoked panic last fl u season ( 1, 15). Although this may have been a factor, it can-not explain why GFT has been missing high by wide margins for more than 2 years. The 2009 version of GFT has weathered other media panics related to the fl u, including the 2005–2006 influenza A/H5N1 (“bird flu”) outbreak and the 2009 A/H1N1 (“swine fl u”) pandemic. A more likely culprit is changes made by Google’s search algorithm itself.
The Google search algorithm is not a static entity—the company is constantly testing and improving search. For example, the offi cial Google search blog reported 86 changes in June and July 2012 alone (SM). Search patterns are the result of thousands of decisions made by the company’s program-mers in various subunits and by millions of consumers worldwide.
There are multiple challenges to replicat-ing GFT’s original algorithm. GFT has never documented the 45 search terms used, and the examples that have been released appear misleading ( 14) (SM). Google does provide a service, Google Correlate, which allows the user to identify search data that correlate with a given time series; however, it is lim-ited to national level data, whereas GFT was developed using correlations at the regional level ( 13). The service also fails to return any of the sample search terms reported in GFT-related publications ( 13, 14).
Nonetheless, using Google Correlate to compare correlated search terms for the GFT time series to those returned by the CDC’s data revealed some interesting differences. In particular, searches for treatments for the fl u and searches for information on differentiat-ing the cold from the fl u track closely with GFT’s errors (SM). This points to the possi-bility that the explanation for changes in rela-tive search behavior is “blue team” dynam-ics—where the algorithm producing the data (and thus user utilization) has been
modi-fi ed by the service provider in accordance with their business model. Google reported in June 2011 that it had modifi ed its search results to provide suggested additional search terms and reported again in February 2012 that it was now returning potential diagnoses for searches including physical symptoms like “fever” and “cough” ( 21, 22). The for-mer recommends searching for treatments of the fl u in response to general fl u inqui-ries, and the latter may explain the increase in some searches to distinguish the fl u from the common cold. We document several other changes that may have affected GFT (SM).
In improving its service to customers, Google is also changing the data-generating process. Modifications to the search algo-rithm are presumably implemented so as to support Google’s business model—for exam-ple, in part, by providing users useful infor-mation quickly and, in part, to promote more advertising revenue. Recommended searches, usually based on what others have searched, will increase the relative magnitude of certain searches. Because GFT uses the relative prev-alence of search terms in its model, improve-ments in the search algorithm can adversely affect GFT’s estimates. Oddly, GFT bakes in an assumption that relative search volume for certain terms is statically related to external
events, but search behavior is not just exog-enously determined, it is also endogexog-enously cultivated by the service provider.
Blue team issues are not limited to Google. Platforms such as Twitter and Face-book are always being re-engineered, and whether studies conducted even a year ago on data collected from these platforms can be replicated in later or earlier periods is an open question.
Although it does not appear to be an issue in GFT, scholars should also be aware of the potential for “red team” attacks on the sys-tems we monitor. Red team dynamics occur when research subjects (in this case Web searchers) attempt to manipulate the data-generating process to meet their own goals, such as economic or political gain. Twitter polling is a clear example of these tactics. Campaigns and companies, aware that news media are monitoring Twitter, have used numerous tactics to make sure their candidate or product is trending ( 23, 24).
Similar use has been made of Twitter and Facebook to spread rumors about stock prices and markets. Ironically, the more suc-cessful we become at monitoring the behav-ior of people using these open sources of information, the more tempting it will be to manipulate those signals.
0 2 4 6 8 10 07/01/09 07/01/10 07/01/11 Data 07/01/12 07/01/13
Google Flu Lagged CDC
Google Flu + CDC CDC –50 0 50 100 150 07/01/09 07/01/10 07/01/11 07/01/12 07/01/13
Google Flu Lagged CDC Google Flu + CDC
Google estimates more than double CDC estimates
Google starts estimating high 100 out of 108 weeks
% ILI
Error (% basel
ine)
GFT overestimation. GFT overestimated the prevalence of fl u in the 2012–2013 season and overshot the actual level in 2011–2012 by more than 50%. From 21 August 2011 to 1 September 2013, GFT reported overly high fl u prevalence 100 out of 108 weeks. (Top) Estimates of doctor visits for ILI. “Lagged CDC” incorporates 52-week seasonality variables with lagged CDC data. “Google Flu + CDC” combines GFT, lagged CDC estimates, lagged error of GFT estimates, and 52-week seasonality variables. (Bottom) Error [as a percentage {[Non-CDC estmate) (CDC estimate)]/(CDC) estimate)}. Both alternative models have much less error than GFT alone. Mean absolute error (MAE) during the out-of-sample period is 0.486 for GFT, 0.311 for lagged CDC, and 0.232 for combined GFT and CDC. All of these differences are statistically signifi cant at P < 0.05. See SM.
Q1
Which of the following was not a conclusion of the emo;onal
contagion experiment conducted via Facebook?
A) Nonverbal behavior is not necessary for emo;onal contagion to
occur
B) Direct interac;on is not necessary for emo;onal contagion to occur
C) Exposure to the happiness of others may produce an “alone
together social comparison effect” thereby actually depressing the
individuals who view it
D) Emo;onal contagion is propor;onal to emo;onal expression rather
than the content of a post
E) All of the above are true
Kevin
Rate Quiz Ques;on 1
A.
Excellent
B.
Very Good
C.
Good
D.
Acceptable
E.
Poor
12Q2
Could par;cipants opt out of the emo;onal
contagion experiment?
A.
Yes, this is required by the Cornell Univ.’s
Human Research Protec;on Program.
B.
Yes, because it involved emo;ons.
C.
No, because it was conducted by Facebook for
internal purposes.
D.
A & B
Q2
Could par;cipants opt out of the emo;onal
contagion experiment?
A.
Yes, this is required by the Cornell Univ.’s
Human Research Protec;on Program.
B.
Yes, because it involved emo;ons.
C.
No, because it was conducted by Facebook for
internal purposes.
D.
A & B
E.
None of the above
Facebook Experiment
•
Size: big sample and small effect
–
~ 155,000 per condi;on; 1/10 of 1 %
•
Informed consent
–
And Facebook terms of service
•
(1-800 in US)
–
Opportunity to opt out
–
Editorial Expression of Concern
•
Was experiment legal?
Rules (Canadian Tri Council)
“If the survey is normally administered as an
opera;onal requirement for quality assurance,
quality improvement, or for program evalua;on
purposes, then it would not require REB review
Ar;cle 2.5), because the survey would not be
considered “research” as defined in this policy.
”
Consequences
“And at the end of the day, the actual impact on
people in the experiment was the minimal
amount to sta;s;cally detect it,” …“Having
wriZen and designed this experiment myself, I
can tell you that our goal was never to upset
anyone. […] In hindsight, the research benefits
of the paper may not have jus;fied all of this
anxiety.” (Adam Kramer, lead author)
Double Standard II:
Internal Experiments
•
‘We no;ced recently that people didn’t like it
when Facebook “experimented” with their
news feed. Even the FTC is geung involved.
But guess what, everybody: if you use the
Internet, you’re the subject of hundreds of
experiments at any given ;me, on every site.
That’s how websites work.’
•
Chris;an Rudder
hZp://blog.okcupid.com/index.php/page/2/
Two Hello Cupid Experiments
•
Remove Photos
•
Disaster for usage:
•
But
–
Responses to 1
stmessages up 44%
–
Conversa;ons went
deeper
–
Contact details
exchanged quicker
Normal Tuesday
A Casual Experiment/Blog
20
•
Meta-data on photos
Q3
According to poli;cal opera;ves, why would a campaign official ask ci;zens, before an
elec;on, whether they would walk or drive to poll sta;ons?
a) to know whether having a man handing out flyers near the poll sta;on would
benefit the campaign.
b) to know how many ci;zens own a car, so they can change their automobile policies.
c
) to get them to think about vo;ng, thereby increasing the chances of actually going
to vote.
d) to understand the transporta;on behaviours of voters, so that loca;ons of poll
sta;ons could be changed for future elec;ons.
Elec;on Campaigns
•
“You don’t want your analy;cal efforts to be
obvious because voters get creeped
out.” (Duhigg, “Campaigns..”)
•
What explains what creeps us out?
–
Did Facebook experiment CYO?
–
Target?
•
How elated to moral permissibility?
Q4
What's the right order of steps in the process that creates habits?
1. Rou;ne
2. Reward
3. Cue
A) 2 - 1 - 3
B) 1 - 2 - 3
C) 3 - 1 - 2
D) 3 - 2 - 1
Irem
Rate Quiz Ques;on 4
A.
Excellent
B.
Very Good
C.
Good
D.
Acceptable
E.
Poor
24Q5
According to the New York Times Ar;cle on habits, when is a
woman most likely to develop new shopping habits?
a)
Shortly axer the birth of her child
b)
Shortly axer marriage
c)
During pregnancy
Rate Quiz Ques;on 5
A.
Excellent
B.
Very Good
C.
Good
D.
Acceptable
E.
Poor
26Discussion
•
Would the data driven techniques used by
Target be effec;ve against ra;onal agents?
•
Do the use of these techniques undercut
Ethics Methods Foiled by Big Data
1.
“No;ce and consent”
–
Dilemma of New unforeseen uses for data
•
Flu from search
–
Not feasible to re-consult
–
Or blanket permission for all uses
2.
Op;ng out:
–
German Street View blur
–
Target for egging!
Big Data is Private Data
•
Research Methodology
–
Lack of transparency at Google Flue
–
Insider access at Yahoo, Facebook, OKCupid
–
How check or replicate private data?
Ethics Methods Foiled by Big Data
3. “Anonymiza;on vs. Reiden;fica;on
–
The MassachuseZs Group Insurance Commission
had a bright idea back in the mid-1990s—it
decided to release "anonymized" data on state
employees that showed every single hospital visit.
The goal was to help researchers, and the state
spent ;me removing all obvious iden;fiers such as
name, address, and Social Security number. But a
graduate student in computer science saw a
chance to make a point about the limits of
anonymiza;on.
•
At the ;me GIC released the data, William Weld, then
Governor of MassachuseZs, assured the public that GIC had
protected pa;ent privacy by dele;ng iden;fiers. In response,
then-graduate student Sweeney started hun;ng for the
Governor’s hospital records in the GIC data. She knew that
Governor Weld resided in Cambridge, MassachuseZs, a city
of 54,000 residents and seven ZIP codes. For twenty dollars,
she purchased the complete voter rolls from the city of
Cambridge, a database containing, among other things, the
name, address, ZIP code, birth date, and sex of every voter.
By combining this data with the GIC records, Sweeney found
Governor Weld with ease. Only six people in Cambridge
shared his birth date, only three of them men, and of them,
only he lived in his ZIP code. In a theatrical flourish, Dr.
Sweeney sent the Governor’s health records (which included
diagnoses and prescrip;ons) to his office.
Ethics Methods Foiled by Big Data
3.
“Anonymiza;on vs. Reiden;fica;on
–
AOL – content of searches alone
•
“60 single men”, “landscapers in Lilburn Ga”
–
Ne~lix contest (and other data)
–
In 2000, [Sweeney] showed that 87 percent of all
Americans could be uniquely iden;fied using only three
bits of informa;on: ZIP code, birthdate, and sex.
–
Cell Phone data even easier.
–
“This … new subspecialty of computer science,
reiden;fica;on science … unearths a tension that shakes a
founda;onal belief about data privacy:
Data can be either
useful or perfectly anonymous but never both.” (
Ohm,
2004, p.1703f)
Vs. Yakowitz: “Tragedy of the Data
Commons”
–
Benefit of Public Data Sets
–
Risks: Theore;cal?
–
Harm: "The risk of privacy harm from
re-iden;fica;on is actually significantly lower than
many of the everyday risks we take for granted,
such as those aZendant on throwing out our
Big Data Research Dilemma
•
“We’re living through a golden age of
behavioral research. It’s amazing how much
we can figure out about how people think
now.” (Eric Siegal quoted in Duhigg, “How
Companies…”)
–
WaZs to Rudder
•
But also undercuung ethical basis of
behavioral research?
References
•
Ohm, Paul, Broken Promises of Privacy: Responding to the
Surprising Failure of Anonymiza;on (August 13, 2009). UCLA Law
Review, Vol. 57, p. 1701, 2010; U of Colorado Law Legal Studies
Research Paper No. 9-12. Available at SSRN:
hZp://ssrn.com/abstract=1450006
•
Yakowitz, J. (2011). Tragedy of the data commons. Harv. JL & Tech.,
25, 1.
•
Mayer-Schonberger, V. & Cukier, K. (2013) Big data: A revolution
that will transform how we live, work, and think. NY:Houghton
Mifflin Harcourt.
•
Watts, D.J. (2011) Everything Is Obvious: *Once You Know the
Answer. Crown Business
•
Rudder, C. (2014) Dataclysm: Who We Are (when we think no one’s
looking). Crown,