Privacy Risks of Big Data Analytics – From a Regulator’s Point of View HKU Big Data and Privacy Workshop
1. Data protection principles
2. Big data analytics and privacy
Big Data Analytics and Mobile Apps
Personal Data Flow
Collection Retention/
Erasure Storage, Use or Processing
Collection Limitation Data Quality
Use Limitation IT System
OECD Privacy Framework Principles
1. Data protection principles
2. Big data analytics and privacy
Big Data Analytics and Mobile Apps
Failures of Big Data Analytics Big Data and Privacy
Failures of Big Data Analytics – Google Flu Prediction
It does not always work…
• Underestimated by half in 2009 when comparing with CDC data
• Overestimated by half in 2012 when comparing with CDC data
• Predictor of flu or predictor of winter?
• A black-box approach makes it hard for people to judge
Failures of Big Data Analytics – US Presidential Election
“Past performance does not guarantee future results…”
• Colorado professors built a data model that correctly “backward predicted” the eight US presidential election results since 1980
• It failed to forward predict the 2012 election…
Privacy risks of big data analytics Big Data and Privacy
Privacy Risks of Big Data Analytics
1. Sense of rights violation or “surprise”
2. Re-identification
3. Negative impact/discrimination
Privacy Risks of Big Data Analytics
1. Sense of rights violation or “surprise”
2. Re-identification
3. Negative impact/discrimination
Correct predication can still be creepy Big Data and Privacy
The Surprise of Big Data Analytics – Target’s Pregnancy Prediction
If it works in this way…
The Surprise of Big Data Analytics – Target’s Pregnancy Prediction
Target learnt this lesson:
“Then we started, in the same mailer, mixing baby items with other things we know they would never buy, like lawn mower… as long as the pregnant woman doesn’t know she has been spied on, it works and she would use the coupons…”
Privacy Risks of Big Data Analytics
1. Sense of rights violation or “surprise”
2. Re-identification
The myth of anonymisation Big Data and Privacy
The Myth of Anonymisation
AOL released “anonymised” search records of 650,000 people over a three-month period
– User 4417749 was found to be Ms Arnold of Lilburn of Georgia through the keywords she entered
– Her searches also included “nicotine effect”, “dry mouth”, “hand tremors”,
“bipolar disorder” – do we need to
“Anonymised” Massachusetts state employee hospital records
– State employee hospital records released for research
– Governor reassured the public that the data was de- identified
The Myth of Anonymisation
– Governor’s own record re- identified by a researcher by
How much data do you need to identify someone?
– 87% US population can be identified by using Zip code, gender and date of birth;
– 53% by place, gender and date of birth; and – 18% by county, gender and date of birth.
The Myth of Anonymisation
The only way to make data anonymous is to make it useless…
Professor Paul Ohm (University of Colorado Law School)
The Myth of Anonymisation
Privacy Risks of Big Data Analytics
1. Sense of rights violation or “surprise”
2. Re-identification
3. Negative impact/discrimination
Before we look at discrimination, let’s look at the reality of big data analytics
Big Data and Privacy
Big data analytics:
Correlation
Causation
The Reality of Big Data Analytics
US spending on science, space, and technology reveals Suicides by hanging, strangulation and suffocation?
The (Academic) Reality of Big Data Analytics
Number of Nicolas Cage films reveals swimming-pool drowning?
The (Academic) Reality of Big Data Analytics
Divorce rate in Maine reveals Per capita consumption of margarine ?
The (Academic) Reality of Big Data Analytics
But, do we really care about
the difference between correlation and causation?
The Reality of Big Data Analytics
The (Commercial) Reality of Big Data Analytics
The (Commercial) Reality of Big Data Analytics
The (Commercial) Reality of Big Data Analytics
The Reality of Big Data Analytics
The Reality of Big Data Analytics
Marketers are not interested in theories, they are interested in results.
– So if it works, what’s the problem?
– So if users of table feet protectors pay back their loans promptly, what’s wrong in lending to them?
The problem lies with the ‘have not’, those that you
The Reality of Big Data Analytics
Is there a solution to this?
– Need to know what big data is and isn’t good at
IS • Pattern matcher
• Gives recommendations
ISN’T • Substitutes for proper data collection and
Privacy Challenge of Big Data Analytics
Risks recap:
1. The (unintended) impacts on people when it is working;
2. The risks of re-identifying people from anonymised sensitive data; and
3. The “targeted not”.
Big Data Analytics