SRA Summer Event 2013
26 June 2013
Can the use of “big data”
eliminate the need for yet another
traditional Census in 2021?
Keith Dugmore
Daily Mail
2 March 2011
Page 2
Agenda
•
Users’ needs: the value of Census-type information
•
What was done in 2011 (and since 1801). Time for a change?
•
Is there a better way? Can we learn from other countries? ONS’s
“Beyond 2011” project.
•
Which Big Data files of government administrative and commercial
customer records would be most valuable?
•
Opportunities, limitations, and barriers to be overcome
& my thanks and acknowledgement to:
– Barry Leventhal, Peter Furness, and Corrine Moy for using material from our joint paper in the IJMR Vol. 53 Issue 5
The Census of Population (e.g. 2011)
•
A unique data source
– Compulsory, 100% count of population and households (achieves 94%, plus imputation)
– Many questions / topics, including age, health, ethnicity, language, religion, qualifications, occupation, travel to work, housing tenure, household size, car ownership, etc...
– Counts of not only residents, but also workplace populations – All four countries of the UK
•
Products
– Detailed statistics for very small Output Areas
– Geographical data too – OA boundaries, and a postcode / OA directory – + Microdata files
– & all are free
Why is Census data so important to commercial
companies?
•
Decisions, decisions……
– What areas are best for our new branches? – What should we offer in each outlet?
– Where should we advertise?
– Who are our best customers, and prospects? – Which areas & people should we survey?
•
Investments of £00s of millions to be targeted every year
•
The Census provides a unique range of topics, small area statistics, &
consistent and often UK-wide coverage to companies such as…..
Sainsbury’s estate since release of 2001 Census data
March 2003
March 2010
Sainsburys Stores
Main Stores Convenience Stores
User sectors – the organisations
Commercial – DUG as the tip of
the iceberg of 2.3 million
businesses
Other sectors have similar needs
(seeking to target services to the
public efficiently)
•Central government
•Local government
•Health Service
•Charities
Or have similar interests in society
•Academics – teaching and
research
Analysis, and the need for Government data generally
Analyses
•
Local areas
•
Profiling individuals
•
Designing surveys
Data – with national coverage
•
Statistics
– Census-type counts for very
small areas
– Sample surveys
•
Map data
– Background, point locations,
road network, boundaries,
postcode look-ups
•
Lists
– Big files of individual addresses
& sometimes people
A38 A4 A37 A4174 A4320 A3029 M5 M4 19 M32 M49 18 18A 17 341 Be dm ins te r 404 Br is tol Cr ibbs Caus e w ay
509 Br is tol Galle r ie s
689 Br is tol Im pe r ial Par k
577 Em e r s ons Gr e e n
Weekly Household Income
638.87 523.91 647.13 754.63 774.70 387.13 671.81 609.63 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 Average 1.00 2.00 3.00 4.00 5.00 6.00 7.00 OA C S u pe r G roup s
Census data collection
(what, still largely
A Traditional Census
•
Winding back 50 years to 1961……… no fundamental change
•
Household forms (paper)
•
Delivery
•
Collection
•
Some innovation: post out (2001 & 2011); post back (2011); online
option 2011
But what do they do in other
countries?
Big Data –
Administrative and Customer
files
Customer
Information
System
(DWP+HMRC)Patient
Register
Linkage Of Administrative Sources
Electoral
Roll
Resident Population
Higher
Education
(HESA)School
Census
Big Data –
“Information collected by commercial companies”
•
A report for ONS in 2009. You can find it at:
http://www.esrc.ac.uk/_images/UKDF0710-%20Keith%20Dugmore_tcm8-8500.pdf
•
ONS’s interests – population estimates, 2011 Census planning, &
Beyond 2011
•
The UK population, as customers
•
What information is collected? It varies greatly by sector and product
•
(& we must remember that commercial companies that resell
information – VARs such as Experian, Equifax, CACI, etc. – are a
different species)
The information collected – its weaknesses, & strengths
•
Weaknesses:
– Not representative samples or subsets of the population as a whole: biases by region, and demographics, and the impact of marketing campaigns
– Updating of records, especially addresses and demographics, can also be patchy – [Big government files from DWP, HMRC, Health, Education must be the first
targets]
•
Strengths:
– Large stocks of customers (often >10 million) – Large flows of new customers
– Timeliness
– Very detailed data on customer behaviour – transactions (+ debt & fraud) – Insight / Intelligence from partial counts (c.f. accredited “National Statistics”)
And pooling of records within sectors to build near 100% coverage can
be very powerful
Information collected – some headlines for 6 sectors
•
Retail
– Huge range of products
– Superstores & local shops, but also online, catalogue, etc.
– Major companies often have 10-15 million customers
– Limited demographics collected at time of application
– Loyalty cards track spending in great detail
Information collected – Financial Services
•
Financial Services
– Wide range of products, e.g. current account, mortgage, savings, loans
– Various sales channels – branches, ATMs, online, post, etc.
– Often >10 million customers; aim to create a customer (c.f. product) view
– Detailed demographics collected for some products, e.g. mortgage – Current accounts & credit cards
track spending in great detail – Pooling of databases is well
established, e.g. mortgages, savings, credit, fraud
Information collected – Leisure
•
Leisure
– Whitbread as an example • Restaurants • Costa Coffee • Hotels– Millions of customers, but we don’t provide leisure companies with much information about ourselves
Information collected – Energy
•
Electricity (& Gas)
– Electricity has 100% coverage, gas c.80%
– Coverage across the UK is patchy / regional
– Minimal demographic information – Lots of effort put into maintaining
address / meter files
– Good data on fraud & debt – Pooling of databases well
established – meter list used by ONS for 2011 Census to identify multi-occupied addresses; DECC statistics on energy consumption
Information collected – Water
•
Water
– Each water company has its own territory (NB)
– Many properties still billed
according to rateable value (c.f. metered)
– Lots of effort put into maintaining address files
– Minimal demographic information – Good data on debt
Information collected – Telecoms
•
Telecoms
– Mobile telephone & broadband has major players, each with >15
million customers
– Mobiles – Post Pay (monthly contract – application form)
– Mobiles – Pre Pay (little information collected)
– Address information – only basic PAF for c.50% on contract
– Transaction information – stunning: full detail of every call, inc. location
Beyond 2011 – commercial data workshop
Daily Mail
2 March 2011
Page 2
Daily Mail
2 March 2011
Page 1
Beyond 2011 – commercial data workshop
•
ONS, January 2011: 11 B2C companies + 2 resellers
•
Headline conclusions
– Sharing of individual records may be prevented by various factors, such as reputational risk, or limitations due to the Data Protection Act
– Achieving a single customer view can be very difficult due to data matching problems
– Computing power is much less of an issue
– Companies would seek to help ONS if they could do so – aiming to minimise risk, effort & cost
– Anonymised records or aggregate statistics may provide mechanisms
Opportunities, limitations, and
barriers to be overcome
Cost profile (real terms)
2011
2021
2031
2041
Cost
Census
???
Alternative method
Statistical benefit profile
2011
2021
2031
2041
Benefit
Census
Alternative
method
loss
gain
loss
gain
Can Big Data meet users’ needs?
•
Frequency – e.g. annual
•
Geography – Output Areas
•
Topics?
– Omissions (e.g. Language)?
– Additions (e.g. Income); also proxies? – Accuracy?
– Change / instability? – All UK?
•
Multivariate analysis?
– Not just complex tables: even simple 2-way