Big Data – a big issue for Official
Statistics?
ASC Conference – 26 September 2014
Pete Brodie
Session objectives
• Big Data and Official Statistics
• The ONS Big Data Project aims
• Wider engagement and communication
• Pilots
Infrastructure, Innovation Labs
Smartmeters
Mobile Phones
Prices
• Emerging Findings
• Next Steps
Big Data and Official Statistics
• Replace existing outputs
• Produce an entirely new output
• Complement other sources:
• Filling in gaps
• Auxiliary variables for statistical models
• Improve operational processes
• Quality assurance
ONS Big Data Project
• A one year project which aims to:
• investigate the potential for big data in official statistics while
understanding the challenges
• establish an ONS policy and longer term strategy which
incorporates ONS’s position within Government and
internationally in this field
• Recommend next steps to support the strategy going
forward
Wider engagement and communication
• International:
• UNECE / ESS
• Cross-government:
• HMG Data Science Community of Interest Group
and Cross Profession Working Group
• GSS data strategy
• GDS/DfT/DECC/BoE/DSTL
• Big data for statistics vs other types of analysis
Wider engagement and communication
• Academia:
• ESRC/RSS
• University of Southampton/Cardiff/Huddersfield
• Private sector:
• Mobile network operators
• Mysupermarket/Billion Prices
• Privacy groups
• B2011 Privacy Advisory Group/GDS Privacy and
Ethical Committee
Pilots - Infrastructure
• Huge and continuously growing data streams,
requiring new data architectures and software
• Feasibility and efficiency of processing,
typically requiring parallel computing on a
large scale
• New skills will be required, bringing together
statistical and technological expertise
Pilots – Innovation Labs
• A new facility to allow research with datasets
and tools without compromising ONS security
• INDEPENDENT of ONS main systems
• NOT SECURE – so the only data that can go
on there is PUBLIC data
• A “private cloud” – individual machines are
pooled together to provide an integrated
environment, accessed via web browser
Pilots – Smart meters
Investigating the potential of smart meter
electricity data (high frequency – 30 mins)
to identify household occupancy levels,
potentially household structure
• England and Ireland both conducted pilots of
rollout in 2009-2010 – data now available for
research
• Southampton University commissioned by
Beyond 2011 to conduct preliminary research
Pilots – Smart meters
Day
T
ot
al
da
il
y
el
ec
tr
ic
it
y
c
on
s
um
p
ti
o
n
(k
Wh)
Irish smart meter pilot study:
Single meter, total daily electricity consumption
Ju l y 2 00 9 O ct o be r 2 00 9 Feb rua ry 20 10 May 2 01 0 Au gu st 20 10 Dec em be r 20 10 Christmas 2009 Christmas 2010
Consecutiv e days w ith low consumption, possibly a w eek aw ay? 50
100 150
Pilots – Mobile Phones
Investigating using mobile phone data to
model population flows, eg travel to work
statistics
• GDS
• Discussions with mobile phone providers
(Telefonica, Vodafone, EE) to provide
aggregate data on origin-destination flows
• Ethical/commercial issues
Pilots – Prices
Scraping prices data from the internet for
use within price statistics
• Potential for richer, more frequent and
cheaper data collection
• Focus on grocery prices from three on-line
supermarkets
• Prototype scrapers collecting a selection of
CPI/RPI item categories (daily collection)
• We are purchasing data from
MySupermarket.com (linked data, longer time
series) for research purposes
Webscraping
Rendered webpage:
HTML code:
...
</div><div class="productLists" id="endFacets-1"><ul class="cf products line"><li id="p-254942348-3" class=" first"><div class="desc"><h3 class="inBasketInfoContainer"><a id="h-254942348" href="/groceries/Product/Details/?id=254942348" class="si_pl_254942348-title"><span class="image"><img
src="http://img.tesco.com/Groceries/pi/121\5010044000121\IDShot_90x90.jpg" alt="" /><!----></span>Warburtons Toastie Sliced
White Bread 800G</a></h3><p class="limitedLife"><a href="http://www.tesco.com/groceries/zones/default.aspx?name=quality-and-freshness">Delivering the freshest food to your door- Find out more ></a></p><div class="descContent"><!----><div class="promo"><a href="/groceries/SpecialOffers/SpecialOfferDetail/Default.aspx?promoId=A31234788" title="All products available for this offer" id="flyout-254942348-promo-A31234788--pos" class="promoFlyout"><span class="promoImgBox"><img src="/Groceries/UIAssets/I/Sites/Retail/Superstore/Online/Product/pos/2for.png" class="promoFlyout promo" alt="Special Offer" id="flyout-254942348-promo-A31234788--posimg" /></span><em>Any 2 for £2.00</em></a><span> valid from 21/1/2014 until 10/2/2014</span></div><div class="tools"><div class="moreInfo"><a href="/groceries/Product/Details/?id=254942348" class="midiFlyout" id="flyout-254942348-midi-0-"><img class="midiFlyout hd"
src="http://ui.tescoassets.com/groceries/UIAssets/I/../Compressed/I_635209615845382232/Sites/Retail/Superstore/Online/Product/i nfoBlue.gif" alt="" title="View product information" id="flyout-254942348-midi-1-" /></a></div><!----><div
class="links"><ul><li><a
href="http://www.tesco.com/groceries/product/browse/default.aspx?notepad=white%20sliced%20loaf%20800g&N=4294793217" class="shelfFlyout active plaintooltip" id="s-tt-254942348" title="Premium White Bread"> Rest of <span class="hide">Premium White Bread <!----></span>shelf </a></li></ul></div></div></div></div><div class="quantity"><div class="content addToBasket"><p