2 APRIL 2014 Paul Hinton Nicola Fulford Jeremy Harris Chris Wray
There’s data –
and then there’s big data
IT is one of four technologies that will shape
future global developments
“information technology is entering the
big data era”
“process power and data storage are
becoming almost free”
“networks and the cloud will provide
global access and pervasive services”
“social media and cybersecurity will be
large new markets”
US National Intelligence Council’s December 2012 Report –
_3
“A collection of structured/unstructured data so large/complex that it is difficult to
process or analyse using traditional data processing applications.”
“A large and growing gap exists between the amount of data that organisations
can accumulate and their abilities to leverage those data in a way that is useful”
(NIC Report, p. 85/6)
Big Data Solutions = (i) hardware; (ii) software; and (iii) often cloud based services
capable of analysing and processing “Big Data”
But why? Because by analysing (often via computer algorithim) vast amounts of
information from multiple sources we can gain deep insights into underlying facts
and trends..
The stone age involved man's clever use of crude tools;
the information age, to date, has seen man's crude use of
clever tools.
The “Big Data” Analytical Horizontal Model
Social Media Question Regular ReportBig Data
Storage
Platform
Cloud?
Live data feed Internal structured data Third party review Third Party Data Transaction data Internal unstructured Data Algorithm Search engine databaseThe “Big Data” Analytical Horizontal Model
Unforseen consequences
_9
Plans
Change
_10
BIG Data Talk
Data Protection – Nicola Fulford
IP Rights and Big Data - Jeremy Harris
BREAK
Data Contracts – Paul Hinton
The Solution - Data Process and Management –
2 APRIL 2014
NICOLA FULFORD
Privacy & Data Protection Partner
Big Data –
Data Protection
Implications
A vertical model for the data-centric world …
Data Protection Act 1998:
Data which relate to a living individual
who can be identified from those data or from other information in the [controller’s]
possession … including opinions about or intentions in respect of that person
_13
Personal data is being
processed where information is collected and analysed with the intention of distinguishing one individual from another and to take a particular action in
respect of an individual … even if no obvious identifiers, such as names or addresses, are held”
Information Commissioner’s Office Guidance
Big Data: And ‘sensitive’ personal data?
_14
Personal data consisting
of information as to…
Religious beliefs
Commission / alleged
commission of an offence or
related court proceedings
Racial / ethnic origin
Membership of a
trade union
Sexual life
Political opinions
Physical or mental
health
There is a lot of personal data out there: – location tracking – facial recognition – wearables – social media – fitness apps – hospital outcomes
– the Internet of things
_15
Big Data Regulation: Examples of personal data
– call recordings
– communications meta data
– supermarket loyalty schemes – cookies
– smart meters
– YouTube etc. videos – CCTV / ANPR
Big Data: Compliance with the DP principles
_16
Big Data
Fair & lawful
Purpose limitation Adequate, relevant and not excessive Accurate & up to date Retention limits Adequate protection outside EEA Kept secure Comply with subjects’ rights
ICO?
– ICO Code of Practice on Anonymisation Article 29 Working Party?
– Opinion on Purpose Limitation
European Data Protection Supervisor?
– Opinion on privacy and competitiveness in the age of Big Data Federal Trade Commission?
– Investigation into data brokers
…….Comments by European Digital Chief Neelie Kroes in March 2014
_17
Current status?
– Commission and Parliament have each agreed their own positions and texts
– Council is still discussing its position: nothing will be agreed until everything is agreed – Next steps: trilogue negotiations
May be agreed end 2014?? / in force from 2017???
_18
Key points that are likely to impact heavily on the use of big data – fines of up to €100m or 5% of worldwide turnover
– cookies and IP addresses will constitute personal data – ‘profiling’ restricted
– extra-territorial effect
– direct obligations on processors – right to erasure for individuals
_19
2 APRIL 2014
Big Data and IP
JEREMY HARRIS
The “Big Data” Factory
Social Media Question Regular ReportBig Data
Storage
Platform
Cloud?
Live data feed Internal structured data Third party review Third Party Data Transaction data Internal unstructured Data Algorithm Search engine databaseUnderstanding how IP fits in with Big Data means knowing:
what you are getting
from where
when
how
under what circumstances, and
how you’re using it
It’s worth looking at the IP position in terms of input and output data
Different types of data from many sources
Is there a licence in place? Does it matter from an IP perspective?
Not really:
– if there is no licence in place, it doesn’t mean that there are no IP rights in the
data
– even if there is a licence in place, still need to understand the IP position
– breach of licence could constitute infringement of IP, but
– the measure of damages for IP infringement is different to breach of contract
Licence might include indemnity against losses incurred as a result of
infringement of third party IP
– both from licensor to licensee and licensee to licensor
Database Right
Database Copyright
Literary Copyright
Confidence
Is there a qualifying database?
“a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means” (Art 1(2) DD)
Does the right subsist?
“qualitatively and/or quantitatively a substantial investment in either the obtaining, verification or presentation of the contents ..” (Art 7(1) DD)
– who created the data? – remember BHB v William Hill – the ‘investment in obtaining, verifying or presenting’ protects ‘resources used to seek out independent material’
– therefore, resources used for the creation of the data are not protected – contrast with recent case – Football DataCo v Sportradar
– data collected and recorded at a live events - the compiler of this information had little control over it and it was therefore not ‘created’ by that person but merely ‘obtained’ by them
Would use of the data constitute an infringement?
Extraction or reutilisation of a substantial part (quantitatively or qualitatively); or Repeated extraction or reutilisation of insubstantial parts
Substantial part:
‘quantitative’ evaluation - proportion of the volume of data lifted in relation to the total volume of the contents of the database
‘qualitative’ assessment – small part of the database which requires significant human, technical or financial investment, may amount to a substantial part evaluated qualitatively
– look at the ‘scale of investment’ in obtaining, verifying or presenting the part taken Accordingly, you need to look at
– how much data is being taken – how often
– how important it is
Is there a qualifying database?
Same definition as for database right
Does the right subsist?
Protection if, “by reason of the selection or arrangement of the contents of the database the database constitutes the author’s own intellectual creation” (section 3A CPDA)
– what exactly is the ‘author’s own intellectual creation’?
– Football DataCo v Brittens Pools/Yahoo = an original expression of the creative freedom of the author (which is a matter for the national court to determine)
Accordingly:
– concept of ‘selection and arrangement’ does not extend to the creation of the data contained in the database
– must be some creative ability in an original manner by making free and creative choices as to selection/arrangement
Is there a copyright work?
Original creation? Literary merit?
Skill, labour & judgment - no longer relevant
– the test is whether it constitutes the author’s own intellectual creation (Infopaq) Infringement:
– would taking the data constitute an an infringing act?
– has a substantial part been taken, whether on a quantatitive or qualitative basis? – quality is more important
Are there obligations of confidence in place?
Is the data stated to be confidential?
Website terms?
Circumstances under which reasonable person would understand it to be confidential?
Is IP being created in the output process?
What are you doing with the data?
– manipulation of the data in a new/different way – incorporation with other data sets
– creation of master database – producing reports
Is there any:
– investment in ‘obtaining’, ‘verifying’ or ‘presenting’? – intellectual creativity in ‘selection and arrangement’?
BREAK
Data Contracts – Paul Hinton
The Solution - Data Process and
Management – Chris Wray
2 APRIL 2014
Navigating the gap
Contracting for “Big
Data” licensed
content
PAUL HINTON
COMMERCIAL TECHNOLOGY PARTNER
Licensing requirements
Input data – standard contracts
Output data - key terms
Conclusion
_33
The “Big Data” Factory
_34 Social Media Question Regular Report
Big Data
Storage
Platform
Cloud?
Live data feed Internal structured data Social Media Data Public Data Transaction data Internal unstructured Data Algorithm Search engine database
Ensure that all input data can be used for all “Big Data” purposes – ideally (i)
assignment; or (ii) a broad worldwide, irrevocable, perpetual, licence to do
anything..
Assuming this is not always possible;
– What will the data be used for? By whom? Internal v external?
– Will the data be relied upon for anything?
_35
Even if ‘no’ licence applies – access can be made subject to a licence at any time – once data obtains value this tends to happen
Many standard licences prohibit such use: – General website terms and conditions
“You are not permitted (except where you have been given express permission to do so) to adapt or modify the Information on this Website or any part of it and the Information or any part of it may not be copied, reproduced, republished, downloaded, stored, databased, posted, broadcast or transmitted in any other way to any third parties for commercial gain.”
– They can be amended at any time..
_36
Input data - standard contracts - social media
_37
Contracts – your use of data
Terms include…
• Only request data that you need for your application
• Must not include data in any search engine or directory without FB consent • Cannot include user data in an advertising creative EVEN if user consents • Cannot transfer data to advertising network
• Cannot sell data
• FB can force you to delete data if your use is “inconsistent with user expectations”
Terms include…
• Not to store on non-public user profile data or content
• Cannot use the Twitter API to aggregate geographic location information contained in Twitter Content
• May not use Twitter Content or other data collected from end users to create or maintain a separate status update or social network database or service
• Don’t sell access to the Twitter API or Twitter Content T’s consent
Terms include…
• They say very little… for now
• Reflects the limited data flow presently available on this platform
• Includes general restriction on use of “Pinterest Content” – you cannot use, modify, reproduce, distribute, sell, license, or otherwise exploit it without Pinterest’s permission.
Key issue for Licensor is that “Big Data” does not = substitute for original data
– What if data only forms a small part of query?
– What if the enquiry is a “one-off”?
If individual or search fees charged= complex previsioning system
_38
Input data – Typical licence issues =
CHARGES
Ideally = flat fee for all
“Big Data” use
Charges per
individual licensed
users or queries
Derived data:
“data of any kind containing Data or any part of it and/or resulting directly orindirectly from the manipulation or analysis of Data (whether generated by human or machine) whether alone or in conjunction with other data regardless of whether or not the Data is in any way identifiable from or within such data by any means;”
Standard positions:
_39
Input data – market data – derived data
Prohibits distribution
of derived data
Derived data
distribution
permitted
Permitted but
not “backwards
calculable”
How much can “Big Data” be relied upon?
– data itself perhaps not verified if from third parties
– what if created by licensor?
– no remedy/reasonableness – UCTA unreasonable exclusion
_40
Input data –market data – warranties and liability
No warranties as to accuracy “as is” “as available” “not to be
relied upon”
No IP warranty/indemnity Calculated in accordance
with specification
Damages capped at fees IP Indemnity uncapped
“The Licensee shall permit [ ] to access to and monitor the use of the Licensee's
system used to distribute the Data, in order to verify that the use of the Data by
the Licensee is in accordance with this Agreement and that Charges due under
this Agreement have been calculated and paid correctly.
”
Confidentiality clauses relevant to other data - insert explicit licence for third
party auditors - but some of these will be competitors.
_41
Input data –market data – audit
No audit
Full audit and downstream audit right to licensees Often delete/purge obligations
Carve –outs:
– reports already sent out?
– record keeping for regulatory requirements?
– record keeping to enable protection of claims?
_42
Input data – financial services market data – Exit
No deletion
Purge all data from
Output Data - Metered Access – key terms
Reflect upstream data input obligations accurately
Avoid sole reliance upon IP rights
– “Contract is king” - Etherton J in At the races v
BHB (2005)
The ideal is to create metered access to data subject to contract at each stage:
– Impose clear rights explicitly drafted as obligations into a contract
– Licence Clauses - set out only what may be done in detail – reserve all other rights
– Contract neutral and able to be flexible adapt where possible by reference to other
documents that can change
Post Termination Rights – do not presume the licence will be implied to terminate Regina
Glass Fibre v Schuller [1972] FSR 141 – “if the Licensee will not be able to enjoy the benefit of what he has paid for”. Explicitly set out what must be done post termination – “purge”
“Big Data” requires significant legal input and a clear legal methodology and lead to ensure compliance and to maximise value
Back to front – understanding outputs and system capability critical before negotiating input agreements
Input data licences are likely to need careful contract review and negotiation Output data licences will need to be carefully drafted, flexible and updated
If complex licensed rights are asserted to data within the “Big Data Factory” system capabilities will be needed to ensure “output” compliance: (i) data tracking; (ii) limit access to or use of data; (iii) attach terms/attributions to data; and (iv) delete/remove data – how quickly/efficiently
Data strategy, policies, process and data management framework – technological solution
_44
2 APRIL 2014
Management and compliance challenges
There’s data
- and then there’s big
data
CHRIS WRAY
Towards big data management and compliance
Multiple sources
provide data
• Public domain
• Market data
• Personal data
• Confidential data
• Licensed data
• Government data
• Employee data
Self-generated and
derived data
Multiple operations
process data
• 3
rdparty
applications
• “Secret sauce” algo
• Pan enterprise
search
• One view of
information
• Data “re-purposing”
• Employee data
Multiple purposes use
data
• Internal use
• Product dev
• Sales &
Marketing
• CRM
• Management
• Finance
• External use
Step 1: Risk Assessment
– structured process to review/assess/report/remediate
– involve all parts of the business
– establish all the types of data your organisation is using & their sources
– where does the data come from? what consents were obtained/are needed?
– what legal wrappers apply to all this data – IPR, contract, regulatory, etc
– what processes do these datasets undergo?
– what does your organisation use this data for?
– data protection environment is reasonably mature – use this as a start point?
Step 2: Strategy Statement
– the start point that everything is referable back to
– high level statement of organisation’s goals relating to big data
– list stakeholders - top management and all parts of the business bought in
– rationale, scope, governance, etc
Step 3: Policy Statement
– next level down
– people context: stakeholder groups, their interests & how they are achieved
– steering group, working party, compliance officers, etc
– project plan – scope, responsibilities, timelines, etc
– state tools to be used (mix of IT/system measure & processes & procedures
– approvals, etc
Step 4: Processes and Procedures
– applicable to all staff – tie in to HR policies, etc
– proportionate processes & procedures to be followed
– IT system/measures & how they’re to be used
– awareness training
_52
The importance of a logic data model
Business Information Model Logical Data Model Integration Specific Data Model Application Specific Data Model Data Warehouse Specific Data Model Database Specific Data Model End to End Scenarios End to End Processes & Activities Integration Processes Computer Independent Model Platform Independent Model Platform Specific Model IT Systems & Components Private Cloud Hybrid Cloud Public Cloud On Premise