• No results found

There s data and then there s big data

N/A
N/A
Protected

Academic year: 2021

Share "There s data and then there s big data"

Copied!
51
0
0

Loading.... (view fulltext now)

Full text

(1)

15TH JANUARY 2014

Richard Kemp Paul Hinton

Jeremy Harris

There’s data –

and then there’s big data

(2)

IT is one of four technologies that will shape

future global developments

“information technology is entering the

big data era”

“process power and data storage are

becoming almost free”

“networks and the cloud will provide

global access and pervasive services”

“social media and cybersecurity will be

large new markets”

US National Intelligence Council’s December 2012 Report –

(3)

“Since modern data solutions have emerged, big data sets have grown exponentially in size. At the same time, the building blocks of knowledge discovery, and the software tools and best

practices available to organisations that handle big data sets, have not kept pace with such growth. So a large - and rapidly growing - gap exists between the amount of data that organisations can

accumulate and their abilities to leverage those data in a way that is useful” (NIC Report, p. 85/6)

The impact of big data is all about knowing your customer and

the competitive advantage that confers

Technology focus Current status Potential for 2030 Issues Impact

Data solutions Large data sorting and analysis is applied in various large

industries, but the quality of data

accumulating is

outstripping the ability of systems to leverage it efficiently. As software and hardware developments continue, new

solutions will emerge to allow considerably more data to be

collected, analysed and acted on.

The greatest areas of uncertainty are the

speed with which big data can be usefully and securely utilised by organisations. Opportunities for commercial organisations and governments to “know” their

customers better will increase. These

customers may object to the collection of so much data.

(4)
(5)
(6)
(7)

… in the context of organisations’ big data operations …

1. Input data from

multiple sources

• public domain

• market data

• social media

• personal data

• confidential data

• licensed data

• government data

• employee data

self-generated and

derived data

(8)

… in the context of organisations’ big data operations …

2. Processing operations

• third party applications

• ‘secret sauce’ algo

• pan enterprise search

• ‘one view’ of information

• data ‘re-purposing’

1. Input data from

multiple sources

• public domain

• market data

• social media

• personal data

• confidential data

• licensed data

• government data

• employee data

self-generated and

derived data

(9)

… in the context of organisations’ big data operations …

2. Processing operations

• third paty applications

• ‘secret sauce’ algo

• pan enterprise search

• ‘one view’ of information

• data ‘re-purposing’

3. Output data for

multiple purposes

- for internal use

• product development

• sales & mktg

• CRM

• management

• finance

- for external use

1. Input data from

multiple sources

• public domain

• market data

• social media

• personal data

• confidential data

• licensed data

• government data

• employee data

- self-generated and

derived data

(10)

15TH JANUARY 2014

Big Data and IP

(11)

The “Big Data” Factory

Social Media Question Regular Report

Big Data

Storage

Platform

Cloud?

Live data feed Internal structured data Third party review Third Party Data Transaction data Internal unstructured Data Algorithm Search engine database

(12)

Understanding how IP fits in with Big Data means knowing:

 what you are getting

 from where

 when

 how

 under what circumstances, and

 how you’re using it

It’s worth looking at the IP position in terms of input and output data

(13)

 Different types of data from many sources

 Is there a licence in place? Does it matter from an IP perspective?

 Not really:

– if there is no licence in place, it doesn’t mean that there are no IP rights in the

data

– even if there is a licence in place, still need to understand the IP position

– breach of licence could constitute infringement of IP, but

– the measure of damages for IP infringement is different to breach of contract

 Licence might include indemnity against losses incurred as a result of

infringement of third party IP

– both from licensor to licensee and licensee to licensor

(14)

 Database Right

 Database Copyright

 Literary Copyright

 Confidence

(15)

Is there a qualifying database?

 “a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means” (Art 1(2) DD)

Does the right subsist?

 “qualitatively and/or quantitatively a substantial investment in either the obtaining, verification or presentation of the contents ..” (Art 7(1) DD)

– who created the data? – remember BHB v William Hill – the ‘investment in obtaining, verifying or presenting’ protects ‘resources used to seek out independent material’

– therefore, resources used for the creation of the data are not protected – contrast with recent case – Football DataCo v Sportradar

– data collected and recorded at a live events - the compiler of this information had little control over it and it was therefore not ‘created’ by that person but merely ‘obtained’ by them

(16)

Would use of the data constitute an infringement?

 Extraction or reutilisation of a substantial part (quantitatively or qualitatively); or  Repeated extraction or reutilisation of insubstantial parts

Substantial part:

 ‘quantitative’ evaluation - proportion of the volume of data lifted in relation to the total volume of the contents of the database

 ‘qualitative’ assessment – small part of the database which requires significant human, technical or financial investment, may amount to a substantial part evaluated qualitatively

– look at the ‘scale of investment’ in obtaining, verifying or presenting the part taken  Accordingly, you need to look at

– how much data is being taken – how often

– how important it is

(17)

Is there a qualifying database?

 Same definition as for database right

Does the right subsist?

 Protection if, “by reason of the selection or arrangement of the contents of the database the database constitutes the author’s own intellectual creation” (section 3A CPDA)

– what exactly is the ‘author’s own intellectual creation’?

– Football DataCo v Brittens Pools/Yahoo = an original expression of the creative freedom of the author (which is a matter for the national court to determine)

 Accordingly:

– concept of ‘selection and arrangement’ does not extend to the creation of the data contained in the database

– must be some creative ability in an original manner by making free and creative choices as to selection/arrangement

(18)

Is there a copyright work?

 Original creation?  Literary merit?

 Skill, labour & judgment - no longer relevant

– the test is whether it constitutes the author’s own intellectual creation (Infopaq)  Infringement:

– would taking the data constitute an an infringing act?

– has a substantial part been taken, whether on a quantatitive or qualitative basis? – quality is more important

(19)

 Are there obligations of confidence in place?

 Is the data stated to be confidential?

 Website terms?

 Circumstances under which reasonable person would understand it to be confidential?

(20)

Is IP being created in the output process?

 What are you doing with the data?

– manipulation of the data in a new/different way – incorporation with other data sets

– creation of master database – producing reports

 Is there any:

– investment in ‘obtaining’, ‘verifying’ or ‘presenting’? – intellectual creativity in ‘selection and arrangement’?

(21)

15TH JANUARY 2014

Navigating the gap

Contracting for “Big

Data” content

(22)

 Licensing requirements

 Input data – standard contracts  FS “market data” contract terms  Output data - key terms

 Conclusion

_22

(23)

The “Big Data” Factory

_23 Social Media Question Regular Report

Big Data

Storage

Platform

Cloud?

Live data feed Internal structured data Social Media Data Public Data Transaction data Internal unstructured Data Algorithm Search engine database

(24)

Ensure that all input data can be used for all “Big Data”

purposes – ideally (i) assignment; or (ii) a broad

worldwide, irrevocable, perpetual, licence to do anything..

Assuming this is not always possible;

–What will the data be used for? By whom? Internal v

external?

–Will the data be relied upon for anything?

_24

(25)

Licensing requirements – Input data

_25

Plans

Change

(26)

 Even if ‘no’ licence applies – access can be made subject to a licence at any time – once data obtains value this tends to happen

 Many standard licences prohibit such use: – General website terms and conditions

“You are not permitted (except where you have been given express permission to do so) to adapt or modify the Information on this Website or any part of it and the Information or any part of it may not be copied, reproduced, republished, downloaded, stored, databased, posted, broadcast or transmitted in any other way to any third parties for commercial gain.”

– They can be amended at any time..

_26

(27)

Input data - standard contracts - social media

_27

Contracts – your use of data

Terms include…

• Only request data that you need for your application

• Must not include data in any search engine or directory without FB consent • Cannot include user data in an advertising creative EVEN if user consents • Cannot transfer data to advertising network

• Cannot sell data

• FB can force you to delete data if your use is “inconsistent with user expectations”

Terms include…

• Not to store on non-public user profile data or content

• Cannot use the Twitter API to aggregate geographic location information contained in Twitter Content

• May not use Twitter Content or other data collected from end users to create or maintain a separate status update or social network database or service

• Don’t sell access to the Twitter API or Twitter Content T’s consent

Terms include…

• They say very little… for now

• Reflects the limited data flow presently available on this platform

• Includes general restriction on use of “Pinterest Content” – you cannot use, modify, reproduce, distribute, sell, license, or otherwise exploit it without Pinterest’s permission.

(28)

 Financial Services

“Market Data” = a mature data licensing market –

licensor maximises control of data and income from downstream use

“Historic” data v “Live” data

“Internal” licence v External “Distribution” licence and charge

 Licensor controls any directly competing redistribution and requires direct

licence or identikit sub-licence

 Use for trading on a platform/analysing v creating separate tradeable product

 Do not generally enable broad

‘Big Data’ use per se but new licences and

charging mechanisms developing

_28

(29)

 Ideally = flat fee or easily calculable fee for “Big Data” use

 Key issue for Licensor is that

“Big Data” does not = substitute for

original data

 Typical charges based upon individual traceable licensed users

“charges per user”

–What if upstream data only forms a small part of “Big Data” query?

–What if the enquiry is a “one-off”?

–Can use be tracked?

_29

Input data – financial services market data – licence and

charges

(30)

 Derived data:

“data of any kind containing Data or any part of it and/or resulting directly or

indirectly from the manipulation or analysis of Data (whether generated by

human or machine) whether alone or in conjunction with other data regardless

of whether or not the Data is in any way identifiable from or within such data by

any means;

 Standard positions:

– prohibited distribution of derived data

– permitted only if no part of original data shown or backwards calculable

– owned by original licensor and report/track licence and charge data use

_30

(31)

 How much can “Big Data” be relied upon?

 Warranties as to accuracy

“as is” “as available” “not to be relied

upon”

–data itself perhaps not verified if from third parties

–what if created by licensor

–what if derived by calculation by licensor – can warrant

calculation accurate

–no remedy/reasonableness

_31

(32)

“display the [Trade Marks] at all times in accordance with the

Permitted Distribution Policy solely in connection with the grant of

licence in clause [ ].”

 practical or possible?

 will data be individually discernible?

_32

(33)

“The Licensee shall permit [ ] to audit and inspect:

– the Licensee's accounts, records and other information and permit it to take

copies or extracts and on demand supply copies to [ ] of such information;

– any information in the Licensee's control that relate to any Subscriber;

– access to and monitor the use of the Licensee's system used to distribute the

Data, in order to verify that the use of the Data by the Licensee is in

accordance with this Agreement and that Charges due under this Agreement

have been calculated and paid correctly.

 Is this possible/practical/desirable?

 Confidentiality clauses relevant to other data - insert explicit licence for third

party auditors - but some of these will be competitors.

 Alternative and lesser obligations

_33

(34)

 Often delete/purge obligations

“…the Licensee must stop using the Data and the Trade Marks and

purge its systems of all Data”

 Is this possible?

 Also:

–what about reports already sent out?

–record keeping for regulatory requirements?

–record keeping to enable protection of claims?

_34

(35)

Output Data - Metered Access – key terms

 Reflect upstream data input obligations accurately

 Avoid sole reliance upon IP rights

– “Contract is king” - Etherton J in At the races v

BHB (2005)

 The ideal is to create metered access to data subject to contract at each stage:

– Impose clear rights explicitly drafted as obligations into a contract

– Licence Clauses - set out only what may be done in detail – reserve all other rights

– Contract neutral and able to be flexible adapt where possible by reference to other

documents that can change

 Post Termination Rights – do not presume the licence will be implied to terminate Regina

Glass Fibre v Schuller [1972] FSR 141 – “if the Licensee will not be able to enjoy the benefit of

(36)

 “Big Data” requires significant legal input and a clear legal methodology and lead to ensure compliance and to maximise value

 Back to front – understanding outputs and system capability critical before negotiating input agreements

 Input data licences are likely to need careful contract review and negotiation  Output data licences will need to be carefully drafted, flexible and updated

 If complex licensed rights are asserted to data within the “Big Data Factory” system capabilities will be needed to ensure “output” compliance: (i) data tracking; (ii) limit access to or use of data; (iii) attach terms/attributions to data; and (iv) delete/remove data – how quickly/efficiently

 Data strategy, policies, process and data management framework – technological solution

_36

(37)

15TH JANUARY 2014 Richard Kemp

Big data

regulatory aspects

information management

CCF

(38)
(39)

 Data Protection

– UK’s ICO has powers to fine up to £500,000

– current progress of draft Data Protection Regulation

– LIBE European Parliament Committee compromise amendments approved on

21 October 2013

– At the moment, fines of upto the greater of €100m or 5% ww turnover if greater

– Likely in force date 2015

– Most Enterprise and many SME orgs have in place formal data protection

compliance policies & processes

– can be used as the basis for ‘big data’ compliance?

Different legal areas where regulatory duties around data arise

1 - generic

(40)

Financial services

 MiFID equity trading rules – pre- & post- trade data – transaction reporting

 Market data

– source/exchange rules

 MiFID II – will extend to other asset classes  Market Abuse, Capital Adequacy directive

requirements

– alleged LIBOR, forex market manipulation

Air Transport Industry (ATI)

 Passenger Name Record (PNR) data  Fares data

– GDS (Amadeus, Worldspan, etc) – Airline websites

 Mobile check in, etc

Different legal areas where regulatory duties around data arise

2 – sector specific

(41)

Professional services

 e.g. legal services

 regulatory client confidentiality rules  rules on conflicts of interest

 privilege rules

– litigation privilege

– legal professional privilege

Healthcare

 clinical outcomes data

– aggregated and anonymised?

 sensitive personal data

Different legal areas where regulatory duties around data arise

2 – sector specific

(42)

 Articles 101 and 102 EU Treaty (Chapters I & II Competition Act 1998)

– Competition authorities becoming more vigilant around commercial practices for

supplying and licensing data

– Article 101/Chapter I concerned with anti-competitive agreements

– Article 102/Chapter II concerned with abusive conduct by market powerful orgs

– Two cases around securities identifiers the financial services area

– S&P & CUSIPs, Thomson Reuters and RICs

– Markit & CDSs (Credit Default Swaps)

Different legal areas where regulatory duties around data arise

3 – competition law

(43)
(44)

 step 1: risk assessment

– structured process to review/assess/report/remediate

– involve all parts of the business

– establish all the types of data your organisation is using & their sources

– where does the data come from? what consents were obtained/are needed?

– what legal wrappers apply to all this data – IPR, contract, regulatory, etc

– what processes do these data undergo?

– what does your organisation use these data for?

– data protection environment is reasonably mature – use this as a start point?

(45)

 step 2: strategy statement

– the start point that everything is referable back to

– high level statement of organisation’s goals relating to big data

– list stakeholders - top management and all parts of the business bought in

– rationale, scope, governance, etc

(46)

 step 3: policy statement

– next level down

– people context: stakeholder groups, their interests & how they are achieved

– steering group, working party, compliance officers, etc

– project plan – scope, responsibilities, timelines, etc

– state tools to be used (mix of IT/system measure & processes & procedures

– approvals, etc

(47)

 step 4: processes and procedures

– applicable to all staff – tie in to HR policies, etc

– proportionate processes & procedures to be followed

– IT system/measures & how they’re to be used

– awareness training

(48)

Our People

Paul Hinton Commercial Partner 020 7710 1623 [email protected] _48 Jeremy Harris Commercial Partner 020 7710 1658 [email protected] Richard Kemp Senior Partner 020 7710 1610 [email protected]

(49)

(50)

If you would like to attend either of the following events, please contact

[email protected]

:

– 5

th

February – HR Forum – Tweaking your business for success in 2014

– 2

nd

April - There’s data…and then there’s big data (re-run)

We will also be running events on the following topics over the next couple

of months – we will contact you with further information:

– Cloud

– Data Protection

(51)

Kemp Little LLP is a limited liability partnership registered in England and Wales (registered number: OC300242) and is authorised and regulated by the Solicitors Regulation Authority. Its registered office is Cheapside House, 138 Cheapside, London EC2V 6BJ. A list of members is open to inspection at the registered office.

KEMP LITTLE Cheapside House 138 Cheapside London EC2V 6BJ TEL +44 (0) 20 7600 8080 FAX +44 (0) 20 7600 7878 — kemplittle.com

Contact info

_51

References

Related documents

Big data is collection of complex, large varieties of data set – structured, unstructured and semi structured data that is arriving continuously at high velocity

Big Data is enabling the combination of unstructured and structured data Big Data initiatives are leading to analytical insights Big Data provides a cost-effective mechanism to

Unstructured data may have its own internal structure, but does not conform into a spreadsheet or database such a data processed to be stored as big data that helps in

Sweet Spot Unstructured and Semi- structured data loading, storage, and query processing. Unstructured and Semi- structured data loading, storage, search,

Unstructured Data (HDFS) Real Time.

Figure 1 - Conceptual Architecture Structured Data Semi- Structured Data Non- Structure d Data Data Sources Big Data Database Data Staging Area ETL Data Mining and

The connection of Big data with IoT starts with the Big data capability to organize and analyze different types of data, structured, semi structured or unstructured,

Keywords – Big Data, Human generated, Machine generated, Social media, Structured data, Unstructured