• No results found

Review & AI Lessons learned while using Artificial Intelligence April 2013

N/A
N/A
Protected

Academic year: 2021

Share "Review & AI Lessons learned while using Artificial Intelligence April 2013"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Review & AI

Lessons learned while using

Artificial Intelligence

April 2013

www.pwc.nl

(2)
(3)

PwC

Introduction

Relative costs of producing electronic documents

Collection 8% Processing 19% Review 73% Internal 4% Vendors 26% Outside counsel 70%

Source: Where the money goes: understanding litigant expenditures for producing electronic discovery / Nicholas M. Pace, Laura Zakaras.

Slide 3 April 2013 Symposium eDiscovery HvA

(4)

Basic keyword search Boolean searching Pattern matching Clustering (un-supervised machine learning) Categorisation (supervised machine learning) Sophist ic a tio n & ef fic ienc y of approach

Review methodologies

(5)

PwC

Review methodologies

Clustering

Slide 5 April 2013 Symposium eDiscovery HvA

(6)

Da Silva Moore v. Publicis Groupe, et al. (February 2012)

Don’t worry about being the guinea pig

‘Computer-assisted review now can be considered judicially-approved for use in appropriate cases.’

Work smarter

‘Computer-assisted review appears to be better than the available alternatives, and thus should be used in appropriate cases.’

Go Faster

‘Computer-assisted review…should be seriously considered for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review.’

(7)

PwC

Da Silva Moore v. Publicis Groupe, et al. (February 2012)

Don’t focus on the black box

‘The idea is not to make [computer-assisted review] perfect, it’s not going to be perfect.’

‘I may be less interested in the science behind the ‘black box’ of the

vendor’s software than in whether it produced responsive documents.’ ‘Proof of a valid ‘process’, including quality control testing, also will be important.’

Judicially approved

Slide 7 April 2013 Symposium eDiscovery HvA

(8)

Computer assisted review

(9)

PwC

Computer assisted review

Categorisation example

Slide 9 April 2013 Symposium eDiscovery HvA

(10)

Computer assisted review

(11)

PwC

Computer assisted review

Categorisation example

Slide 11 April 2013 Symposium eDiscovery HvA

(12)

Computer assisted review

Components of assisted review systems

Domain expert Analytics engine Statistical validation

(13)

PwC

Workflow

Document universe Train the ‘computer’ Categorise document universe Manually review categorised documents Validate results Slide 13 April 2013 Symposium eDiscovery HvA

(14)

How do we address the following ‘concerns’:

• How are the results?

• Training of the ‘computer’

• Do we need to continue training?

• Is the ‘expert’ training the system well? • Are the training docs representative?

• How many documents will we need for training? • Review of documents

• Which documents should be submitted for review? • How do we verify the results?

(15)

PwC

Computer assisted review

Categorisation example

Positive True True Negative False Positive False Negative Slide 15 April 2013 Symposium eDiscovery HvA

(16)

• Accuracy • Recall • Precision • F-measure

Workflow

(17)

PwC • Accuracy • Recall • Precision • F-measure

Workflow

How are the results?

Relevant Not Relevant Total

Relevant 0 0 0 Not Relevant 1 99 100 Total 1 99 100 Categorisation result Review result ‘The truth’ Slide 17 April 2013 Symposium eDiscovery HvA

(18)

• Accuracy = 99% • Recall = 0%

• Precision

• F-measure

Workflow

How are the results?

Relevant Not Relevant Total

Relevant 0 0 0 Not Relevant 1 99 100 Categorisation result Review result ‘The truth’

(19)

PwC • Accuracy • Recall • Precision • F-measure

Workflow

How are the results?

Relevant Not Relevant Total

Relevant 1 3 4 Not Relevant 4 92 96 Total 5 95 100 Categorisation result Review result ‘The truth’ Slide 19 April 2013 Symposium eDiscovery HvA

(20)

• Accuracy = 93% • Recall = 20% • Precision = 25%

• F-measure

Workflow

How are the results?

Relevant Not Relevant Total

Relevant 1 3 4 Not Relevant 4 92 96 Categorisation result Review result ‘The truth’

(21)

PwC

• Accuracy

• Recall • Precision • F-measure:

• Calculated as the harmonic mean of recall and precision • F = 2(P*R)/(P+R)

Workflow

How are the results?

Slide 21 April 2013 Symposium eDiscovery HvA

(22)

Workflow

Document universe Train the ‘computer’ Categorise document Manually review categorised Validate results

(23)

PwC

How do we address the following ‘concerns’:

• How are the results?

• Training of the ‘computer’

• Do we need to continue training?

• Is the ‘expert’ training the system well? • Are the training docs representative?

• How many documents will we need for training? • Review of documents

• Which documents should be submitted for review? • How do we verify the results?

Workflow

Slide 23 April 2013 Symposium eDiscovery HvA

(24)

Do we need to continue training?

• Intuition?

• Objective training optimisation criterion

Workflow

(25)

PwC

Which docs should be reviewed?

Workflow

Manually review categorised documents

Slide 25 April 2013 Symposium eDiscovery HvA

(26)

Which docs should be reviewed?

Workflow

(27)

PwC

How do we verify the results?

Quality assurance provides transparent verification of the generated results and is a key component of the computer assisted review process.

Quality assurance:

• A random sample of the ‘not review’ docs

• Size of sample based on level of statistical confidence • Review of sample set by attorney

• Calculation of recall and precision within ‘not review’ docs • Attorney can confirm or modify cut off point

Workflow

Manually review categorised documents

Slide 27 April 2013 Symposium eDiscovery HvA

(28)

Workflow

Document universe Train the ‘computer’ Categorise document Manually review categorised Validate results

(29)

PwC

Large Second Request

• Chose not to review 1.3m docs based on 95% confidence level

• It took only 4 rounds of human review to stabilize at a 95% confidence level

• Stats were also used to evaluate human review; categorization was found to be 4x’s as accurate as the human review team

Bankruptcy Case

• 2m docs categorized after review of 1,500 docs

Real life examples

Slide 29 April 2013 Symposium eDiscovery HvA

(30)

Civil Litigation

• 200k docs reviewed and produced in 2 days

Corruption Investigation

• Used on a subset of ~53k docs

• 97% of the docs were coded after review of the first sample, which was ~3% of the population

• After the first sample, humans agreed with categorization 87% of the time

• 91% of the overturned documents were exact dupes or 90% similar

(31)

Creating relationships that create value.

© 2012 PwC. All rights reserved. Not for further distribution without the permission of PwC. "PwC" refers to the network of member firms of PricewaterhouseCoopers International Limited (PwCIL), or, as the context requires, individual member firms of the PwC network. Each member firm is a separate legal entity and does not act as agent of PwCIL or any other member firm. PwCIL does not provide any services to clients. PwCIL is not responsible or liable for the acts or omissions of any of its member firms nor can it control the exercise of their professional judgment or bind them in any way. No member firm is responsible or liable for the acts or omissions of any other member firm nor can it control the exercise of another member firm's professional judgment or bind another member firm or PwCIL in any way.

References

Related documents

This paper examines neoliberalisation via an analysis of the relationship between local animal disease practices and the introduction of partnership forms of

His major research interest is on regional monetary integration and cooperation and his recent publication includes Regional Integration: Europe and Asia Compared (Ashgate

 Request for Parish Council to consider upgrading a cycleway/footway between Normanton and Ketton – although the Parish Council agreed that it is dangerous for cyclists using

Document Title: Policy Information Management Issue Date: October 2013.. Document Status: Approved IGC 23 Oct 2013 Review Date:

On June 21, 2013, Employer filed a Modification/Review Petition alleging that as of April 18, 2013, it was entitled to a decreased reimbursement rate on future compensation

Based on interviews with nine married women who have or had TB, four husbands, and two mothers-in-law, this arti- cle highlights that the ways in which TB impedes on the sexual

White and Genesee argue that the existence of adult learners of a language whose competence is indistinguishable from that of native speakers proves that adults have access to

KWWSZZZSHUGLVFRFRPHOPVTVDPKWPOTVDPDVS[  Your progress &RPSOHWHG  DSSUR[LPDWHO\ 5HPDLQLQJSDJHVZLOOWDNH