Review & AI
Lessons learned while using
Artificial Intelligence
April 2013
www.pwc.nlPwC
Introduction
Relative costs of producing electronic documents
Collection 8% Processing 19% Review 73% Internal 4% Vendors 26% Outside counsel 70%
Source: Where the money goes: understanding litigant expenditures for producing electronic discovery / Nicholas M. Pace, Laura Zakaras.
Slide 3 April 2013 Symposium eDiscovery HvA
Basic keyword search Boolean searching Pattern matching Clustering (un-supervised machine learning) Categorisation (supervised machine learning) Sophist ic a tio n & ef fic ienc y of approach
Review methodologies
PwC
Review methodologies
Clustering
Slide 5 April 2013 Symposium eDiscovery HvA
Da Silva Moore v. Publicis Groupe, et al. (February 2012)
• Don’t worry about being the guinea pig
‘Computer-assisted review now can be considered judicially-approved for use in appropriate cases.’
• Work smarter
‘Computer-assisted review appears to be better than the available alternatives, and thus should be used in appropriate cases.’
• Go Faster
‘Computer-assisted review…should be seriously considered for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review.’
PwC
Da Silva Moore v. Publicis Groupe, et al. (February 2012)
• Don’t focus on the black box
‘The idea is not to make [computer-assisted review] perfect, it’s not going to be perfect.’
‘I may be less interested in the science behind the ‘black box’ of the
vendor’s software than in whether it produced responsive documents.’ ‘Proof of a valid ‘process’, including quality control testing, also will be important.’
Judicially approved
Slide 7 April 2013 Symposium eDiscovery HvA
Computer assisted review
PwC
Computer assisted review
Categorisation example
Slide 9 April 2013 Symposium eDiscovery HvA
Computer assisted review
PwC
Computer assisted review
Categorisation example
Slide 11 April 2013 Symposium eDiscovery HvA
Computer assisted review
Components of assisted review systems
Domain expert Analytics engine Statistical validation
PwC
Workflow
Document universe Train the ‘computer’ Categorise document universe Manually review categorised documents Validate results Slide 13 April 2013 Symposium eDiscovery HvAHow do we address the following ‘concerns’:
• How are the results?
• Training of the ‘computer’
• Do we need to continue training?
• Is the ‘expert’ training the system well? • Are the training docs representative?
• How many documents will we need for training? • Review of documents
• Which documents should be submitted for review? • How do we verify the results?
PwC
Computer assisted review
Categorisation example
Positive True True Negative False Positive False Negative Slide 15 April 2013 Symposium eDiscovery HvA
• Accuracy • Recall • Precision • F-measure
Workflow
PwC • Accuracy • Recall • Precision • F-measure
Workflow
How are the results?
Relevant Not Relevant Total
Relevant 0 0 0 Not Relevant 1 99 100 Total 1 99 100 Categorisation result Review result ‘The truth’ Slide 17 April 2013 Symposium eDiscovery HvA
• Accuracy = 99% • Recall = 0%
• Precision
• F-measure
Workflow
How are the results?
Relevant Not Relevant Total
Relevant 0 0 0 Not Relevant 1 99 100 Categorisation result Review result ‘The truth’
PwC • Accuracy • Recall • Precision • F-measure
Workflow
How are the results?
Relevant Not Relevant Total
Relevant 1 3 4 Not Relevant 4 92 96 Total 5 95 100 Categorisation result Review result ‘The truth’ Slide 19 April 2013 Symposium eDiscovery HvA
• Accuracy = 93% • Recall = 20% • Precision = 25%
• F-measure
Workflow
How are the results?
Relevant Not Relevant Total
Relevant 1 3 4 Not Relevant 4 92 96 Categorisation result Review result ‘The truth’
PwC
• Accuracy
• Recall • Precision • F-measure:
• Calculated as the harmonic mean of recall and precision • F = 2(P*R)/(P+R)
Workflow
How are the results?
Slide 21 April 2013 Symposium eDiscovery HvA
Workflow
Document universe Train the ‘computer’ Categorise document Manually review categorised Validate resultsPwC
How do we address the following ‘concerns’:
• How are the results?
• Training of the ‘computer’
• Do we need to continue training?
• Is the ‘expert’ training the system well? • Are the training docs representative?
• How many documents will we need for training? • Review of documents
• Which documents should be submitted for review? • How do we verify the results?
Workflow
Slide 23 April 2013 Symposium eDiscovery HvA
Do we need to continue training?
• Intuition?
• Objective training optimisation criterion
Workflow
PwC
Which docs should be reviewed?
Workflow
Manually review categorised documents
Slide 25 April 2013 Symposium eDiscovery HvA
Which docs should be reviewed?
Workflow
PwC
How do we verify the results?
Quality assurance provides transparent verification of the generated results and is a key component of the computer assisted review process.
Quality assurance:
• A random sample of the ‘not review’ docs
• Size of sample based on level of statistical confidence • Review of sample set by attorney
• Calculation of recall and precision within ‘not review’ docs • Attorney can confirm or modify cut off point
Workflow
Manually review categorised documents
Slide 27 April 2013 Symposium eDiscovery HvA
Workflow
Document universe Train the ‘computer’ Categorise document Manually review categorised Validate resultsPwC
Large Second Request
• Chose not to review 1.3m docs based on 95% confidence level
• It took only 4 rounds of human review to stabilize at a 95% confidence level
• Stats were also used to evaluate human review; categorization was found to be 4x’s as accurate as the human review team
Bankruptcy Case
• 2m docs categorized after review of 1,500 docs
Real life examples
Slide 29 April 2013 Symposium eDiscovery HvA
Civil Litigation
• 200k docs reviewed and produced in 2 days
Corruption Investigation
• Used on a subset of ~53k docs
• 97% of the docs were coded after review of the first sample, which was ~3% of the population
• After the first sample, humans agreed with categorization 87% of the time
• 91% of the overturned documents were exact dupes or 90% similar
Creating relationships that create value.
© 2012 PwC. All rights reserved. Not for further distribution without the permission of PwC. "PwC" refers to the network of member firms of PricewaterhouseCoopers International Limited (PwCIL), or, as the context requires, individual member firms of the PwC network. Each member firm is a separate legal entity and does not act as agent of PwCIL or any other member firm. PwCIL does not provide any services to clients. PwCIL is not responsible or liable for the acts or omissions of any of its member firms nor can it control the exercise of their professional judgment or bind them in any way. No member firm is responsible or liable for the acts or omissions of any other member firm nor can it control the exercise of another member firm's professional judgment or bind another member firm or PwCIL in any way.