• No results found

Recent Developments in the Law & Technology Relating to Predictive Coding

N/A
N/A
Protected

Academic year: 2021

Share "Recent Developments in the Law & Technology Relating to Predictive Coding"

Copied!
39
0
0

Loading.... (view fulltext now)

Full text

(1)

© 2012 DOAR Litigation Consulting | www.DOAR.com

Presented by

Gene Klimov

VP & Managing Director

Recent Developments in the

Law & Technology Relating to

Predictive Coding

Presented by

Gerard Britton

Managing Director Presented by

Paul Neale

CEO

(2)

• Proper Timing & Implementation of Litigation Hold

– Identification of sources of ESI

– Identification of custodians

– Implementation of preservation protocols

– Designation of point person

– Signed acknowledgements by custodians

• Rule 26 Conference

– Scheduling and Planning

– Initial Disclosures

– Meet-and-Confer

– Privilege and Work-Product Protection

– Form of Production

(3)

© 2012 DOAR Litigation Consulting | www.DOAR.com

• Disclosure of sources of ESI

• Indication of ‘Not reasonably accessible sources’

– Disaster Recovery Systems (Backup tapes/Archives)

– Legacy Systems

– Transient or Ephemeral data

• Proprietary systems and/or atypical ESI

• Culling methodologies

– Keyword search terms, Advanced Analytics and/or Predictive Coding

• Protective order, clawback agreement & 502 protection

• Form of production

(4)

Predictive Coding (aka TAR, CAR, CBAA) is the use of

technology to assist in the categorization of documents

to reduce the time and cost associated with document

review and production.

What is Predictive Coding?

TAR – Technology Assisted Review

CAR – Computer Assisted Review

(5)

© 2012 | Predictive Coding

An iterative process that is designed to provide a sample set

of documents to the human reviewer(s) that is most

knowledgeable about the subject matter so that predictive

coding software can learn from the reviewer and replicate the

reviewer’s judgment across the entire document population.

Predictive Coding Process

Sample Review Train

Validate

Threshold Met?

Yes No

(6)
(7)

© 2012 | Predictive Coding

Traditional Culling

Date Filter Keywords Responsive False Hits Missed Hits Total Data

(8)

Traditional Review

Contract Attorney Contract Attorney Contract Attorney Contract Attorney

(9)

© 2012 | Predictive Coding

Training the System

Senior Attorney

(10)

Training the System

Privileged Responsive Irrelevant Train Senior Attorney

(11)

© 2012 | Predictive Coding

Training the System

Senior Attorney

(12)

Training the System

Privileged Responsive Irrelevant Train Senior Attorney

(13)

© 2012 | Predictive Coding

Training the System

Senior Attorney

(14)

Training the System – n

th

Iteration

Accuracy

Threshold Equilibrium is reached

when the system predicts the same coding for

documents as the human reviewer X% of the time. X = Preset Threshold

1st 2nd 3rd 4th 5th . . . nth

(15)

© 2012 | Predictive Coding

Yield Calculation

Initial random sample that is reviewed to determine a

baseline for responsiveness. This is used to compare to

overall system performance at the end of the process.

Industry Terminology

Seed Sets

Exemplar documents known to be responsive that are

used to train the system.

(16)

Iterative Training

Random – Completely random selection of

documents from the entire universe.

(NOTE: Data normalization is important)

Suggested – System derived document groups

based on concepts and similar

documents based on seed sets and

previous iterations.

(17)

© 2012 | Predictive Coding

Sample Calculation

1

Training the System

Sample Size =

Z

2

• p (1 - p)

c

2

Z = Z value (e.g. 1.96 for 95% confidence level)

p = percentage picking a choice, expressed as decimal (.5 used for sample size needed)

c = confidence interval (margin of error), expressed as decimal (e.g., .04 = ±4)

(18)

0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 10,000 50,000 250,000 500,000 1,000,000 2,000,000 5,000,000 10,000,000

Sample Size v. Population

Training the System

Population Sample Size 95% Accuracy ± 2% 99% Accuracy ± 2% 99% Accuracy ± 1%

(19)

© 2012 | Predictive Coding

Measuring Accuracy

The elements of a method’s level of accuracy in the

Information Retrieval context are Recall and Precision.

Recall =

Number of actually responsive documents retrieved

Number of responsive documents overall

Out of the total number of responsive documents that

exist in the population, how many did the process

(20)

Measuring Accuracy

The elements of a method’s level of accuracy in the

Information Retrieval context are Recall and Precision.

Precision =

Number of responsive documents retrieved

Number of documents retrieved

How accurate was the process in classifying

responsiveness in what was retrieved?

(21)

© 2012 | Predictive Coding

Measuring Accuracy

F-Measure (or F

1

Score)

The weighted harmonic mean of precision and recall, the

traditional F-measure or balanced F-score.

F

1

Score

= 2 •

Precision • Recall

Precision + Recall

A type of average that is calculated to determine how

well a specific content retrieval method performs.

(22)

Calculations

Measuring Accuracy

Example:

1,000 Documents 900 Irrelevant 100 Relevant System Retrieved Relevant: 100 Irrelevant: 0 Recall = 100 100 = 1.0 Precision = 100 100 = 1.0 F1 Score = 2 • 1.0 • 1.0 1.0 + 1.0 = 1.0

(23)

© 2012 | Predictive Coding

Calculations

Measuring Accuracy

Example:

1,000 Documents 900 Irrelevant 100 Relevant System Retrieved Relevant: 100 Irrelevant: 50 Recall = 100 100 = 1.0 Precision = 100 150 = 0.667 F1 Score = 2 • 0.667 • 1.0 0.667 + 1.0 = 0.4

(24)

Calculations

Measuring Accuracy

Example:

1,000 Documents 900 Irrelevant 100 Relevant System Retrieved Relevant: 40 Irrelevant: 10 Recall = 40 100 = 0.4 Precision = 40 50 = 0.8 F1 Score = 2 • 0.8 • 0.4 0.8 + 0.4 = 0.46

(25)

© 2012 | Predictive Coding

Measuring Accuracy

Larger gap indicates lower Recall Smaller gap indicates higher Precision

Relevant Documents Incorrectly Identified as “Irrelevant”

Relevant Documents Correctly Identified as “Relevant”

Irrelevant Documents Incorrectly Identified as “Relevant”

Irrelevant Documents Correctly Identified as “Irrelevant”

Total Corpus

Responsive

(26)

Validation

Senior Attorney Privileged Responsive Irrelevant Random Sample

(27)

© 2012 | Predictive Coding

Validation

Senior Attorney Privileged Responsive Irrelevant Random Sample

Recall & Precision

Calculation of both Precision and Recall from a statistically significant random sample of all documents.

Want to have High

Precision and

Reasonable Recall

(28)

Validation

Senior Attorney Privileged Responsive Irrelevant Random Sample

(29)

© 2012 | Predictive Coding

Validation

Senior Attorney Privileged Responsive Irrelevant Random Sample Elusion

A zero acceptance test would require that NO responsive documents be found in the Irrelevant sample.

(30)

System Validation

The final analysis of the result set predicted to be both responsive &

non-responsive through measurement of a sampled set to ensure

that the process meets the criteria for accuracy.

Recall & Precision

The calculated average between Recall & Precision that indicates that the vast majority of responsive documents were located accurately Elusion

The percentage of the rejected documents that are actually responsive. This would allow both parties to choose what is an acceptable level. Z-test

The comparison of responsive documents across a randomly sampled set at the start of the process compared to another randomly sampled set at the end of the process.

(31)

© 2012 | Predictive Coding

• Rapidly evolving segment of the electronic discovery

market

• Gaining momentum in law firms and corporations

• Fragmented market of providers

• On the edge of judicial approval

(32)

DA SILVA MOORE, et al. v. PUBLICIS GROUPE PLC, et al. SDNY

CASE NO. 11-CV-1279

• Joint ESI protocol filed by the parties (with objections)

• First opinion approving defendants’ use of predictive

coding issued by M. Judge Peck

• Objections filed by plaintiffs regarding the lack of

measurability and inability to validate the process

(33)

© 2012 | Predictive Coding

KLEEN PRODUCTS LLC, et al. v. PACKAGING CORPORATION

OF AMERICA, et al., ND ILL. CASE NO. 1:10-CV-5711

• Plaintiffs attempting to force defendants to use CBAA

• Multiple defendants with multiple ESI plans

• Hearing before M. Judge Nolan held on February 21,

2012 focusing on measurability and validation

• August 21, 2012 - Plaintiffs withdrew their demand

(34)

GLOBAL AEROSPACE INC., et al., v. LANDOW AVIATION, L.P., et al.

(Nos. CL 61040, CL 61991, CL 64475, CL 63795, CL 63190, CL 63575, CL 61909, CL 61712, 71633)

• Plaintiffs submitted motion for protective order to disallow the

Defendant’s use of Predictive Coding

• Defendant’s protocol contained transparency and referenced

precision, recall and steps for validation and measuring quality.

• Judge James H. Chamblin issued order approving the

Defendant’s use of Predictive Coding

• Case settled shortly thereafter

(35)

© 2012 | Predictive Coding

Actos (Pioglitazone) Products Liability Litigation, United States

District Court for the Western District of Louisiana, MDL No. 2299

• Order issued embodying agreed upon protocol for a “Search

Methodology Proof of Concept” process to test predictive coding

output:

– Random sample seed set

– Required met and confer conferences to resolve issue

– Iterative Process

– Elusion (“Test the Rest”) Validation

• Order issued

– Case Management Order issued; includes predictive coding protocol

(36)

Federal Housing Finance Agency v. SG Americas, et al. (S.D.N.Y)

• JPMC and Plaintiffs Unable to Agree on Use of Predictive Coding • Defendants Approached J. Cote Regarding Use of Predictive Coding

(7/20/12)

• Draft Protocol Submitted by JPMC Proposing Use of PC After Search Terms • Plaintiff Demanded Review of ALL Documents Predicted to be

Non-Responsive

• Court Ordered All Parties to Meet and Confer to Attempt Agreement

• Parties Have Agreed to Seed Set and Continue to Negotiate PC Process

Seminal Cases

(37)

© 2012 | Predictive Coding

TSI, Incorporated v. Azbil BioVigilant, Inc. (D.C. AZ)

• Judge David Campbell Orders Party to Consider Predictive Coding

• Parties File Joint Submission (8/24/12)

• Heavy Reliance on Da Silva Moore

• Scope of ESI Too Small Due to Court-Imposed Discovery Limits

• Cost & Time Too Much with Predictive Coding

• Use of Search Terms More Effective

(38)

• Lack of a definable standard

• Inconsistency of methodologies & technologies

• Use requires new levels of transparency by producing party

• Validation of results can only occur once costs are sunk

• Darwinian effect is inevitable

• Focus on validation of precision & recall

(39)

References

Related documents

If we are to increase the number of nurses in specialized roles (nurse practitioners, nurse anaesthetists), by how much must we increase nursing enrolments to accommodate these

A current carrying wire carries a current of 2A, which is out of the page, another wire carrying current of 4A in same direction lies parallel to the first, as shown in figure..

There is no danger to the life of your husband as Jupiter holds the lordship of your Mangalyasthana (8 th house) and the 7 th lord Saturn is posited in his own star. Rahu

The objective of this communication is to show as to how farmer participatory soil analysis can be used to diagnose and monitor soil health issues, increase soil

The aims of the study were a) to compare the prevalence of difficulties initiating sleep (DIS) and difficulties maintaining sleep (DMS) in untreated OSA patients vs. controls; b)

In PD, the most common sleep disorders include insomnia [difficulty initiating sleep and its associated restless legs syndrome (RLS), as a rea- son for the difficulty of falling

We averaged the variables based on actig- raphy and the sleep diary from the following periods: (1) days off: from bedtime to bedtime prior to and following all days off that

47 A further observational study of cancer patients treated in an outpatient setting also reported significant improvements in pain scores with OXN PR, although no