The Truth About Predictive Coding: Getting Beyond The Hype

(1)

(2)

The Truth About Predictive Coding:

Getting Beyond The Hype

(3)

David R. Cohen Reed Smith LLP

Records & E-Discovery Practice Group Leader

David leads a group of more than 100 lawyers in his role as Practice Group Leader of Reed Smith’s Records & E-Discovery group. He serves as e-discovery counsel for multiple companies and also counsels clients on records management and litigation readiness issues. David has been named a “Pennsylvania Superlawyer” in litigation and is Chambers-ranked nationally and internationally in the area of e-discovery. He is a frequent author and trains judges, mediators and lawyers in e-discovery issues. He has also been a court-appointed E-Discovery Special Master in multiple cases.

(4)

Bryon Z. Bratcher Reed Smith LLP

Director of Litigation Technology Services

Bryon directs Reed Smith’s global team of 25 Litigation Technology Analysts, drawing on more than a dozen years of experience in technology services for Am Law 100 firms. He assisted with the selection and implementation, and manages the firm’s technology-assisted review tools, and in 2014 was named a winner of The

Recorder’s Law Firm Innovator award for co-developing Reed

(5)

Mark E. Harrington Guidance Software

Senior Vice President, General Counsel & Corporate Secretary

(6)

(7)

(8)

(9)

(10)

(11)

• What is Predictive Coding?

• Why Predictive Coding?

• How Accurate is Human v. Predictive Coding?

• Barriers to Use of Predictive Coding

• Case Studies

• Current “Hot” Issues in Predictive Coding

• Takeaways

Agenda

(12)

What is Predictive Coding?

• a.k.a. “TAR” a.k.a. “CAR,” a.k.a. “RAR”

• Machine learning algorithms and statistical probability tools

used to duplicate human decision making

• Software determines relevance after training by human

reviewer

• Computer identifies properties to predict future coding

• Process continues until accuracy levels reach stability

(13)

Technology-Assisted Review Reference Model

Courtesy of: EDRM.net

(14)

Workflow Overview Total Number of Documents Results from Categorization QC of 1st_Round (Statistical Sample) 2nd Round of Categorization

Seed Set for Human Review

QC of 2nd_Round

(Statistical Sample)

Validation Criteria Not Met

Training Round Overturn Report QC Round Overturn Report 10,000 Uncategorized 2,000,000 Documents 2,000 3,068 3,068 Responsive 596,400 Non Responsive 1,391,600 Responsive 635,178 Non Responsive 1,349,754

(15)

0 500 1,000 1,500 2,000 2,500 3,000 Sample Size +/- 2.0 +/- 2.5 +/- 5.0 Log. (+/- 2.0) Document Count Confidence: 95%

The Numbers Behind the Statistics

(16)

Why Predictive Coding?

• Cost savings

• Time savings

• Reduced risk of errors (?)

• Greater objectivity in classifications

• Sometimes volume of documents and/or value of case

makes human review impractical

(17)

Universe of Available Documents

Technology Assisted Review

(18)

Universe of Available Documents Relevant

Documents

(19)

Documents

Technology Assisted Review

Documents Selected

(20)

Documents

Technology Assisted Review

Documents Selected Irrelevant Documents Mistakenly Selected (Poor Precision) Relevant Documents Mistakenly Missed (Poor Recall)

(21)

Myth #1

Computer Review Will

Never Be As Accurate as

Human Review

(22)

Da Silva Moore v. Publicis Groupe & MSL Group

287 F.R.D. 182 (S.D.N.Y. 2012)

Magistrate Judge Andrew J. Peck:

“…while some lawyers still consider manual

review to be the ‘gold standard,’ that is a myth, as

statistics clearly show that computerized searches

are at least as accurate, if not more so, than

(23)

Da Silva Moore v. Publicis Groupe & MSL Group

287 F.R.D. 182 (S.D.N.Y. 2012)

• Predictive Coding Was Appropriate Because:

• Parties Agreed

• Over 3 Million Documents

• Cost Effectiveness & Proportionality

• Transparent Process Proposed

• Spawned Huge Battle Over Protocol

& Ultimate Motion to Recuse

(24)

Da Silva Moore v. Publicis Groupe & MSL Group

287 F.R.D. 182 (S.D.N.Y. 2012) District Judge Approved Judge Peck’s Proposal:

• The “ESI protocol contains standards for measuring the reliability of the process and the protocol builds in levels of participation by Plaintiffs. It provides that the search methods will be carefully crafted and tested for quality assurance, with Plaintiffs participating in their

(25)

“While this Court recognizes

that computer-assisted review is not

perfect, the Federal Rules of Civil

Procedure do not require perfection.”

Magistrate Judge Andrew Peck

(26)

How Accurate is Human Coding?

• Computer 77%, Humans 60%

• “The myth that exhaustive manual review is the most effective…

approach to document review is strongly refuted. Technology-assisted review can (and does) yield more accurate results than exhaustive

manual review, with much lower effort.”

• “Technology-assisted reviews require…human review of only 1.9% of the documents, a fifty-fold savings over exhaustive manual review.” Technology-Assisted Review in E-Discovery Can Be More

Effective and More Efficient Than Exhaustive Manual Review

,

Maura R. Grossman & Gordon V. Cormack, XVII Richmond Journal of Law and Technology 11 (2011)

(27)

How Accurate is Human Coding?

Document Categorization in Legal Electronic Discovery:

Computer Classification vs. Manual Review, Herbert L. Roitblat et

al., 61 Journal of American Society for Information Science and Technology 70 (2010)

• Performance of two computer systems was at least as

accurate (measured against the original review) as that of

human re-review

• Level of agreement

among human

reviewers: 70-75%

(28)

How Accurate is Human Coding?

Faster, better, cheaper legal document review, pipe dream or reality? Thomas I. Barnett and Svetlana Godjevac, Autonomy, Inc. (2011)

• Responsiveness rates of review

groups ranged from 23% to 54%

• Unanimity of agreement less than

half of the time

• 28,209 documents reviewed by 7 different reviewer

groups (5 document review vendors and 2 law firms)

(29)

Look– the computer did as well as the humans!

(30)

“Using search terms is so

last decade.”

- Judge Shira Sheindlin

(31)

Myth #2

(32)

• Not viable for cases with fewer than 10,000-20,000

documents requiring review

• Limited potential cost savings (e.g. not reliable for privilege)

• Risk of not getting opposing counsel agreement

• Time and expertise required to train computer

• Multiple case problem

• Unsympathetic judges/discovery masters

• Danger of losing key word filtering

(33)

Kleen Products LLC v. Packaging Corp. of Am.,

2012 WL 4498465 (N.D. Ill. Sept. 28, 2012)

• Plaintiffs requested court approval of predictive coding, defendant opposed

• Massive briefing and several days of hearings

• Plaintiff ultimately withdrew request as to current production requests

• Parties agreed to meet and confer regarding the search methodology for future production requests

(34)

Kleen Products LLC v. Packaging Corp. of Am.,

2012 WL 4498465 (N.D. Ill. Sept. 28, 2012)

STIPULATION & ORDER RELATING TO ESI SEARCH

“As to any … ESI beyond the First Request…, plaintiffs will not argue …that defendants should be required to use… “predictive coding” methodology...

“With respect to any requests for production… beyond the First Request Corpus, the parties will meet and confer regarding the

appropriate search methodology to be used for such newly collected documents. If the parties fail to agree on a search methodology,

(35)

Myth #3

(36)

Rio Tinto PLC v. Vale S.A.

14 Civ. 3042, (RMB) (AJP) (S.D.N.Y. March 2, 2015)

Magistrate Judge Andrew Peck, revisiting his landmark decision in De

Silva Moore three years later:

“the case law has developed to the point that it is

now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it”

(37)

Rio Tinto PLC v. Vale S.A.

14 Civ. 3042, (RMB) (AJP) (S.D.N.Y. March 2, 2015)

Observes that “one TAR issue that remains open is how transparent and cooperative the parties need to be with respect to the seed or training set(s).”

In the absence of transparency, statistical estimation of recall and general quality control sampling can still be used to verify

appropriate training of the software and secure satisfactory review outcomes

(38)

“Black Letter Law”?

A case law search for “predictive w/2 coding” returns 35

cases:

• 12 positive references, in commentary or tone

• 18 neutral references

• Often judicial approval of proposed ESI protocols

• 4 that utilized the term in a non-ESI context

Still gaining acceptance and momentum

(39)

Global Aerospace Inc. v. Landow Aviation, L.P.,

2012 WL 1431215 (Vir.Cir.Ct. April 23, 2012)

• Defendants requested permission to use predictive

coding

• Plaintiffs opposed the request

• Order issued approving the use of predictive coding

• Work now concluded

(40)

Global Aerospace Inc. v. Landow Aviation, L.P.,

2012 WL 1431215 (Vir.Cir.Ct. April 23, 2012)

• Sample of 1.1 million “irrelevant” documents showed 2.9% relevant

• 31,000 missed relevant (over 80% recall) • Time: 7 months/Cost: $200,000

• 1.3 million docs after deduplication, 5,000 seeded • Predictive coding identified 173,000 relevant docs • 400 doc sample showed 80% precision

(41)

In re: Biomet M2a Magnum Hip Implant

Products Liability Litigation

Cause No. 3:12-MD-2391, (N.D. Ind., South Bend Div., April 18, 2013)

• Defendant Biomet used combination of electronic search

functions to identify relevant documents

• Beginning universe was 19.5 million documents

• Used keyword culling and deduplication

• Reduced to 2.5 million

• Then employed predictive coding on those 2.5 million

(42)

In re: Biomet M2a Magnum Hip Implant

Products Liability Litigation

Cause No. 3:12-MD-2391, N.D. Ind. (South Bend Division) April 18, 2013

Plaintiffs objected to this procedure -- requested that

Biomet start over:

• Wanted Defendants to use predictive

coding on all 19.5 million documents,

with Plaintiffs and Defendants jointly

training the software

(43)

Biomet Resolution

• Court held that Biomet’s methodology satisfied its

obligations under F.R.C.P. 26(b)(2)(C)

• Likely benefits of going back to the 19.5 million

document set would not outweigh burden and expense

• Assumed Biomet will remain open to “additional

reasonably targeted search terms…”

• If Plaintiffs wish to restart predictive coding process,

Plaintiffs must bear the expense

(44)

Progressive Casualty Insurance Co. v. Delaney

2014 WL 2112927 (D.Nev. May 20, 2014) Court approved a Joint ESI Protocol under which:

• Parties mutually agreed to search terms for universe of collected documents

• Progressive had option to produce all non-privileged documents: • Captured by the agreed search terms; or

• Captured by the agreed search terms responsive to the

(45)

Progressive Casualty Insurance Co. v. Delaney

2014 WL 2112927 (D.Nev. May 20, 2014)

• Progressive advised it would produce all docs Sept. – Oct. 2013 • Progressive produced nothing in six months

• Collected 1.8 million ESI docs, culled to 556,000 using search terms

• Began to review manually

• After review began, determined manual review was too time intensive and expensive

• Without informing Defendants or Court, used predictive coding to review only the 556,000

(46)

Progressive Casualty Insurance Co. v. Delaney

2014 WL 2112927 (D.Nev. May 20, 2014)

• “Many…have argued persuasively that the traditional ways lawyers have culled the …documents for production—manual human review, or keyword searches—are ineffective tools to cull responsive ESI in discovery.

• Predictive coding has emerged as a far more accurate means of producing responsive ESI in discovery. Studies show it is far more accurate than human review or keyword searches which have their own limitations.”

(47)

Progressive Casualty Insurance Co. v. Delaney

2014 WL 2112927 (D.Nev. May 20, 2014)

“Progressive is unwilling to engage in the type of

cooperation

and

transparency

that …is needed for a predictive coding

protocol to be accepted by the court or opposing counsel as a

reasonable method to search for and produce responsive ESI.

Progressive is also unwilling to apply the predictive coding

method it selected to the universe of ESI collected. The method

described does not comply with all of Equivio's recommended

best practices.”

(48)

• “Had the parties…agreed at the onset of this case to a predictive coding based ESI protocol, the court would not hesitate to approve a transparent mutually agreed upon ESI protocol.”

• Ordered Progressive to produce the 565,000

“hit” documents culled from the use of the

search terms, subject to privilege filters, the clawback provisions of FRCP 26(b)(5)(B), and FRE 502(d) and the existing ESI protocol.

Progressive Casualty Insurance Co. v. Delaney

(49)

Case Study #1: Product Liability Case

• 3.5 million documents in Relativity

• Approximately 2 million had been reviewed

• Approximately an equal number of responsive vs.

non-responsive documents

• Approximately 40 reviewers on case

(50)

• Limited potential cost savings

• Difficult plaintiff’s counsel

• MDL + numerous state cases

• Unsympathetic judges/discovery masters

• Danger of losing key word filtering

(51)

How Could Predictive Coding Be Used?

• Accelerate the human review and improve our QC

• We could use predictive coding to accelerate the

review, and check the human review

• It was impractical to use predictive coding as a

substitute for human review in this case

(52)

Case Study #1: Cost Analysis

Docs/Hour Cost / Hour Total Records Total Cost

Current 50 $39.50 2,000,000 $1,580,000 Cost Tier 1 44 $39.50 500,000 $448,863 Cost Tier 2 57 $39.50 1,200,000 $831,578 Cost Tier 3 80 $39.50 300,000 $148,125 TOTAL $1,428,566 Review Savings $151,434 Analytics Cost $60,000 Total Savings $91,434

(53)

Case Study #2

• Client spinning off a division to become separate

company

• Wants former employees to still access old e-mail

• Wishes to remove privileged documents from set to

avoid waiver

• Perfection not required – not an adversarial situation

but needs defensible process

(54)

Case Study #2

• Total volume: Approximately

200,000 documents

• Document-by-document review

and privilege determinations could

cost up to $2 per document

(55)

Case Study #2: Our Recommendations

• We recommended:

• search term filtering

• followed by sampling and

• predictive coding to identify and

remove privileged documents

• Set budget of $30,000

(56)

Case Study #2: Our Process

• Following initial filtering, two experienced reviewers

sampled “hits” and “misses” and adjusted filter terms to

fine-tune filtering

• Reviewers then “trained” software on selected samples of

the remaining “hits”

• Analytics accurately identified remaining documents most

likely to be privileged

• Those results were then used for two additional iterations

of filter “fine-tuning”

(57)

Case Study #2: Results

• We were left with a document population that contains

negligible privileged documents to make available to

ex-employees

• Filtering was not perfect, but even

human filtering is never perfect

• Client saved over 90% of the review

costs, amounting to several hundred

thousand dollars

(58)

Current “Hot” Issues in Predictive Coding

• Do parties have to give advance notice and/or obtain consent from adversaries or the court?

• Should courts allow predictive coding where opposing parties don’t consent?

• Is it okay to run keywords before starting the predictive coding? • Should parties share their “seed sets” with opposing counsel,

including irrelevant docs?

• What workflows are allowable or best?

(59)

Takeaways

• Predictive coding is gaining acceptance by courts and will be used increasingly, with or without opposing party notice and/or consent • Practical considerations continue to rule out primary reliance on

predictive coding for many reviews

• Even when not replacing human review, predictive coding can still be useful for many purposes

• Non-adversary review situations • Accelerating human review

• Improving quality control

• Finding key documents sooner

(60)

Questions?

David R. Cohen Bryon Z. Bratcher Mark E. Harrington [email protected] [email protected] [email protected]

412-288-1098 415-659-5948 626-229-9191 x4660

(61)

(62)

David R. Cohen Bryon Z. Bratcher Mark E. Harrington Practice Group Leader Director Senior Vice President,