• No results found

The State Of Predictive Coding

N/A
N/A
Protected

Academic year: 2021

Share "The State Of Predictive Coding"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

MEALEY’S

TM

LITIGATION REPORT

Discovery

The State Of Predictive Coding

by

Royce F. Cohen and

Derek I.A. Silverman

Stroock & Stroock & Lavan LLP New York

A commentary article

reprinted from the

September 2014 issue of

Mealey’s Litigation

Report: Discovery

(2)
(3)

Commentary

The State Of Predictive Coding

By

Royce F. Cohen and

Derek I.A. Silverman

[Editor’s Note: Royce F. Cohen is a Special Counsel in the Litigation Practice Group of Stroock & Stroock & Lavan LLP, and co-chair of the firm’s eDiscovery and Informa-tion Governance Group. Derek I.A. Silverman is an associ-ate in Stroock’s Litigation Practice Group, and a member of the firm’s eDiscovery and Information Governance Group. Both are in Stroock’s New York office. Copyright # 2014 by Royce F. Cohen and Derek I.A. Silverman. Responses are welcome.]

As the volume of electronically stored information (‘‘ESI’’) continues to soar, parties have increasingly turned to predictive coding to reduce the costs of reviewing ESI when disputes arise. Courts are increas-ingly accepting of the use of this discovery tool; how-ever parties should be aware that it is not a ‘‘fix-all’’ solution appropriate for every case, and an eDiscovery professional should be contacted to assess whether predictive coding should be considered for a specific matter. This article outlines the current state of the law with respect to predictive coding, and offers some guidelines on its use.

What Is Predictive Coding?

Predictive coding, alternatively referred to as ‘‘compu-ter-assisted review’’ or ‘‘technology-assisted review,’’1 involves an attorney ‘‘training’’ a software system, often known as an analytics engine, by tagging a subset of documents, known as a ‘‘seed set.’’ By coding these documents as either ‘‘responsive’’ or ‘‘non-responsive’’ to a document request or issue, the analytics engine can then apply an algorithm which uses the responsive or non-responsive concepts tagged in the seed set to similarly identify the remaining documents in a pro-duction as either responsive or non-responsive. By

training multiple seed sets, and by using statistics and sampling to verify the accuracy of the computer coding, the majority of non-responsive documents can be segregated without necessitating attorney re-view. A document review project can thus be completed by reviewing only a fraction of the total number of documents, resulting in substantial cost savings over traditional ‘‘keyword’’2 searching, as well as providing a more accurate review, according to recent studies.3

Da Silva Moore Paves The Way

The first (and most often cited) case approving the use of predictive coding is Da Silva Moore v. Publicis Groupe.4Da Silva Moore involved a gender discrimina-tion lawsuit in which a defendant was faced with a universe of over three million documents to potentially review.5The parties agreed to use predictive coding to attempt to cull down these documents, yet disagreed ‘‘about how best to implement such review.’’6The par-ties were then instructed by Magistrate Judge Andrew Peck of the Southern District of New York to submit a joint ESI protocol that would outline the process by which the defendant would use predictive coding in the case.7This protocol was so-ordered by Magistrate Judge Peck; however the plaintiffs, whose own pro-posed protocol had been rejected, reserved their rights to object to its use.8When the plaintiffs later exercised their objections to the ESI protocol, District Judge Andrew Carter upheld Judge Peck’s rulings, ‘‘because they are well reasoned and they consider the potential advantages and pitfalls of the predictive coding soft-ware.’’9 Judge Peck’s analysis showed that, in this case, the approval of predictive coding was ‘‘easy,’’ because of ‘‘(1) the parties’ agreement, (2) the vast

1

(4)

amount of ESI to be reviewed (over three million docu-ments), (3) the superiority of computer-assisted review to the available alternatives (i.e., linear manual review or keyword searches), (4) the need for cost effectiveness and proportionality under Rule 26(b)(2)(C), and (5) the transparent process proposed by [the defen-dant].’’10 An important point to note concerning Da Silva Moore, however, is that the court did not order the parties to use predictive coding, as has been sug-gested on several blogs that have summarized the hold-ing of this case.11Rather, here, the parties had agreed on the use of this technology.12

Issues Left Unresolved ByDa Silva Moore

While Da Silva Moore was the first case to give judicial imprimatur to the use of predictive coding, it left two key issues unresolved. First, as stated explicitly in the opinion itself, Da Silva Moore did not address what would happen if the producing party wanted to use predictive coding over the objections of the requesting party.13Second, the court did not address whether the producing party was required to disclose the documents it had used for its seed set, as the parties again had agreed, as part of their stipulated ESI protocol, to dis-close the seed set documents.14

1. What happens when the requesting party objects to the producing party’s use of predictive coding?

This question has since been addressed by two courts, one in a county court in Virginia, and another, more recently, by the Southern District of New York. In the first case, Global Aerospace Inc. v. Landow Aviation, the Virginia court, in a summary order, granted the defen-dants’ request (over the plaintiff’s objections), to pro-ceed with the use of predictive coding for the processing and production of ESI.15 While the court’s decision was issued after briefing, and after 40 minutes of oral argument, unlike Da Silva Moore, it did not provide the parties with guidance concerning the specific protocol for the use of predictive coding (for example, the num-ber of rounds of seed sets), or provide steps to ensure quality control. Following Global Aerospace, Judge Denise Cote of the Southern District of New York, in Federal Housing Finance Agency v. HSBC North America Holdings Inc., likewise permitted JPMorgan Chase, over the objection of the plaintiff, to produce its documents through the use of predictive coding, relying on literature ‘‘that indicated that predictive cod-ing had a better track record in the production of responsive documents than human review.’’16

2. Are seed sets discoverable?

As part of the parties’ stipulated ESI order in Da Silva Moore, the defendant agreed to provide the plaintiffs with the documents that the defendant used for its seed sets, and agreed to disclose the coding that it applied to these documents.17The court thus did not have to address the scenario that would arise had the parties refused to come to an agreement regarding seed set disclosure. This precise issue arose in the Northern District of Indiana in In re Biomet M2a Magnum Hip Implant Products Liability Litigation.18 In this case, after Biomet had performed keyword searching to reduce the number of potentially discoverable docu-ments from 19.5 million to 2.9 million, it turned to predictive coding to further cull the universe of docu-ments.19 After production, the receiving parties requested that Biomet identify the documents that it had used as its seed set, irrespective of whether they were identified as ‘‘responsive’’ or ‘‘nonresponsive’’ to the document requests.20

Judge Robert Miller, Jr., denied the receiving parties’ request, and noted that Federal Rule of Civil Proce-dure 26(b)(1), which makes discoverable ‘‘any non-privileged matter that is relevant to any party’s claim or defense,’’ provided no authority for the receiving parties to discover how Biomet had used certain docu-ments in the seed set prior to their disclosure.21Indeed, were the full disclosure of seed set documents to be the norm when using predictive coding, it would go far beyond what is normally required of parties that perform either a keyword search, or a linear review, as it would force parties to turn over documents that are not even responsive or relevant to an opposing party’s document requests. As some commentators have right-fully noted, the selection and tagging of documents in the seed set could also constitute protected attorney work product, and thus a court should not force the disclosure of this process.22

Predictive Coding Gains Further Traction With The Courts

After the Da Silva Moore decision, other courts have become increasingly receptive to predictive coding as an alternative to keyword searching. In a more remark-able move towards the acceptance of this technology, in EORHB, Inc. v. HOA Holdings LLC, Vice Chancel-lor J. Travis Laster of the Delaware Chancery Court surprised both parties by requiring that they ‘‘(i) retain a single discovery vendor to be used by both sides, and

(5)

(ii) conduct document review with the assistance of predictive coding,’’ despite the fact that neither party had requested the use of predictive coding in the case.23 Following objections by the parties, Vice Chancellor Laster amended his order to allow the parties to use different discovery vendors, and based on the low volume of relevant documents expected to be produced by EORHB, agreed that the cost of predictive coding would be outweighed by any practical benefit, and thus allowed EORHB to conduct document review using ‘‘traditional methods.’’24Despite the court’s sub-sequent about-face, EORHB, as well as Global Aerospace Inc. v. Landow Aviation, and Federal Housing Finance Agency v. HSBC North America Holdings Inc., supra, show that courts across the country are beginning to accept predictive coding as a viable alternative to key-word searching and linear document review.25

Instances In Which Courts Have Refused To Accept Parties’ Attempts To Use Predictive Coding

While courts are increasingly accepting of the use of predictive coding, in certain cases, when other issues have been present, courts have denied parties’ requests to use this technology. In one case, Kleen Products LLC v. Packaging Corporation of America, after the defendants had conducted keyword searches of their ESI, the plaintiffs argued that these searches would be insufficient to retrieve responsive documents, and requested that the court order the defendants to use predictive coding.26 The court denied the plaintiffs’ request, relying on Principle 6 of The Sedona Princi-ples,27 which provides that ‘‘ ‘[r]esponding parties are best situated to evaluate the procedures, methodologies and technologies appropriate for preserving and produ-cing their own electronically stored information.’ ’’28 Recently, in Progressive Casualty Insurance Co. v. Dela-ney, Magistrate Judge Peggy Leen of the U.S. District Court for the District of Nevada also denied a party’s request for the use of predictive coding.29 In this instance, Progressive had agreed to apply keyword search terms to the approximately 1.8 million docu-ments it had collected.30 After applying the search terms, Progressive’s potentially responsive documents were culled to approximately 565,000 documents.31 Progressive’s counsel began manually reviewing these 565,000 documents, as required by the parties’ stipu-lated ESI order, but determined that the manual review was too time intensive and expensive.32 Progressive

then unilaterally began using predictive coding to review its ESI,33 without the defendants’ consent to deviate from the stipulated ESI order, and without seeking leave of the court to amend the order to include the use of predictive coding.34Finding that Progressive was ‘‘propos[ing] a ‘do-over’ of its own invention that lacks transparency and cooperation regarding the search methodologies applied,’’ the court required that Pro-gressive turn over the 565,000 documents that it had originally retrieved by the use of the search terms, with-out any further review to segregate nonresponsive documents.35

While the courts in both Kleen Products and Progressive denied parties’ attempts to use predictive coding, it is important to note that these cases involved very specific factual scenarios. In Kleen Products, a requesting party attempted to force a producing party, over its objec-tions, to use predictive coding in lieu of traditional keyword searching. In Progressive, the parties had already stipulated to proceed using keyword searches, when a party, in violation of this court-approved stipu-lation attempted to use predictive coding. Kleen Pro-ducts and Progressive should thus not be viewed as a rejection by the courts of predictive coding, but rather a recognition that this technology cannot be forced on an unwilling party, nor can it be used in the dark in clear violation of a court order.

Recommendations

While courts are increasingly likely to approve a party’s request to use predictive coding, a party should assess whether the use of this technology is appropriate as early as possible in the discovery process, and, as can be seen from the recent Progressive case, certainly prior to entering into an ESI stipulation. While disclosure of the use of predictive coding to an opposing party is not a predicate to its use, as with all aspects of discovery, to the extent it is possible, transparency and commu-nication with opposing counsel at the early stages of discovery can be key to avoiding costly later disputes. As Magistrate Judge Peck (the author of the Da Silva Moore opinion) noted in an earlier opinion, ‘‘[e]lectro-nic discovery requires cooperation between opposing counsel and transparency in all aspects of preservation and production of ESI.’’36

When considering the use of predictive coding, parties should be aware that not all cases lend themselves to this relatively newly-accepted technology. The following

3

(6)

factors are a non-exclusive list to consider when evalu-ating predictive coding for a case:

 Size of the document collection: Predictive

cod-ing can be a useful tool to avoid a manual review of hundreds of thousands, or even millions of docu-ments. However, as seen in EORHB, Inc. v. HOA Holdings LLC, when faced with small sets of data, the cost of implementing predictive coding may not be a cost-effective solution over tradi-tional keyword searching.

 Types of documents in the collection: Parties should survey the types of documents to be reviewed to determine if the documents lend themselves to the concept-based searching of-fered by predictive coding. For example, if the document universe contains a high volume of spreadsheets consisting largely of numerical fig-ures, the predictive coding analytics engine may have difficulty identifying responsive and non-responsive concepts from these spreadsheets. Likewise, if the document universe contains a large number of hard-copy documents, errors in converting these documents into machine-readable text37could prevent an analytics engine from accurately evaluating the responsiveness of a document.

 Knowledge of the facts of the case: If the

attor-ney reviewing and tagging the seed set has a com-prehensive understanding of the facts of the case, and can thus accurately and quickly determine which documents are relevant to the document requests, predictive coding can be an effective tool to reduce the time and cost of a review. However, often attorneys are only able to learn about the relevant issues of a case, particularly when dealing with a new client, by performing a substantive review of the documents. In this latter scenario, the reviewing attorney may not be able to accurately and quickly code the seed set, thus greatly undermining the efficiencies that would otherwise be gained from predictive coding.

 Review timeline: If the review has a short dead-line, predictive coding, when done correctly, can be a relatively faster process than traditional review. However, a longer deadline allows parties the flexibility to consider other options for re-view, or to consider using a combination of pre-dictive coding with traditional keyword reviews.

 Cost: Vendors and law firms can offer different pricing solutions for both predictive coding, and other traditional forms of review. Generally, when using predictive coding, there will be a higher cost for both use of the software, and for the increased time required for technical support from a firm or vendor’s litigation support profes-sionals. However, this cost can be offset by a reduction in the cost of the attorney time that would otherwise be required for a manual review of the documents.

Above all, parties should consider consulting with an eDiscovery attorney at the outset of discovery to eval-uate whether predictive coding would be beneficial for a case, to help balance the benefit of predictive coding with the cost involved in using this technol-ogy, and to help develop a defensible ESI plan going forward.

Endnotes

1. Predictive coding is a subset of computer-assisted or technology-assisted review (‘‘TAR’’), however the terms have largely been used interchangeably. 2. Keyword searching involves applying a single search

term, or a combination of multiple search terms, to a set of documents, and then sequentially reviewing the documents containing these terms for responsive-ness to document requests.

3. See Maura R. Grossman & Gordon V. Cormack, Technology–Assisted Review in E–Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, XVII RICH. J.L. & TECH. 11, 37

(2011) (reporting that manual reviewers identified between 25% and 80% of relevant documents, while technology-assisted review returned between 67% and 86%).

4. 287 F.R.D. 182 (S.D.N.Y. 2012) (Peck, M.J.), adopted sub nom., Da Silva Moore v. Publicis Groupe SA, No. 11 CIV. 1279(ALC)(AJP), 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012).

(7)

6. Id. at 189. 7. Id. at 187. 8. Id. at 187 n.6.

9. Da Silva Moore v. Publicis Groupe SA, No. 11 CIV. 1279 (ALC)(AJP), 2012 WL 1446534, at *2 (S.D.N.Y. Apr. 26, 2012). 10. Da Silva Moore, 287 F.R.D. at 192. 11. Id. at 183 n.1. 12. Id. 13. Id. at 190.

14. Id. at 186 (‘‘MSL . . . had agreed to provide all 2,399 documents (and MSL’s coding of them) to plaintiffs for their review’’) (emphasis added). Note, however, that the parties’ agreement came after the urging of Judge Peck during a discovery conference, who informed the defendants that: ‘‘If [they] do predictive coding, [they] are going to have to give [their] seed set, including the seed documents marked as nonre-sponsive to the plaintiff’s counsel so they can say, well, of course you are not getting any [relevant] docu-ments, you’re not appropriately training the compu-ter.’’ Id. at 185. Because of Judge Peck’s statements to the parties, some litigants continue to disagree as to whether the court’s opinion stands for the proposition that the contents of a party’s seed set must be disclosed to the opposing party. See Gordon v. Kaleida Health, No. 08-CV-378S F, 2013 WL 2250579, at *2-*3 (W.D.N.Y. May 21, 2013); Hinterberger v. Catholic Health Sys., Inc., No. 08-CV-380S F, 2013 WL 2250603, at *2 (W.D.N.Y. May 21, 2013). 15. No. CL 61040, 2012 WL 1431215 (Va. Cir. Ct.,

Loudoun Cnty., Apr. 23, 2012).

16. No. 11 CIV. 6189(DLC), 2014 WL 584300, at *3 (S.D.N.Y. Feb. 14, 2014). 17. Da Silva Moore, 287 F.R.D. at 201-202. 18. Cause No. 3:12–MD–2391, 2013 WL 6405156, at *2 (N.D. Ind. Aug. 21, 2013). 19. Id. at *1. 20. Id.

21. Id. at *2. The court noted, however, that since Bio-met had not identified any harm that would result from identifying which of the responsive documents that Biomet had already produced were in Biomet’s seed set, Biomet’s cooperation with its opposing counsel fell below the level of what has been endorsed by the Sedona Conference1. Id. The court thus urged Biomet to reconsider its refusal to identify its seed set documents, but came short of ordering Bio-met to do so. Id.

22. H. Christopher Boehning & Daniel K. Toal, ‘Seed Set’ Documents Should Not Be Discoverable, N.Y. L.J. Vol. 251, No. 2 (Feb. 3, 2014) (citing Karl Schieneman & Thomas C. Gricks III, ‘‘The Implications of Rule 26(G) on the Use of Technology-Assisted Review,’’ 7 FED. CTS. L. REV. 239, 262 (2013) (‘‘To the extent

that development of the seed set reflects attorney work product, the certification obligations of Rule 26(g) clearly do not require disclosure.’’)).

23. C.A. No. 7409-VCL, 2012 WL 4896670 (Del. Ch. Oct. 15, 2012).

24. EORHB, Inc. v. HOA Holdings LLC, C.A. No. 7409-VCL, 2013 WL 1960621, at *1 (Del. Ch. May 6, 2013).

25. See also F.D.I.C. v. Bowden, No. CV413-245, 2014 WL 2548137, at *13 (S.D. Ga. June 6, 2014) (order-ing that if there are further disagreements concern(order-ing an ESI-protocol, the parties should consider the use of predictive coding).

26. No. 10 C 5711, 2012 WL 4498465, at *5 (N.D. Ill. Sept. 28, 2012).

27. The Sedona Principles represent a best-practice guide-line for eDiscovery that evolved out of the discussions of The Sedona Conference1, a nonpartisan law and policy think tank.

28. EORHB, Inc., 2013 WL 1960621, at *5 (quoting The Sedona Conference1 Best Practices Commentary on the Use of Search and Information Retrieval Methods in E–Discovery, 8 SEDONACONF. J. 189, 193 (Fall 2007)).

5

(8)

29. No. 2:11-CV-00678-LRH-PA, 2014 WL 3563467 (D. Nev. July 18, 2014).

30. Id. at *2. 31. Id. 32. Id.

33. Progressive applied predictive coding only to the 565,000 documents that were returned after a key-word search, rather than to the universe of documents that it had collected, contradicting its own vendor’s recommended best practices for the use of predictive coding. Id. at *11.

34. Id. at *2.

35. Id. at *10-*11. In denying Progressive’s attempt to use predictive coding, Magistrate Judge Leen’s opinion misstated the amount of disclosure required from a party concerning a seed set. Specifically, Judge Leen stated that ‘‘courts have required the producing party to provide the requesting party with full disclosure about the technology used, the process, and the

methodology, including the documents used to ‘train’ the computer.’’ Id. at *10 (emphasis added) (citing Da Silva Moore, 287 F.R.D. at 191; In re Actos (Piglitazone—Prods. Liab. Lit.), No. 6:11–md–2299, 2012 WL 7861249 (W.D. La. July 27, 2012)). How-ever, the courts in Da Silva Moore and In re Actos, to which Magistrate Judge Leen cited for this proposition, did not require the parties to disclose which documents they had used as their seed set. Rather, the parties had agreed, as part of their stipulated ESI protocols, to turn over these documents. Da Silva Moore, 287 F.R.D. at 186 (‘‘MSL . . . had agreed to provide all 2,399 documents (and MSL’s coding of them) to plain-tiffs for their review’’) (emphasis added); In re Actos, 2012 WL 7861249, at *3 (‘‘The Parties have discussed the methodologies or protocols for the search and review of ESI . . . and the following is a summary of the Parties’ agreement . . .’’) (emphasis added). 36. William A. Gross Const. Associates, Inc. v. Am. Mfrs.

Mut. Ins. Co., 256 F.R.D. 134, 136 (S.D.N.Y. 2009). 37. The conversion of images to text is accomplished through Optical Character Recognition (‘‘OCR’’) software.I

(9)
(10)

1600 John F. Kennedy Blvd., Suite 1655, Philadelphia, PA 19103, USA Telephone: (215)564-1788 1-800-MEALEYS (1-800-632-5397)

References

Related documents

Trade Liberalization, Imported Inputs and Factor Efficiencies: Evidence from the Auto Components Industries in

The counterfactual exercise we wish to imagine is the removal of fair trade status from the coffee berries that are currently sold under these mechanisms. We retain the notion

There is no danger to the life of your husband as Jupiter holds the lordship of your Mangalyasthana (8 th house) and the 7 th lord Saturn is posited in his own star. Rahu

Don't Cry Out Loud.

 As I have observed, ancient versions and interpretations do not necessarily demonstrate, one by one and per se, that their under- standing is right, but they surely demonstrate that

Quality: We measure quality (Q in our formal model) by observing the average number of citations received by a scientist for all the papers he or she published in a given

The aims of the study were a) to compare the prevalence of difficulties initiating sleep (DIS) and difficulties maintaining sleep (DMS) in untreated OSA patients vs. controls; b)

In PD, the most common sleep disorders include insomnia [difficulty initiating sleep and its associated restless legs syndrome (RLS), as a rea- son for the difficulty of falling