• No results found

RISE OF THE MACHINES: Technology-Assisted Coding in the ESI Age. Robert J. Burns Benjamin R. Wilson

N/A
N/A
Protected

Academic year: 2021

Share "RISE OF THE MACHINES: Technology-Assisted Coding in the ESI Age. Robert J. Burns Benjamin R. Wilson"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

RISE OF THE MACHINES:

Technology-Assisted Coding in the ESI Age Robert J. Burns

Benjamin R. Wilson

It was not long ago that business — and with it, litigation involving business — was conducted far differently. Managers drafted memoranda, employees created reports and assistants typed letters. Photocopies were made, and files and archives organized all the paper. A litigator collecting documents would identify the right employees, find their relevant files and examine the archive index to locate historical documents. Perhaps a peek in the email account, and after reviewing for privilege and applying bates numbers, the production was done.

Today’s employee — at nearly every level, and in nearly every field — generates little paper but mountains of data. Email chains lengthen, splinter and multiply. Texts and instant messages fly from workstations, laptops and mobile devices. Documents reside on file servers and

collaborative workspaces, and iterations proliferate. Materials are exchanged within and between companies on encrypted websites. Operative business and contractual

communication often occurs via electronic exchange, not signed paper. The speed of business, the shrinking administrative workforce and the availability of useful (albeit limited) search tools, all mean that nobody pays attention to organizing this amorphous data for posterity. And then the lawsuit hits.

Litigators were among the last to realize that business, and business litigation, had changed. At first, we pretended that electronic data was another category of paper, and satisfied ourselves (and, we thought, our discovery obligations) by asking lay custodians to identify, search and print their relevant electronic files. Then, Zubulake was issued, the Federal Rules were amended and courts took the offensive on discovery of electronically stored information (ESI). Minimizing ESI search and production obligations now posed untenable risks to our cases and to our

clients.

Faced with this seismic shift, we improvised. Full manual review was impossible in most large cases, so we negotiated search terms, hoping to stumble on balanced terms that identified relevant documents and excluded the irrelevant. But terms proved under-inclusive, over-inclusive or both, leading to few needles in large haystacks. We hired armies of contract attorneys to sift through data to identify responsive — and "hot" — documents. But even the brightest contract attorney had limited expertise with the legal issues in a case and limited visibility into the factual issues. This produced idiosyncratic decisions and, often, inconsistency and error. Then, once the review was completed and the contract army disbanded, the

institutional knowledge developed by those closest to the documentary record largely

(2)

2

gigabytes exchanged, this process — which became the new standard for discovery compliance — advanced substantive case development too little, or not at all.

Emerging technology created this quandary. Now, emerging technology holds new promise to mitigate these challenges for litigators, and for business clients weary of paying too much for too little return. 2012 saw the first steps toward judicial validation of new "technology-assisted review" approaches that endeavor to re-balance the scale. 2013 has witnessed judicial reliance, endorsement and advocacy of machine learning tools that increase effectiveness and efficiency in locating responsive documents.

Technology-assisted review will not produce perfect discovery. In today’s data-intensive business world, no review approach will.1 Nor will these new tools completely replace

previously employed review tools, at least not for awhile. Indeed, for smaller data sets, the cost of using predictive coding assistance very well may be outweighed by any practical benefit of its use.2

However, for the right case, these new technologies offer efficient and defensible processes for reviewing large quantities of ESI with more consistency and fewer errors than the alternatives. Not only do these tools hold the promise of fewer overall dollars spent, they promise that the dollars will be spent in a smarter way. These new tools place a premium on the training of a computer algorithm by lawyers with knowledge of the facts, the law and the key issues. The more those training the algorithm know about the case, the smarter the algorithm will be and the better it will work. This increases overall accuracy, expedites identification of critical documents and increases early hands-on facility among the litigation team with the documentary record. In short, these tools promise not only to satisfy a party’s discovery obligations, but to more completely and more expeditiously arm the party’s litigators for the litigation to come.

The Methodology

The use of technology to assist with manual review is not new. Key-word searching applies search terms (sometimes, Boolean, proximity and/or wildcard terms) to limit the universe of data for review. "Deduplication" and grouping of "near duplicates" further reduce the quantity of documents to be reviewed and minimize the risk of inconsistent review. Threading email chains enables all iterations of the same chain to be coded together, enhancing efficiency and consistency. "Smart filters" permit reviewers to restrict documents by email or domain address, for example, and eliminate many of the "junk" e-mails captured by keywords. Each of these tools is employed to winnow the review set and to streamline what remains, in essence, a manual review process.

1 "There is simply no [discovery] tool that guarantees perfection." (Moore v. Publicis Groupe, No. 11 Civ. 1279

(ALC)(AJP), 2012 WL 1446534, at *3 (S.D.N.Y. Apr. 26, 2012).

2See, e.g., EORHB, Inc. v. HOLA Holdings, LLC, No. 7409-VCL, 2013 WL 1960621 ((Del. Ch. May 6, 2013) (modifying

a prior order requiring parties to conduct a review with the assistance of predictive coding where parties

subsequently agreed that based on low volume of relevant documents expected to be produced in discovery, the cost of using predictive coding would likely be outweighed by any practice benefit of its use).

(3)

3

Now, new software technology platforms offer iterative learning methods intended to reduce — but not replace — human review. These tools are referred to variously as "technology-assisted coding" and/or "predictive coding." Though the algorithms and predictive analytics offered by particular software differ, these tools all rely on attorneys to train an algorithm that is applied to identify responsive, privileged and "hot" documents based upon similarity to the human-coded set. As documents are retrieved, lawyers — preferably those with significant knowledge of the facts, the law and the key issues in the case — instruct the algorithm whether its "predicted" coding is accurate. This process repeats itself over multiple iterations. As the database "trains" itself with more human reviewer feedback, the algorithm's predictability improves.

In practice, one or more members of the case team review an initial set of documents randomly selected from the full review population. These attorneys code each as either responsive or non-responsive, and with coding indicating substantive issues, privilege concerns and/or criticality. Using built-in analytical tools — usually, a sophisticated mix of keywords, Boolean connectors, concept searches and categorical groupings — the database identifies underlying elements and properties of the coded documents, and uses those elements and properties to make coding predictions for the un-reviewed universe of documents. A new set of documents is fed to the attorney review team, along with the algorithm’s "predicted" coding for each.

Attorneys correct the proposed coding if necessary and "re-train" the algorithm; then the process repeats. When the database’s predictions and the attorneys' coding coincide to a determined level of confidence, the system has learned enough to make confident predictions for the remaining documents. Although timelines vary somewhat, attorneys may only need to review a small fraction of the overall data set to reach this point. The process concludes with quality control rounds, where random samples are selected and computer-predicted results are tested against human coding. If the coding corresponds to a determined degree of confidence, the process is complete. If not, the algorithm returns for further rounds of training until it meets the quality control metric.

It is important to remember that selecting a particular tool is not one-size-fits-all. Each software employs slightly different approaches, and each provides distinctive workflows. Accordingly, consideration of predictive coding technologies should be a collaborative process between a client, outside counsel, litigation support staff and prospective vendors. The selection should reflect consideration of the volume of ESI, the scope of review, applicable timing and cost. Also, certain cases may benefit from hybrid approaches, such as where predictive coding is applied to thorny data sets (e.g., large email archives), while remaining documents are reviewed via

traditional manual review or the use of keyword and deduplication processes to initially reduce the volume of a data set followed by the use of predictive coding to locate responsive

documents within the remaining corpus.3

3See, e.g.,In re Biomet M2A Magnum Hip Implant Prods. Liab. Litig., No. 3:12-MD-2391, 2013 WL 1729682 (N.D.

Ind. Apr. 18, 2013) (endorsing, over the plaintiffs' objection, the defendant's use of predictive coding to identify relevant documents for production from the 2.5 million that emerged from the keyword and deduplication processes, which initially reduced the universe of documents and attachments from 19.5 million).

(4)

4

Benefits of Predictive Coding

We enumerate here several of the chief advantages of predictive coding technologies that we highlighted in the discussion above.

1. Reliability

Magistrate Judge Peck of the U.S. District Court for the Southern District of New York — the first judge to publicly endorse the use of predictive coding — explained: "while some lawyers still consider manual review to be the 'gold standard,' that is a myth, as statistics clearly show that computerized searches are least as accurate, if not more so, than manual review."4

While some may resist the idea that machines can perform portions of our jobs better than we do, human error is endemic in manual review. As litigators know and as the judiciary has begun to recognize, "such review is prone to human error and marred with inconsistencies from the various attorneys' determination of whether a document is responsive."5 Large-scale manual

review almost invariably requires the use of contract attorneys. These attorneys' knowledge of a client's industry and its business is likely negligible, and their experience litigating cases involving similar issues is limited. Their knowledge of the facts of the case is contained entirely within the background information provided in the course of their review. The amount of data to be reviewed is enormous, and court-imposed deadlines are often tight, requiring long hours and sprawling teams. Individual reviewers differ in their ability to maintain alertness, spot responsive documents, assess a document's potential criticality to the case, navigate privilege issues, and make correct and consistent decisions about marginal documents.6

Keyword searches are not the answer, for several reasons. First, these searches are formulated at the earliest stages of case development, often before extensive interviews of key players, and generally before the issues and applicable lexicon are understood fully. Second, such searches fail to include variations — for example, slang, misspellings and acronyms — that could exclude key data. Third, keyword searches often capture a large amount of irrelevant data caused by "false hits," requiring the same extensive manual review processes described above. In short, as Judge Shira Scheindlin of the U.S. District Court for the Southern District of New York put it, "[s]imple keyword searching is often not enough . . . there is increasing strong evidence that keyword searching is not nearly as effective at identifying relevant information as many lawyers would like to believe."

Notwithstanding quality control procedures, inconsistencies, inaccuracies and errors inevitably remain.

7

4Moore v. Publicis Group, 287 F.R.D. 182, 190 (S.D.N.Y. 2012), adopted by,2012 WL 1446534 (S.D.N.Y. Apr. 26,

2012).

5Moore, 2012 WL 1446534, at * 3.

6 Herbert L. Roitblat, Predictive Coding and Defensibility at 1 (2011 Orcatec).

7Nat'l Day Laborer Org. Network v. U.S. Immigration and Customs Enforcement Agency, 877 F. Supp. 2d 87,

(5)

5

Utilizing manual review or keyword searches — or, most commonly, a hybrid of the two — often results in coding and production inconsistencies. Inconsistencies become fertile ground for exploitation by adversaries, often resulting in costly fights where nothing is gained and credibility may be lost. Similarly, these methods pose the risk that relevant, even critical, documents may remain undiscovered; the producing party may be without the benefit of documents that would make its case, or may be subject to sanctions for non-production of documents that would make its adversary's case.

Technology-assisted coding avoids many of these pitfalls by permitting a single attorney or small team to review and categorize large quantities of ESI with lower effort, and demonstrably greater consistency and accuracy. The software mechanisms for training the algorithm rely on a flexible and proven set of methods for identifying documents similar to those deemed relevant, or "hot," such that the risk of overlooking critical documents is reduced. And once trained to a defensible degree of confidence, the algorithm is applied consistently across the universe of collected data, eliminating inconsistencies by ensuring that all data is reviewed pursuant to a single set of parameters.

2. Prioritizing Review

Moreover, technology-assisted review allows for efficient work flows. Documents may be batched for review based upon the algorithm’s prediction of the likelihood they are responsive and/or "hot." In a case where depositions will immediately follow large productions, those preparing for depositions will have the benefit of the most critical documents much earlier in the process. Also, prioritizing review allows a party to expedite the algorithm training process, such as by assigning documents with the highest predicted relevance to attorneys with the most underlying knowledge of the case.

3. Enhanced Institutional Knowledge

Leaving aside error rates, manual contract attorney review also suffers from a lack of

institutional knowledge and memory: once the project is complete, the contract team disbands, and their coding notations are the only record of their process. Although this constitutes

compliance with discovery obligations, it often leaves counsel ill-equipped to use the

documentary record effectively in depositions, motion practice and at trial. Counsel who will be handling the litigation going forward are often several steps removed from the teams wading through data. Although senior litigation counsel may supervise the overall review process and may review certain key documents as they are identified, it is often not until deposition preparation that senior litigators in document-intensive cases get their arms around the documentary record. By that time, the senior litigator has a document set filtered through multiple sets of junior attorneys: the contract attorneys who identify responsive and "hot" documents, the associates who quality-control the production and assemble a definitive set of critical documents and the associate who pulls together critical documents relevant to

(6)

6

Predictive coding turns this dynamic on its head. By having a senior attorney (or a team thereof) assume active involvement in the initial stages of teaching the database what is responsive and what is important, these senior attorneys acquire a greater understanding of the documentary record as it is being developed. And, by involving senior attorneys in the initial stages of review, those attorneys' knowledge of industry, client, facts and law are incorporated into their coding, which in turn produces a more robust algorithm. Junior attorneys may be called upon to

complete the process, but with the algorithm doing much of the heavy lifting that contract attorneys once performed, review teams are leaner and composed of associates who will continue to work on the case going forward.

Similarly, once the investment is made to train the algorithm, the algorithm stays trained. If a new set of client data is later collected, the trained algorithm can be applied to that new data set without the need to mount a brand-new review process. And when the opponent's production arrives, the algorithm — understanding by that point a great deal about what counsel considers most important — will quickly identify potentially critical documents for review.

All of this means that effective use of these tools allows a party’s litigation counsel — those responsible for setting litigation strategy, counseling clients and handling subsequent stages of litigation — to get much smarter, much faster.

Conducting a Defensible Technology-Assisted Review

Though technology-assisted coding is relativity new, and was judicially approved last year, courts are trending in favor. Indeed, in 2013, several federal courts across the United States have encouraged the parties to use predictive cording and acknowledged the advantages of this technology.8

In light of the many advantages detailed above, now is the time to consider its use in

appropriate cases. Below is a non-exhaustive list of guidelines, gleaned from recent experience and from the few court opinions issued to date, for developing and implementing an efficient, robust and defensible predictive coding process.

8See, e.g., Chevron Corp. v. Donziger, No. 11 Civ. 0691(LAK), 2013 WL 1087236, at *32 (S.D.N.Y. Mar. 15, 2013)

(encouraging a non-party to use predictive coding to reduce the burden and effort required to comply with a

subpoena); Hinterberger v. Catholic Health Sys, Inc., No. 08-CV-378S(F), 2013 WL 2250579 (W.D.N.Y. May 21, 2013)

and Gordon v. Kaleida Health, No. 08-CV-380S(F), 2013 WL 2250603 (W.D.N.Y. May 21, 2013) (noting that the

court previously directed the parties to consider computer assisted ESI reviewing and production method in

accordance with Moore, and denying a motion to compel where defendants agreed to meet and confer with

plaintiffs and plaintiffs' ESI consultants regarding defendants' production using predictive coding); cf. Gabriel Techs

Corp. v. Qualcomm Inc., No. 08-cv-1992 AJB (MDD), 2013 WL 410103, at *10 (S.D. Cal. Feb. 1, 2013) (awarding

defense counsel over $2 million in fees under 35 U.S.C. § 285 attributable to computer-assisted review and remarking that defendants "decision to undertake a more efficient and less time-consuming method of document

review to be reasonable under the circumstances."); Nat'l Day, 877 F. Supp. 2d at 109 (Scheindlin, J.) (describing

predictive coding as an emerging best practice for dealing with the shortcomings of traditional keyword searches).

(7)

7

First, the Sedona Conference Cooperation Proclamation states that "the best solution in the entire area of electronic discovery is cooperation among counsel."9

Second, in business litigations where discovery burdens are roughly balanced, the invitation to use predictive coding will often be welcome and may result in a bilateral agreement.

Predictive coding is no exception to this rule. Absent good reason not to, counsel should advise opposing counsel that it intends to use technology-assisted coding and attempt to secure opposing counsel's

agreement. Counsel should also confer with opposing counsel on a review protocol. A non-exhaustive list of issues for discussion include: (1) use of keywords in the collection of documents, (2) number of custodians, (3) size of the seed set, (4) use of concept groups, (5) number of iterative rounds to stabilize algorithm training, and (6) targeted confidence level. Open discussion with opposing counsel on these issues can ensure defensibility by securing agreement, or can narrow disputed issues for judicial resolution. At a minimum, up-front discussion prevents later claims of sandbagging.

10

Third, counsel should continue its transparency throughout the review process. This may mean providing opposing counsel with: (1) a list of custodians, (2) keywords applied, (3) documents reviewed as a function of the initial seed or control set, whether they were ultimately coded as responsive or nonresponsive, (4) issue codes or concept groups, and/or (5) proof of a valid quality control process, including the confidence level determined to conclude the review.

In the event an adversary also elects to use technology-assisted coding, counsel should consider whether to engage a single vendor and split database costs. Of course, if this approach is proposed, the parties need a protocol for protecting the confidentiality of unproduced and privileged documents, and of party-specific issue coding.

11

Fourth, regardless of disclosure, all aspects of the process should be carefully documented. Even where there is agreement among counsel as to the use of predictive coding, the variations among tools and the presence of individualized determinations counsel in favor of meticulous documentation. The need to document becomes stronger still where the decision to use predictive coding is made unilaterally or over the adversary’s objection. Further, the

Timely disclosure of these matters could strengthen the protocol's defensibility in the event of later challenge. Such disclosure could require the adversary to articulate its objection(s) on the matters being disclosed or risk waiver.

9 Moore, 287 F.R.D. at 192 (citing www.TheSedonaConference.org).

10 Consider, for example, the parties' bilateral agreement endorsed by a federal judge of the U.S. District Court of

Louisiana. In re Actos, MDL No. 6:11-md-2299, 2012 WL 7861249, at *3-8 (W.D.La. Jul. 27, 2012). In training the

database, both parties agreed to designate three individuals to work collaboratively to review the initial random documents collected from four agreed-upon custodians identified by the defendant. The plaintiffs' designated individuals, however, had to sign nondisclosure and confidentiality agreements, agreeing not to disclose

information to co-counsel or their clients without defendant's consent. At the conclusion of this training phase, the parties agreed to meet and confer regarding which relevance score would provide a cutoff for documents to be manually review by defense counsel for production.

(8)

8

responsibility for documentation should not be ceded to the vendor; counsel and/or its litigation support staff should document the process that they may be called upon to defend.

Fifth, to further control costs and fees, a party may consider staging review. For example, this could involve collecting and reviewing documents solely from sources or custodians most likely to have relevant data, without prejudice to the requesting party seeking additional documents after the conclusion of that first-stage review. Or, where the client has identified an initial set of key documents, these documents can be seeded into an initial training set as responsive and "hot," potentially shaving rounds off of the training process.

A Highly Promising Tool

In sum, predictive coding technologies hold great promise for extricating businesses, and their litigators, from the burdens posed by data proliferation. Structured and executed properly, a review protocol using these tools can be an efficient, cost-effective and defensible means of complying with discovery obligations. But just as importantly, these new technologies advance case development by enhancing review accuracy and consistency, increasing the likelihood that key documents are captured and allowing a party and its senior litigation team real-time

visibility into the documentary record as it develops. __________________________

Robert J. Burns is a partner in the Litigation Practice Group of Holland & Knight's New York City office. Mr. Burns has a broad complex business litigation practice, with particular focus upon antitrust, product liability, and insurance actions. In recent years, he has represented U.S. and foreign clients across a wide range of industries, including aviation and transportation,

insurance and reinsurance, manufacturing and distribution, and finance and investment.

Benjamin R. Wilson is an associate in the Litigation Practice Group of Holland & Knight's New York office, where he is admitted to practice in both New York and New Jersey state and federal courts. Mr. Wilson concentrates on commercial litigation, including breach of contract, fraud and unfair competition claims. His experience encompasses all phases of the litigation process, from pre-trial discovery and motion practice through trial and appeal. Mr. Wilson also

References

Related documents

There is, however, a significant degree of variation both across LEAs and across ethnic groups: segregation is higher for pupils of Indian, Pakistani or Bangladeshi origin than

Despite relatively high rates of interest in one-to-one cessation support among baseline smokers (42%), only a small proportion (12%) of our cohort who smoked at all during

Enable providers of resources to build private or community IaaS clouds: The Nimbus Workspace Service provides an implementation of a compute cloud allowing users to

#6: IR and Budget Review - Review of your department profile and budget Criterion:

Proprietary Schools are referred to as those classified nonpublic, which sell or offer for sale mostly post- secondary instruction which leads to an occupation..

[r]

The corona radiata consists of one or more layers of follicular cells that surround the zona pellucida, the polar body, and the secondary oocyte.. The corona radiata is dispersed