• No results found

READY FOR THE MATRIX? MAN VERSUS MACHINE

N/A
N/A
Protected

Academic year: 2021

Share "READY FOR THE MATRIX? MAN VERSUS MACHINE"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

1  |  Page   www.cobralegalsolutions.com  

READY FOR

THE MATRIX?

MAN VERSUS MACHINE

by  Laura  Ewing  Pearle,  CEDS   Assistant  Director,  Client  Services   Cobra  Legal  Solutions  

 

In a 2014 order, Judge Denise Cote presented a Valentine’s Day present to predictive coding vendors by writing in her order: “predictive coding had a better track record in the production of responsive documents than human review”i.

She was quoting the Maura R. Grossman & Gordon V. Cormack article published in 2011, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, but she had signaled her beliefs much earlier in the case, during a telephone conference in August 2012: “I think there's

every reason to believe that, if it's [predictive coding] done correctly, it may be more reliable -- not just as reliable but more reliable than manual review, and certainly more cost

effective” ii. At the end of last year, Judge John Copenhaver even opened the door to

computer review for privilegeiii.

Is predictive coding making human review obsolete? Should we endorse the view of Agent Smith from The Matrix: “Never send a human to do a machine's job”?

With the confusing mingling of TAR, CAR, and predictive coding, perhaps a few definitions are in order. The idea of Technology Assisted Review is not new; technology can “assist” by searching for key words, for clustering documents based on similar concepts, grouping documents based on a percentage of near duplication, and more. Predictive Coding (and usually “Computer Assisted Review”) takes the concept one step further: the computers actually code documents, either based on an algorithm or semantic indexing or some other form of iterative learninga. Predictive coding is not,

however, one set process: even experts disagree on determination of seed sets (random, judgmental, mix), layering search terms, and the best/most accurate analytic and coding methodology. This article will not attempt to delve into the details of the processes; rather we will discuss the concept raised by Judge Cote, Maura Grossman, and others: Are machines always better at reviewing and coding documents? Are humans superior to computer decision-making in many current circumstances?

The general consensus is that predictive coding saves review time, and therefore money, by eliminating the need to review non-responsive documents – and in many cases this is true. The process starts when a subject  matter expert, or

a Even iterative learning can be further delineated into “continuous active learning” and “simple passive learning”

(2)

2  |  Page   www.cobralegalsolutions.com  

SME, codes a seed set of documents as either responsive or non-responsive. The SME is usually described as a member of outside counsel who has already interviewed multiple custodians and is intimately familiar with the issues of the caseb.

Because most predictive coding technology at this point is basically a binary decision treec, the SME is not coding for

issues or privilege at the same time; rather the SME is coding only for inclusion and exclusion based on responsiveness. Most tools also allow the SME to highlight sections of the relevant documents that will help the computer define

“responsiveness”. Generally, if you are aiming for a confidence level of 95%, with a confidence interval of 2.5, your sample set will be around 1500 documents, whether your total population of documents is

10,000,000, or 1,000,000 or 100,000iv.

Change one variable slightly – say, increase the confidence interval to 2 – and your seed set for one million documents jumps to 2400. With a judgmental set, a SME can “plant” documents known to be responsive into the seed set to ensure the correct documents are found. No matter how the sample size is determine, and no matter which type of seed set is chosen, the seed set documents need to be reviewed – and possibly producedd. Once the seed set is coded, the

computers apply their logic, and the population of documents is now divided into three sets: documents that the computers have coded responsive, documents coded non-responsive, and documents which the computer could not code based on available information – the “unknowns”. To check the quality of the computer’s work, and to help the computer learn so that it can code the unknown documents, the SME now reviews a new sample set. This iterative process can take as few as three generations or as many as forty-five. Obviously, if you are using outside counsel at $350/hour as the one SME to code ten iterations of samples, you may not be saving as much money, but think of the savings if a SME only reviews 6,000 documents and the computer eliminates the review of 400,000 non-responsive documents. If Grossman and Cormack are correct and the computer is more accurate as well, bonus. According to their report, predictive coding has a 67 – 86% accuracy rate versus 25 – 80% for human review.

But is that always true? Ignore for the moment whether the low rates for human review were based on accurate studies (and Ralph Losey has an excellent article about this topic). Have the analysts been examining any of the advantages of human review? A few points to consider:

b Note: A few predictive coding bloggers are starting to assert that a team of reviewers can code the seed set as accurately

as one SME.

c A notable exception is XERA’s Predictive Review which allows for simultaneous issue tagging.

d Several judges recommend or enforce producing all non-privileged documents from the seed set, even non-responsive

documents, in order to determine if the entire process will be tainted. A recent order by Judge Brown for disclosure and transparency can be found due to miscoding of non-responsive seed documents in Bridgestone Americas v IBM, 3:13-cv-01196 (M.D. TN), Order Filed February 5, 2015.

(3)

3  |  Page   www.cobralegalsolutions.com  

1. Predictive coding algorithms need text to analyze for content and to a more limited extent, context. Ergo, documents with limited text are either intentionally omitted from decision sets or fall into “unknown”. This includes a plethora of electronic documents used in the course of business: CAD drawings, Excel and financial

spreadsheets, and Visio diagrams are just a few.

2. Related to the above are image-based documents (jpg, png, bmp, gif, etc.) as well as documents containing images and limited text (PowerPoints, Word documents that use “Smart Art”, and more). Even emails can fall into this category given the ubiquitous use of photos and Google Images. Let’s say you have an employment case in which an employee’s antagonism towards her boss is a key issue. The SME codes the few documents with “My boss is evil” as responsive, and adds a few created documents to a judgmental set with words like “anger”. How will the computer handle the email below? Whether using semantic indexing or algorithms, machines cannot read these images or read sarcasm, missing the malicious intent:

(4)

4  |  Page   www.cobralegalsolutions.com  

3. Depending on your platform, metadata is not always included in the computer analysis of a document. How important is this? We’ve all worked cases in which certain emails from “Sally Fields” to “Tom Hanks” are considered responsive, even if they only say “How’s the weather?” If a predictive coding tool does not or cannot search/analyze metadata, these messages either wind up in the large “unknown” bucket or get tossed into Non-Responsive.

4. We h8 #SocialMedia; it’s a pain in the YKW. Social media, text messages, and instant messaging are the new sources of relevant data, and all are replete with misspelled words and odd acronyms so people can share posts that would otherwise be NSFW. BTW, if u dk this, ask yr kids. Issues in this arena are compounded by the fact that punctuation is rarely indexed, making # or #(%* impossible to read.

How does predictive coding work with the following?

 

5. Time and money savings are not immediate. In the bundled Federal Housing cases against the banks, the FHFA argued that they had concerns about meeting the deadlines because of the “testing and retesting” needed, and added, “again, the court in Da Silva Moore recognized that predictive coding may require extensions of the discovery period because it's impossible to predict when the program will be sufficiently trained”.v For cases over

one million documents, the time taken to train a tool can pay off down the road. Judge Peck more recently noted that “fear of spending more in motion practice than the savings from using TAR”vi can be a discouraging factor in

using this technology. Indeed, the Legal Intelligencer posited in January 2015 that “expense and time” could actually be barriers to predictive coding, stating: “Where no search terms are applied prior to predictive coding,

(5)

5  |  Page   www.cobralegalsolutions.com  

the volume of responsive documents identified by the predictive coding engine could approach or exceed the volume from a keyword narrowed universe.”vii Smaller cases can benefit from concept-clustering and bulk-coding

documents non-responsive based on concepts, domain names, or other facets – achieving the same results without the time and expense of training a tool.

6. Receiving reimbursement for technology costs under §1920 is much more difficult than receiving reimbursement for attorneys’ fees. (See Cobra’s white paper

http://www.cobralegalsolutions.com/pdf/Section_1920_Blues.pdf.)

In short, while predictive coding seems to be the future, many documents still need human review in 2015. As of now, Agent Smith’s assertion that computers are “the cure” seems premature.

Laura Ewing-Pearle, CEDS

An   eDiscovery   professional   for   almost   ten   years,   Laura   Ewing-­‐Pearle   currently   works   as   Assistant   Director   –   Client  Services  for  Cobra  Legal  Solutions  LLC.  A  Certified  E-­‐Discovery  Specialist,  Laura  provides  insight  and  clarity   to  clients  on  complex  technical  issues.  Laura  is  a  veteran  of  all  three  sides  of  the  eDiscovery  triangle:  law  firm,   corporate   client,   and   vendor.   She   worked   for   Nixon   Peabody,   a   Global   100   Firm,   and   Thelen   Reid   Brown   Raysman  &  Steiner,  where  she  led  eDiscovery  efforts  for  a  $200  million  insurance  case.  Upon  moving  to  Texas,   Laura  managed  eDiscovery  for  Dell  Inc.'s  litigation  team,  which  involved  more  than  2  TBs  of  data  in  the  span  of  

2.5   years.   At   Scarab   Consulting,   she   was   promoted   to   Director   of   Project   Management   before   leaving   to   start   her   own   consulting   business.  Laura  was  the  Director  of  the  Austin  Chapter  of  Women  in  eDiscovery  for  two  years  and  has  presented  CLEs  on  Technology  &   Ethics   in   both   Texas   and   Georgia,   as   well   as   seminars   on   “eDiscovery   101”   and   the   role   of   the   eDiscovery   paralegal.   She   studied   at   Trinity  University  and  graduated  magna  cum  laude  from  San  Francisco  State  University's  ABA  paralegal  studies  program.  

 

iFederal  Housing  Finance  Agency  v  HSBC  North  America  Holdings  Inc.,  et  al  2014  WL  584300,  February  14,  2014  

ii  Federal  Housing  Finance  Agency  v  JPMorgan  Chase  &  Co,  Inc.,  et  al,    11-­‐CV-­‐06188-­‐DLC,  Conference  Filed  August  6,  2012   iii  Good  v.  American  Water  Works  Co.,  Inc.,  2014  WL  5486827  (S.D.W.Va.)  October  29,  2014  

iv

 http://www.nss.gov.au/nss/home.nsf/pages/Sample+size+calculator  

v  Federal  Housing  Finance  Agency  v.  JPMorgan  Chase  &  Co.,  Inc.,  et  al.,  1:11-­‐cv-­‐06188-­‐DLC,  S.D.  N.Y.,  Telephone   conference  of  July  24,  2012  (filed  08/06/2012).  

vi

 Rio  Tinto  PLC  v  Vale  S.A.,  2015  WL  872294  (S.D.N.Y.)  March  2,  2015   vii

 David  R.  Cohen  and  Marcin  M.  Krieger,  “Seven  Barriers  to  the  Use  of  Predictive  Coding”,  The  Legal  Intelligencer,  January   27,  2015,  http://www.thelegalintelligencer.com/  

References

Related documents

Additionally, this line of theoretical thinking demonstrates how situational factors influence the effectiveness of crisis response strategies that consequently affect

In Section IIC, we argued that the book-to-market ratio in our model is related to expected returns because it is a proxy for firm productivity; that is, firms with a higher

... explained to Donoval that it copied all responsive documents it had in its possession. This Court cannot compel the Defendant to make available documents it does

on June 27 the City Council approved a $1.7 million multifamily Loan to support the construction of Woodlawn Center north Apartments, a 33-unit, 3-story walk-up building at 6127

Since Temporal Profile is an algorithm which generated image of the road by overlapping road region on the car dash camera video as time elapse, it will includes the unique

In accordance with the Basic Rules, the IECEE Secretariat shall edit and make available this

Those that i shortlisted are mostly positive, non-demanding (willing to accept jobs that i dictated..) with clean and good features, able to articulate in simple

(3) Setup model of I-S (Bac Ninh) is sole proprietorship. The main motivation is followed customer and pursuit of growth. ii) The purpose of difference areas is different: the