John Tredennick, Esq.
Founder and CEO, Catalyst Repository Systems
ARTICLE
An Open Look at Keyword Search vs. Predictive Analytics
Can Keyword Search Be As Effective as TAR?
Can Keyword Search Be As or More
Effective Than Technology Assisted
Review at Finding Relevant Documents?
A client recently asked me this question and it is one I frequently hear from lawyers. The issue underlying the question is whether a TAR platform such as our Insight Predict is worth the fee we charge for it.The question is a fair one and it can apply to a range of cases. The short answer, drawing on my 20-plus years of experience as a lawyer, is unequivocally, “It depends.”
Let me walk through my reasoning.
The Question
The question really is twofold: Can you get equal or better results with keyword search than with a product such as Predict? If so, can you do it at lower cost considering the hourly costs to develop and run the searches?
To answer these questions, we have to consider the following: 1. How many hours will it take to develop the
searches and test them, and at what cost? 2. How many documents do you end up reviewing
to complete the process (precision)? 3. What level of recall was obtained?
4. The goal in all of this is to reduce the total cost of review across your cases. If it were my case, I wouldn’t care how that got done, just that it got done reliably.
Measuring Success
To start the inquiry, we need a reliable measure for success. Several years ago, a different client challenged Predict because we ran a parallel process to keyword search that suggested the team review more documents than hit on the keyword searches. You can read that 2013 article at catalystsecure.com/blog, “My Key Word Searches are Better than Your Predictive Ranking Technology!”
The question is twofold:
can you get equal or
better results with
keyword search than
with a product such as
Predict? If so, can you
do it at lower cost?
While
keyword search
can be effective in finding
relevant documents,
research has shown that
it
can suffer from both
low recall and poor
precision.
To explain why Predict’s results made sense, I created this chart to show the four possible states of the search and Predict results.
As shown in the diagram, both approaches agreed on the documents in the top right and bottom left quadrants (specifically, on documents likely relevant and likely not relevant). The disagreement came in the top left and bottom right quadrants. In brief, Predict found a lot of documents missed by the keyword search and a good number of false hits from the keyword search.
Information retrieval scientists would couch this discussion in terms of precision and recall. The suggestion from the chart above is that keyword searches might have failed in terms of recall, at least if the goal was to reach a level of recall above 50%.
However, precision is equally important in calculating the total cost of review. If the method does well with recall but retrieves a lot of irrelevant documents in the process, you meet the recall goal but at the expense of increased review costs.
So Which Approach is Better?
Here is where the lawyer answer comes in: “It depends.”
While keyword search can be effective in finding relevant documents, research has shown that it can suffer from both low recall and poor precision. The oft-touted 1985 David Blair and M.W. Maron study,
“An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System,” sets a framework for the discussion. In the case they studied, lawyers felt their keyword searches were quite effective, finding 75% of the relevant documents.
In fact, according to the scientists, the team had found (on average) just 20% of the relevant documents.1 They were swayed by the seeming precision of the searches (seeing a lot of relevant documents) but didn’t realize they were missing a lot of other relevant documents which didn’t hit on the searches.
You can read more about the study in my November 2015 blog post, “Revisiting the Blair and Maron Study: Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System.”
The antidote to this problem is to use broader searches, but that comes at the cost of lower precision. That means while recall can be improved over the reported average of 20%, the gain would likely come at a reduction in precision. Specifically, the team would be required to look at a lot more irrelevant documents. Total review costs would go up accordingly.
Some litigation support teams address this by using a sophisticated process of iterating keywords over a series of searches, sampling results and then refining the keyword searches. That can certainly improve results, but how much and under what circumstances are questions that require measurement. And, in doing so, you would want to include the cost of developing, refining and running those searches as well.
In contrast, most predictive analytics tools (and certainly Insight Predict) are designed algorithmically to provide the best bang for the buck across most cases. The goal is to bring a desired level of recall (e.g. 75%, 80%, 90%, 95%) while reviewing the fewest possible documents.
While I can’t explain how our proprietary algorithm works, it uses a sophisticated weighing of document features found through a continuous ranking process. The main difference here is that Predict uses tens of thousands of features with both positive and negative
Predict uses tens of
thousands of features
with both positive and
negative weighting
during the process, and
it refines the training
literally hundreds—
often thousands—of
times during the review.
[1] It is also important to note that this 20% recall figure was an average across many different trials in the study. In some trials, keywords performed much better, but they performed much worse in others. This wide variance in search term performance is another reason to be cautious when comparing techniques from only a few trials.
Research suggests that
predictive analytics and
a continuous active
learning process
will
outperform keyword
search—both in terms
of effectiveness and
cost—in most cases and
is a much safer bet as a
standard practice.
weighting during the process, and it refines the training literally hundreds—often thousands—of times during the review. To be sure, humans don’t build the searches (nor could they). But humans still control the process in that they identify the documents from which the searches are built.
Comparing the Two Approaches
So, which approach is better? My first answer is that either
approach could be better in a specific case, depending on facts and circumstances. For example, say you were requested to produce all documents with the company name “Acme” in them. In that case, keyword search would be the quickest and most effective way to go. We recently had a chance to participate in the Total Recall track of the TREC information retrieval program, which is sponsored by the National Institute of Standards and Technology (NIST). The Total Recall track was led by Gordon Cormack and Maura Grossman, with help from many others. Its goal was to use machine learning to find relevant documents quickly, and with the least review possible. TREC has prohibitions on releasing results so I won’t go into detail at this point. However, I can say that a number of the topics lent themselves to keyword search as readily as machine learning. For example, one topic was “affirmative action,” which is not a typical e-discovery subject.2 We used Predict to find relevant topics but later learned through examination of the answers that a simple keyword search for “affirmative action or one world” would have returned more than 90% of the relevant documents with 63% precision. Several other topics out of the 30 total also could have been handled through the right combination of keywords.
How often is that the case in the real world of litigation and regulatory investigations? If your answer is often or always, then keyword search might be the answer for you. In my experience, and that of many others, keyword searches are useful but not effective or efficient at finding a high percentage of relevant documents with high precision.
[2] I should point out that the Total Recall track was not aimed at e-discoverists. Rather, several e-discovery focused teams participated because it was as close to e-discovery as the tracks got. It was fun to participate, but the topics were not set up to match what we are used to in the legal realm, nor did they purport to be.
An added benefit
of CAL is its ability
to incorporate new
documents into the
collection at virtually
any point in the review
without sacrificing
previous effort.
Ultimately, in each case you have to measure not only results, but thecost of getting those results. Our experience and the research out there suggests that predictive analytics, and particularly a continuous active learning process will outperform keyword search—both in terms of effectiveness and cost—in most cases and is a much safer bet as a standard practice.
Keyword vs. Predict
To be clear, we believe in the power of human-generated keyword searches. In fact, we encourage clients to use them for finding seed documents (as opposed to random selection used by some vendors) and for chasing down low-hanging fruit throughout the review process. For us, the question is not whether to use keyword searches. Rather, the question is how to use them most effectively during the review. Different cases will likely suggest different approaches.
If I were in a corporate counsel’s shoes, I would be asking how we might go about determining which cases might be better handled by keyword search rather than Predict. Independent research strongly suggests that keyword search can be less than optimal in many cases, missing large numbers of relevant documents (low recall) while bringing back too many non-relevant documents (low precision). Without doubt, smart searchers running an iterative process can improve results but how much and at what cost?
I am not aware of any research suggesting that keyword search can match the results of a good predictive analytics process across the board. In particular, it is rare to hear about keyword searches getting much above 75 or 80% recall. Perhaps others’ experiences are different but any conclusions should be backed by rigorous analysis (across a wide variety of cases) of the documents that did not return as well as those that did.
Given that you can rarely tell in advance which case is right for a keyword-only approach, it is hard to see why a legal team wouldn’t include predictive analytics as a core strategy. At the least, we need to measure the cost of Predict against the hourly cost of developing and running a keyword process. To that we would add the results differential (recall and precision) before making a comparison.
catalystsecure.com @catalystsecure
Learn More Follow Us
We recognize that TAR is a new process and that some clients may be leery about paying an extra charge for Predict without being certain of the results. For that reason, I initiated an unconditional guarantee for Predict costs. Simply put, if you use Predict and are unhappy with the results for any reason in the first 90 days, just say so. We will turn it off and refund the Predict fees without question. The bottom line is that we believe Predict with keyword search will be more effective than keyword search alone in almost every case. We believe that with such certainty that we are willing to put our proverbial money where our mouth is.
About the Author
John Tredennick is the founder and CEO of Catalyst. John is passionate about the
role of search in e-discovery. Before founding Catalyst in 2000, he was a trial lawyer and litigation partner. John has been a frequent speaker on legal and technology issues for more than 30 years. He’s also written and edited five best-selling books and countless articles on litigation and technology issues.