FOR MID-RANGE EDISCOVERY
DATA COLLECTION
Written by Kyle Sparks
White paper
EXECUTIVE SUMMARY
Data collection during ediscovery is critically important because a significant number of court sanctions are the result of inadequate or improper data collection. Here are two examples:
• In the case of Peerless Industries, Inc. v. Crimson AV, LLC, the plaintiff requested various documents that were held by Sycamore Manufacturing Co., Ltd., a sister company of Crimson. However, Crimson told the court that they had delegated the data collection process in this case to their vendor and they assumed that their vendor had instructed the Sycamore staff on how to collect documents, but neglected to verify that this was actually the case. Since Crimson was not able to collect the requested information and could not answer basic questions about their IT infrastructure and other key issues, the court sanctioned Crimson and ordered the company to “show that they in fact searched for the requested documents and, if those documents no longer exist or cannot be located, they must specifically verify what it is they cannot produce.”
• In Procaps S.A. v. Patheon Inc., two opinions were handed down by the court, largely in response to poor data collection practices and protocols in the case and inadequate support from the plaintiff’s counsel. The first opinion focused on the plaintiff’s decision not to implement a litigation hold, inadequate communication between counsel and data custodians in Colombia (where the plaintiff was based), and a failure to assist data custodians who were charged with the collection of Electronically Stored Information (ESI), the results of which were data searches that the court deemed “inadequate.” Sanctions in the case included payment of the attorneys awarded in the case, as well as the costs of hiring an independent third party to conduct a forensic examination of Procaps’ data stores.
Clearly, improper data collection can result in potentially significant sanctions.
KEY TAKEAWAYS
• Organizations should collect only the data they will need for ediscovery instead of gathering an enormous amount of data that will not address their specific ediscovery requirements.
• Personnel with the appropriate skill set should be charged with leading the data collection effort. The lead person should have a strong background in IT, but other disciplines can also provide value to the data collection effort.
• Understanding where data is located in an organization is an essential step in the data collection process in order to speed-up the collection process and to minimize the risk of not finding relevant data.
ABOUT THIS WHITE PAPER
This white paper discusses the importance of proper data collection in “mid-range” ediscovery cases – i.e., cases that generate approximately 100-500 gigabytes of
IMPORTANT ISSUES IN ESI
DATA COLLECTION
COURTS TEND TO APPROVE OF
“CASTING A WIDE NET”
There are two basic approaches to collecting data during an ediscovery exercise: • Collect everything that is available, including emails, files, text messages, social
media posts and anything that else that in some remote way might be relevant; and then use various tools and human review to cull the collected data.
• Use a more restrictive approach to gathering data, collecting only what will reasonably be deemed necessary to satisfy the requirements of the ediscovery order. Courts tend to approve of the “casting a wide net” approach to data collection because it provides the assurance that a party is collecting everything that might potentially be relevant. This approach is also often favored by attorneys because it saves them time by reducing the amount of effort required to gather data, and because gathering everything possible is often faster than taking a more selective approach.
THE REALITY IS THAT IT’S BETTER TO DO
THE OPPOSITE
However, best practice for data collection is still to collect a large amount of data, but to cull it so that only relevant data remains during ediscovery. For example, instead of making forensic copies of the contents of a large number of hard drives, it is more advantageous to produce only the relevant content from these hard drives through appropriate culling processes. While there are some limited situations in which courts seek production of very large quantities of data, this is not the norm. The primary advantage of collecting information in a more focused manner is that it saves substantially on attorney and paralegal costs and processing fees since there is less information to examine during document processing. Content hosting fees are also lower because less data is stored during the litigation process. Moreover, given the tighter timelines that will be imposed on ediscovery under the new FRCP amendments to go into effect in December 2015,
minimizing the amount of data collected may offer advantages when attempting to work within the more restrictive timeframes that will be imposed.
It is important to note that the majority of ediscovery cases do not generate enormous amounts of data. However, it is essential to keep in mind that: • Data must still be collected properly regardless of the amount of data that
must be collected. It is essential that the method of data collection chosen be forensically sound – i.e., that the ESI collected is not modified in any way and that a proper chain of custody can be established for the collected data.
• The focus in any data collection must be on preventing spoliation of data, including missing relevant data sources during the collection phase or somehow altering the data that is collected.
• The ultimate goal of any data collection effort is to minimize the risk that can be created by the collection process itself.
THERE IS WIDE VARIABILITY IN ORGANIZATIONS’
TECHNOLOGY PROFICIENCY
Most smaller organizations generally do not have the technology proficiency or specialized skill sets required to adequately address the various data collection issues involved in ediscovery. This not only tends to drive up the costs of data collection, but it also increases the risk of over- or under collecting data, spoliation of data, or data being rendered inadmissible.
BEST PRACTICES FOR DATA
COLLECTION
There are several best practices that organizations should consider when addressing mid-range data collections.
Assemble the right personnel for data collection
First and foremost, a team with the right knowledge and skill set is key to reducing risk in data collection. The point person should have a strong background in IT because some of the content that may need to be collected will be from sources that require more specialized collection skills, such as proprietary CRM systems, Microsoft SharePoint, or databases.
The importance of having an IT staff member as the data collection lead who is skilled in finding and collecting ESI cannot be under estimated. For example, in the case of Green v. Blitz USA, Inc. , the manager that was put in charge of the defendant’s data collection efforts described himself as “about as computer…illiterate as they get.” While there are risks inherent in self-collection, these risks can largely be mitigated if the leader of the collection effort is technically competent.
Ideally, if the resources and personnel are available, a team consisting of IT, legal and business staff members should be assembled to manage the data collection process. These skill sets will permit a more thorough understanding of what is being collected and the relevance of the collected data to ensure further mitigation of risk in the collection process. While many in the legal profession are opposed to organizations’ self-collection of data during ediscovery, having the right technical skills on a team of competent professionals within an organization can mitigate much of the risk during
Create a data map
The next step should be to create a data map that will help to inventory corporate data and identify the location and type of all data that may be subject to collection. The benefit of a data map is that it can guide data collectors and speed the data collection process. Moreover, it can also satisfy a court’s requirement that an organization make a good faith assessment of where all potentially relevant data is located.
In an ideal world, creating a data map would be a relatively simple exercise, but it won’t be in many organizations. Potentially relevant data can be found on corporate desktops, laptops, mobile phones and tablets; corporate email systems; SharePoint and other collaboration systems; employee-owned laptops, mobile phones and tablets; employee-managed file sync and share solutions like Dropbox; corporate file shares; USB drives; and corporate- and employee-managed cloud storage and backup systems. Data types can include email, files, text messages, social media posts, photographs, and a wide range of other data types.
There are two challenges inherent in creating a data map. First, when data is distributed across an organization and among many different platforms – only some of which are under IT’s control – data collected for ediscovery is a moving target and can be difficult to find. Second, some data may be difficult to locate at all. For example, a corporate business record created by an employee on his or her personally owned tablet and saved to a personal file sync and share tool may be “invisible” to those charged with collecting data for ediscovery. However, it is essential to collect data from all relevant sources, even those that are under the control of individual employees. For example, in the case of Small v. University Medical Center of Southern Nevada, the special master assigned to the case recommended a default judgment in favor of more than 600 plaintiffs because data from personally owned mobile devices, among other data sources, was not retained properly by the defendant.
Ensure that metadata is preserved
It is essential to collect data properly so that metadata is preserved throughout the data collection process. Metadata – which is data within files that provides information about these files, such as the author and last accessed date – is an essential element that must be retained intact and unmodified during the data collection process in order for information to be defensible. For example, simple drag-and-drop of data during the collection process can alter the metadata of the copied files, potentially rendering the data inadmissible for ediscovery. So important is metadata in the context of discoverable information that the Supreme Courts of Arizona and Washington State have determined that metadata must be retained as part of the information that an organization archives.
Focus on the “low-hanging fruit” first
Another best practice is to concentrate first on the “low-hanging fruit” – the repositories that contain the largest volumes of data that will be relevant during ediscovery. In most organizations, this will include corporate email systems (which in most organizations will be Microsoft Exchange on the backend and Outlook at the desktop or laptop) and employees’ personal directories on their hard drive. Email systems are typically the largest single repository of corporate business records in most organizations, largely because the typical information worker spends at least 150 minutes per day doing work in their email system. One best practice as part of the data collection process can include extracting necessary content into .pst files or equivalents for loading into review platforms, although other repositories must also be processed.
SUMMARY
Data collection is an essential element of the ediscovery process because of the important ramifications it can have on the admissibility of evidence and the mitigation of risk during litigation. Organizations involved in mid-range data collection efforts should take special care to follow appropriate best practices so that collected data is defensibly gathered, the costs of data collection are kept as low as possible, and risk is minimized.
ABOUT THE AUTHOR
Kyle Sparks CEDS Certified Speaker
Kyle’s 22 year career in the legal
discovery profession has traversed
firm and vendor leadership roles.
From paper discovery in big tobacco
litigation, to building a litigation
support department focused on e-discovery for an AM Law 200
firm, Kyle has obtained a comprehensive understanding of the
discipline. Serving as an IT and lit support manager has provided a
wide scope of industry software and legal knowledge. Today, as a
Senior Ediscovery Specialist and Subject Matter Expert for Thomson
Reuters, Kyle specializes in educating clients on all phases of the
EDRM model as well as rules of civil procedure.
For more information contact your Thomson Reuters representative at
1-800-937-8529 or visit www.thomsonreuters.com
http://www.redgravellp.com/sites/default/files/SanctionedForHands-OffApproachWithVendor-MatheaBulander.pdf http://www.ediscoverylaw.com/2014/05/court-orders-forensic-examination-for-inadequate-preservation-collection-confirms-basic-rule-that-custodians-must-be-consulted-for-input-on-search-terms/
http://e-discoveryteam.com/?s=collection Source: various Osterman Research, Inc. surveys