Redefining High Speed eDiscovery
Processing & Production
Conversion of the EDRM Enron Dataset from Natives to TIFF images in 5.3 hours (23 Million pages/day rate) using the Lexbe eDiscovery Processing System
August 6, 2014
Karsten Weber
eDiscovery Webinar Series
○
Takes Place Monthly○
Cover a Variety of Relevant eDiscovery Topics
○
Presentations Available for Download by Registrants.eDiscovery Webinar Series
Lexbe is an Austin, TX based eDiscovery software and services provider.
○
Lexbe eDiscovery PlatformLexbe eDiscovery Platform is a hosted eDiscovery processing and review tool. Users can load a variety of file types, process for review, OCR for search, and conduct document reviews, productions, prepare for depos & analyze transcripts, conduct case analytics, prepare for dispositive motions, and provide litigation support during trial.
○
Lexbe eDiscovery ServicesLexbe performs large volume document culling, processing from native to PDF or TIFF, load file creation, high-volume OCR of image files, Rule 26 and project management consulting, and related eDiscovery Services.
About Lexbe
Lexbe Sales
[email protected] (800) 401-7809 x22
If you have any questions or technical issues, please e-mail them to:
[email protected]
Questions will be forwarded to Gene and answered during the webinar or
via e-mail if we run out of time.
eDiscovery Webinar Series
eDiscovery Webinar Series
Karsten Weber bio
○ Current
- Principal of Lexbe LC
- Principal Architect of Lexbe eDiscovery Platform and Lexbe eDiscovery Services
○ Prior Experience
- Consulting Expert, Lumin Expert Group - Director of Software, nLine Corporation
- Software Engineering Manager, KLA-Tencor
○ Education
- MBA, University of Texas
- M.S. Engineering, Danish Technical University
Contact
Karsten Weber 512-686-3469
○ Background of eDiscovery Processing & Production
○ eDiscovery Review Tools in Use Today
○ TIFF Popularity and Processing Throughput Challenge
○ The Lexbe eDiscovery Processing System
○ Test Methodology & the EDRM Enron Data Set
○ Performance Results
○ Comparison with a Large Provider Using Traditional Processing Methods
○ Conclusion
Executive Summary
Data Types and Volume Keep Expanding
Growth of Data Worldwide
Voip Email iPhones Peer-to-Peer Online Storage Digital Cameras Facebook | LinkedIn DropBox | Backup Devices Elastic Storage | SaaS | Google Streets Personal Blogs | Skype | World Satellite Images Personal Scanners | Customer Service Recordings Public Webcams | Google Goggles | Netbooks | Cloud Instance Servers
| PaaS
Digital Information Created, Captured, Replicated Worldwide
Zettabytes* 4 3 2 1 2005 2010 2015
Source: IDC Digital Universe Study (2012) * 1 Zettabyte = 1 Trillion Gigabytes
Growth of eDiscovery Processing
Data Volume is Rising
GBs of ESI in a Typical Commercial Case
Low High
1995 2000 2005 2010 2015
Enron Criminal Trial (2005)
○ Source ESI: 100M pages (~4 TBs)
○ Brought to Trial: 1M pages (~40 GBs)
○ Extraordinary at time
○ Not now
Microsoft (2011)
○ Microsoft collects 45 custodians per matter average (2011)
○ Almost 1 TB per matter, average
Growth of eDiscovery Processing
Processing Costs Are Falling - But Still High
Cost per GB to Process ESI in Volume2005 2010 2015
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
$2,000 $1,500 $1,000 $500 $0 $1,800/GB (2006)
Source: Forrester Research
$500/GB (2011)
Source: Forrester Research
ESI Processing costs have fallen 90% in the last 10 years
eDiscovery Market is Big & Growing
Source: Complex Discovery (ComplexDiscovery.com) Based on a combination of public market sizing estimates.
eDiscovery
Software & Services
○ $5.5 Billion today
○ Growing 15.5% annually ○ Projected $9.8 Billion (2017) ○ Services (72%)
○ Software (28%)
eDiscovery Processing Background
Processing Activities & Functions
Collection
○ Identify and execute retrieval of discoverable documents and electronic
evidence.
Culling
○ Reduces collections using keyword or date range parameters
Native Processing
○ Convert Native Documents (Outlook, Microsoft Office, etc.) into reviewable
formats (TIFF, PDF, Near Native)
○ Can include application of OCR to make documents searchable
Review
○ Load/ingest ESI into Litigation Database to prepare for trial
Production
○ Create a production in a specified format and apply Bates Numbers
○ Apply Privilege QC procedures to avoid inadvertently producing confidential
case documents. Processing Graphic Setup & Planning Collection Culling & Analysis Review & Production Depos & Motions Processing
eDiscovery Processing Background
Review Environments and TIFF
Type Example Description
TIFF Concordance,
Summation, CaseLogistix, RingTail, iConnect
○ Currently the most commonly used
format/review environment
○ Must process ESI to single page TIFFs
with text and load files before review
PDF WorldDox, Adobe ○ Requires Documents to be converted to
PDF for review
Processed Natives
Relativity, Allegro ○ Must process ESI into a ‘native load file’
○ Generate ‘near native’ HTML for review
Raw Natives
Lexbe, Digital Warroom, NextPoint
○ Load raw natives that will be
automatically processed within the review software
eDiscovery Processing Background
TIFF Background
○ 2013 ILTA (International Legal Technology
Association) survey found that the vast
majority (91%) of firms still use TIFF-Based software.
TIFF Benefits:
○ Standardized Review Format
○ Page level Bates Stamping can be applied
○ Addresses concern of opposition altering
native files
○ Easy to redact
○ TIFF viewer is only requirement
○ Often can be hosted & supported internally
eDiscovery Processing Background
The TIFFing Challenge
○ Traditional TIFFing methods
have been time consuming and expensive due to the
process’ need for considerable computing power
○ As data volumes continue to increase in size, the time
and expense issues associated with TIFFing become more severe
● Use industry standard dataset to ensure
transparent result. Study was run on the 53 GB EDRM Enron Data Set.
● What is the TIFF throughput rate of LEPS?
● How automated is LEPS?
● What quality control procedures are in place?
● How does LEPS compare to current industry
leaders?
Meeting the Challenge - Study Goals
High-Speed eDiscovery Processing
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
Evaluate the Capabilities of Lexbe eDiscovery Processing System (LEPS) under testable and repeatable conditions
High-Speed Processing Demonstration
Lexbe Architecture
Scalable
Systems architecture allows LEPS to increase server instances to apply more resources to your processing task
Automated
LEPS minimizes the need for ‘babysitting’.
Fault Tolerant
Processing tasks are not ‘batch-centric’ and check-out/check-in procedures insure individual processing steps operate independently
Secure Processing Environment
LEPS is powered by Amazon S3 servers to facilitate redundancy and the high security standards. All data is strong encrypted (256-bit) transit and in-place. Our data centers provide SOC I and II reports published under SSAE 16 and ISAE 3402 professional standards and are ISO 27001 certified.
High-Speed Processing Demonstration
Lexbe Process
○ Archive/Container Decompression
○ File Repair
○ Metadata extraction & fielding
○ MD5 hash code generation
○ System file identification & DeNIST ○ Email attachment extraction &
parent email association
○ Native text extraction
○ OCR of image files
○ Full-text indexing
○ Bates stamping
○ PDF & TIFF creation
○ Placeholder creation
○ Native extracted, PDF and TIFF
loadfile generation in multiple formats: XLSX (Lexbe), DAT/OPT (Case Logistix, Concordance, iPro Allegro, Ringtail, Kura Relativity) and DII (Summation), and quality control reports
High-Speed Processing Demonstration
○ High quality output is critical, especially when making a claim of increased efficiency.
High-Speed Processing Demonstration
Sample Output
High-Speed Processing Demonstration
Lexbe Quality Control Tools and Features
○ Programmatic batching of processing to individual servers (reduces
human error)
○ Custom QC flag creation and filtering
○ Integration with Excel for reporting and analysis
○ Pivot table analysis and charting
○ Ability to view all documents including parent containers (email and
attachments) together
○ Ability to verify image quality
○ Filtering and reporting by any captured or calculated fields including
failed to convert, words in document, placeholders, etc.
○ Native files are extracted and provided for linked load and review
○ Statistical sampling and reporting
High quality output is critical, especially when making a claim of increased efficiency.
eDiscovery Processing Background
Providers of TIFF Processing
Type Example Description
Service Providers Xerox, Lexbe etc ● Business Service bureaus that
deliver a wide range of processing service.
● Local server setup and capacity
Professionals Internal Litigation
Support
● Department inside of law firms
responsible for conducting litigation support processing functions.
● Often work with service and
software providers to meet internal demands.
Software Providers Ipro, Law ● Develop processing software that is
licensed for resale by service
providers or use in internal litigation support departments
Lexbe v. Xerox
Compare Lexbe to Industry Leaders
○ Xerox is known for its
high-volume litigation processing and production capacity. ○ Xerox states in its service
literature that its production capacity is 5 million pages a day.
○ TIFF is important and turn around time is critical
○ Traditional approaches:
■ Fixed capacity leading to variable turn-around time.
○ Lexbe approach:
■ Scalable capacity leading to fixed turn around time.
○ Lexbe study demonstrates what we believe is the worlds fastest TIFF processing thereby allowing you to meet even the toughest discovery deadlines.
Summary
High-Speed eDiscovery Processing
Related Lexbe Services
High-Speed eDiscovery Processing
○ ESI Culling+ Reduce ESI stores to manageable sizes with DeNIST,
deduplication, date culling and keyword culling. Metadata extractions and PST reconstitution is available as well.
○ ESI Email Collection+ Flatten and extract native file attachments and
metadata to create loadfiles in preparation for native or near native review.
○ Native Processing+ Convert native documents, including Outlook
Email and Microsoft Office files, into TIFF or PDF format for
searchability, bates stamping, and preparation for online review. ○ eDiscovery OCR+ Apply optical character recognition to increase
searchability of PDFs, TIFFs, or document-formatted JPGs or PNGs. ○ NearDup Groupings+ Identify key documents, group similar
documents, ensure consistency in privilege coding, and enable email threading.
Thank You
Contact Info
Karsten Weber: [email protected]
Principal: (800) 401-7809
Stu Van Dusen [email protected]
Marketing Manager: (512) 669-9485
Webinar Questions: [email protected]