• No results found

Redefining High Speed ediscovery Processing & Production

N/A
N/A
Protected

Academic year: 2021

Share "Redefining High Speed ediscovery Processing & Production"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Redefining High Speed eDiscovery

Processing & Production

Conversion of the EDRM Enron Dataset from Natives to TIFF images in 5.3 hours (23 Million pages/day rate) using the Lexbe eDiscovery Processing System

August 6, 2014

Karsten Weber

(2)

eDiscovery Webinar Series

Takes Place Monthly

Cover a Variety of Relevant eDiscovery Topics

Presentations Available for Download by Registrants.

(3)

eDiscovery Webinar Series

Lexbe is an Austin, TX based eDiscovery software and services provider.

Lexbe eDiscovery Platform

Lexbe eDiscovery Platform is a hosted eDiscovery processing and review tool. Users can load a variety of file types, process for review, OCR for search, and conduct document reviews, productions, prepare for depos & analyze transcripts, conduct case analytics, prepare for dispositive motions, and provide litigation support during trial.

Lexbe eDiscovery Services

Lexbe performs large volume document culling, processing from native to PDF or TIFF, load file creation, high-volume OCR of image files, Rule 26 and project management consulting, and related eDiscovery Services.

About Lexbe

Lexbe Sales

[email protected] (800) 401-7809 x22

(4)

If you have any questions or technical issues, please e-mail them to:

[email protected]

Questions will be forwarded to Gene and answered during the webinar or

via e-mail if we run out of time.

eDiscovery Webinar Series

(5)

eDiscovery Webinar Series

Karsten Weber bio

Current

- Principal of Lexbe LC

- Principal Architect of Lexbe eDiscovery Platform and Lexbe eDiscovery Services

Prior Experience

- Consulting Expert, Lumin Expert Group - Director of Software, nLine Corporation

- Software Engineering Manager, KLA-Tencor

Education

- MBA, University of Texas

- M.S. Engineering, Danish Technical University

Contact

Karsten Weber 512-686-3469

[email protected]

(6)

Background of eDiscovery Processing & Production

eDiscovery Review Tools in Use Today

TIFF Popularity and Processing Throughput Challenge

The Lexbe eDiscovery Processing System

Test Methodology & the EDRM Enron Data Set

Performance Results

Comparison with a Large Provider Using Traditional Processing Methods

Conclusion

Executive Summary

(7)

Data Types and Volume Keep Expanding

Growth of Data Worldwide

Voip Email iPhones Peer-to-Peer Online Storage Digital Cameras Facebook | LinkedIn DropBox | Backup Devices Elastic Storage | SaaS | Google Streets Personal Blogs | Skype | World Satellite Images Personal Scanners | Customer Service Recordings Public Webcams | Google Goggles | Netbooks | Cloud Instance Servers

| PaaS

Digital Information Created, Captured, Replicated Worldwide

Zettabytes* 4 3 2 1 2005 2010 2015

Source: IDC Digital Universe Study (2012) * 1 Zettabyte = 1 Trillion Gigabytes

(8)

Growth of eDiscovery Processing

Data Volume is Rising

GBs of ESI in a Typical Commercial Case

Low High

1995 2000 2005 2010 2015

Enron Criminal Trial (2005)

○ Source ESI: 100M pages (~4 TBs)

○ Brought to Trial: 1M pages (~40 GBs)

○ Extraordinary at time

○ Not now

Microsoft (2011)

○ Microsoft collects 45 custodians per matter average (2011)

○ Almost 1 TB per matter, average

(9)

Growth of eDiscovery Processing

Processing Costs Are Falling - But Still High

Cost per GB to Process ESI in Volume

2005 2010 2015

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

$2,000 $1,500 $1,000 $500 $0 $1,800/GB (2006)

Source: Forrester Research

$500/GB (2011)

Source: Forrester Research

ESI Processing costs have fallen 90% in the last 10 years

(10)

eDiscovery Market is Big & Growing

Source: Complex Discovery (ComplexDiscovery.com) Based on a combination of public market sizing estimates.

eDiscovery

Software & Services

○ $5.5 Billion today

○ Growing 15.5% annually ○ Projected $9.8 Billion (2017) ○ Services (72%)

○ Software (28%)

(11)

eDiscovery Processing Background

Processing Activities & Functions

Collection

○ Identify and execute retrieval of discoverable documents and electronic

evidence.

Culling

○ Reduces collections using keyword or date range parameters

Native Processing

○ Convert Native Documents (Outlook, Microsoft Office, etc.) into reviewable

formats (TIFF, PDF, Near Native)

○ Can include application of OCR to make documents searchable

Review

○ Load/ingest ESI into Litigation Database to prepare for trial

Production

○ Create a production in a specified format and apply Bates Numbers

○ Apply Privilege QC procedures to avoid inadvertently producing confidential

case documents. Processing Graphic Setup & Planning Collection Culling & Analysis Review & Production Depos & Motions Processing

(12)

eDiscovery Processing Background

Review Environments and TIFF

Type Example Description

TIFF Concordance,

Summation, CaseLogistix, RingTail, iConnect

○ Currently the most commonly used

format/review environment

○ Must process ESI to single page TIFFs

with text and load files before review

PDF WorldDox, Adobe ○ Requires Documents to be converted to

PDF for review

Processed Natives

Relativity, Allegro ○ Must process ESI into a ‘native load file’

○ Generate ‘near native’ HTML for review

Raw Natives

Lexbe, Digital Warroom, NextPoint

○ Load raw natives that will be

automatically processed within the review software

(13)

eDiscovery Processing Background

TIFF Background

○ 2013 ILTA (International Legal Technology

Association) survey found that the vast

majority (91%) of firms still use TIFF-Based software.

TIFF Benefits:

○ Standardized Review Format

○ Page level Bates Stamping can be applied

○ Addresses concern of opposition altering

native files

○ Easy to redact

○ TIFF viewer is only requirement

○ Often can be hosted & supported internally

(14)

eDiscovery Processing Background

The TIFFing Challenge

○ Traditional TIFFing methods

have been time consuming and expensive due to the

process’ need for considerable computing power

○ As data volumes continue to increase in size, the time

and expense issues associated with TIFFing become more severe

(15)

● Use industry standard dataset to ensure

transparent result. Study was run on the 53 GB EDRM Enron Data Set.

● What is the TIFF throughput rate of LEPS?

● How automated is LEPS?

● What quality control procedures are in place?

● How does LEPS compare to current industry

leaders?

Meeting the Challenge - Study Goals

High-Speed eDiscovery Processing

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

Evaluate the Capabilities of Lexbe eDiscovery Processing System (LEPS) under testable and repeatable conditions

(16)

High-Speed Processing Demonstration

Lexbe Architecture

Scalable

Systems architecture allows LEPS to increase server instances to apply more resources to your processing task

Automated

LEPS minimizes the need for ‘babysitting’.

Fault Tolerant

Processing tasks are not ‘batch-centric’ and check-out/check-in procedures insure individual processing steps operate independently

Secure Processing Environment

LEPS is powered by Amazon S3 servers to facilitate redundancy and the high security standards. All data is strong encrypted (256-bit) transit and in-place. Our data centers provide SOC I and II reports published under SSAE 16 and ISAE 3402 professional standards and are ISO 27001 certified.

(17)

High-Speed Processing Demonstration

Lexbe Process

○ Archive/Container Decompression

○ File Repair

○ Metadata extraction & fielding

○ MD5 hash code generation

○ System file identification & DeNIST ○ Email attachment extraction &

parent email association

○ Native text extraction

○ OCR of image files

○ Full-text indexing

○ Bates stamping

○ PDF & TIFF creation

○ Placeholder creation

○ Native extracted, PDF and TIFF

loadfile generation in multiple formats: XLSX (Lexbe), DAT/OPT (Case Logistix, Concordance, iPro Allegro, Ringtail, Kura Relativity) and DII (Summation), and quality control reports

(18)

High-Speed Processing Demonstration

(19)

○ High quality output is critical, especially when making a claim of increased efficiency.

High-Speed Processing Demonstration

Sample Output

(20)

High-Speed Processing Demonstration

Lexbe Quality Control Tools and Features

○ Programmatic batching of processing to individual servers (reduces

human error)

○ Custom QC flag creation and filtering

○ Integration with Excel for reporting and analysis

○ Pivot table analysis and charting

○ Ability to view all documents including parent containers (email and

attachments) together

○ Ability to verify image quality

○ Filtering and reporting by any captured or calculated fields including

failed to convert, words in document, placeholders, etc.

○ Native files are extracted and provided for linked load and review

○ Statistical sampling and reporting

High quality output is critical, especially when making a claim of increased efficiency.

(21)

eDiscovery Processing Background

Providers of TIFF Processing

Type Example Description

Service Providers Xerox, Lexbe etc ● Business Service bureaus that

deliver a wide range of processing service.

● Local server setup and capacity

Professionals Internal Litigation

Support

● Department inside of law firms

responsible for conducting litigation support processing functions.

● Often work with service and

software providers to meet internal demands.

Software Providers Ipro, Law ● Develop processing software that is

licensed for resale by service

providers or use in internal litigation support departments

(22)

Lexbe v. Xerox

Compare Lexbe to Industry Leaders

○ Xerox is known for its

high-volume litigation processing and production capacity. ○ Xerox states in its service

literature that its production capacity is 5 million pages a day.

(23)

TIFF is important and turn around time is critical

Traditional approaches:

Fixed capacity leading to variable turn-around time.

Lexbe approach:

Scalable capacity leading to fixed turn around time.

Lexbe study demonstrates what we believe is the worlds fastest TIFF processing thereby allowing you to meet even the toughest discovery deadlines.

Summary

High-Speed eDiscovery Processing

(24)

Related Lexbe Services

High-Speed eDiscovery Processing

ESI Culling+ Reduce ESI stores to manageable sizes with DeNIST,

deduplication, date culling and keyword culling. Metadata extractions and PST reconstitution is available as well.

ESI Email Collection+ Flatten and extract native file attachments and

metadata to create loadfiles in preparation for native or near native review.

Native Processing+ Convert native documents, including Outlook

Email and Microsoft Office files, into TIFF or PDF format for

searchability, bates stamping, and preparation for online review. ○ eDiscovery OCR+ Apply optical character recognition to increase

searchability of PDFs, TIFFs, or document-formatted JPGs or PNGs. ○ NearDup Groupings+ Identify key documents, group similar

documents, ensure consistency in privilege coding, and enable email threading.

(25)

Thank You

Contact Info

Karsten Weber: [email protected]

Principal: (800) 401-7809

Stu Van Dusen [email protected]

Marketing Manager: (512) 669-9485

Webinar Questions: [email protected]

References

Related documents

Using IF function: formulas for numbers, text, dates, blank cells How to use Excel SUMIFS and SUMIF with multiple criteria Change background color based on cell value.. INDEX

Although thousands of LAX workers lack adequate access to health care, many living wage covered employers do provide coverage, including free full family benefits.. An estimated

The evolution of eDiscovery processing—or the handling of electronically stored information (ESI) from identification and preservation, all the way through processing

S earch, Identify, Collect/preserve and Process EnCase eDiscovery Traditional eDiscovery 7 weeks for data identification 5 weekends to collect & preserve Up

We set out to compare variant call sets (Supplementary Note, Section 7; Supplementary Data Set 1): (1) the Platinum-100 variant call set described above, based on 100-base paired-

In this research, we illustrate the complexity involved in interpreting the racial iden- tifications of East Asian/white multiracial young people, when they have chosen white as

Masinisa, quien se apodera de su capital, Cirta, y se encuentra por primera vez con la bellísima Sofonisba, la cual le implora que no la entregue como esclava a los romanos y si no

The Segal Company serves as employee benefits, actuarial, compensation and human resources consultants to the full range of public sector clients: state and local