• No results found

Bringing Science to Digital Forensics with Standardized Forensic Corpora.

N/A
N/A
Protected

Academic year: 2021

Share "Bringing Science to Digital Forensics with Standardized Forensic Corpora."

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Bringing Science to Digital Forensics with

Standardized Forensic Corpora.

Digital Evaluation and Exploitation (DEEP) Group

http://domex.nps.edu/

February 2010

(2)

NPS is the Navy

ʼ

s Research University.

Location:

"

Monterey, CA

Campus Size:

"

627 acres

Students: 1500

§ US Military (All 5 services)

§ US Civilian (Scholarship for Service & SMART)

§ Foreign Military (30 countries)

§ All students are fully funded

Schools:

§ Business & Public Policy

§ Engineering & Applied Sciences

§ Operational & Information Sciences

(3)

Digital Forensics is at a turning point.

Yesterday

ʼ

s work was primarily

reverse engineering

.

Key technical challenges:

§ Evidence preservation.

§ File recovery (file system support); Undeleting files

§ Encryption cracking.

§ Keyword search.

(4)

Digital Forensics is at a turning point.

Today

ʼ

s work is increasingly

scientific.

Evidence Reconstruction

§ Files (fragment recovery carving)

§ Timelines (visualization)

Clustering and data mining

Social network analysis

Sense-making

Drives #74 x #77 25 CCNS in common Drives #171 & #172 13 CCNS in common Drives #179 & #206 13 CCNS in common Same Community College Same Medical Center Same Car Dealership

(5)

Science requires the

scientific process.

Hallmarks of Science:

§ Controlled and repeatable experiments.

§ No privileged observers.

§ Publication of data and results.

§ Sharing of scientific materials.

Today's Digital Forensics is not Scientific!

§ Researchers work on their own data

Data can't be shared with other researchers (privacy)

Data can't be published (copyright)

§ Results can't be meaningfully compared.

(6)

Our solution:

Standardized Corpora for Digital Forensics Research.

"Standardized"

§ Known contents

§ Documented provenance

"Corpora"

§ Many data sets

§ Realistic — lifelike, but no Personally Identifiable Information (PII)

§ Real — Public and Private

"Digital Forensics Research"

§ Created to enable research

§ Legally obtained (c.f. wiretap law)

§ Publishable results

(7)

UNCLASSIFIED UNCLASSIFIED

Test Data

§ Constructed for the purpose of testing a specific feature.

§ CFReDS “Russian Tea Room floppy disk image” to validate Unicode search & display.

Sampled Data

§ A subset of a large data source — e.g., sampled web pages or packets.

§ Hard to randomly sample.

Realistic Data

§ Not “real” — made in a lab, not in the field.

Real and Restricted Data

§ Created by actual human beings during activities that were not performed for the purpose of creating forensic data.

§ Controlled for privacy reasons.

Real but Unrestricted

§ Released for some reason. e.g. the Enron Email Dataset

§ Photos on Flickr; User profiles on Facebook.

7

(8)

1 million(*) documents from US Government web servers

§ Specifically for file identification, data & metadata extraction.

§ Found by random word searches on Google & Yahoo

§ DOC, DOCX, HTML, ASCII, SWF, etc.

Free to use; Free to redistribute

§ No copyright issues — US Government work is not copyrightable.

§ Other files have simply been moved from one USG webserver to another.

§ No PII issues — These files were already released.

Distribution format: ZIP files

§ 1000 ZIP files with 1000 files each.

§ 10 “threads” of 1000 randomly chosen files for student projects.

§ Full provenance for every file (how found; when downloaded; SHA1; etc.)

______________________

(*Approximately 3000 files redacted after release.)

http://domex.nps.edu/corp/files/govdocs1:

1 Million files available

now

(9)

Test Images — Designed to demonstrate a particular aspect

§ nps-2009-hfstest1" (HFS+)

§ nps-2009-ntfs1 " (NTFS)

Realistic Images — Like real life, but no personally identifiable info.

§ nps-2009-canon2" (FAT32)

§ nps-2009-UBNIST1" (FAT32)

§ nps-2009-casper-rw " (embedded EXT3)

§ nps-2009-domexusers" (NTFS)

Each image has:

§ Narrative of how the image was created and expected uses.

§ Image file in RAW/SPLITRAW, AFF and E01 formats

§ SHA1 of raw image

§ “Ground truth” report

9

http://domex.nps.edu/corp/images/nps/

"Test" and "Realistic" disk images

(10)

Typical scenarios include:

§ Distribution of simulated pornography ("kitty porn.")

§ Theft of corporate data.

Nitroba University:

§ University harassment case

m57 theft

§ Theft of corporate data

m57 patents

§ 3 week simulation of a small business

§ Four computers

§ Daily disk and memory images

§ Complete Network Packet Capture

http://domex.nps.edu/corp/scenarios/

Complete Scenarios

(11)

The Real Data Corpus:

"Real Data from Real People."

Most forensic work is based on “realistic” data created in a lab.

We get real data from CN, IN, IL, MX, and other countries.

Real data provides:

§ Real-world experience with data management problems.

§ Unpredictable OS, software, & content

§ Unanticipated faults

We have multiple corpora:

§ Non-US Persons Corpus

§ US Persons Corpus (@Harvard)

§ Releasable Real Corpus

§ Realistic Corpus

IRB approval required for federally funded research.

(12)

UNCLASSIFIED

Real Data Corpus: Current Status

Country

HDs

Flash

Optical

GB (uncomp)

BA

7

38

CA

73

1

1,064

CE

1

82

CH

2

5

CN

143

568

98

3,627

DE

36

1

755

GR

13

27

IL

229

4

2,226

IN

317

66

19,540

MX

175

1,110

NZ

1

4

PS

98

957

TH

1

13

UA

22

55

1,118

643

98

30,008

(13)

UNCLASSIFIED UNCLASSIFIED

RDC has been provided to a range of researchers.

Received and satisfied data sharing request for Real Data:

§ CMU Software Engineering Institute.

§ AccessData

§ I.D.E.A.L. Technology

Pending Agreements:

§ University of Texas San Antonio

§ University of California, Santa Cruz

§ Georgetown University

Data sharing for use in training:

§ West Point

§ DC3/DCCI

§ CMU Computer Science Department

(14)

Conclusion:

Digital forensics needs digital corpora!

National Research Council 2009 Report found a lack of

“science” in forensics...

§ “Substantive information and testimony based on faulty forensic science analysis may have contributed to wrongful convictions of innocent people...

§ “Moreover, imprecise or exaggerated expert testimony has sometimes contributed to the admission of erroneous or misleading evidence.”

National Research Council, 2009

Contact Information:

§ http://domex.nps.edu/deep

§ Joshua B. Gross <[email protected]>

§ Simson L. Garfinkel <[email protected]>

Questions?

14

PREPUBLICATION COPY

STRENGTHENING FORENSIC SCIENCE IN THE UNITED STATES:

A PATH FORWARD

Committee on Identifying the Needs of the Forensic Science Community Committee on Science, Technology, and Law

Policy and Global Affairs

Committee on Applied and Theoretical Statistics Division on Engineering and Physical Sciences

THE NATIONAL ACADEMIES PRESS !"#$%&'(&)*+#,-.#/0%1('$#/0%"-$%*((0%&'/1#2(2%./%."(%&)*+#,%

./%3-,#+#.-.(%.#4(+5%-,,($$%./%."(%,/44#..((6$%'(&/'.7% 8+."/)9"%."(%$)*$.-0,(%/3%."(%'(&/'.%#$%3#0-+:%(2#./'#-+%

,"-09($%;#++%*(%4-2(%."'/)9"/).%."(%.(<.:%-02%."(% ,#.-.#/0$%;#++%*(%,"(,=(2%*(3/'(%&)*+#,-.#/07%

References

Related documents

With this in mind, we suggest that all water banks should aim to be self-financed as promoted by the Water Framework Directive (WFD), where full- cost recovery is a key

The concept of representation in the West: it means ‘to act on behalf of some other(s)’ and has developed from the symbolic to the substantive and from standing for

The purpose of this qualitative descriptive study was to explore elementary general education teachers’ perceptions regarding the dyslexia professional development training they

Wearable antenna with garment integ essential nowadays in many applications suc field, military development, mineworker environment monitoring. These antennas possess good

In return of perceived employee development relates to the positive and favorable outcomes to the employees associated with Job satisfaction, Positive mode and

This paper analyses the Indigenous ecological knowledge and western science underpinning the northern long-necked turtle and fledgling tarantula spider industries that have

These include senior information officers (SIOs), who have overall responsibility and accountability for the records and information under their control;

Seeing an occupational therapist, not working, having a progressive type of MS, having more activity limitations and more symptoms, and having MS for a longer period were found