Tom Ruoff
How DHS is Doing Cybersecurity with
Content Filtering
Department of Homeland Security
National Protection and Programs Directorate
DHS & Content Filtering
Bottom Line Up Front
Q1. Why is DHS is working on this?
A1. Because current signature and detonation approaches are not sufficient to allow control of cyber attacks.
Q2. What is better?
A2. Content Filtering. Test results indicate eMIST 3.0.3 is capable of blocking zero day malware at about a 99.5% rate.
Q2. What does DHS want to accomplish?
A3. Strategically – improve cybersecurity. Tactically - stimulate both sides of the supply-demand equation to significantly enable and enhance cybersecurity posture for Federal Executive Branch Departments and Agencies as well as critical infrastructure owners and operators Information Technology systems through use of commercially available technology acquired at market driven cost.
DHS & Content Filtering
What You Get Out of This Talk – Agenda
1.
Technical understanding of what content filtering is
2.
How well it work in neutering malware – test results
3.
What DHS is doing with this cool stuff to protect itself
4.
What are our next steps
5.
What can you do with this knowledge
DHS & Content Filtering
WHAT IS CONTENT FILTERING?
What is Content Filtering?
A filtering technology based on a robust understanding of the syntactic structure
and semantic meaning of the file type or protocol being filtered to pass
known/validated good content
Uses a bit/byte level understanding of the file – compare to RFC
Decomposed objects into base elements of file type/object protocol specification and then re-assembles a “clean” version that excludes non-essential components
Requires access to the file type/protocol specification (RFC) and/or extensive reverse engineering
Specs frequently don’t match reality so sometimes the decomposition process fails since the object
does not de-compose per the specification; a Word doc is sometimes not a Word document per the Word RFC….or a Word document masquerades as a PowerPoint
Not signature based
World of Malware – Where Content Filter Fits In
Two types of Malware attacks (1 of 2)
1.
Syntactic
–
The attacker sends incorrect, malformed, or unexpected data to the system in order to execute an exploit. Within syntactic based attacks there are two main variants:a.
Non-compliance with Specification
–
In this attack, the data does not comply with the file format/protocol specification and the software processing that data does not properly handle it leading to a program crash and possible exploit.b.
Compliance with Specification
–
In this attack, the data complies with thespecification, but an incorrect assumption or decision by the developer on how to implement the specification leads to potential program crash and exploit. For example, suppose a program processes a length delimited file and the specification says that a data field is 128 characters but developer knew that by convention (e.g. common use) that only 16 characters were used so he hardcoded an array to be 16 characters long. If an attacker sent a specification compliant data field with 128 characters of data instead of 16 characters it could lead to a buffer overflow and possible
World of Malware – Where Content Filter Fits in
Two types of Malware attacks (2 of 2)
2.
Semantic
–
The attacker sends structurally correct but logically incorrect data to thesystem to cause the device to operate outside of its design parameters (e.g. tell a generator to operate 20K RPM above its design tolerance of 5K RPM).
So Why Does Content Filtering Work?
Most malware very fragile, format conversion changes to the file can
break it (render operationally useless)
Malware likes to misrepresent itself
E.g. a JPEG claiming to be TIFF
Malware exploits defects in parsing, usually by providing a structurally
wrong or logically incorrect file
Malware developers like to hide in the portions of files used for metadata
storage, at the end of the file, between segments/markers in a file, and
via steganographic techniques in the payload of files (e.g. image data)
Content Filtering: Deep Content Inspection &
Sanitization
ASSUMPTIONS
1.
Detecting malware is really hard so don’t try
2.
Malware is fragile so extracting content and re-assembling objects neuters almost all
attacks
3.
Exploding the malware is a good start to observe malicious behavior but not entirely
effective
4.
Active content within object protocol (Excel formulas) are benign – the rest is assumed
malicious
5.
There is a user impact (like rendering URLs inactive) and need to be part of policy settings
6.If the object is not definable (
Syntactic
attack -
kind of a Word 2007…) then policy can
Content Filtering Methods
Deep Content Inspection and Sanitization
Verifies file complies with specification, then writes out known good content
Format Conversion
Converts a file to another related format before converting back to the original file
format (e.g. PDF to PS to PDF)
File Flattening
Converts file to another similar but usually less complex format that doesn’t have
the data attack risks of the original (e.g. PPT to series of JPG files)
Canonicalization
Convert contents from specialized form into normalized/raw form (e.g. audio files
into PCM)
Typical Content Filtering Process
Text Dirty Word Search
Based on a “Dirty” and “Clean” word list
Macro removal filter
Images are inspected for format and
sanitized for embedded information or malware
Embedded objects are inspected up to a configurable level deep, usually 1
Virus Cleaning
Typical Office
Document
<Image> </Image> <Excel> </Excel> <Macro> </Macro>How Does it Work: MS Office (1 of 2)
Microsoft Office Filters (97-2010), Word (.doc/.docx), Excel
(.xls/.xlsx), PowerPoint (.ppt/.pptx) - Processing Steps
1.
Validate file type compiles with official specification from Microsoft
(2003 and below) or from Microsoft and the ISO for (2007+)
2.
Recursively process MS Office into constituent parts
How Does it Work: MS Office (2 of 2)
Microsoft Office Filters (97-2010), Word (.doc/.docx), Excel
(.xls/.xlsx), PowerPoint (.ppt/.pptx) - Processing Steps
continued
4.
Send all non-MS Office components that are supported to other filters.
If file type not supported then either fail the MS Office file or remove
that object from the MS Office*
5.
Non-MS Office components are filtered by their respective filters and if
How Does it Work: Imagery
JPEG (.jpg, .jpeg), Windows Bitmap (.bmp/.dib), Windows Metafile
(.wmf), Windows Enhanced Metafile (.emf), Graphics Interchange Format
(.gif), Portable Network Graphics (.png), Tagged Image File Format (.tiff)
Processing Steps:
1.
Validate file type compiles with official specification
2.Validate and/or remove metadata
3.
Send metadata for dirty word analysis
4.
Zeroize the least significant bits of the image data*
5.Rebuild and recompress image
How Does it Work: Compressed Files
PKzip (.zip), UNIX tar (.tar), GNU zip (.gz), BZip2 (.bz2)
Steps:
1.
Validate file type compiles with official specification
2.
Check excessive levels of embedding (zip/tar)
3.
Extract directory structure data
4.
Extract all the files and throw away the container
5.
Filter files
6.
Rebuild container by reinserting filtered files. Failed files are replaced
How Does it Work: Text
Text files (.txt/.csv/.log) – Support 7 bit/8 bit ASCII and
Unicode UTF-8 - Steps
1.
Validate the file is non-executable textual data
2.
Apply Regular Expressions to data (usually to neuter URLs)
3.
Apply Dirty Word Filter to textual by rotating through a series of
How Does it Work: PDF
Adobe Portable Document Format (PDF) - Processing Steps
1.
Validate file type compiles with official specification
2.
Perform text extraction for Dirty Word Analysis
3.
Convert PDF to Postscript (PS) then back to PDF
Content Filtering Lab Test Results
Methodology for determining eMIST’s effectiveness at neutralizing
malware and determining false positive rates:
1.
Collect presumed good and malicious test data.
2.Verify the malicious data using established test bed.
3.
Configure eMIST v3.0.3 with the appropriate policies, network configuration,
etc.
4.
Process files through eMIST v3.0.3.
5.
Record output results (e.g., passed, modified, rejected) for each file, per file
type.
6.
Evaluate malicious test set output files for malicious content using established
test bed.
How Well Does Content Filtering Work – Lab Results
At 95% Confidence Factor
File Type
Block/Cleansing Rate
(479 Policy)
Block/Cleansing Rate
(Basic Policy)
Doc
95.28% ± 2.02%
98.63% ± 1.56%
Ppt
80.48% ± 24.76% //99%
71.92% ± 33.67% /99%
99.80% ± 0.16%
99.87% ± 0.18%
Xls
96.62% ± 1.33%//98%
98.06% ± 1.43%//98%
Gif
98.22% ± 2.50% //100%
96.56% ± 4.78% //100%
Jpg
2.91% ± 1.33%
2.88% ± 1.86%
Rtf
N/A//99.8%
N/A//99.8%
How Well Does Content Filtering Work – Lab Testing
File Type False Positive Rate (479 Policy) False Positive Rate (Basic Policy)
doc 4.28% ± 0.79% 4.27% ± 1.12 ppt 5.36% ± 1.53% 5.68% ± 2.21% xls 8.26% ± 2.94% 8.73% ± 4.23% docx 5.03% ± 0.50% 44.55% ± 1.62% pptx 15.39% ± 1.10% 25.81% ± 1.89% xlsx 16.73% ± 2.37% 19.16% ± 3.52% pdf 1.49% ± 0.20% 3.39% ± 0.43% gif 1.73% ± 0.58% 1.82% ± 0.84% tiff 1.32% ± 0.32% 1.36% ± 0.46% jpg 1.45% ± 0.31% 1.36% ± 0.42% png 1.66% ± 0.29% 1.83% ± 0.42% bmp 1.88% ± 0.53% 2.03% ± 0.78% wmf 1.25% ± 0.56% 1.31% ± 0.81% emf 1.35% ± 0.42% 1.28% ± 0.57%
Review of Lab Testing
Results from testing indicate eMIST 3.0.3 appears to be
capable of blocking zero day malware at about a 99.5%
rate
Pass rate is 98.5%, can be improved by tailoring dirty
word list
OR
If object is not defined then send to secondary inspection
process since this means the object may be malicious –
take a systems approach
DHS Operational Testing of eMIST 3.0.3
We will put eMIST 3.0.3 in our operational network (LAN A)
to assess operational malicious content kill rate
Test results forthcoming: we ran into
operational issues so test results need to be
verified before public release
#RSAC
eMist Mail Content Filtering Combined with
Behavior-based Tools
InternetDHS SOC
OneNet DC2 LAN-A OneNet Hub Transport Server @dhs.gov Email Server MS OutlookClient Main Inbox
Current @dhs.gov email
path
based Tools
InternetDHS SOC
OneNet DC2 LAN-A OneNet Hub Transport Server @dhs.gov Email Server eMist Email ServerCS&C Participants –
EPP-eMist
Pilot adds Endpoint Protection
(EPP)-EPP EPP EPP EPP EPP EPP EPP EPP EPP EPP EPP EPP EPP EPP EPP EPP EPP
#RSAC
EPP
eMist Mail Content Filtering Combined with
Behavior-based Tools
InternetDHS SOC
OneNet DC2 LAN-A OneNet Hub Transport Server @dhs.gov Email Server eMist Email ServerCS&C Participants – EPP-equipped Laptops
eMist
Email traffic
entering dhs.gov
is replicated and
goes to both
primary Outlook
server and eMist
based Tools
eMist
eMist extracts embedded
attachments in emails and
cleans them
Emails are reconstructed
with their now-cleansed
attachments re-inserted
#RSAC
MS Outlook Client
EPP
eMist Mail Content Filtering Combined with
Behavior-based Tools
InternetDHS SOC
OneNet DC2 LAN-A OneNet Hub Transport Server @dhs.gov Email Server eMist Email Server CS&C Participants EPP-equipped LaptopseMist
Pilot participants with EPP laptops
have Outlook Clients connect to 2
inboxes
Allows EPP tools to detect malicious
behavior from files originating from
either email inbox
Main Inboxbased Tools
InternetDHS SOC
OneNet DC2 LAN-A OneNet Hub Transport Server @dhs.gov Email Server eMist Email Server MS Outlook Client Test Inbox Main Inbox CS&C ParticipantseMist
EPP
on laptop monitors for and alerts on
suspicious behaviors, including reference
to files that are source of suspect
behaviors
#RSAC
eMist Mail Content Filtering Combined with
Behavior-based Tools
EPP
EPP-detected behaviors from laptops
Data aggregated by EPP
server now supports
multiple cybersecurity
activities
based Tools
EPP-detected behaviors from laptops
Malicious items successfully blocked by eMist/ missed by current
mechanisms
EPP
#RSAC
eMist Mail Content Filtering Combined with
Behavior-based Tools
EPP-detected behaviors .gov emails
EPP-detected behaviors eMist test emails
Malicious items not blocked by eMist – candidates for tuning,
signature development, or heuristics
EPP
DHS Use of Content Filtering
What DHS is doing with content filtering to promote its
use?
We put eMIST 3.0.3 and follow-on commercial in our operational
network (LAN A) to assess operational malicious content kill rate –
slide show
Will use evidence to justify and encourage procurement of commercial
content filtering products
Partnering with vendors to advance state of art for email and web
DHS Use of Content Filtering
What is DHS Doing
next
with content filtering?
Programming next set of commercial product tests and operational
demonstrations of kill rate – email and web
Planning next set of operational tests using a TBD commercial product
to perform content filtering on DHS LAN A email
Focus will be on sanitization rate, usability and availability
Using evidence to justify and encourage procurement of commercial
content filtering products
Partnering with vendors to advance state of art for email and web
content filtering
What Can YOU Do with this Knowledge?
1.
Research content filtering technology – become smarter on “pass
known good” approach
2.
Become familiar with current commercial state of art
3.
Go get some and protect your networks!!!
4.
Demand vendors improve offerings – the demand side of
supply/demand
5.
Developers: Go make better commercial offerings to advance
Parting Words - Motivation
1.
This approach works – 98% zero day kill rate
2.
It is not monetarily costly, sort of depends…
3.
This approach impacts user experience (based upon policy to
block/pass undefinable objects) – this is a good thing as it
re-sets expectations for “cost of security”
4.