• No results found

Finding the needle in the haystack with ELK

N/A
N/A
Protected

Academic year: 2021

Share "Finding the needle in the haystack with ELK"

Copied!
63
0
0

Loading.... (view fulltext now)

Full text

(1)

S

Finding the needle in the

haystack with ELK

Elasticsearch for Incident Handlers and Forensic Analysts

(2)

Whoami

S  Working for the Belgian Government

my own company

S  Incident Handling

S  Malware analysis

S  Forensics (network + system)

S  Open Source minded

S  Creator of MISP – Malware Information Sharing Platform

S  Creator of pystemon – pastebin monitoring tool

S  Core organizer of the FOSDEM conference for many years

(3)

S

Finding the needle in the

haystack with ELK

Elasticsearch for Incident Handlers and Forensic Analysts

(4)
(5)

What tools do you use?

S  Text logs

S  notepad

S  Grep

S  awk / sed / cut

(6)
(7)

Optimizing

S  grep -F log.txt

S  zgrep -F log.txt

S  zgrep -f patterns.txt -F log.txt

S  find "$LOGS_DIR" -iname "*.gz" -print0 | parallel --gnu -0

-n1 -P8 zgrep -f patterns.txt –F > result-all.txt

(8)

Optimizing

S  MySQL / MS Access

S  Splunk

S  free = 500MB/day

S  ELSA – Enterprise Log Search and Archive

S  Limitation of the # of columns

(9)

Trick for Splunk Addicts

S  Limit is 500 MB /day

S  3 license violations allowed per month

S  Set the date to 00:01 AM

S  Index as much as possible 24h/day for 3 days

(while loops are your friend)

(10)

logstash

kibana

Trick for all = ELK

S  Elasticsearch Logstash Kibana

S  Index as much as you want

S  No limit on volume, speed or position of the moon

(11)

Configurations

S  https://github.com/cvandeplas/ELK-forensics

S  Repository with Logstash and Kibana configurations

S  Mactime, BlueCoat, Mail IMSS, IWSVA, IIS,

SuperTimeline, Plaso, …

S 

http://christophe.vandeplas.com/2014/06/setting-up-single-node-elk-in-20-minutes.html

S  Our focus today:

S  Forensics and Incident Handling

(12)
(13)
(14)
(15)
(16)
(17)
(18)

S

(19)

logstash

kibana

Trick for all = ELK

S  Elasticsearch Logstash Kibana

S  Index as much as you want

S  No limit on volume, speed or position-of-the-moon-licensing

(20)

Inputs

S  Inputs & codecs

S  collectd, drupal_dblog, elasticsearch, eventlog, exec, file,

ganglia, gelf, gemfire, generator, graphite, heroku, imap,

invalid_input, irc, jmx, log4j, lumberjack, pipe, puppet_facter, rabbitmq, rackspace, redis, relp, s3, snmptrap, sqlite, sqs, stdin,

stomp, syslog, tcp, twitter, udp, unix, varnishlog, websocket,

wmi, xmpp, zenoss, zeromq

S  cloudtrail, collectd, compress_spooler, dots, edn, edn_lines,

fluent, graphite, json, json_lines, json_spooler, line, msgpack, multiline, netflow, noop, oldlogstashjson, plain, rubydebug,

spool

S  Outputs

(21)

Input Example

S  I usually don’t use “file” as input

S  Keeps a reference to the position in the file

S  TCP socket is the easiest for me

(22)

Outputs

S  Inputs & codecs

S  Outputs

S  boundary, circonus, cloudwatch, csv, datadog,

datadog_metrics, elasticsearch, elasticsearch_http,

elasticsearch_river, email, exec, file, ganglia, gelf, gemfire,

google_bigquery, google_cloud_storage, graphite, graphtastic, hipchat, http, irc, jira, juggernaut, librato, loggly, lumberjack, metriccatcher, mongodb, nagios, nagios_nsca, null, opentsdb, pagerduty, pipe, rabbitmq, rackspace, redis, redmine, riak, riemann, s3, sns, solr_http, sqs, statsd, stdout, stomp, syslog, tcp, udp, websocket, xmpp, zabbix, zeromq

(23)
(24)

Filters

S  Inputs & codecs

S  Outputs

S  Filters

S  advisor, alter, anonymize, checksum, cidr, cipher, clone,

collate, csv, date, dns, drop, elapsed, elasticsearch,

environment, extractnumbers, fingerprint, gelfify, geoip, grep, grok, grokdiscovery, i18n, json, json_encode, kv, metaevent,

metrics, multiline, mutate, noop, prune, punct,

railsparallelrequest, range, ruby, sleep, split, sumnumbers,

syslog_pri, throttle, translate, unique, urldecode, useragent,

(25)
(26)
(27)

Grok

S  Named regular expressions to match patterns/extract data.

S  Logstash ships with lots of patterns !

https://github.com/elasticsearch/logstash/tree/master/patterns

(28)
(29)

Data Enrichment with Filters

S  Extract fields: csv, grok, kv!

S  Extract date!

S  Modify using mutate!

S  Enrich with S  Geoip S  User-agent S  Urldecode S  Translate S  …

(30)
(31)
(32)
(33)
(34)
(35)
(36)

Ruby as last resort

(37)

Data Enrichment with Filters

S  Extract fields: csv, grok, kv!

S  Extract date!

S  Modify using mutate!

S  Enrich with S  Geoip S  User-agent S  Urldecode S  Translate S  …

(38)

logstash

kibana

Trick for all = ELK

S  Elasticsearch Logstash Kibana

S  Index as much as you want

S  No limit on volume, speed or season-licensing

(39)

Elasticsearch

S  Wikipedia: Elasticsearch is a search server based on

Lucene. It provides a distributed, multitenant-capable

full-text search engine with a RESTful web interface and

schema-free JSON documents. Elasticsearch is developed in

Java and is released as open source under the terms of the

Apache License.

S  Very very fast

(40)

Elasticsearch

S  Be cautious

S  No security by default

S  Auto-discovery, auto-distribution if other node is present

S  Elastic HQ plugin

S  cd /usr/share/elasticsearch/bin!

(41)

logstash

kibana

Trick for all = ELK

S  Elasticsearch Logstash Kibana

S  Index as much as you want

S  No limit on volume, speed or horoscope-licensing

(42)

Kibana

S  Fancy GUI

S  Extremely easy to build up a dashboard

S  Gives good overview over data

S  Powerful, but limited in capability

(43)

DO NOT PRESS

(44)

Search syntax

S  Apache Lucene Search syntax

S  title:foo title:"foo bar”

S  title:"foo bar” AND body:"quick fox”

S  (title:"foo bar" AND body:"quick fox") OR title:fox

S  title:foo -title:bar

S  title:foo*bar

S  time_taken:[10000 TO 999999999]

(45)
(46)
(47)
(48)

S

(49)

Performance goals

S  Focus Incident Handling and Forensics

S  Max speed of indexing

S  Max speed of searching

S  During indexation search may be slow

S  No need for redundancy

(50)

Performance Logstash

S  Memory setting: (/etc/default/elasticsearch)

S  LS_HEAP_SIZE="500m"!

S  Command line flag:

S  -w or –filterworkers AMOUNT_OF_CORES (default: 1)!

S  Each extra filter slows it down

S  Grok aka regex = slow S  Prefer csv, kv

S  Use the least possible wildcards (* or +)!

S  Geoip = slow but very practical

(51)

Performance Elasticsearch

S  Memory setting (/etc/default/elasticsearch)

S  ES_HEAP_SIZE=12g => set to half of RAM (max 32 GB)

S  Disable redundancy (/etc/elasticsearch/elasticsearch.yml)

S  index.number_of_replicas: 0!

S  Shards for number of nodes (/etc/elasticsearch/elasticsearch.yml)

S  index.number_of_shards: 1

S  Increase memory buffer for search

(52)

Perf. Elasticsearch Indexes

S  Open Index = memory usage + disk usage

Closed Index = disk usage, so close index when not needed

S  Per case new indexes

Similar logs in the same index, but use a field “host” to differentiate investigations

S  system timelines: logstash-%{[case]}-%{[type]}

S  mail logs: logstash-%{[case]}-%{[type]}-%{+YYYY.MM}

S  proxy logs: logstash-%{[case]}-%{[type]}-%{+YYYY.MM.dd} S  curl -XPOST 'localhost:9200/logstash-${case}*/_close'


(53)

Performance Kibana

S  Each block/graph is extra search

S  So 10 graphs equals 10 simultaneous searches

1.  First select small date/time window

2.  Test your search on small data set

3.  Add filters

4.  Zoom out on date/time

(54)

Keep in mind

S  Logstash is (relatively) SLOW

S  Finished? Close the index, do NOT delete it

S  Or save JSON to files (output plugin Logstash), re-index them later

(55)

S

(56)

Plaso

S  Plaso = the new log2timeline and more

S  log2timeline.py win7-64-nfury-10.3.58.6.dump /path/to/

disk/image

(57)

ELK-forensics

S 

https://github.com/cvandeplas/ELK-forensics

S  Logstash configs

S  Kibana dashboards

S  Mactime, Log2timeline csv, BlueCoat, Mail IMSS, IWSVA,

IIS

(58)

Other interesting projects using

Elasticsearch

S  Moloch – Open Source large scale IPv4 full PCAP

capturing, indexing and database system. https://github.com/aol/moloch

S  Mozdef – PoC – automate IH process and facilitate

real-time activities - https://github.com/jeffbryner/MozDef

S  Suricata – Exports data in EVE format (JSON).

(59)
(60)
(61)
(62)
(63)

S

Places to be?

• 

https://github.com/cvandeplas/ELK-forensics

• 

http://www.elasticsearch.org/overview/elkdownloads/

• 

http://logstash.net/

• 

https://groups.google.com/forum/#!forum/logstash-users

References

Related documents

The aim of this paper is to highlight the discrepancies by way of a use-efficacy analysis of the subject headings (terms of a controlled vocabulary or of a thesaurus called

Report/Visualization Tools I use open source data reporting/visualiz ation tools I use commercial data reporting/visualiz ation tools I use home grown routines or open source

http://<CDN>/<Cache>/<Machine id>/<Logical volume, Photo> Picture taken from “ Finding a needle in Haystack: Facebook’s photo storage”, OSDI'10

Scalable Batch ETL & Indexing Index shard Files Index shard Indexer w/ Morphlines Files or HBase Solr server Indexer w/ Morphlines Solr server GO LIVE HDFS Solr and

Given a dataset of features (structural information, edges in anatom- ic or functional networks) obtained from the brain of several subjects and a continuous valued phenotype

Types of JVMs Java Virtual Machine JRockit Azul Zing Hotspot IBM J9 OpenJDK Azul Zulu IcedTea Free, Open Source, GPL Semi-Free, Closed Source, Restrictions Commercial,

Conduct problem triage; diagnose root cause Monitor all business transactions through. the IT infrastructure; measure response and

13 Integrated Security • Information Security is not an extra domain • Information Security is a transversal activity • Information Security is integrated (embedded) with