• No results found

netflow-indexer Documentation

N/A
N/A
Protected

Academic year: 2021

Share "netflow-indexer Documentation"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

netflow-indexer Documentation

Release 0.1.28

Justin Azoff

(2)

CONTENTS

1 Installation 2

1.1 Install prerequisites. . . 2

1.2 Install netflow-indexer . . . 2

2 Configuration 3 2.1 Example Configuration files . . . 3

2.2 Cron . . . 4

2.3 Daily compaction. . . 4

3 Example Session 5 3.1 Indexing data . . . 5

3.2 Searching the index . . . 5

3.3 Specifying output columns . . . 6

3.4 Dumping data . . . 6

4 API 7 4.1 Searching with the API. . . 7

4.2 Example . . . 7

4.3 File metadata . . . 8

4.4 Searching with pynfdump . . . 8

5 Indices and tables 10

Index 11

(3)

netflow-indexer Documentation, Release 0.1.28

Netflow-indexer is a program that uses xapian to index the flat file databases used by nfdump or flow-tools. Contents:

(4)

CHAPTER

ONE

INSTALLATION

1.1 Install prerequisites

netflow indexer uses the python xapian bindings. The IPy module is used for some subnet calculations to support CIDR searching. On debian you can install all the dependencies using:

# apt-get install python-pip python-xapian xapian-tools python-ipy

1.2 Install netflow-indexer

I recommend installing netflow-indexer into a virtual environment:

# pip install -s -E /usr/local/python_env/ netflowindexer-0.1.9.tar.gz

Then modify your path or source the activation script:

# PATH=/usr/local/python_env/bin/:$PATH

(5)

CHAPTER

TWO

CONFIGURATION

Netflow-indexer uses a small configuration file that setups the type of indexer to use and the location of the files on disk. It has the following settings:

• indexer - the type of indexer to use.nfdump,nfdump_full,flowtools, orflowtools_full • dbpath - the path to save the indexes to

• fileglob - the shell glob that will expand to the flow data files for the current hour • allfileglob - the shell glob that will expand to all flow data files

• pathregex - a regular expression or simple string used to extract metadata from flow file paths.

2.1 Example Configuration files

2.1.1 nfdump using full indexing(recommended)

[nfi] indexer = nfdump_full dbpath = /data/nfdump_xap flowpath = /data/nfsen/profiles/live/podium fileglob = %(flowpath)s/nfcapd.%(year)s%(month)s%(day)s* allfileglob = %(flowpath)s/nfcapd.* pathregex = /profiles/:profile/:source/nfcapd

2.1.2 nfdump

[nfi] indexer = nfdump dbpath = /data/nfdump_xap flowpath = /data/nfsen/profiles/live/podium fileglob = %(flowpath)s/nfcapd.%(year)s%(month)s%(day)s* allfileglob = %(flowpath)s/nfcapd.* pathregex = /profiles/:profile/:source/nfcapd

2.1.3 flow-tools

3

(6)

netflow-indexer Documentation, Release 0.1.28 [nfi] indexer = flowtools dbpath = /usr/local/var/db/flows/nfi flowpath = /usr/local/var/db/flows/packeteer fileglob= %(flowpath)s/%(year)s/%(year)s-%(month)s/%(year)s-%(month)s-%(day)s/ft-v05.%(year)s-%(month)s-%(day)s.%(hour)s* allfileglob = %(flowpath)s/*/*/*/ft-v05.*

2.2 Cron

Netflow-indexer should be run from cron 5 minutes after every hour when using thenfdumpindexer and every 5 minutes when using thenfdump_fullindexer:

MAILTO=root

PATH=/usr/local/python_env/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 45 0 * * * cd /data/nfdump_xap/ && ./daily_compact > /dev/null

*/5 * * * * sleep 30;netflow-index-update /data/nfdump_xap/nfdump.ini 55 0 * * * netflow-index-cleanup /data/nfdump_xap/nfdump.ini -d

2.3 Daily compaction

xapian allows you to compact an index for read-only use. Compaction yields disk usage and speed improvements. daily compaction is a work in progress

2.3.1 examples/daily_compact.sh

#!/bin/sh

DAY=‘date +"%Y%m%d" -d "60 minutes ago"‘

./xap_compact ${DAY}.db

2.3.2 examples/xap_compact.sh

#!/bin/sh orig="$1" tmp=tmp_$$.db tmp2=tmp2_$$.db if [ -e $orig/.compacted ] ; then exit 0 fi

xapian-compact -F $orig $tmp && mv $orig $tmp2 && mv $tmp $orig && rm -rf $tmp2 && touch $orig/.compacted

(7)

CHAPTER

THREE

EXAMPLE SESSION

3.1 Indexing data

Tell the netflow indexer to index the current netflow files For this example I deleted todays index so it can be re-created netflow@nf:~$ netflow-index-update /data/nfdump_xap/nfdump.ini

read /data/nfsen/profiles/live/podium/nfcapd.201205010000 in 2.4 seconds. 64501 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010005 in 2.5 seconds. 70830 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010010 in 3.8 seconds. 120925 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010015 in 2.7 seconds. 65676 ips. ...

read /data/nfsen/profiles/live/podium/nfcapd.201205010240 in 1.3 seconds. 54040 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010245 in 1.3 seconds. 52391 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010250 in 1.2 seconds. 49993 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010255 in 1.2 seconds. 52161 ips. Flush took 7.4 seconds.

...

read /data/nfsen/profiles/live/podium/nfcapd.201205011615 in 7.4 seconds. 159399 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011620 in 7.1 seconds. 155225 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011625 in 5.7 seconds. 110510 ips. Flush took 28.9 seconds.

Running the indexer when more data is available does an incremental update: netflow@nf:~$ netflow-index-update /data/nfdump_xap/nfdump.ini

read /data/nfsen/profiles/live/podium/nfcapd.201205011630 in 3.7 seconds. 110257 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011635 in 3.7 seconds. 116742 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011640 in 4.2 seconds. 107927 ips. Flush took 7.0 seconds.

netflow@nf:~$ netflow-index-update /data/nfdump_xap/nfdump.ini netflow@nf:~$

When performing an index for the first time you should use the–full-indexor-f option to index all the data. By default netflow-indexer only tries indexing files that matchfileglob:

netflow-index-update /data/nfdump_xap/nfdump.ini --full-index

3.2 Searching the index

Search the index for 2011-04-18:

(8)

netflow-indexer Documentation, Release 0.1.28

# 59.124.163.60 is an address that just scanned us

remote@nf:~$ time netflow-index-search /data/nfdump_xap/nfdump.ini /data/nfdump_xap/20110419.db 59.124.163.60 2011-04-19 05:35:00 2011-04-19 05:40:00 2011-04-19 05:45:00 2011-04-19 05:50:00 2011-04-19 05:55:00 2011-04-19 06:00:00 2011-04-19 06:05:00 2011-04-19 07:40:00 2011-04-19 07:45:00 2011-04-19 07:50:00 2011-04-19 07:55:00 real 0m0.072s

This output shows that it was present in the index in 11 5 minute chunks. Searching the 30 day index takes only slightly longer and returns the same results:

remote@nf:~$ netflow-index-search-all /data/nfdump_xap/nfdump.ini 59.124.163.60

Searching for an IP that does not exist in the index is very fast:

remote@nf:~$ time netflow-index-search-all /data/nfdump_xap/nfdump.ini 9.254.9.254

real 0m0.097s

3.3 Specifying output columns

netflow-index-searchandnetflow-index-search-allsupport a -c option which selects what columns should be output. By default onlytimeis output. The other built-in field isfilename. Additional fields are made available by using the pathregexconfiguration option. Columns can be selected by using two methods:

-c time -c filename

or:

-c time,filename

3.4 Dumping data

netflow-index-searchandnetflow-index-search-allsupport a -d option which automatically runs the appropriate netflow tool for you:

remote@nf:~$ time netflow-index-search-all /data/nfdump_xap/nfdump.ini 59.124.163.60 -d|head

2011-04-19 05:38:36.468 1.696 TCP 59.124.163.60:39432 -> 123.123.2.245:22 4 192 1 2011-04-19 05:38:36.468 1.776 TCP 59.124.163.60:50920 -> 123.123.2.246:22 4 192 1 2011-04-19 05:38:36.468 1.428 TCP 123.123.2.245:22 -> 59.124.163.60:39432 4 237 1 2011-04-19 05:38:36.472 0.828 TCP 59.124.163.60:36167 -> 123.123.2.247:22 3 152 1 ...

You can also use the -f option to pass an additional filter:

remote@nf:~$ netflow-index-search-all /data/nfdump_xap/nfdump.ini 59.124.163.60 -d -f ’not port 22’

(9)

CHAPTER

FOUR

API

4.1 Searching with the API

classnetflowindexer.main.Searcher(ini_file)

Create a new searcher instance. Call with the path to the ini file

list_databases()

Return a list of database files in the ‘dbpath’ directory search(database,ips,dump=False,filter=None,mode=None)

Search a specific database file

Parameters

• database– The full path to a database file. • ips– a list of ip addresses to search for.

• dump– if True dump the full netflow records, otherwise just the seen timeslots • filter– optional additional netflow search filter to be used when dump=True • mode– set to ‘pipe’ to have nfdump list pipe delimited records

search_all(ips,dump=False,filter=None,mode=None)

Search all database files. Takes the same parameters assearch()

4.2 Example

TheSearcherclass can be used to search for records: >>> from netflowindexer import Searcher

>>> s = Searcher("/spare/tmp/netflow/nfdump.ini") >>> print s.list_databases()

[’/spare/tmp/netflow/20110408.db’]

>>> for record in s.search_all([’8.8.8.8’]): ... print record 2011-04-08 15:00:00 2011-04-08 15:05:00 2011-04-08 15:10:00 2011-04-08 15:15:00 2011-04-08 15:20:00 ... 7

(10)

netflow-indexer Documentation, Release 0.1.28

>>> for record in s.search_all([’8.8.8.8’], dump=True): ... print record 2011-04-08 14:59:32.696 0.000 UDP 111.222.121.54:53241 -> 8.8.8.8:53 2 138 1 2011-04-08 14:59:32.708 0.028 UDP 8.8.8.8:53 -> 111.222.121.54:53241 2 266 1 2011-04-08 14:59:38.416 0.000 UDP 111.222.121.127:51528 -> 8.8.8.8:53 1 77 1 2011-04-08 14:59:38.396 0.000 UDP 8.8.8.8:53 -> 111.222.121.127:51528 1 165 1 2011-04-08 14:59:38.400 0.000 UDP 111.222.121.127:60043 -> 8.8.8.8:53 1 77 1 2011-04-08 14:59:38.368 0.000 UDP 8.8.8.8:53 -> 111.222.121.127:60043 1 151 1 2011-04-08 14:59:41.516 0.000 UDP 111.222.121.54:60128 -> 8.8.8.8:53 1 85 1 2011-04-08 14:59:41.516 0.000 UDP 111.222.121.54:63357 -> 8.8.8.8:53 1 86 1

4.3 File metadata

Search results are actually an object. str() will return simply the time of the matching flow records, but there are other fields available:

>>> for record in s.search_all([’8.8.8.8’]): ... print repr(record)

SearchResult(filename=/spare/tmp/netflow/nfcapd.201104081500, time=2011-04-08 15:00:00, profile=tmp) SearchResult(filename=/spare/tmp/netflow/nfcapd.201104081505, time=2011-04-08 15:05:00, profile=tmp) SearchResult(filename=/spare/tmp/netflow/nfcapd.201104081510, time=2011-04-08 15:10:00, profile=tmp)

>>> for record in s.search_all([’8.8.8.8’]): ... print record.time, record.profile

2011-04-08 15:00:00 tmp 2011-04-08 15:05:00 tmp 2011-04-08 15:10:00 tmp

These field extractions are done via thepathregexconfiguration option.

4.4 Searching with pynfdump

pynfdump1is another module I have written. You can easily use netflow indexer with pynfdump: >>> from netflowindexer import Searcher >>> import pynfdump

>>> d=pynfdump.Dumper()

>>> s = Searcher("/spare/tmp/netflow/nfdump.ini")

>>> records = s.search_all(["8.8.8.8"], dump=True, filter=’dst port 53’, mode=’pipe’) >>> for rec in d.parse_search(records):

... print rec[’dstip’], rec[’bytes’]

8.8.8.8 138 8.8.8.8 77 8.8.8.8 77 8.8.8.8 85 8.8.8.8 86 8.8.8.8 85 8.8.8.8 86 8.8.8.8 86 8.8.8.8 55 1http://packages.python.org/pynfdump/ 4.3. File metadata 8

(11)

netflow-indexer Documentation, Release 0.1.28

8.8.8.8 55 8.8.8.8 68

The above example used netflowindexer to find all flows to 8.8.8.8, then used nfdump to filter it by ‘dst port 53’, and finally handed it off to pynfdump for parsing.

(12)

CHAPTER

FIVE

INDICES AND TABLES

• genindex

• search

(13)

INDEX

L

list_databases() (netflowindexer.main.Searcher method), 7

S

search() (netflowindexer.main.Searcher method),7 search_all() (netflowindexer.main.Searcher method),7 Searcher (class in netflowindexer.main),7

References

Related documents

The government co ll ect s revenue f rom taxes on capital and labour income and value-added taxes on fina l demand , production taxes on intermediate inputs, and...

Any two-weeks of course INactivity on the CONNECT tutorial at any time or for any reason during the semester will result in you being assigned a final course grade of “FX”.

In view of the present satisfactory level of computerisation in commercial bank branches, it is proposed that, ‘‘payment of interest on savings bank accounts by scheduled

No.3 IP Fixed Mobile All-IP based FMC Single Platform Box Module Site or Central Office One Cabinet One Site 9KW 3×3KW Smart modularized power management 2KW

It is the (education that will empower biology graduates for the application of biology knowledge and skills acquired in solving the problem of unemployment for oneself and others

 Transportation  activities  include  personnel  and   freight  movements  and  mobile  plant  activities..  Intertwined  with  these  BMPs  are  enforceable

As colleagues working together in Hamline Law’s top-ranked Dispute Resolution Institute, we have spent the last two decades designing and implementing graduate legal education

A number of processes can constitute a marine geohazard, but here, we specifically focus on (i) submarine landslides, (ii) tur- bidity currents, (iii) scour and seafloor