netflow-indexer Documentation
Release 0.1.28
Justin Azoff
CONTENTS
1 Installation 2
1.1 Install prerequisites. . . 2
1.2 Install netflow-indexer . . . 2
2 Configuration 3 2.1 Example Configuration files . . . 3
2.2 Cron . . . 4
2.3 Daily compaction. . . 4
3 Example Session 5 3.1 Indexing data . . . 5
3.2 Searching the index . . . 5
3.3 Specifying output columns . . . 6
3.4 Dumping data . . . 6
4 API 7 4.1 Searching with the API. . . 7
4.2 Example . . . 7
4.3 File metadata . . . 8
4.4 Searching with pynfdump . . . 8
5 Indices and tables 10
Index 11
netflow-indexer Documentation, Release 0.1.28
Netflow-indexer is a program that uses xapian to index the flat file databases used by nfdump or flow-tools. Contents:
CHAPTER
ONE
INSTALLATION
1.1 Install prerequisites
netflow indexer uses the python xapian bindings. The IPy module is used for some subnet calculations to support CIDR searching. On debian you can install all the dependencies using:
# apt-get install python-pip python-xapian xapian-tools python-ipy
1.2 Install netflow-indexer
I recommend installing netflow-indexer into a virtual environment:
# pip install -s -E /usr/local/python_env/ netflowindexer-0.1.9.tar.gz
Then modify your path or source the activation script:
# PATH=/usr/local/python_env/bin/:$PATH
CHAPTER
TWO
CONFIGURATION
Netflow-indexer uses a small configuration file that setups the type of indexer to use and the location of the files on disk. It has the following settings:• indexer - the type of indexer to use.nfdump,nfdump_full,flowtools, orflowtools_full • dbpath - the path to save the indexes to
• fileglob - the shell glob that will expand to the flow data files for the current hour • allfileglob - the shell glob that will expand to all flow data files
• pathregex - a regular expression or simple string used to extract metadata from flow file paths.
2.1 Example Configuration files
2.1.1 nfdump using full indexing(recommended)
[nfi] indexer = nfdump_full dbpath = /data/nfdump_xap flowpath = /data/nfsen/profiles/live/podium fileglob = %(flowpath)s/nfcapd.%(year)s%(month)s%(day)s* allfileglob = %(flowpath)s/nfcapd.* pathregex = /profiles/:profile/:source/nfcapd2.1.2 nfdump
[nfi] indexer = nfdump dbpath = /data/nfdump_xap flowpath = /data/nfsen/profiles/live/podium fileglob = %(flowpath)s/nfcapd.%(year)s%(month)s%(day)s* allfileglob = %(flowpath)s/nfcapd.* pathregex = /profiles/:profile/:source/nfcapd2.1.3 flow-tools
3netflow-indexer Documentation, Release 0.1.28 [nfi] indexer = flowtools dbpath = /usr/local/var/db/flows/nfi flowpath = /usr/local/var/db/flows/packeteer fileglob= %(flowpath)s/%(year)s/%(year)s-%(month)s/%(year)s-%(month)s-%(day)s/ft-v05.%(year)s-%(month)s-%(day)s.%(hour)s* allfileglob = %(flowpath)s/*/*/*/ft-v05.*
2.2 Cron
Netflow-indexer should be run from cron 5 minutes after every hour when using thenfdumpindexer and every 5 minutes when using thenfdump_fullindexer:
MAILTO=root
PATH=/usr/local/python_env/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 45 0 * * * cd /data/nfdump_xap/ && ./daily_compact > /dev/null
*/5 * * * * sleep 30;netflow-index-update /data/nfdump_xap/nfdump.ini 55 0 * * * netflow-index-cleanup /data/nfdump_xap/nfdump.ini -d
2.3 Daily compaction
xapian allows you to compact an index for read-only use. Compaction yields disk usage and speed improvements. daily compaction is a work in progress
2.3.1 examples/daily_compact.sh
#!/bin/shDAY=‘date +"%Y%m%d" -d "60 minutes ago"‘
./xap_compact ${DAY}.db
2.3.2 examples/xap_compact.sh
#!/bin/sh orig="$1" tmp=tmp_$$.db tmp2=tmp2_$$.db if [ -e $orig/.compacted ] ; then exit 0 fixapian-compact -F $orig $tmp && mv $orig $tmp2 && mv $tmp $orig && rm -rf $tmp2 && touch $orig/.compacted
CHAPTER
THREE
EXAMPLE SESSION
3.1 Indexing data
Tell the netflow indexer to index the current netflow files For this example I deleted todays index so it can be re-created netflow@nf:~$ netflow-index-update /data/nfdump_xap/nfdump.ini
read /data/nfsen/profiles/live/podium/nfcapd.201205010000 in 2.4 seconds. 64501 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010005 in 2.5 seconds. 70830 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010010 in 3.8 seconds. 120925 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010015 in 2.7 seconds. 65676 ips. ...
read /data/nfsen/profiles/live/podium/nfcapd.201205010240 in 1.3 seconds. 54040 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010245 in 1.3 seconds. 52391 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010250 in 1.2 seconds. 49993 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010255 in 1.2 seconds. 52161 ips. Flush took 7.4 seconds.
...
read /data/nfsen/profiles/live/podium/nfcapd.201205011615 in 7.4 seconds. 159399 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011620 in 7.1 seconds. 155225 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011625 in 5.7 seconds. 110510 ips. Flush took 28.9 seconds.
Running the indexer when more data is available does an incremental update: netflow@nf:~$ netflow-index-update /data/nfdump_xap/nfdump.ini
read /data/nfsen/profiles/live/podium/nfcapd.201205011630 in 3.7 seconds. 110257 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011635 in 3.7 seconds. 116742 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011640 in 4.2 seconds. 107927 ips. Flush took 7.0 seconds.
netflow@nf:~$ netflow-index-update /data/nfdump_xap/nfdump.ini netflow@nf:~$
When performing an index for the first time you should use the–full-indexor-f option to index all the data. By default netflow-indexer only tries indexing files that matchfileglob:
netflow-index-update /data/nfdump_xap/nfdump.ini --full-index
3.2 Searching the index
Search the index for 2011-04-18:
netflow-indexer Documentation, Release 0.1.28
# 59.124.163.60 is an address that just scanned us
remote@nf:~$ time netflow-index-search /data/nfdump_xap/nfdump.ini /data/nfdump_xap/20110419.db 59.124.163.60 2011-04-19 05:35:00 2011-04-19 05:40:00 2011-04-19 05:45:00 2011-04-19 05:50:00 2011-04-19 05:55:00 2011-04-19 06:00:00 2011-04-19 06:05:00 2011-04-19 07:40:00 2011-04-19 07:45:00 2011-04-19 07:50:00 2011-04-19 07:55:00 real 0m0.072s
This output shows that it was present in the index in 11 5 minute chunks. Searching the 30 day index takes only slightly longer and returns the same results:
remote@nf:~$ netflow-index-search-all /data/nfdump_xap/nfdump.ini 59.124.163.60
Searching for an IP that does not exist in the index is very fast:
remote@nf:~$ time netflow-index-search-all /data/nfdump_xap/nfdump.ini 9.254.9.254
real 0m0.097s
3.3 Specifying output columns
netflow-index-searchandnetflow-index-search-allsupport a -c option which selects what columns should be output. By default onlytimeis output. The other built-in field isfilename. Additional fields are made available by using the pathregexconfiguration option. Columns can be selected by using two methods:
-c time -c filename
or:
-c time,filename
3.4 Dumping data
netflow-index-searchandnetflow-index-search-allsupport a -d option which automatically runs the appropriate netflow tool for you:
remote@nf:~$ time netflow-index-search-all /data/nfdump_xap/nfdump.ini 59.124.163.60 -d|head
2011-04-19 05:38:36.468 1.696 TCP 59.124.163.60:39432 -> 123.123.2.245:22 4 192 1 2011-04-19 05:38:36.468 1.776 TCP 59.124.163.60:50920 -> 123.123.2.246:22 4 192 1 2011-04-19 05:38:36.468 1.428 TCP 123.123.2.245:22 -> 59.124.163.60:39432 4 237 1 2011-04-19 05:38:36.472 0.828 TCP 59.124.163.60:36167 -> 123.123.2.247:22 3 152 1 ...
You can also use the -f option to pass an additional filter:
remote@nf:~$ netflow-index-search-all /data/nfdump_xap/nfdump.ini 59.124.163.60 -d -f ’not port 22’
CHAPTER
FOUR
API
4.1 Searching with the API
classnetflowindexer.main.Searcher(ini_file)
Create a new searcher instance. Call with the path to the ini file
list_databases()
Return a list of database files in the ‘dbpath’ directory search(database,ips,dump=False,filter=None,mode=None)
Search a specific database file
Parameters
• database– The full path to a database file. • ips– a list of ip addresses to search for.
• dump– if True dump the full netflow records, otherwise just the seen timeslots • filter– optional additional netflow search filter to be used when dump=True • mode– set to ‘pipe’ to have nfdump list pipe delimited records
search_all(ips,dump=False,filter=None,mode=None)
Search all database files. Takes the same parameters assearch()
4.2 Example
TheSearcherclass can be used to search for records: >>> from netflowindexer import Searcher
>>> s = Searcher("/spare/tmp/netflow/nfdump.ini") >>> print s.list_databases()
[’/spare/tmp/netflow/20110408.db’]
>>> for record in s.search_all([’8.8.8.8’]): ... print record 2011-04-08 15:00:00 2011-04-08 15:05:00 2011-04-08 15:10:00 2011-04-08 15:15:00 2011-04-08 15:20:00 ... 7
netflow-indexer Documentation, Release 0.1.28
>>> for record in s.search_all([’8.8.8.8’], dump=True): ... print record 2011-04-08 14:59:32.696 0.000 UDP 111.222.121.54:53241 -> 8.8.8.8:53 2 138 1 2011-04-08 14:59:32.708 0.028 UDP 8.8.8.8:53 -> 111.222.121.54:53241 2 266 1 2011-04-08 14:59:38.416 0.000 UDP 111.222.121.127:51528 -> 8.8.8.8:53 1 77 1 2011-04-08 14:59:38.396 0.000 UDP 8.8.8.8:53 -> 111.222.121.127:51528 1 165 1 2011-04-08 14:59:38.400 0.000 UDP 111.222.121.127:60043 -> 8.8.8.8:53 1 77 1 2011-04-08 14:59:38.368 0.000 UDP 8.8.8.8:53 -> 111.222.121.127:60043 1 151 1 2011-04-08 14:59:41.516 0.000 UDP 111.222.121.54:60128 -> 8.8.8.8:53 1 85 1 2011-04-08 14:59:41.516 0.000 UDP 111.222.121.54:63357 -> 8.8.8.8:53 1 86 1
4.3 File metadata
Search results are actually an object. str() will return simply the time of the matching flow records, but there are other fields available:
>>> for record in s.search_all([’8.8.8.8’]): ... print repr(record)
SearchResult(filename=/spare/tmp/netflow/nfcapd.201104081500, time=2011-04-08 15:00:00, profile=tmp) SearchResult(filename=/spare/tmp/netflow/nfcapd.201104081505, time=2011-04-08 15:05:00, profile=tmp) SearchResult(filename=/spare/tmp/netflow/nfcapd.201104081510, time=2011-04-08 15:10:00, profile=tmp)
>>> for record in s.search_all([’8.8.8.8’]): ... print record.time, record.profile
2011-04-08 15:00:00 tmp 2011-04-08 15:05:00 tmp 2011-04-08 15:10:00 tmp
These field extractions are done via thepathregexconfiguration option.
4.4 Searching with pynfdump
pynfdump1is another module I have written. You can easily use netflow indexer with pynfdump: >>> from netflowindexer import Searcher >>> import pynfdump
>>> d=pynfdump.Dumper()
>>> s = Searcher("/spare/tmp/netflow/nfdump.ini")
>>> records = s.search_all(["8.8.8.8"], dump=True, filter=’dst port 53’, mode=’pipe’) >>> for rec in d.parse_search(records):
... print rec[’dstip’], rec[’bytes’]
8.8.8.8 138 8.8.8.8 77 8.8.8.8 77 8.8.8.8 85 8.8.8.8 86 8.8.8.8 85 8.8.8.8 86 8.8.8.8 86 8.8.8.8 55 1http://packages.python.org/pynfdump/ 4.3. File metadata 8
netflow-indexer Documentation, Release 0.1.28
8.8.8.8 55 8.8.8.8 68
The above example used netflowindexer to find all flows to 8.8.8.8, then used nfdump to filter it by ‘dst port 53’, and finally handed it off to pynfdump for parsing.
CHAPTER
FIVE
INDICES AND TABLES
• genindex• search
INDEX
L
list_databases() (netflowindexer.main.Searcher method), 7
S
search() (netflowindexer.main.Searcher method),7 search_all() (netflowindexer.main.Searcher method),7 Searcher (class in netflowindexer.main),7