• No results found

SQRRL PRODUCT TRAINING

N/A
N/A
Protected

Academic year: 2021

Share "SQRRL PRODUCT TRAINING"

Copied!
93
0
0

Loading.... (view fulltext now)

Full text

(1)

Target. Hunt. Disrupt.

SQRRL

PRODUCT TRAINING

(2)

Querying sources

Sqrrl Query Language (SqrrlQL) is a query language designed specifically to retrieve Sqrrl data

SELECT queries are used to retrieve rows of data from one or more Sqrrl sources, models or from instances of a single Sqrrl entity

Sqrrl also includes a SqrrlQL API for using SqrrlQL queries in custom development

(3)

SELECT query

Allows you to retrieve rows of data from sources or models

SQL-like language with extensions to handle hierarchical data and Lucene-style searches

You can issue SELECT queries from the

• Sqrrl Enterprise web application

• Sqrrl shell using the select command • SqrrlClient API

SELECT query results are always in the form of a table unless you explicitly request the results in JSON format

(4)

Select overview

SELECT selectExpressionList

• selectExpressionList Identifies the data that you want to return

• Can request values from a single column or contain more complex expressions including aggregated expressions

FROM dataList

• Identifies the location to return the data from (1) list of sources (2) list of models (3) single model entity

• Is optional if you are already in the context of a specific source, model or entity SELECTselectExpressionList FROMdataList [ WHERE whereExpression ] [ GROUP BY

groupingExpressionList ] [ ORDER BY orderExpression ] [ LIMIT limitExpression ] [OFFSET offsetExpression]

(5)

Select overview

WHERE whereExpression (optional) • Filters the returned data

GROUP BY groupingExpressionList (optional)

• If the SELECT clause includes aggregated expressions then the GROUP BY clause determines how to group the returned results

SELECTselectExpressionList FROM dataList [ WHERE whereExpression ] [ GROUP BY groupingExpressionList ] [ ORDER BY orderExpression ] [ LIMIT limitExpression ] [OFFSET offsetExpression]

(6)

Select overview

ORDER BY orderExpression (optional) • Controls the order of the returned results

LIMIT limitExpression (optional)

• Limits the total number of results to return from the query

OFFSET offsetExpression (optional)

• Indicates where to start the matching results by skipping the specified number of rows from the beginning of the original results

• Can only be used when there is an ORDER BY clause

SELECT selectExpressionList FROMdataList [ WHERE whereExpression ] [ GROUP BY groupingExpressionList ] [ ORDER BY orderExpression ] [ LIMITlimitExpression ] [OFFSET offsetExpression]

(7)

Select clause

Identifies the data that you want to return and consists of a list of query expressions

Each expression in the list becomes a column in the SELECT query results

SELECT clause Expressions support requests to • Return basic source field or entity feature values • Specify visibility labels to include

• Aggregate values using grouping values provided in the GROUP BY clause • Use mathematical operations to adjust or combine values

(8)

Select clause

SqrrlQL keywords can be all uppercase like 'SELECT' or all lowercase like 'select' but not mixed-case

# returns everything from the source CounterOps SELECT json() FROM CounterOps

select json() from CounterOps # not allowed

X Select json() from CounterOps

FROM clause is optional if you first execute dataset CounterOps in the shell prior to running the query

SELECT query results are always in the form of a table unless you request the results in JSON format

(9)

SELECT … LIMIT

Determines the total number of SELECT query results to return

# only fetch 5 results

SELECT json() FROM CounterOps LIMIT 5 SELECT JSON() FROM CounterOps LIMIT 5 SELECT json() FROM Sqrrl_Netflow LIMIT 5

(10)

Selecting Specific Fields

# select only specific fields

SELECT hostname FROM CounterOps LIMIT 5

SELECT hostname, ipAddress FROM CounterOps LIMIT 5

SELECT path('hostname'), path(ipAddress) FROM CounterOps LIMIT 5Any field reference in a query expression that is not present in a

matching record is interpreted as a null value

# 'foo' is not a field in hostname so it will be null for each record

(11)

Selecting Array Elements

SELECT hostname, path('secondaryIds[0]') FROM CounterOps LIMIT 5 SELECT hostname, path('secondaryIds[1]') FROM CounterOps LIMIT 5 SELECT json('secondaryIds') FROM CounterOps LIMIT 5

(12)

Selecting Hierarchical Data

Hierarchical fields are delimited with forward slashes and must be wrapped in the path() function to avoid SQL parsing errors

(13)

Shell Options for SELECT Queries

-np,--no-pagination

SELECT hostname, ipAddress FROM CounterOps–np--batch-size (default is 1000)

SELECT hostname, ipAddress FROM CounterOps-np --batch-size 10

--format=table (default)

SELECT hostname, ipAddress FROM CounterOps LIMIT 10 --format=table

--format=simple

SELECT hostname, ipAddress FROM CounterOps LIMIT 10 --format=simple

--format=csv [--delimiter <delimiter>] (default is tab-delimited)

SELECT hostname, ipAddress FROM CounterOps LIMIT 10 --format=csv

SELECT hostname, ipAddress FROM CounterOps LIMIT 10 --format=csv --delimiter ,

--format=null

(14)

Running Queries in the Background

SELECT and NEXT commands can take an optional trailing & which indicates to run the search in the background

To send a running search to the background press the z key

To monitor the progress of a background query use the

WATCHPROGRESS, CHECKPROGRESS, and PEEK commands To rejoin the query use the NEXT command

SELECT and NEXT commands also take an optional -w or --watch-progress argument

(15)

Running Queries in the Background

Once the batch is returned to the shell a message is printed on the console

• You can use NEXT without the “&” to print the background results to the screen

• Typing NEXT without the “&” rejoins the search and blocks until the current batch is returned

While waiting for search results

• Type "z” to return to the prompt while the query continues in the background • Type "c” to cancel the query

(16)

Select Using as

SELECT query supports using AS to alias a SELECT clause expression

• After you define an alias you can use that alias in subsequent expressions in the query. (You cannot use an alias in a SELECT query WHERE clause)

Also used as the column heading in the query results

Aliases are often used for grouped or ordered queries where you may need to refer to a SELECT expression in the GROUP BY or ORDER BY clause

(17)

Aliases

AS keyword is optional

SELECT hostname AS Name FROM CounterOps LIMIT 5 SELECT hostname Name FROM CounterOps LIMIT 5

(18)

© 2019, Amazon Web Services, Inc. or its Affiliates.

FROM clause

Identifies the location of the SELECT query data

Comma-separated list of source name (FROM source1,source2,source3)

• Sqrrl looks for matching data in the source records

Comma-separated list of model names (FROM model1,model2,model3)

• Sqrrl looks for matching data in the entity instances for those models

A single entity in the format modelName.entityName (FROM model1.entity)

(19)

Where clause

Allows you to filter the returned query results based on expression values

Supports the full SQL query language except for aggregate expressions

• You cannot use an alias from the SELECT clause in the WHERE clause

Supports regular expressions to check for specific content

When you specify sources or models in the FROM clause but don't provide a WHERE clause the results list includes rows from *ALL* of the sources and *ALL* all of the entities from the model

(20)

Comparison Operators (=, <, >, <=, >=, <>)

All comparison operators with the exception of <> can take advantage of indexes as follows

• Numeric, DATETIME, and INET_ADDR queries can exploit VALUE_BY_FIELD indexes

• STRING queries perform best with the VALUE_BY_FIELD index but fall back to the TERMS_BY_FIELD or TERMS index if either is available

• Matching aggregate field values can exploit AGGREGATE indexes

• The <> operator can never utilize indexes and requires a full table scan which can be SLOW !

(21)

Comparison Operators: =, <, >, <=, >=, <>

SELECT hostname FROM CounterOps WHERE hostname = 'cam' SELECT hostname FROM CounterOps WHERE hostname < 'cam' SELECT hostname FROM CounterOps WHERE hostname >= 'cam' SELECT hostname FROM CounterOps WHERE hostname <> 'cam'

SELECT SourceAddress Src, DestinationAddress Dst, Protoco FROM Netflow WHERE Protocol = 'ICMP'

SELECT SourceAddress Src, DestinationAddress Dst, Protocol FROM Netflow WHERE Protocol <> 'ICMP'

(22)

Where Like clause

Use LIKE operator to compare values to a specified pattern

• If you just provide a value without any pattern wildcards the string must match exactly

Pattern Description

% Matches zero or more characters.

For example, "b%ad" would match "bad", "brad", and "broad”. _ Matches any single character.

For example, "b_ad" would match "brad" or "bead". It would not match "bad" or "broad”. […] Matches any character in the brackets. Also supports inclusive ranges of values ( [a-z]).

For example, "b[a-f]ad" would match "bead", but not "brad”

[^…] Matches any character or range of characters other than the ones specified. For example, "b[^a-f]d would match "bird", but not "bead”

(23)

WHERE … LIKE

SELECT hostname FROM CounterOps WHERE hostname LIKE 'cam' SELECT hostname FROM CounterOps WHERE hostname LIKE 'cam_' SELECT hostname FROM CounterOps WHERE hostname LIKE 'cam[23]_' SELECT hostname FROM CounterOps WHERE hostname LIKE 'cam[^23]_'

# range search - lower bound is inclusive, upper bound is exclusive

SELECT hostname FROM CounterOps WHERE hostname LIKE '[b-c]%'

# these queries always require a table scan

SELECT hostname FROM CounterOpsWHERE hostname LIKE '%cam'

SELECT hostname FROM CounterOpsWHERE hostname LIKE 'cam%'

(24)

WHERE … IN

Use IN operator to check whether a value matches one of a list of expression values

<value> IN (<match1>, <match2>, ...)

SELECT hostname FROM CounterOps WHERE hostname IN ('athena','cam') SELECT SourceAddress Src, DestinationAddress Dst, Protocol,

(25)

WHERE … BETWEEN

BETWEEN operator checks whether a value falls within an inclusive range

<value> BETWEEN <lowerBound> AND <upperBound>

Range search: lower bound is inclusive, upper bound is exclusive

String queries can exploit VALUE_BY_FIELD indexes.

Numeric queries can exploit VALUE_BY_FIELD or AGGREGATE indexes

SELECT hostname FROM CounterOps WHERE hostname BETWEEN 'b' AND 'd' SELECT hostname, path('secondaryIds[0]') AS id0,

path('secondaryIds[1]') AS id1 FROM CounterOps WHERE path('secondaryIds[1]') BETWEEN 'b' AND 'd'

(26)

WHERE … NOT

SELECT hostname FROM CounterOps WHERE hostname NOT LIKE 'a%' SELECT hostname FROM CounterOps WHERE hostname NOT IN

('athena','cam')

SELECT hostname FROM CounterOps WHERE hostname NOT BETWEEN 'b' AND

'd'

Use the NOT operator to search for records that do not match a given expression

NOT operator cannot use indexes as a result some queries may be very SLOW !

(27)

WHERE … IS [NOT] NULL

# Note that the training sources do not contain any null fields

SELECT hostname FROM CounterOps WHERE hostname IS NULL

SELECT hostname FROM CounterOps WHERE hostname IS NOT NULL

Use the NULL keyword to find NULL values for an expression

Can NEVER utilize indexes requiring a full table scan which can be very SLOW !

(28)

WHERE … AND | OR

WHERE clause can include multiple expressions using AND or OR • Expressions combined using AND must match all WHERE expressions • Expressions combined using OR must match at least one of WHERE

expressions

(29)

WHERE … AND | OR

SELECT SourceAddress Src, DestinationAddress Dst, Protocol, DestinationPort DstPort FROM Netflow WHERE Protocol='TCP' AND DestinationPort=88

SELECT SourceAddress Src, DestinationAddress Dst, Protocol, DestinationPort DstPort FROM Netflow WHERE Protocol IN('TCP','UDP') AND DestinationPort=88 SELECT InBytes, InPackets, SourceAddress Src, DestinationAddress Dst,

Protocol, DestinationPort DstPort FROM Netflow WHERE InBytes>100000000 AND InPackets>100000

SELECT InBytes, InPackets, SourceAddress Src, DestinationAddress Dst,

Protocol, DestinationPort DstPort FROM Netflow WHERE InBytes > 100000000 OR InPackets>100000

SELECT InBytes, InPackets, SourceAddress Src, DestinationAddress Dst,

Protocol, DestinationPort DstPort FROM Netflow WHERE (InBytes>100000000 OR InPackets>100000) AND DestinationPort=139

(30)

lucene

The lucene() function in WHERE clauses supports the full Lucene 4.0 syntax except for fuzzy searches and the NOT operator

Lucene() function DOES SUPPORT

• Fielded or unfielded queries Leading or trailing wildcards

• AND, OR String and numeric range searches

• Phrase searches Full regular expression searches

Lucene() function DOES NOT SUPPORT

• Negations ('NOT','-') • Fuzzy searches ('~') • Proximity searches ('“”~')

(31)

WHERE Lucene()

Use cases Lucene queries are well-suited for

• Tokenization of individual field data into more granular Terms that can be indexed & queried including

– email headers, senders, recipients, subjects, & message bodies – structured or unstructured textual content in general

Full-Text searches that can match either terms or phrases anywhere in a record regardless of which field contains them

Fielded searches that can match either terms or phrases anywhere in a particular field

Regular Expression-based queries

In all other cases prefer '=' or 'LIKE' instead of Lucene queries for better performance!

(32)

WHERE Lucene()

A Lucene search pattern consists of terms and operators

There are two types of Lucene terms

• A single term is a single word such as 'test' or 'hello'.

• A phrase is a group of words surrounded by double quotes such as "hello dolly”

(33)

Lucene Tokenization & Term Indexing

The Lucene Standard Tokenizer tokenizes field values to more granular terms as follows

• Words are split at punctuation characters removing punctuation

A dot that is not followed by whitespace is considered part of a token

• Email addresses and internet host & domain names are recognized as as single tokens

A TERMS_BY_FIELD index indexes individual terms per field

A TERMS index indexes individual terms across multiple document fields

(34)

Unfielded Lucene Queries

• An unfielded Lucene query string consists of a search pattern to match against without restricting the search to any particular document field(s), thereby providing Full-Text Search capability

lucene('pattern') - True if any field(s) in a record match the search pattern otherwise false

(35)

Fielded Lucene Queries

• A fielded Lucene query string additionally specifies one or more document fields which restricts the search to those fields

lucene('fieldname:pattern') - True if any records contain a field whose

name matches the specified fieldname (which may contain wildcards) and whose data matches the search pattern otherwise false

(36)

Lucene Phrase Searches

Specifies a literal quoted phrase as the search pattern

• Search evaluates to true if the exact phrase is found anywhere in a document field irrespective of term tokenization boundaries

• Can exploit TERMS_BY_FIELD or TERMS indexes

Phrase search patterns are always treated as literal values

• Unlike term searches they cannot take advantage of search pattern modifiers such as ranges, wildcards, and regular expressions

(37)

Lucene Phrase Searches

# search for the phrase "cam" in any field

SELECT hostname FROM CounterOps WHERE lucene('"cam"') # search for the phrase "192.168.1.108" in any field

SELECT SourceAddress Src, DestinationAddress Dst FROM Netflow WHERE lucene('"192.168.1.108"')

# search for the phrase "cam" in the CounterOps.hostname field SELECT hostname FROM CounterOps WHERE lucene('hostname:"cam"')

# search for the phrase "192.168.1.108" in the Netflow.SourceAddress field SELECT SourceAddress Src, DestinationAddress Dst FROM Netflow WHERE

(38)

Lucene Term Searches

Lucene term search specifies an (unquoted) Lucene single term as the search pattern which must match a tokenized term in a document field for the search to evaluate to true

Term searches can exploit TERMS_BY_FIELD or TERMS indexes

Lucene term searches can also take advantage of search pattern modifiers such as ranges, wildcards, and regular

(39)

Lucene Term Searches

# search for the exact term 'cam' in any field

SELECT hostname FROM CounterOps WHERE lucene('cam')

# search for the exact term '192.168.1.108' in any field

SELECT SourceAddress Src, DestinationAddress Dst FROM Netflow WHERE

lucene('192.168.1.108')

# search for the exact term 'cam' in the CounterOps.hostname field

SELECT hostname FROM CounterOps WHERE lucene('hostname:cam')

# search for the exact term '192.168.1.108' in the Netflow.SourceAddress field

SELECT SourceAddress Src, DestinationAddress Dst FROM Netflow WHERE

(40)

Lucene Term Searches

# matches any hostname containing the hyphen-delimited term 'sqrrl' # does not match single term 'sqrrl_wk301'

SELECT hostname FROM CounterOps WHERE lucene('hostname:sqrrl')

# matches any hostname containing the hyphen-delimited term 'wk301' # does not match single term 'sqrrl_wk301'

SELECT hostname FROM CounterOps WHERE lucene('hostname:wk301')

# these queries return no rows because the search pattern matches no term in any hostname field

SELECT hostname FROM CounterOps WHERE lucene('hostname:qrrl-wk') SELECT hostname FROM CounterOps WHERE lucene('hostname:sqrrl-wk301')

# matches single term 'sqrrl_wk301'

(41)

Fields in the Lucene Query String

Lucene Field Selectors & Visibility Labels

• Lucene queries do not require visibility labels with field names

• A specified field name matches all visible fields with the same base name in accordance with the user's visibility roles

Hierarchical fields are specified in a flattened format of elements separated by forward slashes

(42)

ARRAYS in Lucene Field Selectors

# exact term search on secondaryIds[0]

SELECT json() FROM CounterOps WHERE lucene('secondaryIds[0]:cam')

# exact term search on secondaryIds[1]

SELECT json() FROM CounterOps WHERE lucene('secondaryIds[1]:"cam.sqrrl.com"')

# use path() to select a particular array element

SELECT path('secondaryIds[1]') FROM CounterOps WHERE lucene('secondaryIds[0]:cam')

# use json() to return the entire array

SELECT json('secondaryIds') FROM CounterOps WHERE lucene('secondaryIds[0]:cam')

(43)

ARRAYS in Lucene Field Selectors

All array members are searched if either • the last element of a path is an array • the array index is '[]' or '[*]'

# the following equivalent queries all search all members of the array

SELECT json() FROM CounterOps WHERE lucene('secondaryIds:cam') SELECT json() FROM CounterOps WHERE lucene('secondaryIds[]:cam') SELECT json() FROM CounterOps WHERE lucene('secondaryIds[*]:cam')

(44)

Wildcards In Lucene Field Selectors

The following wildcards can be used in Lucene field selectors

• One asterisk * matches any path element or portion of a path element but * does not span multiple levels of a hierarchical path

• Two asterisks ** match any number of intervening levels in a hierarchical path

SELECT hostname FROM CounterOps WHERE lucene('host*:cam') SELECT hostname FROM CounterOps WHERE lucene('host*ame:cam') SELECT hostname FROM CounterOps WHERE lucene('*ostname:cam') SELECT hostname FROM CounterOps WHERE lucene('host**:cam') SELECT hostname FROM CounterOps WHERE lucene('host**ame:cam') SELECT hostname FROM CounterOps WHERE lucene('**ostname:cam')

(45)

Searching Multiple Sources

It can sometimes be useful to search for a term or phrase across multiple or even all Sqrrl sources

# queries all fields in multiple named sources:

SELECT dataset(), json() FROM CounterOps, Netflow WHERE lucene('192.168.1.108')

# you can optionally supply a field spec if the sources have similarly named

fields

SELECT dataset(), json() FROM CounterOps, Netflow WHERE lucene('*Address:192.168.1.108')

# no FROM clause - queries all fields in all visible datasets

# (including both sources & models)

(46)

Lucene Wildcards

Single term search patterns (but not phrase patterns) can contain wildcards

Trailing wildcard searches match against tokenized Terms in addition

to the entire field value

• They can exploit TERMS_BY_FIELD or TERMS indexes

# trailing wildcards search individual terms in addition to the entire field

SELECT hostname FROM CounterOps WHERE lucene('hostname:cam?')

# also returns 'sqrrl-cam002' & 'sqrrl-cam002-2', but not 'b1lobbycam'

SELECT hostname FROM CounterOps WHERE lucene('hostname:cam*')

# WHERE NOT with a lucene wildcard query

(47)

Lucene Wildcards

Non-trailing wildcard searches match against the entire field value not against tokenized terms

They require full table scans and therefore can be much slower than other search patterns

(48)

Lucene Wildcards

# non-trailing wildcards cannot exploit any indexes so require a full table scan) # non-trailing wildcards match against entire fields, not terms

SELECT hostname FROM CounterOps WHERE lucene('hostname:cam*1') SELECT hostname FROM CounterOps WHERE lucene('hostname:cam?1')

# these queries match both field values 'sqrrl_wk301' and 'sqrrl-wk301' SELECT hostname FROM CounterOps WHERE lucene('hostname:sqrrl?wk301') SELECT hostname FROM CounterOps WHERE lucene('hostname:sqrrl*wk301')

# leading wildcards are permitted

# like other non-trailing wildcards they match against entire fields, not terms SELECT hostname FROM CounterOps WHERE lucene('hostname:*qrrl-wk301')

(49)

Lucene String Range Searches

String range searches match against tokenized terms in addition to the entire field value

• Can exploit TERMS_BY_FIELD or TERMS indexes.

Range searches allow one to match documents whose field values are between the (inclusive) lower bound and the (exclusive) upper bound specified by the range search

Both of these queries return 11 more rows than their 'LIKE' counterparts do which are of the form /[a-zA-Z]*\-[b-d].*/:

SELECT hostname FROM CounterOps WHERE lucene('hostname:[b TO d]*') SELECT path('secondaryIds[1]') id1 FROM CounterOps WHERE

(50)

Lucene Regular Expressions

Match against the entire field value not against tokenized terms

Require full table scans and therefore can be much slower than other search patterns

Must be delimited with double slashes

SELECT hostname FROM CounterOps WHERE lucene('hostname://cam//') SELECT hostname FROM CounterOps WHERE lucene('hostname://^cam//') SELECT hostname FROM CounterOps WHERE lucene('hostname://cam$//') SELECT hostname FROM CounterOps WHERE lucene('hostname://^cam$//') SELECT hostname FROM CounterOps WHERE lucene('hostname://^cam.$//') SELECT hostname FROM CounterOps WHERE lucene('hostname://^cam\d+$//') SELECT hostname FROM CounterOps WHERE lucene('hostname://^cam.*1$//') SELECT hostname FROM CounterOps WHERE lucene('hostname://^[b-c]//') SELECT hostname FROM CounterOps WHERE lucene('hostname://^cam[23].$//') SELECT hostname FROM CounterOps WHERE lucene('hostname://^cam[^23].$//')

(51)

© 2019, Amazon Web Services, Inc. or its Affiliates.

Lucene Compound Queries

Search pattern can combine single terms and phrases with the

Boolean operators AND / OR, or their respective counterparts && and ||

OR is the default conjunction operator

• Links two terms and finds a matching document if either of the terms exist in a document - equivalent to a set union

AND operator matches documents where both terms exist anywhere in the text of a single document - equivalent to a set intersection.

Supports using parentheses to construct compound queries based on sub-queries

SELECT SourceAddress Src, DestinationAddress Dst, Protocol, DestinationPort DstPort FROM Netflow WHERE lucene('(Protocol:TCP OR Protocol:UDP) AND

(52)

Escaping Lucine Special Characters

Lucene supports escaping the following special characters that are part of its query syntax

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /

To escape these characters precede them with “\” (to search for (1+1):2 use the query)

(53)

ORDER BY – Sorting Query Results

Used to control the sort order for SELECT query results

Can include columns or expressions that are not part of the SELECT clause

(54)

ORDER BY a Single Expression

SELECT SourceAddress Src, DestinationAddress Dst, Protocol FROM Netflow WHERE Protocol='ICMP' ORDER BY Src ASC

SELECT SourceAddress Src, DestinationAddress Dst, Protocol FROM Netflow WHERE Protocol='ICMP' ORDER BY Src

SELECT SourceAddress Src, DestinationAddress Dst, Protocol FROM Netflow WHERE Protocol='ICMP' ORDER BY Src DESC

SELECT SourceAddress Src, DestinationAddress Dst, Protocol FROM Netflow WHERE Protocol='ICMP' ORDER BY InBytes DESC

(55)

ORDER BY Multiple Expressions

SELECT SourceAddress Src, DestinationAddress Dst, Protocol FROM Netflow WHERE Protocol='ICMP' ORDER BY Src, Dst

SELECT SourceAddress Src, DestinationAddress Dst, Protocol FROM Netflow WHERE Protocol='ICMP' ORDER BY Dst, Src

SELECT SourceAddress Src, DestinationAddress Dst, Protocol FROM Netflow WHERE Protocol='ICMP' ORDER BY Src ASC, Dst DESC

(56)

ORDER BY Column Numbers

SELECT SourceAddress Src, DestinationAddress Dst, Protocol FROM Netflow WHERE Protocol='ICMP' ORDER BY 1, 2

(57)

ORDER BY … LIMIT … OFFSET

You can only use OFFSET following both an ORDER BY clause and a LIMIT clause

SELECT SourceAddress Src, DestinationAddress Dst, Protocol FROM Netflow WHERE Protocol='ICMP' ORDER BY Src, Dst LIMIT 5

SELECT SourceAddress Src, DestinationAddress Dst, Protocol FROM Netflow WHERE Protocol='ICMP' ORDER BY Src, Dst LIMIT 5 OFFSET 1

(58)

Aggregate Functions

COUNT, MAX, MIN, SUM, AVG

You can use aggregate functions only in the following clauses • SELECT

• GROUP BY

(59)

Counting Source Records

COUNT(fieldname) returns the total number of non-null entries in a given field or path across all matching documents

SELECT COUNT(hostname) FROM CounterOps

You can apply a WHERE clause as usual to restrict the result set

SELECT COUNT(hostname) FROM CounterOps WHERE hostname LIKE 'cam%'

Use IS NULL to count matching records with null values in a particular field

# NOTE THAT THE TRAINING SOURCES DO NOT CONTAIN ANY NULL FIELDS SELECT COUNT(hostname) FROM CounterOps WHERE hostname IS NULL

# These two queries are equivalent

SELECT COUNT(hostname) FROM CounterOps WHERE hostname IS NOT NULL SELECT COUNT(hostname) FROM CounterOps

(60)

Counting Source Records

COUNT(*) returns the total number of matching records and is more efficient than counting a particular field

SELECT COUNT(*) FROM CounterOps

SELECT COUNT(*) Ct FROM Netflow WHERE Protocol='TCP'

SELECT COUNT(*) Ct FROM Netflow WHERE Protocol='TCP' AND

(61)

Counting Unique Values

COUNT(DISTINCT fieldname) is the preferred way to estimate the number of unique values in a field

COUNT(DISTINCT) uses an advanced algorithm (HyperLogLog++) to estimate the number of unique values in any particular field, typically within 5% of the true count

COUNT(DISTINCT) performs much better and consumes far fewer resources than an exact unique field value count

The DISTINCT keyword is unique to the COUNT() function and cannot be used in any other context

(62)

Counting Unique Values

# estimate the number of unique values in a field - the fast but approximate method SELECT COUNT(DISTINCT hostname) FROM CounterOps

SELECT COUNT(DISTINCT hostname) FROM CounterOps WHERE hostname LIKE 'cam%' # not supported

X SELECT DISTINCT hostname FROM CounterOps

If you require an exact count of unique values in a field despite the

performance and resource penalties you can execute a query with the

following form

# count unique values in a field - the slow but exact method SELECT hostname, COUNT(*) FROM CounterOps GROUP BY hostname

(63)

Other Aggregate Functions

SELECT MIN(InBytes), MAX(InBytes), SUM(InBytes), AVG(InBytes) FROM Netflow

SELECT MIN(InBytes) AS MinBytes, MAX(InBytes) AS MaxBytes,

SUM(InBytes) AS SumBytes, round(AVG(InBytes)) AS AvgBytes from

Netflow

SELECT MIN(TimeDuration), MAX(TimeDuration), AVG(TimeDuration),

SUM(TimeDuration) FROM Netflow

SELECT round(MIN(TimeDuration)) MinDuration,

round(MAX(TimeDuration)) MaxDuration, round(AVG(TimeDuration)) AvgDuration, round(SUM(TimeDuration)) SumDuration FROM Netflow

(64)

GROUP BY - Aggregating SELECT Query Results

Allows you to group the returned results of an aggregated expression in the SELECT clause

If the SELECT clause includes an aggregated expression then you can use a GROUP BY clause to group the returned results

GROUP BY value must include all of the non-aggregated expressions from the SELECT clause

Aliases are commonly used in both grouped and ordered queries for

(65)

GROUP BY a Single Expression

GROUP BY with no aggregates

# returns the unique values in the Netflow.Protocol field w/out any aggregation

SELECT Protocol FROM Netflow GROUP BY Protocol

If you group by a single expression then the aggregated values are grouped by every value of the GROUP BY expression that is in the matching data

# outputs the COUNT(*) aggregate for each unique value in the Netflow.Protocol field

SELECT Protocol, COUNT(*) Ct FROM Netflow GROUP BY ProtocolYou cannot group by an aggregated expression

# not allowed

(66)

GROUP BY Multiple Expressions

If you group by multiple expressions the results are grouped by all combinations of values in the grouping expression

The grouping order reflects the order of the GROUP BY expressions

SELECT Protocol, DestinationPort Port, COUNT(*) Ct FROM Netflow GROUP BY

Protocol, Port LIMIT 15

SELECT Protocol, SourceAddress Src, DestinationPort Port, COUNT(*) Ct FROM Netflow GROUP BY Protocol, Src, Port

SELECT Protocol, SourceAddress Src, DestinationAddress Dst, DestinationPort Port, COUNT(*) Ct FROM Netflow GROUP BY Protocol, Src, Dst, Port

(67)

GROUP BY … WHERE

You can apply a WHERE clause as usual to restrict the result set

SELECT Protocol, COUNT(*) Ct FROM Netflow WHERE Protocol IN

('TCP','UDP') GROUP BY Protocol

SELECT Protocol, DestinationPort Dst, COUNT(*) Ct FROM Netflow WHERE

Protocol NOT IN ('TCP','UDP') GROUP BY Protocol, Dst

SELECT Protocol, SourceAddress Src, DestinationPort Port, COUNT(*) Ct FROM Netflow WHERE Protocol NOT IN ('TCP','UDP') GROUP BY

(68)

Order by

Used to control the sort order for SELECT query results

Can include columns that are not part of the SELECT clause

(69)

GROUP BY … ORDER BY

Order by Aggregated expressions

SELECT Protocol, COUNT(*) Ct FROM Netflow GROUP BY Protocol ORDER BY Ct DESC

SELECT DestinationAddress Dst, COUNT(*) Ct FROM Netflow GROUP BY Dst ORDER BY Ct DESC LIMIT 15

Order by any expressions from the GROUP BY clause

SELECT Protocol, COUNT(*) Ct FROM Netflow GROUP BY Protocol ORDER BY Protocol

SELECT Protocol, DestinationPort DstPort, COUNT(*) Ct FROM Netflow GROUP BY Protocol, DstPort ORDER BY Protocol, DstPort

SELECT DestinationPort DstPort, Protocol, COUNT(*) Ct FROM Netflow GROUP BY DstPort, Protocol ORDER BY DstPort, Protocol

Order by a combination of both

SELECT hostname, COUNT(*) AS Ct FROM CounterOps GROUP BY hostname ORDER BY Ct DESC, hostname LIMIT 15

SELECT Protocol, DestinationPort DstPort, COUNT(*) Ct FROM Netflow GROUP BY DstPort, Protocol ORDER

(70)

Arithmetic Operators in Queries

Sqrrl supports using the following basic comparison operators to compare expressions in a SELECT query WHERE clause

• =, <, >, <=, >=, <>

• FLOAT operators: + - * / • INTEGER operators: div mod • STRING concatenation: +\

SELECT TimeStart, InBytes+OutBytes TotalBytes, SourceAddress Src, DestinationAddress Dst, Protocol + ':' + DestinationPort AS

"Protocol:Port" FROM Netflow WHERE InBytes+OutBytes >= 10000000 ORDER BY TotalBytes DESC

(71)

Query Functions – Already Covered

json([fieldPath1, ...])

• Returns a JSON object representing the fields of this document

• Optionally, the results can only show the tree under the specified paths • dataset()

• Returns the dataset containing the matching document • Useful for grouping multi-dataset queries

• Not valid in WHERE clauses

(72)

Query Functions – Already Covered

path(fieldPath)

• Returns the value of a field at the specified path

• Must be used instead of a bare column reference for slash-delimited paths or for fields with reserved keyword names

round(x, [precision])

• Rounds the given number to precision decimal places • If precision is omitted defaults to 0

(73)

Query Functions – Record Identifiers

uuid()

• Returns the uuid of a document/record • event_time()

• Returns the event time of a source record

• Can only be used in queries against sources

SELECT uuid(), event_time(), TimeStart, TimeEnd FROM Netflow ORDER BY TimeStart LIMIT 10

(74)

Query Functions – Date & Time Functions

DATETIME Type Conversions

• A DATETIME value converts to a STRING as an ISO 8601 formatted timestamp rendered in the appropriate time zone

• A STRING value converts to a DATETIME as a prefix of an ISO 8601 formatted timestamp with all remaining fields set to zero. If the time zone is not provided explicitly then UTC is assumed

• The to_date() and format_date() functions offer more control of DATETIME format patterns and timezones

(75)

Query Functions – Date & Time Functions

DATETIME Type Conversions

• DATETIMEs are converted to and from numbers as milliseconds since the epoch 1970-01-01 00:00Z

Aggregate functions such as MIN(), MAX(), and AVG() therefore interpret DATETIME values as milliseconds since the epoch and the result can be explicitly cast back to a DATETIME

DATETIME Arithmetic

DATETIME+/- numeric - Adds or subtracts a number of days from the DATETIME

(76)

Query Functions – Date & Time Functions

to_date(expr, [format])

• Attempts to parse expr as a DATETIME according to the specified format pattern or to the ISO 8601 format if omitted. If an integer is provided as the single argument interprets the value as milliseconds since the epoch

SELECT TimeStart, TimeEnd, SourceAddress Src, DestinationAddress Dst FROM Netflow WHERE TimeStart BETWEEN to_date('2014-11-01') AND to_date('2014-11-03') ORDER BY TimeStart SELECT TimeStart, TimeEnd, SourceAddress Src, DestinationAddress Dst FROM Netflow WHERE

TimeStart BETWEEN to_date('2014-11-03')-1 AND to_date('2014-11-03') ORDER BY TimeStart SELECT to_date(MIN(TimeStart)) MinTime, to_date(AVG(TimeStart)) AvgTime,

to_date(MAX(TimeStart)) MaxTime FROM Netflow

(77)

Query Functions – Date & Time Functions

format_date(datetime, format, [timezone]

• Formats the specified DATETIME with a format pattern

SELECT format_date(MIN(TimeStart), 'MM/dd/yyyy hh:mm:ss') MinTime FROM Netflow

now()

(78)

Query Functions – Date & Time Functions

timestampdiff(field, datetime1, datetime2)

• Returns the difference between two DATETIMEs in the specified units (one of YEAR, MONTH, DAY, HOUR, MINUTE, SECOND). In SELECT queries, field is a keyword and should not be enclosed in quotes

SELECT timestampdiff(DAY, to_date(MAX(TimeStart)), now()) FROM Netflow SELECT timestampdiff(HOUR, to_date(MAX(TimeStart)), now()) FROM Netflow SELECT DestinationAddress Dst, COUNT(*) Ct, format_date(MIN(event_time()),

'yyyy/MM/dd hh:mm:ss') MinTime, format_date(MAX(event_time()), 'yyyy/MM/dd hh:mm:ss') MaxTime, timestampdiff(HOUR, to_date(MIN(event_time())),

to_date(MAX(event_time())))+'H' TimeSpan FROM Netflow GROUP BY Dst ORDER BY Ct DESC LIMIT 20

(79)

Query Functions – Processing Inet ADDR

inet(ipaddress)

• Converts a STRING into an INET_ADDR

SELECT to_date(MIN(event_time())) MinTime, to_date(MAX(event_time())) MaxTime FROM Netflow WHERE DestinationAddress=inet('192.168.0.120')

SELECT ipAddress FROM CounterOps ORDER BY ipAddress

SELECT ipAddress FROM CounterOps WHERE ipAddress < inet('172.16.23.10') ORDER BY ipAddress

SELECT ipAddress FROM CounterOps WHERE ipAddress BETWEEN

(80)

Query Functions – Processing Inet ADDR

contains(subnet, inet_address)

• Returns true if the specified subnet contains the specified INET_ADDR otherwise false

SELECT ipAddress FROM CounterOps WHERE contains('65.18.0.0/8', ipAddress) ORDER BY ipAddress

SELECT COUNT(*) FROM CounterOps WHERE contains('192.168.0.0/24', ipAddress) SELECT COUNT(*) FROM CounterOps WHERE contains('192.168.0.0/255.255.255.0',

ipAddress)

SELECT COUNT(*) Ct, timestampdiff(HOUR, to_date(MIN(event_time())),

to_date(MAX(event_time())))+'H' TimeSpan FROM Netflow WHERE

(81)

Query Functions – String functions

concat datalength len locate lower

ltrim regex replace rtrim str

(82)

Query Functions – Math Functions

abs acos asin atan atan2

ceiling cos exp floor int

ln log log2 log10 round

(83)

Visibility Labels & SELECT Queries

When a source field has visibility labels they become part of the field name so to query those fields you must also specify the visibility label

SELECT path('field@[visibility]')

If you do not specify the visibility label for a field that requires one then no results are returned for that field

If the --strip-visibilities query option is enabled Sqrrl returns all matching data allowed by any of the user's visibility roles

(84)

Visibility Labels & SELECT Queries

If you do include visibility labels when --strip-visibilities is enabled then no results are returned for those fields

If two fields have the same name but different visibility labels AND the user has access to both visibility labels then the resulting value is not

(85)

Escaping Reserved Keywords & Special Characters

A source, field, model, entity, feature, or relationship name that is an invalid SQL identifier or a reserved SQL keyword must be enclosed in double quotes

SELECT field FROM "3"

SELECT username FROM SecurityAnalysis."User"

Generally use the backslash (\) to escape special characters

SELECT json() FROM CounterOps WHERE hostname = 'name\+123'

Except to escape a single quote, use another single quote

(86)

Sqrrl Indexes & SELECT … WHERE Expressions

VALUE_BY_FIELD

• (field, type, value) → Record • Equality, Range, Prefix

• All non-aggregate types

AGGREGATE

• field → (value, record)

• Numeric equality and range • Aggregates only

(87)

Sqrrl Indexes & SELECT … WHERE Expressions

TERMS_BY_FIELD

• (field, term) → record

• Exact-field Lucene term queries

• Can be used for string equality at a cost

• All non-aggregate types, but will be compared as strings

TERMS

• term → (field, record)

• Unfielded or field pattern Lucene term queries

• Can be used for fielded Lucene or string equality at a cost • All non-aggregate types, but will be compared as strings

(88)

Explain plan

Explain() method on the query operations object returned by the createQuery() or createGqlQuery() API call returns a JSON tree representing the iterator hierarchy that will be used to execute the query

Result is *implementation-dependent* and is not guaranteed to be consistent across different versions of Sqrrl but can be used to

optimize WHERE clause formulation to help avoid inefficient full table scans

(89)
(90)

Explain plan

Queries with a relativeSpeed of 3 take advantage of a specifically optimized index and go directly to matching records

Queries with a relativeSpeed of 2 use indexes but may degrade given particular characteristics of the data

• Range searches whose result sizes exceed

sqrrl.server.rangesearch.maxResortingEntries fail over to scanning every document

Queries with a relativeSpeed of 1 indicate queries whose running time scales linearly with the total size of the data

(91)

SELECT queries in the Sqrrl Web UI

You can also execute queries in the Sqrrl Web UI

SELECT hostname FROM CounterOps

Web UI supports SELECT clauses of the form SELECT * which returns all of the available columns for the matching source records

To avoid a mix of columns from different items include an explicit FROM clause with SELECT *

First column displayed contains the event time for each source record • Remaining columns are then displayed in alphabetical order by name

(92)

SELECT queries in the Sqrrl Web UI

Provide a WHERE clause to limit the matching results from SELECT *

SELECT * FROM CounterOps WHERE hostname LIKE 'cam%'

From the results list, you can:

Sort by a specific column by clicking the column heading

change the column order by dragging and dropping a column to a new position

Hide and show individual columns by right-clicking the column name in the table heading then checking or unchecking the checkboxes in the column list

You cannot use SELECT * outside of the Sqrrl Web UI • Executing SELECT * in the Sqrrl shell returns an error

(93)

© 2019, Amazon Web Services, Inc. or its Affiliates.

End

References

Related documents

KALLIWODA | RESEARCH GmbH employees or other persons that were involved in the preparation of this document or related parties are possibly holders of instruments that are

With the exception of the first semester accounting course (Accounting I), no significant differences were found in the mean IS core course grades for students

implementation of a cardiac rehabilitation program at your institute by using this scale.. Please rate eligible patients' perceptions of barriers to participation in a

Flow Entry OpenFlow Controller Ingress Port MAC DA MAC SA EtherType VLAN ID IP Src IP Dst IP Protocol TCP/UDP src port TCP/UDP dst port P-bits IP DSCP Layer 2

a) SELECT * FROM Persons SORT BY ‘FirstName’ DESC b) SELECT * FROM Persons ORDER FirstName DESC c) SELECT * FROM Persons SORT ‘FirstName’ DESC d) SELECT * FROM Persons ORDER

Based on this hypothesis, three analytical mechanical models are developed to predict several aspects of thin- film spallation failure including nucleation, stable and unstable

In conclusion, where statistical meta-analysis is not possible or appropriate, reviewers should take sample size and magnitude of effect into consideration when interpreting find-

All patients despite their level of literacy, understanding, value system and linguistic group, must be allowed the opportunity to exercise their right to be the master of their