• No results found

Rewriting X-Forwarded-For ip address

SSL_PROTOCOL

Protocol for the established SSL connection SSL_SESSION_ID

SSL session ID for the established SSL connection Match type

Select Regex Match to require a match or Regex No Match to negate

match. Match criteria

A regular expression to match data in the client cert field selected. Example: If SSL_CLIENT_S_DN is selected OU=management would

match certificates where the client cert has Organisational Unit = management.

1.7. Regular expressions

Web Security Manager has full support for standard PCRE (Perl Compatible Regular Expressions). Following below is a brief regular expression "survival guide". For a more thorough explanation of the subject some links and books are recommended at the end of the section.

1.7.1. What are regular expressions

A regular expression is a formula for matching strings that follow some pattern.

Regular expressions are made up of normal characters and special characters. Normal characters include upper and lower case letters and digits. The characters with special meanings and are described in detail below.

In the simplest case, a regular expression looks like a standard text string. For example, the regular expression "john" contains no special characters. It will match "john" and "john doe" but it will not match "John".

In an input validation context we always want the expression to match the whole string. The ex- pression above would now be expressed as ^john$, where the characters ^ and $ means starting of line and end of line. Now john will only match "john" but not "john doe". To obtain match of "john doe" as well as "john smith" etc. we employ a few more simple special characters. In its simplest form "john lastname" could be expressed as "^john.*$" meaning: A string starting with the characters "john" followed by zero or more (the "*") occurrences of any character (the "."). For those familiar with the simple wild-card character "*" in (a.o.) DOS and Unix, ".*" equals "*" - that is: anything. Specifying anything is not very useful in an input validation context. With regular expressions much more fine grained input validation masks can be defined with the rich set of meta characters, character classes, repetition quantifiers, etc.

1.7.2. Metacharacters

Beginning of string (implied in Web Security Manager)

^

End of string (implied in Web Security Manager)

$

Any character except newline

.

Match 0 or more times

*

Match 1 or more times

+

Match 0 or 1 times; or: shortest match quantifier (i.e. *?)

?

Alternative (like logical OR)

|

Grouping

()

Set of characters (a list of characters)

[]

Repetition modifier

{}

Quote or special

\

Table 5.1. Metacharacters in regular expressions

To present a metacharacter as a data character standing for itself, precede it with \ (e.g. \. matches the full stop character "." only).

Note

In Web Security Manager all regular expressions are forced to match the entire string (URL path or parameter value) by automatically prefixing an expression with "^" and suf- fixing it with "$".

1.7.3. Repetition

Zero or more a's

a*

One or more a's

a+

Zero or one a's (i.e., optional a)

a?

Exactly m a's

a{m}

At least m a's

a{m,}

At least m but at most n a's

a{m,n}

Same as repetition but the shortest match is taken

repetition?

Table 5.2. Repetition in regular expressions Read "a's" as "occurrences of strings, each of which matches the pattern a". Read repetition as any of the repetition expressions listed above it.

Shortest match means that the shortest string matching the pattern is taken. The default is "greedy matching", which finds the longest match.

1.7.4. Special notations with \

tab

\t

newline

return (CR)

\r

character with hex. code hh

\xhh

"word" boundary (zero space assertion)

\b

not a "word" boundary

\B

matches any single international character classified as a "word" character (alphanu- meric or _). Examples: A, z, 1, 9, Æ, â

\w

matches any non-"word" character

\W

matches any whitespace character (space, tab, newline)

\s

matches any non-whitespace character

\S

matches any digit character, equiv. to [0-9]

\d

matches any non-digit character

\D

Matches any UNICODE character classified as numeric

\pN

Table 5.3. Notations with \ in Web Security Manager regular expressions

1.7.5. Character sets [...]

A character set is denoted by [...]. Different meanings apply inside a character set ("character class") so that, instead of the normal rules given here, the following apply:

matches any of the characters in the list (c,h,a,r,a,c,t,e,r,s)

[characters]

matches any of the characters from x to y (inclusively) in the ASCII code

[x-y]

matches the hyphen character -

[\-]

matches the newline; other single character denotations with \ apply normally, too

[\n]

Negation. Matches any character except those that [something] denotes; that is, immediately after the leading [ the circumflex ^ means "not" applied to all of the rest

[^something]

Table 5.4. Character sets in regular expressions

1.7.6. Lookaround

The lookaround construct allows for the creation of regular expressions matching something but only when it is followed/preceded or not followed/preceded by something else. Note that the lookaround construct is a zero-width assertion. It is testing for a match of something else but it will not actually match it - that is why it is called an assertion. The lookaround constructs allows for the creation of otherwise impossible or too complex expressions.

In an input validation context look ahead could be used for specifying an expression allowing angle brackets <> but only when they are not closed.

Negative lookahead. Matches "a" when not followed by expres- sion, where expression is any regular expression.

a(?!expression)

Positive lookahead. Matches "a", when followed by expression. a(?=expression)

Negative lookbehind. Matches "a" when not preceded by fixed- expression, where fixed-expression is any regular expression (?<!fixed-expression)a

specifying a fixed number of characters. That is "aaa" wil work but a+ will not work.

Positive lookbehind. Matches "a" when preceded by fixed-ex- pression.

(?<=fixed-expression)a

Table 5.5. Lookaround in regular expressions

1.7.7. Examples

1.7.7.1. Global URL regular expressions

The URL regular expressions filter matches URLs without parameters on a proxy global basis. If a request matches any of the defined regular expressions, it will be marked as valid by Web Se- curity Manager and forwarded to the back-end server.

Matches Expression

URL with the extension html containing any in- ternational word characters, digits, _ and -. (\w

(/[\w\-]+)+\.html

matches upper and lower case alphanumeric characters plus _).

Same URL starting with /abc, including the URL /abc.html.

/abc(?:/[\w\-]+)*\.html

Same URL matching extensions html and htm (/[\w\-]+)+\.html?

Same URL matching extensions html and pdf. (/[\w\-]+)+\.(html|pdf)

URL with the extension html containing any of

the lower case letters abcdefgh. (/[abcdefgh]+)+\.html

Exact match of /index.html /index\.html

"Natural" URL containing international alphanu- meric characters, digits, _ and -.

(/[\w\-]+)+/?

URL with the extension asp starting with /sw

followed by 0-12 digits.

/sw[0-9]{0,12}\.asp

Only URLs /login or /logout /(login|logout)

Any international characters URL with one of the extensions htm, html shtml or pdf.

(/[\w\-]+)+\.(htm|html|shtml|pdf)

Table 5.6. Examples of global URL regular expressions

1.7.7.2. Validating input parameters

matches regular expression

International alphanumeric characters, underscore, a space, dot, @, parentheses and a dash.

^[\w \.@()\-]+$

digits, ASCII characters a-z, a dot and a space.

^[0-9a-za-z. ]+$

only digits.[0-9] can also be expressed as \d ^[0-9]+$

one to five digits.

^[\d]{1,5}$

only lower case ASCII characters from a-z.

matches regular expression

matches only lower case ASCII characters from a-z and limits the total length to maximum 32 characters.

^[a-z]{0,32}$

Table 5.7. Examples of regular expressions for input validation

1.7.7.3. Global parameters

When specifying global parameters both the name and the value are defined using regular expres- sions.

Matches Value

Name

The specific parameter usepf with the

static value true true

usepf

All parameters with name starting with

parm followed by three digits with the [a-zA-Z\d]{3,32}

parm\d{3}

value any combination of letters a-Z

(upper and lowercase) or digits with a minimum length of 3 and a maximum length of 32 characters.

Any parameter with name consisting of international word characters and with

[\w\s_,/:()@$*\.\-]* \w{1,25}

values containing zero or more"friendly characters".

Table 5.8. Examples of global parameters regular expressions

1.7.7.4. Predefined standard classes in Web Security Manager

The following classes are predefined in Web Security Manager. The classes are presented in the order the Automatic Policy Generator evaluates them when automatically mapping classes to

input parameters. Description Regular expression Class No values allowed empty

Digits - a maximum of 32 digits

\d{1,32}

num

Payment card numbers, allows for spaces and hyphens between number groups.

(?:\d{4}[\-\x20]?){2}\d{4,5}[\- \x20]?(?:\d{2,4})?

payment_card

Microsoft identifier with optional preceding and trailing curly brackets.

{?[A-Za-z0-9]{8}-[A-Za-z0-9]{4}- [A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A- Za-z0-9]{12}}?

ms_ident

International alphanumeric char- acters. No spaces. max. 256 chars.

\w{1,256}

alphanum

Simple international text.

(?!.*(\.\.|//).*)[\w\x20+.,\-:@=/]+

text

Simple international URL match. With parameters. Consecutive

(?:ht-

tps?://)?(?!.*(\.\.|//).*)[\w\x20@,.(){}/\- =?&]+

Description Regular expression

Class

"/" or "." not allowed (negative look ahead)

Text input, international, several special characters allowed includ- ing newline.

[\w\s@,.(){}/\-=?&_:]+

standard

Any number of printable charac- ters. Defined by negating charac-

[^\x00-\x08\x0b\x0c\x0e-\x1f\x7f]+

printable

ter class containing non-printable characters.

Anything but newline.

.+

anything

Anything including newline.

(?:.|\n)*

Anything_multiline

Table 5.9. Predefined standard classes in Web Security Manager

1.7.8. Further reading

A number of web sites and books are describing regular expressions in more detail.

1.7.8.1. Web sites

Wikipedia

A general description

http://en.wikipedia.org/wiki/Regular_expression

The 30 Minute Regex Tutorial