Note: this is not intended to be an introduction to regular expressions (short: regex) but a description of their application within Kannel. For general information regarding regexes please refer to
[BIBLIO-REGEXP]>.
Syntax and semantics of the regex configuration parameter
This section describes the regex-configuration parameters and their effects in combination with the respective non-regex-parameter, e.g.white-listandwhite-list-regex.
How-to setup the regex-parameters
examples, short syntax, what happens on errors Regex-parameters are configured just as every other parameter is configured. Regular expressions are supported as defined by POSIX Extended Regular Expressions. Suppose a configuration where only SMS messages originating from a sender using a number with a prefix of "040", "050", "070" or "090" are accepted. Without regexes the configuration would read
allowed-prefix="040;050;070;090"
Using regular expressions yields a more concise configuration allowed-prefix-regex=^0[4579]0
The following table gives an overview over some regex-operators and their meaning, the POSIX Regular Expressions manual page (regex(7)). Once again, the extended regex-syntax is used and the table is just meant as a means to give a quick-start to regular expressions, the next section features some more complex examples.
Operator Meaning
| or, for example "dog|hog" matches dog or hog.
{number,number} repetition, for example "a{2,5}" matches - among others - "aa", "aaa" and "baaaaad"
* shorthand for {0,}
? shorthand for {0,1}
+ shorthand for {1,}
Appendix F. Regular Expressions
Operator Meaning
[] bracket expression, defines a class of possible
single character matches. For example "[hb]og"
matches "hog" and "bog". If the expression starts with^then the class is negated, e.g. "[^hb]og"
does not match "hog" and "bog" but matches for example "dog".
() groups patterns, e.g. "[hb]o(g|ld)" matches "hog",
"hold", "bog", "bold"
[:class:] A character class such as digit, space etc. See wctype(3) for details.
^ Start of line anchor.
$ End of line anchor.
The advantages of regular expressions are at hand
• Regexes are easier to understand, if one is fluent in POSIX Regular Expressions. Although simple expressions as shown above should be clear to everyone who has ever used a standard UN*X shell.
• Regexes are easier to maintain. Suppose the example above needed to cope with dozens of different prefixes each with subtle differences, in such cases using a - carefully constructed - regular expression could help to keep things in apple pie order. Furthermore regexes help reducing redundancy within the configuration.
• Regexes more flexible than standard parameters.
Nevertheless, it must be mentioned that - in addition to the overhead involved - complexity is an issue, too. Although the syntactic correctness of each used regular expression is ensured (see below) the semantic correctness cannot be automatically proofed.
Expressions that are not compilable, which means they are not valid POSIX regexes, force Kannel to panic with a message like (note the missing "]")
ERROR: gwlib/regex.c:106: gw_regex_comp_real: regex compilation ‘[hbo(g|ld)’ failed: Invalid regular expression (Called from gw/urltrans.c:987:create_onetrans.) PANIC: Could not compile pattern ’[hbo(g|ld)’
As shown the erroneous pattern is reported in the error message.
Regex and non-regex-parameters
Using the regex and non-regex version of a parameter at the same time should be done with caution.
Both are combined in a boolean-or sense, for example white-list=01234
white-list-regex=^5(23)?$
Appendix F. Regular Expressions implies that a number is accepted either if it is "01234", "5" or "523" - note the use of anchors! The same goes for all the other parameters, thus both mechanisms can be used without problems in parallel, but care should be taken that the implications are understood and wanted.
Performance issues
While there is some overhead involved, the actual performance degradation is negligible. At startup - e.g.
when the configuration files are parsed - the regular expressions are pre-compiled and stored in the pre-compiled fashion, thus future comparisons involve executing the expression on some string only. To be on the sure side, before using regexes extensively some benchmarking should be performed, to verify that the loss of performance is acceptable.
Examples
This section discusses some simple scenarios and potential solutions based on regexes. The examples are not meant to be comprehensive but rather informative.
Example 1: core-configuration
The bearerbox must only accept SMS messages from three costumers. The first costumer uses numbers that always start with "0824" the second one uses numbers that start with either "0123" and end in "314"
or start with "0882" and end in "666". The third costumer uses numbers starting with "0167" and ending in a number between "30" and "57".
Important in this and in the following examples is the use of anchors, otherwise a "string contains"
semantic instead of a "string is equal" semantic would be used.
group=core ...
white-list-regex=^((0824[0-9]+)|(0123[0-9]+314)|(0882[0-9]+666)|(0167[0-9]+([34][0-9]|5[0-7])))$
...
Example 3: smsc-configuration
Only SMS messages originating from certain SMSCs (smsc-idis either "foo", "bar" or "blah") are preferably forwarded to this smsc. Furthermore all SMSCs with an id containing "vodafone" must never be forwarded to the smsc. Not the missing anchors around "vodafone".
group=smsc ...
preferred-smsc-id-regex=^(foo|bar|blah)$
denied-prefix-regex=vodafone ...
Appendix F. Regular Expressions
Example 4: sms-service-configuration
Please note that there are a mandatorykeywordand an optionalkeyword-regexfields. That means that service selection can be simplified as in the following example. Suppose that some Web-content should be delivered to the mobile. Different costumers use the same service but they rely on different keywords.
Whenever a sms-service is requested, Kannel first checks whether a regex has been defined, if not a literal match based onkeywordis performed. If a regex is configured then the literal match is never tried.
group=sms-service ...
keyword=web_service
keyword-regex=^(data|www|text|net)$
get-url=http://someserver.net/getContent.jsp ...