A regular expression is a set of pattern matching rules based on string syntax used widely with many text editors and formal computer languages to parse text for copy, paste, replace, or to generate input and output data in a formal programming language.
Programmers have employed regular expressions for decades and have created
standardized set of symbols defining the syntax. This appendix is not intended to serve as a full-fledged reference but rather to get the user up and running quickly in using the Bulk Configurator.
Users are encouraged to refer to the numerous online resources and books that are available on regular expressions for better understanding.
Regular Expression
A regular expression (aka regex or regexp) is a special way of specifying search patterns.
A simple example of a search pattern is *.doc or *.xls to search for files in Windows Explorer. Another example is the Find and Replace (using wild cards) feature in Microsoft Word.
Regular expressions enable us to perform searches within a given text block. They can be used to match strings in a text block, validate data based on specific character sequences, and to form new text strings by replacement. The regex equivalents of the above
examples are .*\.doc$ and .*\.xls$ respectively.
.NET Regular Expressions
Regular expressions are supported by several languages and tools such as Perl®
programming language, Java® programming language, the Microsoft .NET® languages, awk, the UNIX® grep tool, etc. Though the exact syntax and supported features could vary among the different languages, for practical purposes, the syntax is quite similar.
For a comparison of the different flavors, refer to:
• http://www.regular-expressions.info/refflavors.html
The Bulk Configurator was developed using Microsoft .NET language and therefore uses the .NET regular expression engine and its syntax.
Starter Syntax Examples
In the Bulk Configurator Rulebook, a rule specification includes:
1. A ‘search’ regular expression to search for an input or output tag matching a certain pattern
2. A ‘replace’ regular expression to generate a model name by replacing parts of the matched tag
3. A ‘replace’ regular expression to determine the other tag name that needs to be connected back to the tag in (1) to complete the tieback.
Therefore a basic knowledge of both search and replace syntax is essential.
A few basic syntax examples are listed below. Please refer to section Regular Expression Basic Syntax Reference in this document for a more detailed reference.
Search Syntax
• The characters [[\[[\\\^$.|?*+(){}^$.|?*+(){}^$.|?*+(){}^$.|?*+(){}have special meaning in regular expressions. All other characters (alphabets, numerals, other special characters) match a single instance of themselves.
• The character .... (dot) means, “match any character”.
• The character **** means, “match the previous character zero or more times”.
• Combining the above two, .*.*.*.* means, “match any character zero or more times”. .*.* .*.*
is the regex equivalent of the wildcard * in Windows Explorer search.
Similarly, a*a*a*a* means “match the character aaaa zero or more times” and would match with abc, baac, bcaaa but not with bcd.
ab*c would match with abc, fabdcg but not with fbbc.
• A backslash \\\\ in front of any of the above special characters suppresses their
• The character ???? means, “the previous character is optional”. abc1?dabc1?dabc1?dabc1?d will match with abcd and abc1d but not with abc11d (1 is optional, i.e., either zero or one 1 but not more).
\\
\\w*w*w*w* would match with abc, 12_345 but not with a!bc. \\\\d*d*d*d* would match with 123 but not with abc or ab!c.
Negated versions of these are also available. Refer to Regular Expression Basic Syntax Reference.
• […][…][…][…] matches any single character between the square brackets.
a[123]b a[123]b a[123]b
a[123]b matches with a1b, a2b but not with abb or a11b.
a[
^ab matches with abcd but not with cabcd.
• ^^^^ immediately after [[[[ has a different meaning. [[[[^^^^…]…]…]…] matches any single character not between the square brackets.
a[^1]b a[^1]b a[^1]b
a[^1]b matches with a2b but not with a1b
• $$$$ means, “end of string”.
abc$
abc$
abc$
abc$ matches with abc, 123abc but not with 123abc456.
• {n}{n}{n}{n} means, “match the previous character exactly n times”.
ab{2}c ab{2}c ab{2}c
ab{2}c matches with abbc and 123abbc456 but not with abbbc.
• (…) (…) (…) (…) is used to specify grouping and is useful in building a replace regular expression.
Regex Replace
A “search” regular expression and a “replace” regular expression could be used in combination to first match a string and then form a different string by replacement.
The (…)(…)(…)(…)is quite useful for this purpose. As mentioned earlier, it is used to specify grouping. The string $1$1$1$1, $2$2$2$2, etc., is used to indicate the subgroups to use in the replacement.
Search Syntax Matches Replace Syntax
After
As can be seen from the above examples, the new string is formed by replacing the matches in the original string with the replacement string.
The different replacement sequences are listed below:
$1, $2, …
${name} Matched text of a named capture group
$`
$`
$`
$` Text before match
$$
$+ Last subgroup match
$_$_
$_$_ Original input string
Examples specific to I/A Series and Triconex Applications The Bulk Configurator looks at the entire I/O tag in the cross-reference file. I/A Series software tags are of the format CompoundName:BlockName.ParameterName. TRISIM tags are simpler and just have the point name with no delimiters like “:” and “.”. Some examples specific to the Bulk Configurator are shown below.
Note: Valid Dynsim model names should start with an alphabet, contain no special characters other than _ and should be less than 60 characters in length. In addition, the tagnames used here are made up to demonstrate the regex use and may not conform to any naming standards.
Filter based 1B03_MW:FY5F01.OUT (.*):FY(.*)\.OUT $1:FI$2.MEAS1 1B03_MW:FI5F01.MEAS1 XV$2 XV5F01
1B03_MW:FY5F02.OUT 1B03_MW:FI5F02.MEAS1 XV5F02
1B03_MW:FY5F01.FBCO_1 (.*):(.*)\.FBCO_(.*) $1:$2.FBCIN_$3 1B03_MW:FY5F01.FBCIN_1 DI$2_$3 DIFY5F01_1
1B03_MW:FY5F01.FBCO_2 1B03_MW:FY5F01.FBCIN_2 DIFY5F01_2
1B03_MW:FY5F01.FBCO_3 1B03_MW:FY5F01.FBCIN_3 DIFY5F01_3
1B03_MW:FY5F02.FBCO_1 1B03_MW:FY5F02.FBCIN_1 DIFY5F02_1
1B03_MW:FY5F02.FBCO_2 1B03_MW:FY5F02.FBCIN_2 DIFY5F02_2
Block + Parameter Names
1B03_MW:FY5F02.FBCO_3 1B03_MW:FY5F02.FBCIN_3 DIFY5F02_3
1B03_MW:FY5F01.OUT (.*)MW:FY(.*)\.OUT $1AWD:FI$2.POINT 1B03_AWD:FI5F01.POINT XV$1_$2 XV1B03_5F01
1B03_MW:FY5F02.OUT 1B03_AWD:FI5F02.POINT XV1B03_5F02
1B03_MW:FY5F03.OUT 1B03_AWD:FI5F03.POINT XV1B03_5F03
1B04_MW:FY5F04.OUT 1B04_AWD:FI5F04.POINT XV1B04_5F04
Compound + Block
1B05_MW:FY5F05.OUT 1B05_AWD:FI5F05.POINT XV1B05_5F05
Regular Expression Basic Syntax Reference
Characters
Character Description Example
Any character except
[\^$.|?*+() All characters except the listed special characters match a single instance of themselves. { and } are literal characters, unless they're part of a valid regular expression token (e.g. the {n} quantifier).
a
A backslash escapes special characters to suppress their special meaning.
\\
\\++++ matches +
\Q...\E Matches the characters between \Q and
\E literally, suppressing the meaning of special characters.
Matches the character with the specified ASCII/ANSI value, which depends on the code page used. Can be used in a tab character respectively. Can be used in character classes. character (\x1B), form feed (\x0C) and vertical tab (\x0B) respectively. Can be used in character classes.
\cA through \cZ Match an ASCII character Control+A through Control+Z, equivalent to \x01 through \x1A. Can be used in character classes.
\\
\\cMcMcMcM\\\\cJcJcJ matches a cJ DOS/Windows CRLF line break.
Character Classes or Character Sets [abc]
Character Description Example
[ (opening square bracket)
Starts a character class. A character class matches a single character out of all the possibilities offered by the character class. Inside a character class, different rules apply. The rules in this section are only valid inside character classes. The rules outside this section are not valid in character classes, except \n, \r, \t and
\xFF
All characters except the listed special characters.
A backslash escapes special characters to suppress their special meaning.
[
Specifies a range of characters.
(Specifies a hyphen if placed immediately after the opening [)
[a[a
[a[a----zAzAzAzA---Z0-Z0Z0-Z0--9]-9]9] 9]
matches any letter or digit
^ (caret) immediately after the opening [
Negates the character class, causing it to match a single character not listed in the character class. (Specifies a caret if placed anywhere except after the opening [)
\d, \w and \s Shorthand character classes matching digits 0-9, word characters (letters and digits) and whitespace respectively. Can be used inside and outside character classes.
[ [ [
[\\\\dddd\\\\s]s]s] matches a s]
character that is a digit or whitespace
\D, \W and \S Negated versions of the above. Should be used only outside character classes. (Can be used inside, but that is confusing.)
\
\
\
\DDDD matches a character that is not a digit
[\b] Inside a character class, \b is a
Character Description Example
. (dot) Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too.
..
.. matches x or (almost) any other character
Anchors
Character Description Example
^ (caret) Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (i.e. at the start of a line in a file) as well.
$ (dollar) Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the dollar match before line breaks (i.e. at the end of a line in a file) as well. Also matches before the very last line break if the string ends with a line break.
.$ regex pattern is applied to. Matches a position rather than a character. Never matches after line breaks.
\ pattern is applied to. Matches a position rather than a character. Never matches before line breaks, except for the very last line break if the string ends with a line break.
..
..\\\\ZZZZ matches f in abc\ndef
\z Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Never matches before line breaks.
Character Description Example
\b Matches at the position between a word
character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start and/or end of the string if the first and/or last characters in the string are word characters.
..
..\\\\bbbb matches c in abc
\B Matches at the position between two word characters (i.e the position between
\w\w) as well as at the position between two non-word characters (i.e. \W\W).
\\
\\B.B.B.B.\\\\BBB matches b in B abc
Alternation
Character Description Example
| (pipe) Causes the regex engine to match either the part on the left side, or the part on the right side. Can be strung together into a series of options.
| (pipe) The pipe has the lowest precedence of all operators. Use grouping to alternate only part of the regular expression.
abc(def|xyz)
Character Description Example
? (question mark) Makes the preceding item optional.
Greedy, so the optional item is included in the match if possible.
abc?
abc?
abc?
abc? matches ab or abc
?? Makes the preceding item optional. Lazy, so the optional item is excluded in the match if possible. This construct is often excluded from documentation because of its limited use.
* (star) Repeats the previous item zero or more times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all.
".*"
*? (lazy star) Repeats the previous item zero or more times. Lazy, so the engine first attempts to skip the previous item, before trying permutations with ever increasing matches of the preceding item.
".*?"
+ (plus) Repeats the previous item once or more.
Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once.
".+"
+? (lazy plus) Repeats the previous item once or more.
Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item.
".+?"
Repeats the previous item exactly n times.
Repeats the previous item between n and m times. Greedy, so repeating m times is tried before reducing the repetition to n times.
Repeats the previous item between n and m times. Lazy, so repeating n times is tried before increasing the repetition to m times.
{n,} where n >= 1 Repeats the previous item at least n times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only n times.
a{2,}
a{2,}
a{2,}
a{2,} matches aaaaa in aaaaa
{n,}? where n >=
1 Repeats the previous item between n and
m times. Lazy, so the engine first matches the previous item n times, before trying permutations with ever increasing matches of the preceding item.
a{2,}?
a{2,}?
a{2,}?
a{2,}? matches aa in aaaaa
Other Resources
• Books (several)
• Google® for .NET regular expressions
• Tools
o RegexCoach (http://www.weitz.de/regex-coach) o RegexBuddy (http://www.regexbuddy.com)