Position Specifiers - Regular Expressions Chapter Syllabus

Chapter review questions

Chapter 6. Regular Expressions Chapter Syllabus

6.2 Position Specifiers

Position specifiers are characters that are used to specify the position of text within a line. Sometimes these are also called anchor characters. The caret character (^) is the starting position specifier. It is used to match a text string occurring at the start of a line of text. The dollar sign ($) is the end-position specifier and is used to refer to a line that ends with a particular string.

Table 6-1 shows the uses of position specifiers. Table 6-1. Uses of Position Specifiers

Position Specifier Example Result of Match

^Miami Matches word Miami at the start of a line.

Miami$ Matches word Miami at the end of a line.

^Miami$ Matches a line containing only one word, Miami.

^$ Matches a blank line.

^\^ Matches a ^ at the beginning of a line.

\$$ Matches a $ at the end of a line.

Use of $

The dollar sign $ is used to match a string if it occurs at the end of a line. Consider a file with the name myfile having contents as shown below after using the cat command.

$ cat myfile

Finally I got it done. The procedure for adding a new template is completed in three steps. 1- Create a new template.

2- Assign this template to a node with this procedure.

Action -> Agents -> Assign Templates -> Add -> Enter hostname and template nee -> OK 3- After assignment, the template is still on the ITO server. To install it on the required server, the procedure is:

Action -> Agents -> Install/Update SW & Config -> Select Templates, Node name & Force update -> OK

If step 3 is successful, a message appears on ITO message browser showing that update process on the node is complete.

IMPORTANT ===========

The template will not work if the node name specified in it is unknown to ITO server. In our template we specified batch_server which was unknown to ITO server node name in the template. Finally I got out the node name which is more convenient as ITO automatically takes current node name if the name is n ot specified in the template.

Template Options ===============

1- It runs every minute. Scans the file only if it is modified. 2- User initiated action is specified to run restart.

3- A short instruction is provided to run the script. It needs to be modified to make more meaningful.

Let us use the grep command to find all lines in the file that contain the word node. $ grep node myfile

2- Assign this template to a node with this procedure. message browser showing that update process on the node The template will not work if the node name specified node name in the template. Finally I got out the node current node name if the name is not specified in the $

You found out that there are five lines in the file containing the word node. Now let us find only those lines that end with this word by using the $ position specifier.

$ grep node$ myfile

message browser showing that update process on the node node name in the template. Finally I got out the node

The position specifiers can be used with any command that deals with text-type data.

Use of ^

The caret character (^) matches a string at the start of a line. Using the same example of finding the word node, now at the start of a line, enter the following command and watch the result.

$ grep ^node myfile

node name in the template. Finally I got out the node $

As another example, you can list all users on your system with login names starting with the letter "m" as follows.

$ grep ^m /etc/passwd Getting Rid of Blank Lines

Use of position specifiers is very useful in many cases. To show you one example, ^$ can find blank lines in a file. If you want to count blank lines, you can just pipe output of the grep command to the wc command as in the following.

5 $

This command will scan myfile and tell you exactly how many blank lines there are in the file. You can use the grep command to take out all blank lines from the file as shown below. The grep -v command reverses the selection and shows those lines that are not empty.

$ grep -v ^$ myfile

Finally I got it done. The procedure for adding a new template is completed in three steps. 1- Create a new template.

2- Assign this template to a node with this procedure.

Action -> Agents -> Install/Update SW & Config -> Select Templates, Node name & Force update -> OK

If step 3 is successful, a message appears on ITO message browser showing that update process on the node is complete.

IMPORTANT ===========

Template Options ===============

1- It runs every minute. Scans the file only if it is modified. 2- User initiated action is specified to run restart.

3- A short instruction is provided to run the script. It needs to be modified to make more meaningful.

Please note that an "empty line" means a line that doesn't contain any characters. Some lines seem to be empty but actually contain a space or tab character. These lines are not matched by the above command. To match a line that contains space characters, you can use ^[ ]$, where there is a space character between the two square brackets.

Escaping Position Specifiers

Sometimes the actual string contains one of the position specifiers or meta characters. If you pass this string as-is to a command, the shell will expand the meta character to its special meaning, and you will not get correct results. To instruct the shell not to expand a character to its special meaning, you need to escape that character. For this purpose, you use a backslash (\) before the character. For example, if you want to search for the $ character in a file, you will use the grep \$ command instead of grep $. If you don't escape the $ character, this command will display all contents of the file.

Please note that \ is also a special character. To match a backslash, you need to use two backslashes \\ in the string.

6.3 Meta Characters

Meta characters are those that have special meaning when used within a regular expression. You already have seen two meta characters used as position specifiers. A list of other meta characters and their meanings is shown in Table 6-2.

Table 6-2. Meta Characters Used in Regular Expressions

Charact

er Description

* Matches any number of characters, including zero. . Matches any character, one at a time.

[] One of the enclosed characters is matched. The enclosed characters may be a list of characters or a range.

{n1,n2\\ Matches minimum of n1 and maximum of n2 occurrences of the preceding character or regular expression.

\< Matches at the beginning of the word. \> Matches at the end of the word.

\ The character following acts as a regular character, not a meta character. It is used for escaping a meta character.

Use of the Asterisk * Character

The asterisk character is used to match zero or more occurrences of the preceding characters. If you take our example of myfile, the result of the following grep command will be as shown below.

$ grep mom* myfile

name which is more convenient as ITO automatically takes modified.

It needs to be modified to make more meaningful. $

Is this what you were expecting? The grep command found all text patterns that start with "mo" and after that have zero or more occurrences of the letter m. The words that match this criteria are "more," and "modified." Use of * with only a single character is meaningless as it will match anything. For example, if we use m*, it means to match anything that starts with any number of "m" characters including zero. Now each word that does not start with the letter "m" is also matched because it has zero occurrences of "m". So one must be careful when using the asterisk (*) character in regular expressions.

Use of the Dot (.) Character

The dot character matches any character excluding the new line character, one at a time. See the example below where we used the dot to match all words containing the letter "s" followed by any character, followed by the letter "e".

$ grep s.e myfile

new template is completed in three steps.

If step 3 is successful, a message appears on ITO The template will not work if the node name specified specified batch_server which was unknown to ITO server current node name if the name is not specified in the 1- It runs every minute. Scans the file only if it is 2- User initiated action is specified to run restart $

In every line shown above, there is a word containing an "s" followed by another character and then "e". The second-to-last line is of special interest, where this letter combination occurs when we combine the two words "runs every." Here "s" is followed by a space and then an "e".

Use of Range Characters [...]

Consider that you want to list all files in a directory that start with the letters a, b, c, d, or e. You can use a command such as:

$ ls a* b* c* d* e*

This is not convenient if this list grows. The alternate way is to use a range pattern like the following.

$ ls [a-e]*

Square brackets are used to specify ranges of characters. For example, if you want to match all words that contain any of the capital letters from A to D, you can use [A-D] in the regular expression.

$ grep [A-D] myfile

1- Create a new template.

2- Assign this template to a node with this procedure. Action -> Agents -> Assign Templates -> Add -> Enter 3- After assignment, the template is still on the ITO Action -> Agents -> Install/Update SW & Config -> IMPORTANT

3- A short instruction is provided to run the script. $

Similarly, if you need to find words starting with lowercase vowels, [aeiou] will serve the purpose. If such words are desired to be at the beginning of a line, we can use ^[aeiou]. Multiple ranges can also be used, such as ^A[a-z0-9], which matches words that are at the start of a line, has "A" as the first character, and either a lowercase letter or a number as the second character.

The selection criteria can also be reversed using ^ as the first character within the square brackets. An expression [^0-9] matches any character other than a number.

Use of the Word Delimiters \< and \>

These two sets of meta characters can be used to match complete words. The \< character matches the start of a word and \> checks the end of a word. Without these meta characters, all regular expressions match a string irrespective of its presence in the start, end, or middle of a word. If we want to match all occurrences of "this" or "This" as a whole word in a file, we can use the following grep command.

$ grep \<[tT]his\>

If you use \< only, the pattern is matched if it occurs in the start of a word. Using only \> matches a pattern occurring in the end of a word.

6.4 Standard and Extended Regular Expressions

Sometimes you may want to make logical OR operations in regular expressions. As an example, you may need to find all lines in your saved files in the $HOME/mbox file containing a sender's address and date of sending. All such lines start with the words "From:" and "Date:". Using a standard regular expression it would be very difficult to extract this information. The egrep command uses an extended regular expression as opposed to the grep command that uses standard regular expressions. If you use parentheses and the logical OR operator (|) in extended regular expressions with the egrep command, the above-mentioned information can be extracted as follows.

$ egrep '^(From|Date):' $HOME/mbox

Note that we don't use \ prior to parentheses in extended regular expressions.

You may think that this task can also be accomplished using a standard regular expression with the following command; it might seem correct at the first sight but it is not.

$ grep '[FD][ra][ot][me]:' $HOME/mbox

This command does not work because it will also expand to "Fate," "Drom," "Droe," and so on. Extended regular expressions are used with the egrep and awk commands. Sometimes it is more convenient to use standard expressions. At other times, extended regular expressions may be more useful. There is no hard and fast rule as to which type of expression you should use. I use both of these and sometimes combine commands using both types of expressions with pipes to get a desired result. With practice you will come to know the appropriate use.

Chapter Summary

Regular expressions are very useful in day-to-day work where you need to match character patterns. In this chapter, you learned how a UNIX command is executed. Position specifiers are used to match a pattern at the start or end of a line, and you learned the use of caret ^ and dollar $ position specifiers. Then you studied other meta characters and their use in regular expressions. The asterisk character is used to match any number of characters, including zero. The dot character matches one character at a time, including the new line character. Square brackets [] are used for specifying a range of characters. You also used word delimiters \< and \>. These are used to match a complete word during a text pattern matching process.

Chapter Review Questions

1: Describe the process used by the UNIX shell for command execution.

2: What is the command to find all lines in a file that start or end with the word "an"?

3: What is the result of the following command?

grep ^[a-z]$ ?

4: Write a command that lists all users in the /etc/passwd file whose name starts with a vowel and who are assigned the POSIX shell (/usr/bin/sh).

Test Your Knowledge

1: The purpose of the command grep ^Test$ is:

to find the word "Test" in the start of a line to find the word "Test" in the end of a line

to find the word "Test" in the start or end of a line to find a line containing a word "Test" only

2: Square brackets in pattern matching are used for:

escaping meta characters

specifying a range of characters; all of which must be present for a match specifying a range of characters; only one of which must be present for a match specifying a range of characters; one or more of which must be present for a match 3: A regular expression <join matches:

all words starting with "join" all words ending with "join"

all words starting or ending with "join" none of the above

4: The grep command can use:

standard regular expressions only extended regular expressions only

both standard and extended regular expressions

either standard or extended regular expressions but not both of these simultaneously

5: Which of these is NOT a meta character?

* \ $ -

Chapter 7. File Permissions

In document Rehman_HP (Page 70-77)