Sed is the ultimate stream editor. If that sounds strange, picture a stream flowing through a pipe. Okay, you can't see a stream if it's inside a pipe.
That's what I get for attempting a flowing analogy.
Anyhow, sed is a marvelous utility. Unfortunately, most people never learn its real power. The language is very simple, but the documentation is terrible.
The Solaris on-line manual pages for sed are five pages long, and two of those pages describe the 34 different errors you can get. A program that spends as much space documenting the errors than it does documenting the language has a serious learning curve.
Sed has several commands, but most people only learn the substitute command: s. The substitute command changes all occurrences of the regular expression into a new value. A simple example is changing "day" in the "old"
file to "night" in the "new" file:
sed s/day/night/ <old >new
I didn't put quotes around the argument because this example didn't need them. If you read my earlier tutorial, you would understand why it doesn't need quotes. If you have meta-characters in the command, quotes are necessary.
In any case, quoting is a good habit, and I will henceforth quote future examples. That is:
sed 's/day/night/' <old >new
There are four parts to this substitute command:
s Substitute command /../../ Delimiter
day Regular Expression Pattern String night Replacement string
sed in shell script
If you have many commands and they won't fit neatly on one line, you can break up the line using a backslash:
sed -e 's/a/A/g'
-e 's/o/O/g' \
-e 's/u/U/g' <old >new
Sed is extremely powerful, and you can do things in sed that you can't do in any standard word processor. And because sed is external to the word processor and comes with every Unix system in the world, once you learn sed you'll have a very handy tool in your toolkit, even if (like me) you rarely use Unix.
How it works: You feed sed a script of editing commands (like, "change every line that begins with a colon to such-and-such") and sed sends your revised text to the screen. To save the revisions on disk, use the redirection arrow,
>newfile.txt. Sample syntax:
sed "one-or-two-sed-commands" input.file >newfile.txt sed -f bigger_sed.script input.file >newfile.txt
awk:
Awk is a ``pattern scanning and processing language'' which is useful for writing quick and dirty programs that don't have to be compiled. The calling syntax of awk is like sed:
UNIX> awk program [ file ] or
UNIX> awk -f program-file [ file ]
Like sed, awk can work on standard input or on a file. Like the shell, if you start an awk program with
#!/bin/awk – f
then you can execute the program directly from the shell.
Most systems also have nawk, which stands for ``new awk.'' Nawk has many more features than awk and is generally more useful. I am just going to cover awk, but you should check out nawk too in your own time. Nawk has some nice things like a random number generator, that awk doesn't have.
awk programs are composed of ``pattern-action'' statements of the form:
pattern { action }
What such a statement does is apply the action to all lines that match the pattern. If there is no pattern, then it applies the action to all lines. If there is
no action, then the default action is to copy the line to standard output.
Patterns can be regular expressions enclosed in slashes (they can be more than that, but for now, just assume that they are regular expressions).
So, for example, the program awkgrep works just like ``grep Jim''.
UNIX> cat awkgrep
#!/bin/awk -f /Jim/
UNIX> cat input
Which of these lines doesn't belong:
Bill Clinton
Basically look like C programs. There are some big differences, but for the most part, you can do most basic things that you can do in C.
Awk breaks up each line into fields, which are basically whitespace-separated words. You can get at word i by specifying $i. The variable NF contains the number of words on the line. The variable $0 is the line itself.
So, to print out the first and last words on each line, you can do:
UNIX> cat input
Which of these lines doesn't belong:
Bill Clinton
An alternative awkgrep prints out $0 when it finds the pattern:
Awk has a printf just like C. You don't have to use parentheses when you call it (although you can if you'd like). Unlike print, printf will not print a newline if you don't want it to. So, for example, awkrev reverses the lines of a file:
UNIX> cat awkrev
#!/bin/awk -f
{ for (i = NF; i > 0; i-- ) printf "%s ", $i printf "\n" }
UNIX> awkrev input
belong: doesn't lines these of Which Clinton Bill
A few things that you'll notice about awkrev: Actions can be multiline. You don't need semicolons to separate lines like in C. However, you can specify multiple commands on a line and separate them with semi-colons as in C.
And you can block commands with curly braces as in C. If you want a command to span two lines (this often happens with complex printf statements), you need to end the first line with a backslash.
Also, you'll notice that awkrev didn't declare the variable i. Awk just figured out that it's an integer.