a. h
a, h
The language l iis regular because it can be defined by the regular expression [a(a + b)*a + b(a + b)*b]*
and is accepted by the TG
� COMPLEMENTS AND INTERSECTIONS DEFINITION
•
If l is a language over the alphabet I, we define its complement, l ' , to be the language of
all strings of letters from I that are not words in L. •
Many authors use the bar notation L to denote the complement of the language l, but as with most writing for computers, we will use the form more easily typed.
Complements and Intersections 173
EXAMPLE
If L is the language over the alphabet � = { a b I of all words that have a double a in them, then L' is the language of all words that do not have a double a. •
It is important to specify the alphabet �. or else the complement of L might contain cat, dog.frog, . . . , because these are definitely not strings in L.
Notice that the complement of the language L' is the language L. We could write this as (L' ) ' = L
This is a theorem from set theory that is not restricted only to languages.
THEOREM 1 1
I f L i s a regular language, then L' i s also a regular language. I n other words, the set o f regu
lar languages is closed under complementation.
PROOF
If L is a regular language, we know from Kleene 's theorem that there is some FA that ac
cepts the language L. Some of the states of this FA are final states and, most likely, some are not. Let us reverse the final status of each state; that is, if it was a final state, make it a nonfi
nal state, and if it was a nonfinal state, make it a final state. If an input string formerly ended in a nonfinal state, it now ends in a final state and vice versa. This new machine we have built accepts all input strings that were not accepted by the original FA (all the words in L') and rejects all the input strings that the FA used t o accept (the words i n L). Therefore, this machine accepts exactly the language L ' . So, by Kleene's theorem, L' is regular. •
Notice that even the final status of the - state gets reversed: - � ::!::: •
EXAMPLE
An FA that accepts only the strings aba and abb is shown below:
a . h
An FA that accepts all strings other than aba and abb is shown on the next page.
THEOREM
12
a . h
•
If L1 and L2 are regular languages, then L1 n L2 is also a regular language. In other words, the set of regular languages is closed under intersection.
PROOF
By DeMorgan 's law for sets of any kind (regular languages or not):
L1 n L2 = (L; + L;)' This is illustrated b y the Venn diagrams below:
U- i + C2) =
This means that the language L1 n L2 consists of all words that are not in either L; or L�. Be
cause L1 and L2 are regular, then so are L; and L;. Since L; and L� are regular, so is L; + L;. And because L; + L; is regular, then so is (L; + L;)', which means L1 n L2 is regular. • This is a case of "the proof is quicker than the eye." When we start with two languages L1 and L2, which are known to be regular because they are defined by FAs, finding the FA for L1 n L2 is not as easy as the proof makes it seem. If L 1 and L2 are defined by regular expres
sions, finding L 1 n L2 can be even harder. However, all the algorithms that we need for these constructions have already been developed.
EXAMPLE
Let us work out one example in complete detail . We begin with two languages over
� = { a b ) .
L1 = all strings with a double a
L2 = all strings with an even number of a 's
Complements and Intersections 175
These languages are not the same, because aaa is in L1 but not in L2 and aba is in L2 but not in L,.
They are both regular languages because they are defined b y the following regular ex
pressions (among others):
r1 = (a + b) *aa(a + b)*
r 2 = b*(ab*ab*)*
The regular expression r 2 is somewhat new to us. A word in the language L2 can have some b's in the front, but then whenever there is an a, it is balanced (after some b's) by an
other a. This gives us factors of the form (ab*ab*). The word can have as many factors of this form as it wants. It can end in an a or a b.
Because these two languages are regular, Kleene 's theorem says that they can also be defined by FAs. The two smallest of these are
h a a . h the middle state. This is our opportunity to find a double a. If we read another a from the in
put string while in the middle state, we move to the final state where we remain. If we miss our chance and read a b, we go back to -. If we never get past the middle state, the word has no double a and is rejected. We have seen this before.
The second machine switches from the left state to the right state or from the right state to the left state every time it reads an a. It ignores all b 's. If the string begins on the left and ends on the left, it must have made an even number of left/right switches. There
fore, the strings this machine accepts are exactly those in L2• We have also seen this be
fore.
Now the first step in building the machine (and regular expression) for L 1 n L2 is to find the machines that accept the complementary languages L; and L�. Although it is not neces
sary for the successful execution of the algorithm, the English description of these languages is
L; = all strings that do not contain the substring aa
L� = all strings having an odd number of a 's
In the proof of the theorem where the complement of a regular language is regular, we gave the algorithm for building the machines that accept these languages. All that we have to do is reverse what is a final state and what is not a final state. The machines for these lan
guages are then
b
b
a. b
b
a
Even if we are going to want both the regular expression and the FA for the intersection language, we do not need to find the regular expressions that go with these two component machines. However, it is good exercise and the algorithm for doing this was presented as part of the proof of Kleene's theorem. Recall that we go through stages of transition graphs with edges labeled by regular expressions. FA ; becomes
State 3 is part of no path from - to + , so it can be dropped. To bypass state 2, we need to join the incoming a-edge with both outgoing edges (b-edge to 1 and A-edge to + ). When we add the two loops, we get b + ab and the sum of the two edges from 1 to + is a + A, so the machine looks like this:
b + ab
A
The last step is to bypass state I . To do this, we concatenate the incoming A-label with the loop label starred (b + ab)* concatenated with the outgoing (a + A)-label to produce one edge from - to + with the regular expression for l ; .
r; = ( b + ab)*(a + A)
Let us now do the same thing for the language L;. FA; becomes
b b
a
.\ .\
a
Complements and Intersections 177 Let us start the simplification of this picture by eliminating state 2. There is one incom
ing edge, a loop, and two outgoing edges, so we need to replace them with only two edges:
The path 1 -2-2- 1 becomes a loop at I and the path 1 -2-2- + becomes an edge from 1 to + . After bypassing state 2 and adding the two loop labels, we have
b+ab•a
We can now eliminate state 1 and we have
which gives us the regular expression
r� = (b + ab*a)*ab*
This is one of several regular expressions that define the language of all words with an odd number of a's. Another is
b*ab*(ab*ab*)*
which we get by adding the factor b*a in front of the regular expression for L1 • This works because words with an odd number of a 's can be interpreted as b*a in front of words with an even number of a's. The fact that these two different regular expressions define the same lan
guage is not obvious. The question, "How can we tell when two regular expressions are equal?", will be answered in Chapter 1 1 .
We now have regular expressions for L ; and L�, so we can write the regular expression for L ; + L�. This will be
r; + r� = (b + ab)*(A + a) + (b + ab*a)*ab*
We must now go in the other direction and make this regular expression into an FA so that we can take its complement to get the FA that defines L1 n L2•
To build the FA that corresponds to a complicated regular expression is no picnic, as we remember from the proof of Kleene 's theorem, but it can be done. However not by anybody as reasonable as ourselves. Clever people like us can always fi nd a better way.
An alternative approach is to make the machine for L; + L� directly from the machines
for L; and L� without resorting to regular expressions.
Let us label the states in the two machines for FA ; and FA� as shown:
b a a . b b b
a
b a
where the start states are x1 and y1 and the fi nal states are xi ' x2, and y2• The six possible
combination states are
z1 = x1 or y1 start, final ( words ending here are accepted in FA ;)
z2 = x1 or y2 final (words ending here are accepted on FA ; and FA�)
z3 = x2 or y1 final (words ending here are accepted on FA ;) z4 = x2 o r y2 final (words ending here are accepted on FA ; and FA�) z5 = x3 or y1 not final on either machine
z6 = x3 or Y2 final (words ending here are accepted on FA�) The transition table for this machine is
a + ·
- " 1 Z4 z ,
+ Zz
Z3 22
+ z, 26 2 ,
+ Z4
Z5 Zz
Z5 26 Z5
+ 26
Z5 z6
And so the union machine can be pictured like this:
h
h
This is an FA that accepts the language L ; + L� . If we reverse the status of each state from final to nonfinal and vice versa, we produce an FA for the language L1 n L2• This is it:
h
Complements and Intersections 179
Bypassing z2 and z6 gives
Then bypassing z3 gives
b + ab*a a+bb*a ab*a
b+abb*ab
So, the whole machine reduces to the regular expression
(b + abb*ab)*a(a + bb*aab*a)(b + ab*a)*
Even though we know this expression must be our answer because we know how it was de
rived, let us try to analyze it anyway to see whether we can understand what this language means in some more intuitive sense.
As it stands, there are four factors (the second is just an a and the first and fourth are starred). Every time we use one of the options from the two end factors, we incorporate an even number of a's into the word (either none or two). The second factor gives us an odd number of a's (exactly one). The third factor gives us the option of taking either one or three a's. In total, the number of a's must be even. So, all the words in this language are in L2"
The second factor gives us an a, and then we must immediately concatenate this with one of the choices from the third factor. If we choose the a, then we have formed a double a.
If we choose the other expression, bb*aab*a, then we have formed a double a in a different
way. By either choice, the words in this language all have a double a and are therefore in L 1 • This means that all the words in the language of this regular expression are contained in the language L 1 n L2• But are all the words in L1 n L2 included in the language of this ex
pression?
The answer to this is yes. Let us look at any word that is in L 1 n L2" It has an even num
ber of a's and a double a somewhere in it. There are two possibilities to consider separately:
1. Before the first double a, there are an even number of a's.
2. Before the first double a, there are an odd number of a's.
Words of type 1 come from the expression below:
(even number of a's but not doubled)(first aa)(even number of a's may be doubled)
= (b + abb*ab)*(aa)(b + ab*a)*
= type 1
Notice that the third factor defines the language L1 and is a shorter expression than the r1 we used above.
Words of type 2 come from the expression
(odd number of not doubled a's)(first aa)(odd number of a 's may be doubled)
This completes the calculation that was started on p. 1 74. •
The proofs of the last three theorems are a tour de force of technique. The first was proved by regular expressions and TGs, the second by FAs, and the third by a Venn diagram.
We must confess now that the proof of the theorem that the intersection of two regular languages is again a regular language was an evil pedagogical trick. The theorem is not re
ally as difficult as we made it seem. We chose the hard way to do things because it was a good example of mathematical thinking: Reduce the problem to elements that have already been solved.
This procedure is reminiscent of a famous story about a theoretical mathematician. Pro
fessor X is surprised one day to find his desk on fire. He grabs the extinguisher and douses the flames. The next day, he looks up from his book to see that his wastepaper basket is on fire. Quickly, he takes the basket and empties it onto his desk, which begins to bum. Having thus reduced the problem to one he has already solved, he goes back to his reading. (The stu
dents who find this funny are probably the ones who have been setting the fires in his office.) The following is a more direct proof that the intersection of two regular languages is regular.
GOOD PROOF OF THEOREM 12
Let us recall the method we introduced to produce the union-machine FA3 that accepts any string accepted by either FA 1 or FA2•
To prove this, we showed how to build a machine with states z 1 , z2, • • • of the form x somer mg h. if the input is running on FA 1 or y some mg rh· if the input is running on FA2• If either the x-state or the y-state was a final state, we made the z-state a final state.
Let us now build the exact same machine FA3, but let us change the designation of final states. Let the z-state be a final state only if both the corresponding x-state and the corre
sponding y-state are final states. Now FA3 accepts only strings that reach final states simulta
neously on both machines.
The words in the language for FA3 are words in both the languages for FA 1 and FA2• This
is therefore a machine for the intersection language. •
Not only is the proof shorter but also the construction of the machine has fewer steps.
EXAMPLE
In the proof of Kleene 's theorem, we took the sum of the machine that accepts words with a double a,
Complements and Intersections 181
a b
- xi
Xi Xi
Xi X3 XI
+ x3 X3 X3
and the machine that accepts all words in EVEN-EVEN,
a b
::':: yl Y3 Yi
Yi Y4 Yi
Y3 Yi Y4
Y4 Yi Y3
The resultant union-machine was
a b Old States
::':: zt Zi Z3 x1 or y1
Zi Z4 Zs Xi or Y3
Z3 z6 Z1 x1 or Yi
+ z4 Z7 Zs x3 or y1
Zs Z9 Z IO Xi or Y4
;:6 Zs Z IO Xi or y4
+z7 Z4 Z 1 1 X3 or Y3
+ ;:s Z 1 1 Z4 X3 or Yi
Z9 Z 1 1 z, Xi or Yi
Z IO Z 1i Z5 x1 or y3
+z, , Zs Z7 x3 or y4
+z12 Z7 Z3 xi or y1
The intersection machine is identical to this except that it has only one final state. In or
der for the z-state to be a final state, both the x-and y-states must be final states. If FA 1 and FA2 have only one final state, then FA3 can have only one final state (if it can be reached at all). The only final state in our FA3 is z4, which is x3 or y 1 •
This complicated machine i s pictured below:
a
b b b
The dashed lines are perfectly good edges, but they have to cross other edges. With a lit
tle imagination, we can see how this machine accepts all EVEN-EVEN with a double a. All north - south changes are caused by h's, all east- west by a 's. To get into the inner four states
takes a double a. •
EXAMPLE
Let us rework the example in the first proof once again, this time by the quick method.
This is like the citizens of the fabled city of Chelm who on learning that they did not have to carry all their logs down from the top of the mountain were so overjoyed that they car
ried them all back up again so that they could use the clever work-saving method of
The machine that simulates the same input running on both machines at once is
a
Complements and Intersections 183
h
EXAMPLE
Let us work through one last example of intersection. Our two languages will be L 1 = all words that begin with an a
L2 = all words that end with an a r 1 = a(a + b)*
r2 = (a + b)*a
The intersection language will be
L1 n L2 = all words that begin and end with the letter a
The language is obviously regular because it can be defined by the regular expression a(a + b)*a + a
•
Note that the first term requires that the first and last a's be different, which is why we need the second choice " + a."
In this example, we were lucky enough to "understand" the languages, so we could concoct a regular expression that we "understand" represents the intersection. In general, this does not happen, so we follow the algorithm presented in the proof, which we can ex
ecute even without the benefit of understanding. (Although the normal quota of insights per human is one per year, the daily adult requirement of interpreting regular expressions is even lower.)
For this, we must begin with FAs that define these languages:
b
a, b
a
As it turns out, even though the two regular expressions are very similar, the machines are very different. There is a three-state version of FA2, but no two-state version of FA 1 •
We now build the transition table of the machine that runs its input strings on FA 1 and FA2 simultaneously:
State Read a Read b
- ;: I xi or Y i Xz or Y2 x3 or Yi
22 x2 or y2 x2 or y2 .r2 or Y i X3 or Y1 x3 or y2 x3 or Yi Z4 X2 or Y1 x2 or y2 .r2 or Yi
Z5 or y2 X3 or Y2 x3 or Y i
The machine looks like this:
a h h
(/
(/
(/ (/
h h
If we are bui lding the machine for
L1 + L2 = all words in either L1 or L2 or in both
we would put + 's at any state representing acceptance by L1 or L2, that is, any state with an x2 or a y2:
Problems
22 +
24 + 25 +
Because we are instead constructing the machine for LI n L2 = all words in both L I and L2
we put a + only in the state that represents acceptance by both machines at once:
22 + = x2 or y2
185
Strings ending here are accepted if being run on FA 1 (by ending in x2) and if being run
on FA2 (by ending in y2). •
Do not be fooled by this slight confusion:
22 = x2 or y2 = accepted by FA 1 and FA2 The poor plus sign is perilously overworked.
2 + 2
(sometimes read "2 and 2 are 4") (a QI h repeated as often as we choose) (a string of at least one a)
(all words in L1 QI L2)
(z2 is a final state, the machine accepts input strings if they end here) Arithmetic
For each of the following pairs of regular languages, find a regular expression and an FA that each define l 1 n L2: