Proof of Part 3
The proof of this part will be by recursive definition and constructive algorithm at the same time. This is the hardest part of our whole theorem, so we shall go very slowly.
We know that every regular expression can be built up from the letters of the alphabet I and A by repeated application of certain rules: addition, concatenation, and closure. We shall see that as we are building up a regular expression, we could at the same time be building up an FA that accepts the same language.
We present our algorithm recursively.
Rule 1 There is an FA that accepts any particular letter of the alphabet. There is an FA that accepts only the word A.
Proof of Rule 1 If x is in I, then the FA
all �
all � except x
all �
Converting Regular Expressions into FAs 109
Rule 2 If there is an FA called FA , that accepts the language defined by the regular ex
pression r1 and there is an FA called FA2 that accepts the language defined by the regular expression r2, then there is an FA that we shall call FA3 that accepts the language defined by the regular expression (r 1 + r 2).
Proof of Rule 2
We are going to prove Rule 2 by showing how to construct the new machine in the most rea
sonable way from the two old machines. We shall prove FA3 exists by showing how to con
struct it.
Before we state the general principles, let us demonstrate them in a specific example.
Suppose we have the machine FA , pictured below, which accepts the language of all words over the alphabet I, = ( a b } that have a double a somewhere in them
and the familiar machine FA2, which accepts all words that have both an even number of to
tal a's and an even number of total b's (EVEN-EVEN)
b a h
We shall show how to design a machine that accepts both sets. That is, we shall build a ma
chine that accepts all words that either have an aa or are in EVEN-EVEN and rejects all strings with neither characteristic.
The language the new machine accepts will be the union of these two languages. We shall call the states in this new machine z " z2, z3, and so on, for as many as we need. We shall define this machine by its transition table.
Our guiding principle is this: The new machine will simultaneously keep track of where the input would be if it were running on FA 1 alone and where the input would be if it were running on FA2 alone.
First of all, we need a start state. This state must combine x" the start state for FA " and y" the start state for FA2• We call it z 1 • If the string were running on FA 1 , it would start in x1 and if on FA2 in y1 •
All z-states in the FA3 machine carry with them a double meaning - they keep track of which x state the string would be in and which y state the string would be in. It is not as if we are uncertain about which machine the input string is running on - it is running on both FA 1 and FA2, and we are keeping track of both games simultaneously.
What new states can occur if the input letter a is read? If the string were being run on the first machine, it would put the machine into state x2• If the string were running on the second machine, it would put the machine into state y3' Therefore, on our new machine an a puts us into state z2, which means either x2 or y3, in the same way that z1 means either x1 or y1 • Because y1 is a final state for FA2, z 1 is also a final state in the sense that any word whose path ends there on the z-machine would be accepted by FA2•
± z 1 = x1 or y1
Z z = X2 or Y3
On the machine FA3, we are following both the path the input would make on FA 1 and the in
put's path on FA2 at the same time. By keeping track of both paths, we know when the input string ends, whether or not it has reached a final state on either machine.
Let us not consider this "x or y" disjunction as a matter of uncertainty. We know for a fact that the same input is running on both machines; we might equivalently say "x and y. "
We may not know whether a certain person weighed I 00 or 200 lb to start with, but we are certain that after gaining 20 lb, then losing 5 , and then gaining l , his total weight is now ex
actly either 1 1 6 or 2 1 6 lb. So, even if we do not know in which initial state the string started, we can still be certain that given a known sequence of transformations, it is now definitely in either one of two possible conditions.
If we are in state z1 and we read the letter b, then being in x1 on FA 1 and reading a b, we return to x" whereas being in y1 on FA2 and reading a b send us to y2•
z3 = x1 or y2 The beginning of our transition table for FA3 is
a b
· 2, •3
Suppose that somehow we have gotten into state z2 and then we read an a. If we were in FA 1 , we would now go to state x3, which is a final state. If we were in FA2, we would now go back to y" which is also a final state. We will call this condition z4, meaning either x3 or y 1 • Be
cause this string could now be accepted on one of these two machines, :4 is a final state for FA3• As it turns out, in this example the word is accepted by both machines at once, but this is not necessary. Acceptance by either machine FA 1 or FA2 is enough for acceptance by FA3•
Membership in either language is enough to guarantee membership in the union.
If we are in state z2 and we happen to read a b, then in FA 1 we are back to x" whereas in FA2 we are in y4• Call this new condition z5 = state x1 or y4•
+ z4 = x3 or Y i z5 = x1 or y4
Converting Regular Expressions into FAs
At this point, our transition table looks l ike this:
a b
1 1 1
What happens if we start from state z3 and read an a ? I f w e were i n FA 1 , we are now in xi; if in FAi, we are now in y4. This is a new state in the sense that we have not encountered
this combination of x and y before; call it state z6.
z6 = Xi or Y4
What if we are in z3 and we read a b? In FA 1 we stay in x1 , whereas in FAi we return to y1 • This means that if we are in z3 and we read a b, we return to state z1 • This is the first time that we have not had to create a new state. If we never got any use out of the old states, the machine would grow ad infinitum.
Our transition table now looks like this:
a b
Both of these are final states because a string ending here on the z-machine will be accepted by FA 1 , because x3 is a final state for FA 1 •
If we are in z1 1 and we read an a, we go to x3 or y2 = z8•
If we are in z1 1 and we read a b, we go to x3 or y3 = zr If we are in z 12 and we read an a, we go to x3 or y3 = zr If we are in z 1 2 and we read a b, we go to x1 or y2 = zy Our machine is now complete. The full transition table is
a b
:tz, Z2 Z3
z2 Z4 Zs
Z3 z6 z ,
+z4 Z7 Zs
Zs Z9 ZIO
z6 Zs z,o
+z1 Z4 z l l
+zs Z1 1 Z4
Z9 Z1 1 z ,
ZIO Z12 Zs
+z1 1 Zs -7 �
+ z,2 Z7 Z3
Here is what FA3 may look like:
If a string traces through this machine and ends up at a final state, it means that it would also end at a final state either on machine FA 1 or on machine FA2• Also, any string accepted by ei
ther FA 1 or FA2 will be accepted by this FA3•
Converting Regular Expressions into FAs 113
ALGORITHM
The general description of the algorithm we employed earlier is as follows. Starting with two machines, FA 1 with states
xi'
Xi•x
3, • • • and FAi with statesYp Yi·
y3, • • • , build a new machine FA3 with states zl ' Zi• z3, • • • , where each z is of the form"xsomething
orYsomethin/'
The combination state
x,tart
orYstart
is the - state of the new FA. If either thex
part or the y part is a final state, then the corresponding z is a final state. To go from one z to another by reading a letter from the input string, we see what happens to thex
part and the y part and go to the new z accordingly. We could write this as a formula:znew
after letter p =[xnew
after letter p on FA 1 J or[ynew
after letter p on FAiJBecause there are only finitely many x's and y's, there can be only fi nitely many possi
ble z's. Not all of them will necessarily be used in FA3 if no input string beginning at - can get to them. In this way, we can build a machine that can accept the sum of two regular ex
pressions if we already know machines to accept each of the component regular expressions
separately. •
EXAMPLE (Inside the proof of Theorem 6)
Let us go through this very quickly once more on the two machines:
b a a, b a
b
b b
a
FA 1 accepts all words with a double a in them, and FAi accepts all words ending in b.The machine that accepts the union of the two languages for these two machines begins:
- z1 = X1 or Y 1 In z 1 if we read an a, we go to Xi or y1 =z
i
In z 1 if we read a b, we go to x1 or Yi = z3, which is a final state since Yi is.
The partial picture of this machine is now
b
In z2 if we read an a, we go to x3 or y1 = z4, which is a final state because x3 is.
In z2 if we read a b, we go to x1 or y2 = zr
In z3 if we read an a, we go to x2 or y1 = z2•
In z3 if we read a b, we go to x1 or y2 = z3"
In z4 if we read an a, we go to x3 or y1 = z4•
In z4 if we read a b, we go to x3 or y2 = z5, which is a final state.
In z5 if we read an a, we go to x3 or y1 = z4•
In z5 if we read a b, we go to x3 or y2
=
z5•The whole machine looks like this:
h a
This machine accepts all words that have a double a or that end in b.
The seemingly logical possibility
z6 = x2 or y2
does not arise. This is because to be in x2 on FA 1 means the last letter read is an a. But to be
in y2 on FA2 means the last letter read is a b. These cannot both be true at the same time, so no input string ever has the possibility of being in state z6. •
EXAMPLE (Inside the proof of Theorem 6)
Let FA 1 be the machine below that accepts all words that end in a:
b a
a
Converting Regular Expressions into FAs 115 and let FA2 be the machine below that accepts all words with an odd number of letters (odd
length):
a, b
a, b
Using the algorithm produces the machine below that accepts all words that either have an odd number of letters or end in a:
b
b b a
a b
The only state that is not a + state is the - state. To get back to the start state, a word must
have an even number of letters and end in b. •
EXAMPLE (Inside the proof of Theorem 6) Let FA 1 be
b
a
which accepts all words ending in a, and let FA2 be
a
b
a
which accepts all words ending in b.
Using the algorithm, we produce
a
b
which accepts all words ending in a or b, that is, all words except A. Notice that the state x, or y2 cannot be reached because x2 means "we have just read an a" and y2 means "we have
just read a b." •
There is an alternate procedure for producing the union-machine form two-component machines that has a more compact mathematical description, but whose disadvantages are well illustrated by the example we have just considered. Let FA , have states x" x2, • • • and FA2 have states y " y2 , • • • •Then we can define FA3 initially as having all the possible states X; or yj for all combinations of i and j. The number of states in FA3 would always be the prod
uct of the number of states in FA 1 and the number of states in FA2• For each state in FA3 we could then, in any order, draw its a-edge and b-edge because they would go to already exist
ing states. What we have done before is create new z-states as the need arose, as in the Japan
ese "just in time" approach to automobile manufacturing. This may seem a little haphazard, and we never really know when or whether the need for a new combination of x and y would arise. This alternate, more organized, approach has the advantage of knowing from the begin
ning just how many states and edges we will need to draw, always the pessimistic estimate of the largest possible number. For the example above, we would start with four possible states:
For each of these four states we would draw two edges, producing
a
h
Converting Regular Expressions into FAs 1 17
This is a perfectly possible FA for the union language FA 1 + FA2• However, on inspection we see that its lower right-hand state is completely useless because it can never be entered by any string starting at - . It is not against the definition of an FA to have such a useless state, nor is it a crime. It is simply an example of the tradeoff between constructing states in our need-to-have policy versus the more universal-seeming all-at-once strategy.
By either algorithm, this concludes the proof of Rule 2.
We still have two rules to go.
Rule 3 If there is an FA 1 that accepts the language defined by the regular expression r1 and an FA2 that accepts the language defined by the regular expression r2, then there is an FA3 that accepts the language defined by the concatenation r 1 r2, the product language.
Proof of Rule 3
Again, we shall verify this rule by a constructive algorithm. We shall prove that such an FA3 exists by showing how to construct it from FA 1 and FA2• As usual, first we do an illustration;
then we state the general principles, but our illustration here first is of what can go wrong, not what to do right.
Let l1 be the language of all words with b as the second letter. One machine that accepts l1 is FA 1 :
a, b
a, b b
Let l2 be the language of all words that have an odd number of a's. One machine for l2 is FA2:
b a b
a
Now consider the input string ababbaa. This is a word in the product language l 1l2, be
cause it is the concatenation of a word in l 1 (ab) with a word in l2 (abbaa). If we begin to run this string on FA " we would reach the + state after the second letter. If we could now somehow automatically jump over into FA2, we could begin running what is left of the input, abbaa, starting in the - state. This remaining input is a word in L2, so it will finish its path in the + state of FA2• Basically, this is what we want to build- an FA3 that processes the first
part of the input string as if it were FA 1 ; then when it reaches the FA , + state, it turns into the - state on FA2• From there it continues processing the string until it reaches the + state on FA2, and we can then accept the input.
Tentatively, let us say FA3 looks something like this:
a. b b
Unfortunately, this idea, though simple, does not work. We can see this by considering a different input string from the same product language. The word ababbab is also in L 1L2, be
cause abab is in L 1 (it has b as its second letter) and bah is in l2 (it has an odd number of a 's).
If we run the input string ababbab first on FA , , we get to the + state after two letters, but we must not say that we are finished yet with the L1 part of the input. If we stopped run
ning on FA 1 after ab, when we reached + in FA " the remaining input string abbab could not reach + on FA2 because it has an even number of a's.
Remember that FA 1 accepts all words with paths that end at a final state. They could pass through that final state many times before ending there. This is the case with the input abab. It reaches + after two letters. However, we must continue to run the string on FA , for two more letters. We enter + three times. Then we can jump to FA2 (whatever that means)
and run the remaining string bah on FA2• The input hab will then start on FA2 in the - state
and finish in the + state.
Our problem is this: How do we know when to jump from FA 1 to FA2? With the input abahhaa we should jump when we first reach the + in FA , . With the input ababbab (which differs only in the last letter), we have to stay in FA 1 until we have looped back to the + state some number of times before jumping to FA2• How can a finite automaton, which must make a mandatory transition on each input letter without looking ahead to see what the rest of the string will be, know when to jump from FA 1 to FA2?
This is a subtle point, and it involves some new ideas.
We have to build a machine that has the characteristic of starting out like FA 1 and fol
lowing along it until it enters a final state at which time an option is reached. Either we continue along FA 1 waiting to reach another +, or else we switch over to the start state of FA2 and begin circulating there. This is tricky, because the r 1 part of the input string can generate an arbitrarily long word if it has a star in it, and we cannot be quite sure of when to jump out of FA , and into FA2• And what happens (heavens forfend) if FA , has more than one + ?
Now let us illustrate how to build such an FA3 for a specific example. The two machines we shall use are
FA 1 = the machine that accepts only strings with a double a in them and
FA2 = the machine that accepts all words that end in the letter b
Converting Regular Expressions into FAs 1 19
h a. h
a a h
a
h a
We shall start with the state zl ' which is exactly like x1• It is a start state, and it means that the input string is being run on FA1 alone. Unlike the union machine the string is not being state that the string must pass through to get eventually to its last state in FA 1• Many strings, some of which are accepted and some of which are rejected, pass through several + states on their way through any given machine.
If we are now in z3 in its capacity as the final state of FA 1 for the first part of this input string, we must begin running the rest of the input string as if it were input of FA2 beginning
at state y1 • Therefore, the full meaning of being in z3 is
(
x3, and we are still running on FA 1z3 = or
yl ' and we have begun to run on FA2
Notice the similarity between this disjunctive (either/or) definition of z3 and the disjunc
tive definitions for the z-states produced by the algorithm given for the addition of two FAs.
tive definitions for the z-states produced by the algorithm given for the addition of two FAs.