• No results found

Streaming String-to-Term Transducers

QREs generalized DReX by allowing expressions to map input strings to output terms. Simi- larly, SSTTs generalize SSTs by producing and maintaining intermediate terms, rather than concrete values. Most simply, an SSTT is obtained by permitting the registers of the machine to carry terms (as syntactic objects), instead of being restricted to carrying concrete values.

q q1 a O x: minpytq:“z`3u,xtr:“2uq y:“q`3 z:“minpr, 5q

Figure 3.3:Example transition in an SSTT. There are three registers,tx,y,zu, all integer- valued. The parameters of interest arep,q, andr, all also integer-valued. We declare that

Px “ tp,ru,Py“ tqu, andPz“ tru, indicating what parameters the runtime values of these registers may depend on. There are two parts to ensuring copylessness: first, the registers must appear at most once on the right-hand side of the update expressions, and second, each register must always hold a single-use term.

There are two issues in this generalization that require a little care: (a) ensuring that each register always holds a term of the correct type, and (b) ensuring copylessness in the presence of parameter substitutions. Each registervholds a termt: the idea is to associatev with a set of parametersPv which over-approximates Paramptq, and ensure that the register updates are consistent withPv.

Recall from section 2.3 thatPis the set of all parameters in the universe. For simplicity, we assume that the set of registersVis itself a finite subset ofP,V ĎP. Each registervis associated with a typeT, expressed succinctly asv:T, and each register be associated with a setPv ĎPof parameters such thatPvXV “ H.

Example 3.6. We present an example transition of an SSTT in figure 3.3. There are three integer-valued registers,x,y, andz, and three parameters,p,q, andr, all also integer-valued. We declare that Px “ tp,ru,Py “ tqu, andPz “ tru, i.e. that at runtime, if the register valuations aretxÞÑtx,yÞÑty,zÞÑtzu, then

Paramptxq Ď tp,ru, Paramptyq Ď tqu, and

Paramptzq Ď tru.

Thus, the following is a well-formed register assignment: val1 “ txÞÑminpp,r`5q,yÞÑ4,zÞÑr`2u,

but the following is not well-formed:

val2 “ txÞÑminpp,r`5q,yÞÑr`4,zÞÑr`2u,

because hereydepends on the forbidden parameterr. Now focus on the register updates during theqÑq1

transition. As with QREs, the update expressiont91tp:“t92uindicates that the actual term substitution needs to be performed while

executing the machine. Observe that the expresions guarantee that if the original register values depend only on permitted parameters (Parampvalpvqq ĎPv, for all registersv), then

their final values also depend only on permitted parameters.

Lastly, let the initial register valuation betxÞÑ minpp,rq,yÞÑ q`2,zÞÑ ru. Then the term held byxafter the transition is

tx1 “minppq`2qrq:“r`3s,minpp,rqrr:“2sq “minppr`3q `2,minpp, 2qq

“minpr`5,p, 2q,

which is also a single-use term. On the other hand, consider the following alternative update expression for the registerx:

x:“minpytq:“z`3u,xq.

If we consider the same initial valuation of the registers as before, then the term held byx after the transition is:

t2

x“minpq`2rq:“r`3s,minpp,rqq

“minppr`3q `2,minpp,rqq “minpr`5,p,rq,

which is not single-use. Observe that the original update expression of figure 3.3 statically guarantees that if the original register valuation is single-use, then so is the final register valuation.

We will set up the definitions of this section so that SSTTs automatically guarantee these well-typedness and single-use properties. 4

Register valuations, update terms, and register assignments. A register valuation

is a function val:VÑ Terms. It is well-formed if for eachv, (a) valpvq:Tv, (b) valpvqis single-use, and (c) Parampvalpvqq ĎPv.

From constantsc, parametersp, registersv, and operations op, we can naturally construct update terms:

9

t ::“ c|p|oppt91,t92, . . . ,t9kq

| t91tp:“t92u (3.2.1)

Compare this definition with the definition of terms in equation 2.3.1: the only difference is the addition of the operator for substitutiont91tp:“t92u. As with QREs, this operation is distinct from term substitutiont1rp:“t2s, and indicates that substitution is to be performed

at runtime.

Let UpdTermspTq be the set of well-typed update terms of typeT. The parameter sup- port Parampt9q of an update termt9 is inductively defined as follows: (a) Parampcq “ H, (b) Paramppq “ tpuifpRVand Parampvq “Pvfor allvPV, (c) Parampoppt91,t92, . . . ,t9kqq “

Parampt91q YParampt92q Y ¨ ¨ ¨ YParampt9kq, and (d) Parampt91rp:“t92sq “ pParampt91qztpuq Y

Parampt92q. The register support Regspt9qis the set of registers appearing int9.

Copyless update termst9are inductively defined as follows: (a) the leaf termscandpare always copyless, (b) oppt91,t92, . . . ,t9kqis copyless if each sub-termt9iis copyless and Parampt9iq

is pairwise-disjoint, and (c)t91rp:“t92sis copyless ifpParampt91qztpuq XParampt92q “ H.

A register assignment is a function f : V Ñ UpdTerms. It is well-formed if Regspfpvqqis pair-wise disjoint and for each register v, (a) fpvq : Tv, (b) fpvq is copyless, and (c) Parampfpvqq ĎPv.

Given the set of register V “ tv1,v2, . . . ,vku, a register valuation val : V Ñ Terms and a register assignmentf:V ÑUpdTerms, we define the subsequent register valuation val1 “fpvalqas follows:

val1pvq “fpvqrv1 :“valv1,v2 :“valpv2q, . . . ,vk :“valpvkqs.

The following result is a sanity check that the above definitions are meaningful:

Proposition 3.7. If val is a well-formed register valuation, andf : V Ñ UpdTerms is a well- formed register assignment, then the subsequent register valuation val1

“fpvalqis also a well- formed register valuation.

Definition 3.8(SSTT). An SSTT is a tupleM“ pQ,V,Σ,Tout,δ,µ,q0,val0,F,νq, where: 1. Qis a finite set of states,

2. Vis a finite set of registers, 3. Σis a finite input alphabet,

4. Toutis the type of the output term,

5. δ:QˆΣÑQis the state transition function,

6. µ:QˆΣˆVÑUpdTerms is the register update function such that for eachq,a, the register assignmentµpq,aq:V ÑUpdTerms is well-formed,

7. q0PQis the initial state,

8. val0 is a well-formed initial register valuation,

9. FĎQis the set of accepting states, and

Semantics. We will define the semantics of SSTTs using configurations, just as we did with SSTs. A configuration ofMis a pairγ“ pq,valq, whereqPQis the current state, and val is

a well-formed register valuation. The initial configuration isγ0 “ pq0,val0q.

Given a configurationγ“ pq,valqand an input symbola, the update functionµpq,aq:

VÑUpdTerms defines a register assignment. We then define the subsequent configuration

γ1 “ pq1,

val1qas: (a)q1 “δpq,aq, and (b) val1 “µpq,aqpvalq. We use the notationγÑa γ1 to express the relation thatγ1 is the successor configuration ofγ, and lift this to entire strings

γÑwγ1

in the usual way.

Finally, we define the functionJMK:Σ

˚ Ñ

TermspToutqimplemented by the SSTTMas follows. Given an input stringw, say thatγ0Ñwγf, whereγf “ pqf,valfq. Ifqf PF, then:

JMKpwq “νpqqrv1 :“valfpv1qsrv2:“valfpv2qsr¨ ¨ ¨ srvk :“valfpvkqs. (3.2.2)

Otherwise, JMKpwq is undefined. A function f : Σ

˚

Ñ TermspToutq is a regular cost function if there is an SSTTMsuch that for all input streamsw,fpwq “JMKpwq.

We present some example machines in figure 3.4. While parameters are required for full expressive power, in most cases of practical interest, parameter-free SSTTs suffice, as shown in the figure. Furthermore, the definition of SSTTs we give here, where each register holds an arbitrary term, is slightly more permissive than the original definition of [13], where each register contained at most one hole named “?”. By theorem 3 of [11], it turns out that the two definitions are equivalent.

SSTTs generalize the class of functions expressible by SSTs. If the cost domain is re- stricted to a monoidpD,¨, 1Dq, then the following theorem states that the classes of functions expressible SSTs and SSTTs coincide. This theorem originally appears as theorem 3 of [13], and states that “FcpD,bq “RpD,bq”. When rephrased in the terminology of this thesis, it states:

Theorem 3.9(Theorem 3 of [13]). LetpD,¨, 1Dqbe a monoid, andΣbe a finite input alphabet. Let F be the class of functions Σ˚

Ñ D expressible by SSTs. Let R be the class of functions

expressible by SSTTs where all registers v and all terms tappearing in the description are of type D, and the only operation appearing int is the monoid operation ¨. Then Fand R are

equal.

Non-determinism. Once again, we can replace the state transition functionδ:QˆΣÑQ

with a state transition relation∆ Ď Qˆ pΣY tuq ˆQand the register update function µ:QˆΣˆVÑUpdTerms withµ:∆ˆVÑUpdTerms to obtain non-deterministic SSTTs. A non-deterministic machineMdefines a relation,JMKĎΣ

˚

ˆTermspToutq. The machine is unambiguous if for each stringw, there is at most one accepting path q0 Ñw qf. For unambiguous machines, the denoted object is once again a partial functionJMK : Σ

˚ Ñ TermspToutq. Along similar lines as theorem 3.4, we have the following result:

Theorem 3.10. If Mis an unambiguous SSTT which maps input stringsw P Σ˚ to outputs

t P TermspToutq, then there is an effectively constructible (deterministic) SSTTMd such that

q S x`y start qS x`z C N y:“y`2 z:“z`1 S M O x:x`y y:“0 z:“0 C{z:“z`1 S M O x:x`z y:“0 z:“0

(a)M3computes the function coffee from example 2.7.

start avgpxq C{y:“y`1 S M N x:“xYy y:“0 C{y:“y`1 S M N x:“xYy y:“0

(b)M4computes the function#avgC from example 2.13.

start

b dPR{b:“b`d M{b:“1.01b

dPR{b:b`d

M{b:“1.01b

(c)M5computes the function bal from example 2.14.

Proof. We first construct an SSTMd1which maps an input stringwto the (necessarily unique) accepting path throughM.Md1 is identical to the construction presented in theorem 3.4.

The second SSTTMd2 maps each each acceptingπthrough the original machineMto the final value computed. By theorem 7 of [11], a deterministic machineMdcan be constructed which computes the same function as the compositionJMd2K˝JMd1K.

Remark3.11. As with SSTs, we will only use unambiguous SSTTs for the proof of theorem 4.1. The adjective “deterministic” will always be implicit in our usage of the unqualified term “SSTT”.

Converting Function Expressions into

Transducers

The QRE / DReX can express exactly the class of regular functions. In this chapter, we will prove the first half of this claim, by presenting a translation algorithm from DReX expressions to SSTs and from QREs to SSTTs. The algorithms are mostly straightforward generalizations of the classical algorithm which converts regular expressions into equivalent DFAs, assuming the determinization procedure of theorems 3.4 and 3.10. Formally, the main result of this chapter is the following:

Theorem 4.1(Closure). IfΣis a finite input alphabet, then:

1. Ifeis a consistent DReX expression, then we can construct an SSTMwhich computesJeK. 2. Ifeis a consistent QRE, then we can construct an SSTTMwhich computesJeK.

We made a stylistic decision in chapter 3 to present SSTs and SSTTs as operating over a finite input alphabetΣ. This is consistent with their original descriptions [10, 13], but narrows the immediate applicability of the results of this chapter to finite input alphabets. Theorem 4.1 will readily generalize if a determined reader rephrases the definitions of regular machines to operate on symbolic predicates.

4.1

From DReX to SSTs

We will now prove the first part of theorem 4.1, that every DReX expression can be translated into an equivalent SST. We will divide the proof into two parts: In the first part, lemma 4.2, we will show that every DReX expression that does not contain an occurrence of the chain or left-chain combinators can be translated into an equivalent SST. The construction will proceed by induction on the expressione, and is very similar to the familiar conversion from regular expressions to NFAs. While we could, in principle, carry out a similar construction even for the case of the chained sum, the details would be messy and unenlightening. Instead, in lemma 4.3, we construct a desugaring from the chained sum to function composition and

the remaining DReX operators, and we already know that SSTs are closed under function composition (theorem 3.3). The first part of theorem 4.1 will follow from lemmas 4.2 and 4.3.

Lemma 4.2. If e is a consistent DReX expression without any occurrences of the chain or left-chain combinators, then we can effectively construct an SSTMwhich computese.

Proof. The proof is by induction on the expressione. For each case, we will construct an

unambiguous SSTMu, which can then immediately be determinized by applying theorem 3.4. For the induction hypotheses, we will assume deterministicSSTs for each sub-expression. We present several of the cases graphically in figure 4.1. For each machine constructed, we need to establish: (a) copylessness, (b) unambiguity, and (c) that it computes the relevant expression: well-parsed strings correspond to accepting paths and vice-versa.

Base cases: These cases are straightforward, as presented in figures 4.1a, 4.1b, and 4.1c.

The machines are trivially copyless, unambiguous, and compute the appropriate base functions.

combinepe,fq: We perform the traditional product construction. Given SSTsMe and Mf

for e and frespectively, we construct a machineM with state spaceQ “ Qe ˆQf

and register set V “ VeZVf. Here AZB “ pt0u ˆAq Y pt1u ˆBq is the disjoint union operator. The initial state ofM is pq0e,q0fq, and the set of accepting states

F “ tpqfe,qffq |qfe PFe,qff PFfu. The output function in statepqfe,qffq is given byνepqfeq ¨νfpqfq, where the component output functions νe andνf are implicitly renamed to the new registerst0u ˆVeandt1u ˆVf respectively. For each symbolaPΣ,

and all synchronized transitionsqe Ña qe1 of Me and qf Ña qf1 of Mf, we create

a transitionpqe,qfq Ña pq1

e,qf1qinM. Finally, the registers are updated separately:

µppqe,qfq,a,veq “ µepqe,a,veqfor each registerve ofMe, and µppqe,qfq,a,vfq “

µfpqf,a,vfq. Once again, the registers in these equations are implicitly renamed to the appropriate registers of the product machine.

Because the registers are maintained separately, and have no interaction except in the output functionν, copylessness is trivial.

Consider a potentially ambiguous input stringw, with two different accepting pathsπ1

andπ2 through the product machine. Project each of these paths on to the component

machines, to obtain paths π1e,π1f, π2e, π2f. Sinceπ1 ‰ π2, it follows that either

π1e ‰π2e or that π1f ‰π2f. Each of these possibilities leads to the conclusion that we have discovered an ambiguous string for one of the component machines, and this violates the induction hypothesis that the machines are deterministic. We therefore conclude thatMis unambiguous.