• No results found

Anytime Algorithms for Speech Parsing?

N/A
N/A
Protected

Academic year: 2020

Share "Anytime Algorithms for Speech Parsing?"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Anytime Algorithms fl)r Speech Parsing?*

Gfinl;her ( ] 6 r z M a r c u s K e s s c l e r U n i v o r s i t ; y of l,'zlange|>Niirnberg, I M M D V I I I

goerz@inf ormat ik. uni-erl angen, d e

T O P I C A I , PAI']~I/

K e y w o r d s : a n y t i m e M g o r i t h m s , p;~rsing, s p e e c h a n a l y s i s

A b s t r a c t

This paper discusses to which extent the concept of "anytime algorithms" can be applied to parsing algo- rithms with feature unification. We first try to give a more precise definition of what an anytinm algorithm is. We arque that parsing algorithms have to be clas- sified as contract algorithms as opposed to (truly) it> terruptible algorithms. With the restriction that the transaction being active at the time an inl,errupt, is is- sued has to be COml)leted before the interrupt cart be executed, it is possible to provide a parser with linritcd anytime t)ehavior, which is in fact t)dng realized in our re.search l)rototype.

1

I n t r o d u c t i o n

The idea of '%nylime algorithms", which has been around in the tieht of plmming for some time 1, has recently been suggested for application in natural lan- guage and speech l)rocessing ( N L / S P ) 2. An anytime algorithm is an algorit.hm "whose quality of results (legrades graceflflly a~s computation time decreases" ([Russell attd Zilt)erstein 1991], p. 212). In the follow- ing we will first give a more specilic definition of which properties allow an algorithm to be implemented and used as an anytime algorithm. We then apply this knowledge to a specitic aspect of NL/SP, namely pars- ing algorithms in a speech understanding system. In the Appendix we present the A I)C protocol which sup- ports anytime computations.

We will discuss these matters in the framework of the Verbmobil joint research project a, where we are working on the implementation of an incremental chart parser 4. The conception of this I)arser has been derived from earlier work by the llrst author 5.

lef. e.g. [llussell m M Z i l b e r s t e i n 1991]

P'so [ W a h l s t e r 1992] in his i n v i t e d t a l k at CO1,[N(I-92 a ~lThe V e r b m n b i l j o i n t r e s e a r c h p r o j e c t h a s b e e n defined in t h e d o c u m e n t [ V e r b m o b i l tl.eport 1991]

4 t h e V e r l ) m o l , i l / 1 5 p a r s e r , of. [ W e b e r 1!)!)'2] Sthe G u L P p a r s e r , of. [Gi~rz 1988].

2

A i , y t i m e A l g o r i t h m s

[1)etm and Boddy 1988] give the fi)llowing characteri~ zation of anytime algorithms:

1.

2.

3.

they lend themselves to preemptive scheduling techniques (i.e., they cart bc suspended and re- sumed with negligible overhead),

they can be terminated at any time and will return

SOl[le a n s w e r ~ a n d

the attswers reI, urned iml)rove in some welt- behaved maturer as a function of time.

Unforl,unately this characterization does not make a clear distinction between the intplementation of an algorithm and tile algorithm as such.

Point (1) is true of a great many Mgorithms imple- mented on preenq)tive operatirLg systems.

Poin~ (2) can be made true for any algorithm by adding all explicit R e s u l t slot, that is I)reset by a wdue. denoting a w)id result. I,et us call the implementation of an anyl;inm algorithm an anytime producer. Accord- ingly we ttanle the entity interested in the result of such an anytime computation the anytime consumer. Fig- urc 1 shows two such processes in a tightly coupled synchronization loop. Figure 2 shows the same com- municating processes decoupled by the introduction of the R e s u l t slot. Note that synrhronisation is much cheaper in terms of perceived complexity R)r the pro- gramrne.r and runtime synehronisation overhead (just the time to cheek and eventually traverse the mutual exclusion barrier). In such an architecture producer and consumer work under a regime that allows the consmner to interrupt the producer at any lime and dentand a result. The risk that the consumer incurs by such flexibility is a eertMn non-zero probability that this result is void ~ or mtchanged since the last result retrievah

(2)
[image:2.612.132.266.83.159.2] [image:2.612.98.298.217.335.2]

9

-""

... % )

Result

Anytime

Anytime

Consumer

Producer

Figure 1: Tightly coupled processes with complex syn- chronization internals.

Result Slot

Mutex

(preset with

Barrier

"VOID")

~',

\ / '\ ,/

& ~ .~...~

? "X

/+

- f

t /

M . / ' 7 , ; 7 d , - . . . .

, . _ - / M . /

Anytime

Anytime

Consumer

Producer

Figure 2: Processes decoupled by using a result slot protected by a simple mutual exclusion barrier.

Point (3) is surely a much too strong restriction, since it is not always possible to define what exactly

an improvement is for any given algorithm. In NL/SP,

where we are often dealing with scored hypotheses, it is difficult, if not impossible, to devise algorithms that supply answers that improve monotonically as a flmc- tion of invested computational resources (time or pro- cessing units in a parallel architecture).

We propose the following characterization of any- time algorithms:

An algorithm is fit to be used as an anytime producer if its implementation yields a pro- gram that has a Result Production Granular-

ity (RPG) that is cmnpatible with the time

constraints of the consumer.

The notion of R P G is based on the following obser- vation: Computations being performed on f n i t e state machines do not proceed directly from goal state to goal state. Instead they go through arbitrarily large sequences of states that yield no extractable or intelli- gible data to an outside observer. To interrupt a pro- ducer on any of these intermediate states is fruitless, since the result obtained could at best, according to the observation made on point (2) above, be the result that was available in the last goal state of the producer. From the point of view of the consumer the transitions from goal state to goal state in the producer are atomic transactions.

The average length of these transactions in the al- gorithm correspond to average time intervals in the im- plementation, so that we can speak of a granularity with which results are produced.

The time constraints under which the eonsumer is operating then give the final verdict if the implemeuta- tion of an algorithm is usable as an anytime producer. Let us illustrate this by an example: In a real-time NL/SP-system tim upper bound for the R P G will I, yp- ieally be in the range of 10 lOOms. T h a t is, a producer implemented with such an R P G ofl>rs the consumer the chance to trade a 500ms delay for 5 to 50 fllrther potential solutions.

Note that goal states can also be associated with

intermediate results in the producer algorithm. Con-

ceptually there really is not much of a difference be- tween a result and an intermediate result,, but in highly optimized implementations there might be the need to explicitly export such intermediate results, due to data representation incompatibilities or simply because the data might be overwritten by other (non-result) data. Section 4 gives an example of how the R P G of an imple- mentation can be reduced by identifying intermediate goal states that yield information which is of interest to the consumer.

3

B r e a d t h and D e p t h of A n a l y -

sis

In the following we will ask whether and how the idea of anytime producers can be applied within the active chart parsing algorithm scheme with feature unifica- tion. Although the analogy to decision making in plan- ning where the idea of anytime algorithms has been developed seems to be rather shallow, we can, for the operation of the parser, distinguish between depth

and breadth of analysis 7.

We define depth of analysis as the concept refering to the growing size of information content in a fea- ture structure over a given set of non-competing word hypotheses in a certain time segment dur- ing its computation. Larger depth corresponds to a more detailed linguistic description of the same objects.

In contrast, we understand by breadth of analy- sis the consideration of linguistic descriptions re- sulting from the analysis of growing sets of word hypotheses, either from growing segments of the utterance to be parsed or from a larger number of competing word hypotheses in a given time seg- ment.

q'o regard breadth of analysis as a measure in the context of the anytime algorithm concept is in a sense

(3)

trivial: Considering only one l)arse, the more process- ing time the parser is given the larger the analyzed segment of the i n p u t u t t e r a n c e will be. In general, larger breadth corresponds to more information a b o u t competing word hypotheses in an (half-) open time in- terval as opposed to more information a b o u t a given word sequence. So, obviously, b r e a d t h of analysis does not correspond to w h a t is intended by the concel)t of a n y t i m e algorithms, whereas d e p t h of analysis meets the inliention.

If an utterance is syntactically ambiguous, we (:an c o m p u t e more parses the more processing time the parser is given. Therefore, tohis case is a p a r t , icular instance of depth of analysis, beeaase the same word sequence is considered, and not of b r e a d t h of analysis given the definition above. In this case one would like to get the best analysis in t e r m s of the quality scores of its constituents first, and other readings late,', ordered by score. If the parser works incrementally, what hap- pens to be the case for the V e r b m o b i l / 1 5 parser s, the intended effect car, be achieved by the a d j u s t m e n t of a strategy p a r a m e t e r namely to report the analysis of a g r a m m a t i c a l f r a g m e n t of the i n p u t utterance as soon as it is found.

At least one distinction m i g h t be useful for the V e r b m o b i l / [ 5 parser. In our parser a category check is performed on two chart edges for eIficiency reasons, and only if this check is successflfi, the unificatkm of the associated feature structures is performed, llence, an i n t e r r u p t would be admissible after ,,he category check. In this case we emphasize a factorization of the set; of constraints in two distinct subsets: phrasal constraints which are processed by the act.iw~ chart parsing algo- r i t h m schema (with l)olynomial complexity), and func- tional constraints which are solved by the unification a l g o r i t h m (with exponential complexity). ' r h e interface between b o t h types of constraints is a crucial place for the i n t r o d u c t i o n of control in the parsing process in general 9

Since we use a constraint-hased g r a m m a r formal- ism, whose central operation is the unification of fea- ture structures, it does not make sense to a d m i t inter rupts at any time. Instead, the operation of the parser consists of a sequence of transactions. At the most coarse grained level, a transaction would be an appli- cation of the f l m d a m e n t a l rule of active chart t)arsing, i.e. a series of operations which ends when a new edge is introduced into the chart, including the c o m p u t a t i o n of the feature s t r u c t u r e associated with it. Of course this a r g u m e n t holds when an application of the fun- d a m e n t a l rule results in a n o t h e r application of it on s u b u n i t s due to the reeursive structure of the g r a m m a r ruleQ °. Certainly one m i g h t ask whether a smaller grain size makes sense, i.e. the construclion of a fea- ture structure should itself he interruptible. In this case one could t h i n k of the possibility of au interrupt.

Sand for G u l , t ' as well 9 cf. [Maxwell a n d K a p l a n 1994]

l ° T h i s h,'ts been i m p l e m e n t e d in the i n t e r r u p t system of (lul,l) [Ggrz 1988].

after one feature in one of the two feature s t r u c t u r e s to be unified has been l)roeessed. We think t h a t this possibility shouhl be rejected, since feature structures usually contain eoreli'.rences. If we consider a partial feature s t r u c t u r e - - as in an i n t e r m e d i a t e step in the unitication of two feature structures in the s i t u a t i o n where j u s t one feature has been processed, this struc- ture m i g h t not be a realistic partial description of the part of speech under consideration, but simply inad- equate as long as not all e m b e d d e d eoreferences have been established. It seems obvious t h a t the grain size c a n n o t be meaningfully decreased below the processing of one feature. Therefore we decided t h a t t r a n s a c t i o n s m u s t be defined in terms of c o m p u t a t i o n s of whole fea- ture structures.

Nevertheless, a possibility for i n t e r r u p t i n g the com- p u t a t i o n of a feature s t r u c t u r e could be considered in case the set of featnre, s is divided in ~wo classes: fea- tures which are obligatory and features which are op- tional. Members of the last group are candidates for c o n s t r a i n t relaxation which seems to be relevant with respect to robustness at least in the case of speech parsing. We have j u s t s t a r t e d to work on the c o n s t r a i n t relaxation problem, but there is no d o u b t t h a t this is an i m p o r t a n t issue for further research. Nevertheless, at the time being we d o u b t whether the above men- tione.d problem with coreferences couht be avoided in this case.

A further o p p o r t u n i t y for i n t e r r u p t s comes up in cases where the processing of alternatives in unifying disjm)ctiw~' feature structures is delayed. In this case, unilication with one of the disjuncts can be considered as a transaction.

Another chance R)r the i m p l e m e n t a t i o n of a n y t i m e behavior in parsing arises if we consider the gram- m a r from a linguistic perspective ~ oppose.d to the purely formal view taken above. Since semantic con- struction is done by our g r a m m a r as well, the func- tional constraints contain a distinct subset for the pur- pose of s e m a n t i c construction. In a separate b, vesti- gation [Fischer 1994] i m p l e m e n t e d a version of A-I)t{;I ~ [l)inkal 1993] within the. same feature unification fo> realism which buihts semantic structures within the framework of Discourse Representation Theory. It has been shown t h a t the process of DRS construction can be split in two types of transactions, one which can be performed incrementally basically the construction of event representations w i t h o u t t e m p o r a l information - - a n d a n o t h e r one which c a n n o t be concluded before the end of an u t t e r a n c e has been reached - - supplying t e m p o r a l information. Since the first kind of transac- tions represents meaningfnl partial semantic analyses those can be supplied i m m e d i a t e l y on d e m a n d under au a n y t i m e regime.

(4)

4

F e a t u r e

U n i f i c a t i o n

as

a n

A n y t i m e A l g o r i t h m ?

Up to now, in our discussion of an appropriate grain size for the unification of feature structures we consid: ered two cases: the unification of two whole feature structures or the unification of parts of two feature structures on the level of disjuncts or individual fea- tures..In all of these cases unitication is considered as a single step, neglecting its real cost, i.e. time constraints would only affect the number of unification steps, but not the execution of a particular unification operation. Alternatively, one might consider the unification algo- rithm itself as an anytime algorithm with a property which one might call "shallow unification". A shallow unification process would quickly come up with a first, incomplete and only partially correct solution which then, given more computation time, would have to be refined and possibly revised. It seems that this prop- erty cannot be achieved by a modification of existing unification algorithms, but would require a radically different approach. A prerequisite for that would be a sort of quality measure 11 tbr different partial feature structures describing a given linguistic object which is distinct from the subsumption relation. To our knowl- edge, the definition of such a measure is an open re: search question.

5

C o n c l u s i o n

According to [Russell and Zilberst, ein 1991] parsing al- gorithms with feature unification have to be classified as contract algorithms as opposed to (truly) interrupt- ible algorithms: They must be given a particular time allocation in advance, because interrupted at any time shorter than the contract time they will not yield useflll results. At least the transaction which is active at the time an interrupt occurs has to be completed before the interrupt can be executed. With this restriction, it is possible to provide a parser with linqited anytime behavior, which is in fact being realized in the current version of the Verbmobil/15 parser.

A c k n o w l e d g e m e n t s . The authors would like to thank Gerhard Kraetzschmar, Herbert Stoyan, and Hans Weber for w~luable comments on a previous ver.- sion of this paper.

R e f e r e n c e s

[Dean and Boddy 1988]

Thomas Dean and Mark Boddy: An Analysis of

'I]me-Dependent Planning. AAAI 1988, 49--54

[Dongarra, Geist, Manchek and Sundaram 1993] Jack Dongarra, G. A. Geist, Robert Manchek and V. S. Sundaram: Integrated PVM Framework Supports

11 c.f. [Russell a n d W e f a l d 1989]

Heterogeneous Network Computing. Comlmters in

Physics, Vol. 7, No. 2, 1993, 166-175

[Fischer 1994] Fischer, I.: Die kompositionelle Bildung

yon .Diskursrepriisentationsstrtzkturen fiber einer

Chart. Submitted to KONVENS 94, Vienna.

[Gfrz 1988] G5rz, G.: Struktm'analyse natfirli&er

Spra&e. Bonn: Addison-Wesley, 1988

[Maxwell and Kaplan 1.994] Maxwell, J.T, Kaplan, R.: The Interface between Pbr<~sal and F~metional

Constraints. Computational l,inguistics, Vol. 19,

1994, 571- 590

[Pinkal 1993] PinkM, M.: Semantik. In: Gfrz, G. (Ed.): Einffihr,ng in die Kfinstliehe Intelligenz. Bonn: Addison-Wesley, 1993, 425-498

[Russell and Wefald 1989] Rnssell, S.J. and Wefald, E:

Principles of Metareasoning. Proc. KR-89, 1989,

400 411.

[Russell and Zilberstein 1991] Russell, S. a., Zilber- stein, S.: Composing Real-Time Systems. Proc. I3CAI-91, Sydney, 1991, 212-217

[Verbmobil Report 1991] Verbmobil Konsor- titan (13¢1.): Verbmobil- Mobiles Dohnets&ge- rat. BMFT Report, Miinchen, 1991

[Wahlster 1992] Wahlster, W.: Complltational Models of Face-to-Face Dialogs: Multimodality, Negotia-

tion and Translation. Invited talk at COLING-92,

Nantes, 1992. Not contained in the proceedings; copies of slides are available from the author. [Weber 1992] Weber, 1I.: Chart Parsing m ASI, ASL

Tecbnical Report ASL-TR-28-92/UER, Univer- sity of 1;rlaugen-Niirnberg, IMMD VIII, Erlangen, 1992

A p p e n d i x :

A P r o t o c o l for A n y -

t i m e

P r o d u c e r / C o n s u m e r

P r o -

c e s s e s

In the following we introduce the A P C (Anytime Pro-

ducer Consumer) protocol which allows for easy estab-

lishment of anytime producer/consumer relationships on parallel architectures.

Let P r o d u c e r be the flmction that implements the producer algorithm. In a purely sequential procedural call/return implementation this function would have a control structure similar to:

(defml Producer (...) (Initialize)

(let ((Result nil))

(~hile (not (GoodEnough? Result)) (ImproveResult))

(5)

The RP(] of P r o d u c e r is at least t h a t of the func- tion ImproveResult. It is finer if ImproveResult is itself made of loops that produce intermediate results that are ext)ortable to consumers.

q'he consumer is ilnplemented as the function

C o n s u m e r , t h a t at some point calls the l)roducer: (defun Consumer ( . . . )

( P r o d u c e r . . . )

We now translate P r o d u c e r and C o n s u m e r into parallel processes , s i n g the A P C protocol, which is directly implemented by functions t h a t act as in- terfaces to the underlying c o m m u n i c a t i o n / s y n c h r o - nization system. All functions implementing the protocol have the prefix APC: (In our imphunenta- | i o n all of them are in the Conmlon~l,isp package

a n y t | m e - p r o d u c e r - c o n s u m e r ) .

(defun AnytimeProducer (...) (Initialize)

(let ((Result nil))

(while (not (GoodEnough? Result)) (ImproveResult)

;; Make Result available to consumers (APC:SetResult! Result)

;; Check for messages/instructions ;; from Consumer

(APC:CheckStatus)

Result))

In a paralM i m p l e m e n t s | | o r , it is not sullicient for the consumer to simt)ly call the producer. The pro° ducer has to be spawT~ed or forked as a separate process:

(defun AnytimeConsumer ( . . . )

• Create a new p r o c e s s ( l e t ( ( P - A n y t i m e P r o d u c e r - 1

( A I ' g : S t a r t P r o c e s s ( A n y t i m e P r o d u c e r . . . ) ) ) )

( l e t ( ( R e s u l t

(hPC:GetResult P - h n y t i m e P r o d u c e r - 1 ) ) ) (while (not (ConsumerGoodEnough? Result))

; I)o something else, like going to sleep ; to give tile producer some more time

(setf Result,

(APe : G e t R e s n l t P - A n y t i m e P r o d u c e r - 1 ) )

) )

(APe : h b o r t P r o c e s s P-AnytimeProducer-:l) )

The APC Pr()/;()('()l

A P C : S t a r t P r o c e s s F starts a new process in which the procedure F is executed. This function is also responsibh; for tile creation of the protected R e s u l t slot. APe : S t a r t P r o c e s s returns the id of the new process.

No~e that an arbitrm:y number of producers may be started by a consnlner. A prodtlcer may o[' course also start other producers.

A P C : A b o r t P r o c e s s P r o c aborts the process P r e c .

A P C : S e t R e s u l t ! R sets the value of t h e R e s u l t slot to R.

A P C : G e t R e s u l t P • retrieves the current value of

the R e s u l t slot from process P. R e m e m b e r t h a t APC:SetResult ! and APC:GetResult avoid

r e a d / w r i t e conflicts by a locking m e c h a n i s m t h a t implements mutual exclusion.

APC : g e s e t P r o c e s s P r o c I - restarts the process P r o c with new input I.

A P C : C h e c k S t a t u s [ P r o c ] check if any inessages or instructions have arrived from Proc. Often par- allel soft;ware environments offer only very crude process scheduling and control primitives. The user may have to implement sortie of t h e m by himself. A P C : R e s e t P r o e e s s , for example, is (lit" ticult to formulate in a general way. R e s e t

can also involve, ltla.intenance or eleannp work, which is clearly beyond any process-oriented im- ph'.mentation of R e s e t . 'l'he idea is t h a t these user implemented control procedures are hooked into h P C : C h e c k S t a t u s [ P r o c ] . 'lk) a|,tain a line- grained control relationship between consunter and t)roducer, the user simply inserts APC : CheckStatus

at key-positions in the code.

The AP(; protocol has been i m p l e m e n t e d aud tested under a coarse grained p a r a l M C o m m o n l,ist) System running on a four processor SUN- SPARC MP-670. UNIX IPC 1~ shared m e m - ory and s e n | s p h e r e s are used to i m p l e m e n t the h)w-level communica.tion and synchronisation facil- ities. We are currently porting the system to Solaris 2.3, with PVM (Parallel Virtual Machine, see [l)ongarra, Geist, Manchek and S n n d a r a m 1993]) as the basic c o m m u n i c a t k m facility. IWM would al- low us to mow~ our parallel system h'om tile current high communication and low memory b a n d w i d t h im plementation on a shared m e m o r y machine, to a low c o m m u n i c a t i o n / h i g h memory b a n d w i d t h implementa- tion tutoring on a cluster of workstations.

Figure

Figure 2: Processes decoupled by using a result slot protected by a simple mutual exclusion barrier

References

Related documents

Intensified RITA treatment did not increase CDKN1A expression, but increased MDM2 expression (P = 0.01) and activated the TP53 pathway in the xenograft tumor tissue

LCMAPS is a Local Credential MAPping Service [19] which allows the acquisition of credentials (like Unix user ids) for Grid jobs that run on the Local fabric.. LCMAPS offers

Ignoring the uncertainty in the modeling parameters and solely relying on the MAP model (corresponding to the maximum of the posterior PDF) or the MLE model (corresponding to

Common-mode RF currents couple onto the 2-wire 5 V power supply cable and are radiated, as indicated by a current probe that has been clamped around the dc input power cable

The benefit of monitoring community tension and sharing information between communities and partners is primarily to support communities to develop their own solutions and

there is a general misconception, borne of a mix laissez-faire ideo- logy and myth that Europe is subsidising its farmers beyond all reason and ~conomic sense,

débouche sur une concep- tion plus élevée des rap- ports entre nations, sur le dépassement de l'égoïsme national, sur une vision d'avenir commun. Les trente

Additionally, the internal consistence of external efficacy and confidence is not influenced by education, so it is more likely that respondents with low education just