• No results found

Plug and Play Speech Understanding

N/A
N/A
Protected

Academic year: 2020

Share "Plug and Play Speech Understanding"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Manny Rayner, Ian Lewin

& Genevieve Gorrell

netdecisions Ltd

Wellington House,

East Road, CambridgeCB1 1BH, UK

manny.raynerjian.lewinjgen eviev e.go rrel l

@netdecisions.com

Johan Boye

Telia Research

S-12386Farsta, Sweden

[email protected]

Abstract

Plug and Play is an increasingly

im-portant concept in systemand network

architectures. We introduce and

de-scribe a spoken language dialogue

sys-tem architecture which supports Plug

and Playablenetworks of objects in its

domain. Eachdeviceinthenetwork

car-riesthelinguistic anddialogue

manage-mentinformationwhichispertinenttoit

and uploads it dynamically to the

rele-vantlanguageprocessingcomponentsin

the spoken language interface. We

de-scribethe currentstate ofour plugand

play demonstratorand discuss

theoreti-calissuesthatarisefromourwork.Plug

and Play forms a central topic for the

DHommeproject.

1 Introduction

ThenotionofPlugandPlayndsitsmost

natu-ralhomeintheworldofnetworkedhomedevices,

whereitoersatleastthefollowingtwoimportant

properties

thenetwork ofdevices isdynamically

recon-gurableasdevicesarebroughtonlineor

dis-appearoine

zerore-congurationbytheuserisrequired

FrameworksforachievingPlugandPlay

gener-allyaddressthisbyincludingatleastthefollowing

devicesannouncethemselvesonthe network

when theyare pluggedinto it (andalso

dis-covertheexistenceofothers)

devices describe their own capabilities,

pro-vide a means for accessing them and can

queryandaccess thecapabilities ofothers

devicesshouldsupport,wherepossible,

seam-Plug and Play is, not surprisingly, viewed as

apre-requisite forthecommercialsuccess of

net-worked devices in the home. There are already

severalpromisingcandidateplatformsfor

achiev-ing thenecessaryfunctionality,including

Univer-salPlug and Play(UPnP) (Microsoft, 2000)and

Jini (Oaks and Wong, 2000). In this paper, we

address the requirementson spoken dialogue

in-terfaces that arisefrom aplug andplay domain.

We also presentthe current state of ourEnglish

languageplugandplaydemonstratorfor

control-ling lamps, dimmers and sensors, previously

de-scribedin(Rayneretal.,2001b). (Thereisalsoa

Swedishinstantiation).

First, however, we need briey to distinguish

our notion from other notions of plug and play

and recongurability.

The notion of Plug and Play has been used

for dialogue systemtoolkits in which the various

dierent language processing components

them-selves (e.g. recognition, parsing, generation and

dialogue management) can be plugged in and

out. The most prominent instance of this is

theDarpaCommunicatorarchitecture(Goldschen

and Loehr, 1999), which denes interoperability

standards for language processing components.

The intentionis simply that researchers and

de-veloperscanexperimentwithsystemscontaining

dierent instantiations of the language

process-ing components. The Communicator

Architec-tureisnotdesignedtoaddressthespecial

require-ments of a plug and play domain. In fact, the

Communicatorarchitecturedoesnotsupportthe

dynamic re-conguration of language processing

componentswhilethesystemisrunning.

Atamoregenerallevel,simplere-conguration

ofspokenlanguagedialoguesystemshasofcourse

long been a goal of language engineering. But

such re-conguration is nearly always viewed as

the problem of cross-domain or possibly

cross-language porting, e.g. (Glass, 1999). Once one

(2)

even the \database access" scenario. There are

various toolkits,architecturesand methodologies

for rapidly and/or semi-expertly generating new

instances of dialogue systems, e.g. by

abstract-ing away from domain orapplication dependent

features of particular systems, e.g. (Fraser and

Thornton,1995;Kolzer,1999),or`bottom-up'by

aggregationof useful re-congurablecomponents

, e.g. (Sutton et al, 1998; Larsson and Traum,

2000). The automated within-domain

recongu-ration required for a plug and play domain, has

not,toourknowledge,beendescribedpreviously.

Pursuitof plugand playfunctionality(and its

realizationinstrongandweakforms-discussedin

section3) formsacentraltheme oftheDHomme

project. 1

In the rest of this paper, we begin by

detail-ing ourconcrete Plug andPlayscenario- device

control in the home - with an example dialogue

fromourdemonstratorandanoutlineofthemain

dialogueprocessingelements. Insection3,we

dis-tinguishstrongandweaknotionsofPlugandPlay

and their applicability to spoken language

inter-faces. In section 4, we discuss the strong plug

andplaycapabilitywehavebuiltintothe

recogni-tion,parsingand(contextindependent)semantic

interpretation systemcomponentsof our

demon-strator. Insection5,wediscusssomefuturework

forPlugandPlaydialoguemanagement. Section

6containsourconclusions.

2 A Plug and Play Scenario

Inthissectionwepresentexampledialoguesfrom

our currentdemonstrator and briey outline the

main processing elements. The domain is

net-workedhomedevices,anareawhereplugandplay

isalreadyinareasonablyadvancedstateof

devel-opmentandspeechcontrollookshighlyattractive.

2.1 Plug and Play Examples

Figure 1 displays an example dialogue from our

currentdemonstrator.

InU1,theuserasks fortheTV tobeswitched

on. Thesystem reports (S1) that it simply does

not understand the user. (If it possessed \TV"

initsrecognitionvocabularyandknewsomething

aboutthe conceptof TVsit couldhavereported

I don'tknow of any TV).WhenaTVis plugged

intothenetwork(followingU2),thesystemisable

tounderstand, andactontheuser'srepeated

re-questtoswitchontheTV.Thesystemreportson

1

This work is supported by EU 5th Framework

ror" message. Whenanother TV isplugged into

thenetwork,thesystemmustnowengage in

dis-ambiguationbehaviourwhereaspreviouslyithad

noneedto(S7).

S10illustratesthat,intheabsenceofdimmable

lights, \Dim" is not understood, and, possibly,

not even recognized. When a dimmable light is

plugged in (or, at least, knowledge of dimmable

lights is plugged in), then a more helpful error

message can begiven in S12. Finally, when the

grammarisincreasedtocovernewcommands,the

systemmaybegintomakemistakesthatitdidnot

makeoriginally (S13).

2.2 The currentdemonstrator

Our demonstrator expects devices of three main

types: switchable,dimmableandsensors.

Switch-able devicesare binary statedevicesthat can be

set or queried. Dimmable devices have a state

varyingonasinglescalardimensionwhichcanbe

set, changed,or queried. Sensorsare similarbut

canonlybequeried.

Formally, these commands and queriesare

en-coded by a 4-ary slot-value structure and

De-vice Grammars must generate semantic values

containing these slots. (Not all slots are

re-quired for every utterance, of course.) The

four slots are: op (lled by command or query);

level; change and dir (on or o). In

or-der to identify devices, there are 4 other slots:

device(light,tv...),loc(bathroom,kitchen...),

device-spec(all,the...) andpronoun(it,them

...). For example, Is the light in the hall on?

translates to h op=query dir=on device=light

device-spec=theloc=halli. Dim everything by

ten percenttranslatesto hop=commanddir=o

device-spec=everything change=10 i. Switch

the hall and kitchen lights o translates to h

op=command dir=o level=0 device=light h

loc=kitchenloc=hallii.

Dialogue interpretation contains three stages.

First,conjunctionsintheinput(whicharetreated

asjustpackedrepresentations)areunpackedinto

asetof(7-ary)slot-structures.Secondly,aformof

ellipsisresolution,\stickydefaults",takesplacein

which missing slotsare lled in from their

previ-ousvalues. A fragmentarysemanticvalueis

sim-plypastedoverthecorrespondingpartsofthelast

one. Thus And in the bathroom translates to h

loc=bathroomibut theother required slots(e.g.

device)are suppliedfromtheprevious

represen-tation. Finallyreferenceresolutiontriesto

deter-minedeviceidentiersforpronounsanddenitely

(3)

identi-Following contextual interpretation, either a

program(a command orsequence ofsuch) or an

`error'condition(e.g. resolutionfailed toidentify

a device) will havebeen generated. The system

must then executethe program and/orgenerate

feedback. Knowledgeofhowtoexecutethese

pro-grams,e.g. thatitisan`error'totrytoswitchon

alight that is already on, and possiblefeedback

messagesaresimplyhardcodedinto theDialogue

Manager. There is apre-dened set of feedback

messagestructuresassociatedwitheach

underly-ingactionandtheirpossibleresults. Some

exam-pleparaphrasesofmessagestructuresare\TheX

isnowon",\TheXisalreadyon",\TheXisnow

atYpercent",\ThereisnoXintheY".

3 Strong and Weak Plug and Play

Initsweakest form,Plugand Playrefersonlyto

the ability to add a device to a network

with-out manual conguration. Knowledge

distribu-tionisnotincluded. StandardPlugand Playfor

PCperipheralssimplyautomatesthematchingup

of physical devices with software device-specic

driversin thePC. Communicationlinks between

them are establishedby reserving resourcessuch

assharedmemoryandinterruptrequestnumbers.

The weak sense is very useful. Users need not

congure their hardware via jumper switches or

software drivers by entering `magic' numbers in

congurationles.

In the strong sense, Plug and Play can refer

also to modular, distributed knowledge. Devices

notonlysetupnetworkcommunicationsbut

pub-lishinformation aboutthemselvesoverit. Other

devices can obtainand useit. InJini, for

exam-ple, anew printerdevice canregisterits printing

service(andjavacodeforinvokingmethodsonit)

onthenetwork. Then,aword-processing

applica-tion can nd it and congure itself to use it. In

UPnP,devices publishXML descriptionsof their

interfaces.

Thestrong-weakcontrast isnota sharpor

bi-nary one. The word-processor might know the

industry agreed printer interface and so display

a greyed-out print button if no printer is

net-worked. When a new type of printer is

net-worked, it mightsupply additional print options

(e.g. \print colour") that the processor knows

nothingabout.

Onedesirable Plug and Play property in both

strong and weak forms iscommutativity, i.e. the

system understands the same commands in the

same way no matter which device is connected

ingdeviceX.Thisseemsreasonableinaweakplug

and playsystem, but in thestrongcase itwould

mean that the recognizer would cease to

under-standtheword\TV"assoonastheTVwere

dis-connected. Thismightbeconfusingfortheuser.

The strong and weak senses of plug and play

apply to spoken languagedialogueinterfaces. In

the weakest sense, the dialoguesystem mightbe

entirely pre-congured to deal with all possible

devices and device-combinations. The required

knowledgeisalreadypresentinthenetwork. Plug

andPlaythenconsistsofidentifyingwhich

partic-ular devices are currently networked and

estab-lishing communication channels with them. In

thestrongersense,thecomponentsof thespoken

languagedialogueinterfaceacquiretheknowledge

pertinenttoparticulardevicesfromthosedevices.

So,asinexampleS1above,thespeechrecognizer

may not have the word \TV" in its vocabulary

until aTV is pluggedinto thenetwork. The

di-alogue manager may not be capable of uttering

\That deviceis notdimmable"untiladimmable

device is plugged into the network. A strongly

Plug and Play system may therefore be

distin-guishable from aweaker oneby its behaviourin

the absence of certain device specic knowledge.

Iftherelevantknowledgeispresent,onecannotbe

certainwhetheritwaspre-conguredoruploaded

\on demand".

Plug and Play also enforces a certain sort of

modularityonthesystem. Sincedevicesmust

de-claretheinformation requiredtoupdate the

dia-loguecomponents,aclearinterfaceisprovidedfor

re-conguringthe systemfornewtypesof device

aswellasaclearerpictureoftheinternalstructure

ofthose dialoguecomponents. Indeed,itisreally

justadesignchoicewhetherdeviceknowledgeisin

fact installed onlywhenthedevice ispluggedin.

One may, forexample, choose tooptimize

recog-nition performanceon theset ofdevices actually

installed by notloadinginformation about other

devices. Alternatively, one might prefer to

rec-ognize thenamesof devices notinstalled sothat

helpful errormessagescanbedelivered.

Potentially, each component in a spoken

lan-guageinterface(recognizer,parser,interpreter,

di-aloguemanageretc.) canbeupdatedby

informa-tionfromadeviceinaPlugandPlaydomain.

Dif-ferentcomponentsmightsupportdierentdegrees

ofstrengthofthePlugandPlaynotion. F

urther-more,dierentinstantiationsofthesecomponents

may require very dierent sorts of update. To

(4)

car-particular device will evidently beasignicantly

dierent task from updating a recognizer which

usesagrammar-basedlanguagemodel.

Ourcurrentdemonstratorprograminstantiates

aPlugandPlaycapabilityforrecognition,parsing

andcontextindependentsemanticanalysisandis

builtontopoftheNuancetoolkit(Nuance

Com-munications, 1999). The next section discusses

the capability in detail. Section 5discusses and

makessomeproposalsforPlugandPlayDialogue

Management.

4 Distributed Grammar

4.1 Introduction

Inthis section,wewilldescribehowwehave

ad-dressedtheissuesthat arise whenweattempt to

apply the(strong)Plug andPlayscenarioto the

tasksofspeechrecognitionandlanguage

process-ing. Eachdevicewill providetheknowledgethat

thespeechinterfaceneedsinordertorecognisethe

new types of utterance relevant to the device in

question, and convert these utterancesinto

well-formedsemanticrepresentations.

Let's start by considering what this means in

practice. Thereare in fact awhole rangeof

pos-siblescenariosto consider,dependingonhowthe

speechunderstandingmoduleiscongured. Ifthe

module'sconstructionissimpleenough,theremay

beno signicant problems involvedin extending

it to oer Plug and Play functionality. For

ex-ample, the command vocabulary oered by the

speech interfacemayjustconsistof alistofxed

phrases. Inthiscase,PlugandPlayspeech

recog-nitionbecomestrivial:eachdevicecontributesthe

phrases it needs, after which they can be

com-bined into a single grammar. An approach of

thiskindfailshowevertoscaleuptoaninterface

whichsupportscomplexcommands,inparticular

commandswhichcombinewithin thesame

utter-ance language referring to two or moredierent

devices. For example, a command may address

severaldevicesatonce(\turnontheradioandthe

livingroomlight");alternately,severalcommands

maybecombinedintoasingleutterance(\switch

onthecookerandswitchothemicrowave"). Our

experiencewithpracticalspokendeviceinterfaces

suggeststhatexampleslikethesearebynomeans

uncommon.

Another architecture relatively easy to

com-bine with Plug and Play is doing recognition

through a general large-vocabulary recogniser,

and language-processing through device-specic

phrase-spotting (Milward, 2000). Therecogniser

lemsatthelevelofspeechrecognition,anditisin

principlepossibletosupportcomplexcommands.

The maindrawback,however,isthat recognition

qualityismarkedlyinferiorcomparedtoasystem

inwhichrecognitioncoverageislimitedtothe

do-main denedbythecurrentsetofdevices.

Modern speech interfaces supporting complex

commands are typically specied using a

rule-based grammar formalism dened by a platform

like Nuance (Nuance Communications, 1999) or

SpeechWorks (Inc, 2001). The type of grammar

supportedissomesubsetoffullCFG,extendedto

includesemanticannotations. Grammarrules

de-nethelanguagemodelthatconstrainsthe

recog-nitionprocess,tuningittothedomaininorderto

achievehighperformance. (They alsosupplythe

semanticrulesthat denetheoutput

representa-tion;wewillreturntothispointlater). Ifwewant

to implementanambitiousPlugandPlayspeech

recognitionmodulewithinthiskindofframework,

wehavetwotop-levelgoals. Ontheonehand,we

want to achieve high-quality speech recognition.

At thesametime, standard software engineering

considerationssuggest that we want to minimize

the overlapbetween the rule-sets contributed by

each device: ideally, the device will only upload

thespeciclexicalitemsrelevanttoit.

It turns out that our software engineering

ob-jectives conict to some extent with our initial

goalofachievinghigh-qualityspeechrecognition.

Considerastraightforwardsolution,inwhichthe

grammaticalinformationcontributedbyeach

de-vice consistspurely of lexical entries,i.e. entries

oftheform

<Nonterminal> --> <Terminal>

In aCFG-based framework,this implies that we

haveacentraldevice-independentCFGgrammar,

which denesthe otherruleswhich link together

the nonterminals that appear on the

left-hand-sides of thelexical rules. The crucial questionis

what these lexical non-terminal symbols will be.

Suppose, for concreteness, that we want our set

ofdevicestoincludelightswithdimmerswitches,

which will among otherthings acceptcommands

like \dim the light". We might achieve this by

makingthedeviceuploadlexicalrulesoftherough

form

TRANSITIVE_VERB --> dim

NOUN --> light

wheretheLHSsareconventionalgrammatical

cat-egories. (We will for the moment skip over the

questionofhowtorepresentsemantics). The

(5)

NP --> DET NOUN

DET --> the

Thiskindofsolutioniseasytounderstand,but

ex-perienceshowsthatitleadstopoorspeech

recog-nition. The problem is that the languagemodel

producedbythegrammarisunderconstrained:it

willinparticularallowanytransitiveverbto

com-binewithanyNP.However,averblike\dim"will

only combine with a restricted range of possible

NPs, and ideally we would like to capture this

fact. Whatwereally want to dois parameterise

thelanguagemodel. Inthepresentcase,wewant

toparameterisetheTRANSITIVEVERB\dim"with

theinformationthatitonlycombineswithobject

NPs that can be used to refer to dimmable

de-vices. We will parameterisethe NP and NOUN

non-terminals similarly. The obvious way to do

thiswithintheboundsofCFGistospecialisethe

rulesapproximatelyasfollows:

COMMAND --> TRANS_DIM_VERB DIMMABLE_NP

DIMMABLE_NP --> DET DIMMABLE_NOUN

TRANS_DIM_VERB --> dim

DIMMABLE_NOUN --> light

DET --> the

Unfortunately, however, this defeats the original

object of the exercise, since the \general" rules

nowmakereferencetothedevice-specicconcept

of dimming. What we want instead is a more

generictreatment,likethefollowing:

COMMAND -->

TRANSITIVE_VERB:[sem_obj_type=T]

NP:[sem_type=T]

NP:[sem_type=T] -->

DET NOUN:[sem_type=T]

DET --> the

TRANSITIVE_VERB:[sem_obj_type=dimmable]

--> dim

NOUN:[sem_type=dimmable] --> light

This kind of parameterisation of a CFG is not

in any way new: it is simply unication

gram-mar (Pullum and Gazdar, 1982; Gazdar et al.,

1985). Thus our rst main idea is to raise the

levelofabstraction,formulatingthedevice

gram-mar at the level of unication grammars, and

compiling these down into the underlying CFG

representation. There are now a number of

sys-temswhich canperform thistypeof compilation

(Moore, 1998; Kieferand Krieger,2000);the

ba-sic methods we use in our system are described

in detail elsewhere(Rayner et al., 2001a). Here,

wefocusontheaspectsthatarerequiredfor

\dis-tributed" unication grammars needed for Plug

object-oriented programming".

Our basic idea is to start with a general

device-independent unication grammar, which

imple-mentsthecoregrammar rules. Inourprototype,

there are 34 core rules. Typical examples are

the NP conjunction and PP modications rules,

schematically

NP --> NP CONJ NP

NP --> NP PP

which are likelyto occur in connection with any

kind ofdevice. Theserules areparameterisedby

various features. Forexample,thesetof features

associated with the NP category includes

gram-maticalnumber(singularorplural),WH(plusor

minus)andsortaltype(multiple options).

Each individual type of device can extend the

coregrammarinoneofthreepossibleways:

New lexical entries A device may add lexical

entriesfordevice-specicwordsandphrases;

e.g.,adevicewillgenerallycontributeatleast

onenounusedtorefertoit.

New grammar rules Adevicemayadd

device-specic rules; e.g., adimmerswitch may

in-cluderulesfordimmingandbrightening,like

\anotherXpercent"or\abitbrighter".

New featurevalues Least obviously, a device

mayextendtherangeofvaluesthata

gram-maticalfeature cantake(seefurther below).

Forusualsoftwareengineeringreasons,wendit

convenienttodividethedistributedgrammarinto

modules; the grammatical knowledge associated

withadevicemayresideinmorethanonemodule.

Thegrammarinourcurrentdemonstrator

con-tains 21 modules, including the \core" grammar

described above. Each device typically requires

betweentwo and vemodules. For example, an

on/olightswitch loads three modules: thecore

grammar,thegeneralgrammarforon/o

switch-able devices, and the grammar specically for

on/oswitchablelights. Thecoregrammar,as

al-readyexplained,consistsoflinguisticallyoriented

device-independent grammar rules. The

mod-ule for on/o switchable devices contains

gram-marrulesspecictoon/oswitchablebehaviour,

which in general make use of the framework

es-tablished by thegeneralgrammar. Forexample,

there arerulesoftheschematic form

QUESTION -->

is

(6)

PARTICLE_VERB:[particle_type=onoff]

--> switch

Finally,themoduleforon/oswitchablelightsis

verysmall, and justconsists of ahandful of

lexi-calentriesfornounslike\light",deningtheseas

nounsreferringtoon/oswitchabledevices. The

way in which nouns of this kind cancombine is

howeverdened entirelybytheon/oswitchable

devicegrammarandcoregrammar.

Thepatternhereturnsouttobetheusualone:

thegrammarappropriatetoadeviceiscomposed

ofachainofmodules,eachonedependingonthe

previouslinkinthechainandinsomeway

special-isingit. Structurally,thisissimilar tothe

organ-isation of apiece of normal object-oriented

soft-ware,andwehavebeeninterestedtodiscoverthat

manyofthestandardconceptsofobject-oriented

programmingcarry overnaturallyto distributed

unicationgrammars. Intheremainderofthe

sec-tion,wewillexpandonthisanalogy.

Ifwethinkin termsof Javaorasimilar

main-stream OO language, a major grammatical

con-stituent likeS, NP or PPhas manyof the

prop-erties ofan OOinterface. Grammarrulesin one

module canmakereferencetothese constituents,

letting rules in other modules implement their

denition. For example, the temperature

sen-sor grammar module containsasmall numberof

highlyspecialisedrules,e.g.

QUESTION -->

what is the temperature

PP:[pp_type=location]

QUESTION -->

how many degrees is it

PP:[pp_type=location]

The point to note here is that the temperature

sensorgrammarmodule doesnotdenethe

loca-tive PP construction; this is handled elsewhere,

currently in the core grammar module. The

up-shotisthatthetemperaturesensormoduleisable

todeneitsconstructionswithoutworryingabout

theexactnatureof thelocativePPconstruction.

Asaresult,wewereforinstance ableto upgrade

thePPrulestoincludeconjoinedPPs(thus

allow-ing e.g. \whatis the temperaturein thekitchen

and the living room") withoutin any way

alter-ingthegrammar rulesin thetemparaturesensor

module 2

2

Anambitioustreatmentofconjunctionmight

ar-guably alsonecessitate changesinthe dialogue

man-agement componentspecicto the temperature

sen-sor device. Inthe implemented system, conjunction

is uniformly treated as distributive, so \what is the

{themajorcategories{naturallyneedtobe

well-dened. In practice, this implies restrictions on

thewaywehandlethreethings: thesetof

syntac-ticfeatures associatedwith acategory,therange

of possible values (the domain) associated with

each feature, and the semantics of the category.

Weconsidereachofthesein turn.

Most obviously, we need to standardise the

feature-set for the category. At present, we

de-ne most major categories in the core grammar

module, to theextentof specifying therethe full

range of features associated with each category.

It turns out,however,that it issometimes

desir-ablenottoxthedomainofafeatureinthecore

grammar, but rather to allow this domain to be

extended asnew modules areadded. The issues

thatarisehereareinteresting,andwewilldiscuss

them insomedetail.

The problems occur primarily in connection

with features mediating sortal constraints. As

we have already seen in examples above, most

constituents will have at least one sortal

fea-ture, encoding thesortaltypeof theconstituent;

there may also be further features encoding the

sortal types of possible complements and

ad-juncts. For example, the V category has a

fea-ture vtypeencoding the sortaltype of the V

it-self, afeature objsemnp typeencodingthe

sor-tal typeofapossibledirect object,andafeature

vp modifierstypeencodingthesortaltypeofa

possiblepostverbalmodier.

Featureslikethese pose twointerrelated

prob-lems. First, the plug and play scenario implies

thatwecannotknowaheadoftimethewhole

do-mainofasortalfeature. Itisalwayspossiblethat

wewill connect a device whose associated

gram-mar module requires denition of a new sortal

type,inordertoenforceappropriateconstraintsin

the languagemodel. Thesecond problem isthat

it isstill oftennecessaryto dene grammarrules

referringto sortalfeatures before thedomains of

these features areknown: in particular, thecore

modulewillcontainmanysuchrules. Evenbefore

knowingtheidentityof anyspecic devices,

gen-eral grammar rules may well wantto distinguish

between \device" NPs and \location" NPs. For

example, the general \where-question" rule has

theform

QUESTION --> where is NP

Here,weprefertoconstraintheNPsoastomake

itreferonlytodevices,sincethesystemcurrently

tomatically interpretedas equivalentto \whatis the

(7)

tempera-toaroom,e.g. \whereisthebathroom".

Wehaveaddressedtheseissuesinanaturalway

byadapting theOO-orientedideaofinheritance:

specically, wedene ahierarchyof possible

fea-turevalues, allowingone feature valueto inherit

from another. In the context of the \where is

NP" rule above, we dene the rule in the core

module; in this module, the sortal NP feature

semnp typemayonlytakethetwovaluesdevice

andlocation,whichwespecifywiththe

declara-tion 3

domain(sem_np_type, [location, device])

Thisallowsustowritetheconstrained\whereis"

ruleas

QUESTION -->

where is NP:[sem_np_type=device]

Supposenowthatweaddmodulesforbothon/o

switchable and dimmable devices; we would like

to make these into distinct sortal types, called

switchabledevice and dimmable device. We

dothis by includingthefollowingdeclarations in

the\switchable"module:

domain(sem_np_type,

[location,

device,

switchable_device])

specialises(switchable_device, device)

andcorrespondinglyin the\dimmable"module:

domain(sem_np_type,

[location,

device,

dimmable_device])

specialises(dimmable_device, device)

When all these declarations are combined at

compile-time, the eect is as follows. The

do-main of the sem nptype feature is now the

union of the domains specied by each

compo-nent, and is thus the set flocation, device,

switchabledevice, dimmabledeviceg. Since

switchabledevice and dimmable device are

the precise values specialising device, the

com-piler systematically replaces the original feature

valuedevicewiththedisjunction

switchable_device \/ dimmable_device

Thusthe\whereis"rulenowbecomes

QUESTION -->

where is

NP:[sem_np_type=switchable_device \/

dimmable_device]

3

Wehaveslightlysimpliedtheformofthe

decla-switchabledevice, then the rule will again be

adjustedbythecompilersoastoinclude

appropri-ate newelementsin thedisjunction. The

impor-tantpointtonoticehereisthatnochangeismade

to the original rule denition; in line with

nor-malOOthinking,thefeaturedomaininformation

isdistributedacrossseveralindependentmodules,

andthechangesoccurinvisiblyatcompile-time 4

.

Wehavesofarsaidnothingabouthowwedeal

with semantics, and we conclude the section by

sketching our treatment. In fact, it is not clear

to us that the demands of supporting Plug and

Play greatly aect semantics. If they do, the

most important practicalconsiderationis

proba-bly that plug and play becomes easierto realise

ifthesemanticsare keptsimple. Wehaveat any

rate adopted a minimal semantic representation

scheme,andthelackofproblemswehave

experi-encedwithregardtosemanticsmaypartlybedue

to this.

TheannotatedCFGgrammarsproducedbyour

compilerareinnormalNuanceGrammar

Speci-cation Language (GSL)notation, which includes

semantics; unication grammar rules encode

se-manticsusingthedistinguishedfeaturesem,which

translatesinto theGSL returnconstruction. So

forexampletheunicationgrammarrules

DEVICE_NOUN:[sem=light] --> light

DEVICE_NOUN:[sem=heater] --> heater

translatesintotheGSLrule

DEVICE_NOUN

[ light {return(light)}

heater {return(heater)}]

Unicationgrammarrulesmaycontainvariables,

translatingdownintoGSLvariables;sofor

exam-ple,

NP:[sem=[D, N]] -->

DET:[sem=D]

NOUN:[sem=N]

translatesintotheGSLrule

NP (DET:d NOUN:n) {return(($d $n))}

Ourbasicsemanticrepresentationisaformof

fea-ture/value notation, extended to allow handling

4

Readers familiar with OO methodology may

be disturbed by the fact that the rule appears

to have been attached to the daughter nodes

(switchable device dimmable device, etc), rather

thantothemotherdevicenode. Wewouldarguethat

the ruleis still conceptuallyattachedto the device

node,butthatthenecessityofeventuallyrealisingit

inCFGformimpliesthatitmustbecompiledinthis

(8)

construction:

Simplevalues,e.g.light,heater. Typically

associatedwithlexicalentries.

Feature/value pairs expressed in list

no-tation, e.g. [device, light], [location,

kitchen]. These areassociated withnouns,

adjectivesandsimilar constituents.

Lists of feature/value pairs, e.g. [[device,

light], [location, kitchen]]. Theseare

associated with major constituents such as

NP,PP,VP andS.

Conjunctions of lists of feature/value pairs,

e.g.[and, [[device, light]], [[device,

heater]]] These represent conjoined

con-stituents,e.g. conjoinedNPs,PPsandSs.

Thisschememakesitstraightforwardtowritethe

semanticpartsofgrammarrules. Mostoften,the

rulejust concatenatesthesemanticcontributions

of its daughters: thus for example the semantic

featuresofthenominalPPrule aresimply

NP:[sem=concat(Np, Pp)] -->

NP:[sem=Np]

PP:[sem=Pp]

Thesemanticoutputofaconjunctionruleis

typ-ically the conjunction of its daughters excluding

theconjunctionitself,e.g.

NP:[sem=[and, Np1, Np2]] -->

NP:[sem=Np1]

and

NP:[sem=Np2]

5 Future Plug & Play work

In the future, we intend to move to a system

in which all dialogue components can be

recon-gured by devices. For example, in a complete

Plug and Play scenario, the possible device

ac-tions themselves should be declared by devices

perhapsfollowing UPNPstandards in which

de-vices publish allinterface commands in theform

actionname(arg

1

...arg

i

)plusaninternalstate

modelofasimplevectorofvalues. Inthissection

we start with some very generalobservations on

PlugandPlaydialoguemanagementandtherole

ofinference. Thenweoutlineaproposalforarule

basedformalism.

At a very general level of course, indirections

betweenexecutableactionsandlinguisticcontents

can arise at several levels: the speech act level

(\It'stoowarminhere"),thecontentlevel(\How

cal constructions. At the moment, our

pronom-inal and ellipticalresolution methods depend on

verysimple`matching'algorithms. Ingeneral,one

mightatleastwantsomesortofconsistencycheck

betweenthelinguistic properties expressed in an

utteranceandthose ofcandidateobjectsreferred

to. Onemightexpectthat inferentialelementsin

contextualinterpretationshouldbestronglyPlug

and Play - theywill depend, for correctness and

eÆciency, on tailoring to the current objects in

the domain. The research project of uploading

relevant axioms and meaning postulates from a

device toageneralpurpose inferenceenginethat

canbeinvokedincontextualresolutionlooksvery

exciting.

Evidently, higher pragmatic relations between

what the user\strictlysays" and possible device

operationsare also very heavily inference based.

At the moment, we simply encode any

neces-sary inferencesdirectlyinto thedevice grammars

and this suÆces to deal with certain simple

be-haviours. However, the requirement to

encapsu-latealldevicebehaviourinaPlugandPlay

man-ner imposes a signicant requirement. For

ex-ample, themost naturalinteraction with a

ther-mometer is, for example, \How warm is it?" or

\What isthetemperature?" andnot\Querythe

thermometer". In our demonstrator, the

(gram-mar derived) semantic values simply reect

di-rectly therelevantdevice operations:hop=query

device=thermometeri. Thestrategysupportsthe

simple natural interactions it is designed to. It

eveninteractstolerablywellwithourellipsis and

reference resolutionmethods. \What is the

tem-perature in the hall? And in the living room?"

and \Whatisthetemperatureinthehall? What

is it in theliving room?" can both be correctly

interpreted. Other interactions are less natural.

The defaultoutput whenresolution cannot

iden-tify a device N is \I don't know which N you

mean". However,askingforthetemperatureina

roomwithseveral thermometersshould probably

notresultin\Idon'tknowwhichtemperatureyou

mean". Itfollowsthatprescribingallbehaviourin

aPlugandPlayfashionisasignicantconstraint.

Indeed,amoregeneralpointcanbemadehere.

A problem hasarisenbecausetheinference from

servicerequiredtoserviceproviderhasbecome

in-secure in the presence ofother service providers.

Inthehighlynetworkedhomesofthefuture,more

sophisticated inference may be required just

(9)

In this section we assume that semantic

values consist of 5-ary slot-structure with

slots deviceclass (dimmable light, TV ...),

deviceattributes(kitchen,blue ...),pronoun

(as before) and device-specifier (as before)

and operation. An operation is the action

the user wants carried out, e.g. switchon,

setlevel(X), where X is a real number (for

dimmable lights), setprogram(Y) etc. (for

Hi-Fis,TVs),and soon. Asin thegrammar,device

classes are ordered in a hierarchy in a standard

object-orientedway. Thus \dimmable light" is a

subclass of \dimmable device"and inherits from

it.

For strong plug and play, at least the

follow-ing information must beloaded bya device into

the dialogue manager: the device interface (e.g.

thatthe\switchablelight"classhasaswitchon

method; the feedbackto theuser; the update to

thesystem'sdevicemodelgeneratedbyexecuting

thecommand. Clearly,behaviourcanalsodepend

onthe currentstate. Reactionto \switchon the

kitchenlight"dependsonwhetherthelampiso,

on,andwhetherthereisakitchenlight. Wewrite

a rule as command(A,B,C,D)where A is a

com-mand,BisaclassofdevicesforwhichAis

appli-cable,Cisalistofdeviceattributeswhosevalues

mustbeknowninordertoexecuteA,Disalistof

items describinghow thesystem should reacton

A.Eachitem hasthefollowingcomponents:

precondition(X) | X is an executable

proce-durethatteststhenetworkstateandreturns

trueorfalse. Iftrue,theitemcan`re'.

action(Y) |Yistheprocedureto execute

feedback(Z,R) |Zisthefeedbackfortheuser

and candepend on R, thereturn valuefrom

thedeviceoperation

upd(W,R) | W describes how the system's

model of the network state should be

up-dated. AlsoWcandependonR.

For example switching on

a light might be encoded as

command(switch;switchablelight;[id=ID];D)

where D is a list of items in the above form of

which one describes behaviour when the light is

alreadyothus:

[ precondition(lightoff(ID));

action(switchonlight(ID));

feedback(lightnowon(ID);success);

feedback(couldnotswitch(ID);error);

upd([devupdate(ID;status=1)];success);

provided bythe lamp. Thefeedbackto the user

and theupdate rulesdependontheresult ofthe

switchon lightprocedure.

6 Conclusion

Applying theideaofplug andplaytospoken

di-alogue interfaces poses a number of interesting

and importantproblems. Sincethelinguisticand

dialogue management information is distributed

throughout the network,aplug and playsystem

must updateitsspeechinterfacewheneveranew

device is connected. In this paper, we have

fo-cussed in particular ondistributed grammars for

plug and play speech recognition which we have

integratedintoourdemonstratorsystem. Wehave

also examinedsomeissuesand describeda

possi-bleapproachtodistributeddialoguemanagement

whichweplanto undertakeinfurtherwork.

Acknowledgments

We are very grateful to our partners in the

DHommeprojectfordiscussionoftheaboveideas

- especially on theimportance and roleof

dier-ing strengths of Plug and Play. The DHomme

projectpartnersincludenetdecisionsLtd,SRI

In-ternational,TeliaResearchAB,andthe

Universi-ties ofEdinburgh, GothenburgandSeville.

References

N.M.FraserandJ.H.S.Thornton. 1995. Vocalist:

A robust, portable spoken language dialogue

system for telephone applications. In Proc. of

Eurospeech '95,pages1947{1950,Madrid.

Gerald Gazdar, Ewan Klein, Georey Pullum,

andIvanSag. 1985. GeneralizedPhrase

Struc-tureGrammar. HarvardUniversityPress,

Cam-bridge,MA.

J.R Glass. 1999. Challengesfor spoken dialogue

systems. InProc.IEEEASRUWorkshop,

Key-stone,CO.

A.GoldschenandDLoehr. 1999. Theroleofthe

darpa communicator architecture as a human

computer interfacefor distributed simulations.

In 1999 SISO Spring Simulation

Interoperabil-ityWorkshop, Orlando, Florida, March 1999.

SpeechWorks Int Inc, 2001. SpeechWorks.

http://www.speechworks.com. Asat31/01/01.

B. Kiefer and H. Krieger. 2000. A context-free

approximationofhead-drivenphrase structure

grammar. InProceedings of 6th Int.Workshop

(10)

for conversational systems. In Proceedings of

IJCAI'99 Workshop on Knowledge &

Reason-ingIn Practical Dialogue Systems,Stockholm.

S.LarssonandD.Traum. 2000. Informationstate

anddialoguemanagementinthetrindidialogue

moveenginetoolkit. Nat.Lang. Engineering, 6.

Microsoft,2000. Universal Plug andPlay Device

Architecture. http://www.upnp.org. Version

1.0,8June2000.

D.Milward. 2000. Distributingrepresentationfor

robustinterpretationofdialogueutterances. In

Proc. of38thACL, HongKong,pages133{141.

R. Moore. 1998. Using naturallanguage

knowl-edgesourcesin speechrecognition. In

Proceed-ingsof the NATOAdvancedStudiesInstitute.

Nuance Communications, 1999. Nuance Speech

RecognitionSystemDeveloper'sManualversion

6.2. 1380WillowRoad,MenloPark,CA94025.

S. Oaks and H. Wong. 2000. Jini in a Nutshell.

O'Reilly.

Georey K. Pullum and Gerald Gazdar. 1982.

natural languages and context-free languages.

Llinguistics andPhilosophy,4:471{504.

M.Rayner,J.Dowding,andB.A.Hockey. 2001a.

Abaselinemethodforcompilingtyped

unica-tiongrammarsintocontextfreelanguage

mod-els. Proc.Eurospeech2001,Aalborg.

M.Rayner,G.Gorrell,B.A.Hockey,J.Dowding,

and J. Boye. 2001b. Do cfg based language

models need agreement constraints? In

Pro-ceedings of 2ndNAACL,Pittsburgh.

S.Suttonetal. 1998. Universalspeechtools: The

cslu toolkit. In Proc. ICSLP-98, pages 3221{

3224.

Networkstatus: Thereisno TV.ThereisaVCR

in the TVroom.

U1 Switchonthetelevision

S1 Sorry,tryagain

U2 Add new component (this must precede

deviceinstallation)

Now, aTV is plugged intothe living room.

Cur-rently we simulate this in asimple text interface,

where devices and rooms are chosen from

prede-nedlists. The speech interface isthen

automati-callyrecongured(thiscurrentlytakes29seconds)

U3 Switchonthetelevision

S3 TheTVisnowon

U4 Switchonthetelevision

S4 TheTVisalreadyon

U5a Addnewcomponent

Now, another TV ispluggedintothe kitchen. No

speechinterfacerecongurationrequiredthis time

U5 Switchonthetelevision

S5 Theliving-roomTVisalreadyon

U6 SwitchontheVCR

S6 TheVCRisnowon

U7 Switchonthetelevision

S7 Sorry,I don'tknowwhichTVyoumean

U8 Thetelevisioninthekitchen

S8 ThekitchenTVisnowon

U9a Addnewcomponent

Now, on-o lights are installed in the hall and

bathroom. Recongurationtimeis27 seconds

U9 Switchonthelightin thehall

S9 Thehalllightisnowon

U10 Dimthelighttoftypercent

S10 Sorry,tryagain.

U11a Addnewcomponent

Now, a dimmablelight isinstalled inthe kitchen.

Recongurationtimeis39 seconds

U11 Dimthekitchenlightto 50percent

S11 Thelightinthekitchen isnowonatfty

percent

U12 Dimthelightin thehall

S12 I'msorry,thehalllightcanonlybeonor

o

U13 Turnonthelightinthekitchen(misheard

as\increasethelightinthekitchen")

References

Related documents

The coupling of stress and environment during static and cyclic creep of Alloy 617 at 800°C in controlled helium environments revealed that mechanical loading increased the

The Trading Atrium’s infrastructure is powerful and fail-safe: it incorporates state-of-the-art technology, premium- class servers and ultra-low latency network systems; its

Unlike other inexact Newton methods, the INIS-type optimization algorithm is shown to preserve the local convergence properties and the asymptotic contraction rate of the

Abstract: The aim of the present study was to test infrared thermography (IRT), under field conditions, as a possible tool for the evaluation of cow udder health status. 3)

Figure 4 compares the time series of annual 0.01% and 0.001% exceeded rain rates, for SE region, derived directly from distributions of 1 km Nimrod data, with estimates derived by

The process of designing proposed layouts is done by using qualitative methods (CORELAP, BLOCPLAN &amp; CRAFT) and quantitative methods (ABSMODEL &amp; CRAFT)

This publication, which is an offshoot of the 2010 position document of the Economic Commission for Latin America and the Caribbean, entitled Time for equality: closing gaps,