Manny Rayner, Ian Lewin
& Genevieve Gorrell
netdecisions Ltd
Wellington House,
East Road, CambridgeCB1 1BH, UK
manny.raynerjian.lewinjgen eviev e.go rrel l
@netdecisions.com
Johan Boye
Telia Research
S-12386Farsta, Sweden
Abstract
Plug and Play is an increasingly
im-portant concept in systemand network
architectures. We introduce and
de-scribe a spoken language dialogue
sys-tem architecture which supports Plug
and Playablenetworks of objects in its
domain. Eachdeviceinthenetwork
car-riesthelinguistic anddialogue
manage-mentinformationwhichispertinenttoit
and uploads it dynamically to the
rele-vantlanguageprocessingcomponentsin
the spoken language interface. We
de-scribethe currentstate ofour plugand
play demonstratorand discuss
theoreti-calissuesthatarisefromourwork.Plug
and Play forms a central topic for the
DHommeproject.
1 Introduction
ThenotionofPlugandPlayndsitsmost
natu-ralhomeintheworldofnetworkedhomedevices,
whereitoersatleastthefollowingtwoimportant
properties
thenetwork ofdevices isdynamically
recon-gurableasdevicesarebroughtonlineor
dis-appearoine
zerore-congurationbytheuserisrequired
FrameworksforachievingPlugandPlay
gener-allyaddressthisbyincludingatleastthefollowing
devicesannouncethemselvesonthe network
when theyare pluggedinto it (andalso
dis-covertheexistenceofothers)
devices describe their own capabilities,
pro-vide a means for accessing them and can
queryandaccess thecapabilities ofothers
devicesshouldsupport,wherepossible,
seam-Plug and Play is, not surprisingly, viewed as
apre-requisite forthecommercialsuccess of
net-worked devices in the home. There are already
severalpromisingcandidateplatformsfor
achiev-ing thenecessaryfunctionality,including
Univer-salPlug and Play(UPnP) (Microsoft, 2000)and
Jini (Oaks and Wong, 2000). In this paper, we
address the requirementson spoken dialogue
in-terfaces that arisefrom aplug andplay domain.
We also presentthe current state of ourEnglish
languageplugandplaydemonstratorfor
control-ling lamps, dimmers and sensors, previously
de-scribedin(Rayneretal.,2001b). (Thereisalsoa
Swedishinstantiation).
First, however, we need briey to distinguish
our notion from other notions of plug and play
and recongurability.
The notion of Plug and Play has been used
for dialogue systemtoolkits in which the various
dierent language processing components
them-selves (e.g. recognition, parsing, generation and
dialogue management) can be plugged in and
out. The most prominent instance of this is
theDarpaCommunicatorarchitecture(Goldschen
and Loehr, 1999), which denes interoperability
standards for language processing components.
The intentionis simply that researchers and
de-veloperscanexperimentwithsystemscontaining
dierent instantiations of the language
process-ing components. The Communicator
Architec-tureisnotdesignedtoaddressthespecial
require-ments of a plug and play domain. In fact, the
Communicatorarchitecturedoesnotsupportthe
dynamic re-conguration of language processing
componentswhilethesystemisrunning.
Atamoregenerallevel,simplere-conguration
ofspokenlanguagedialoguesystemshasofcourse
long been a goal of language engineering. But
such re-conguration is nearly always viewed as
the problem of cross-domain or possibly
cross-language porting, e.g. (Glass, 1999). Once one
even the \database access" scenario. There are
various toolkits,architecturesand methodologies
for rapidly and/or semi-expertly generating new
instances of dialogue systems, e.g. by
abstract-ing away from domain orapplication dependent
features of particular systems, e.g. (Fraser and
Thornton,1995;Kolzer,1999),or`bottom-up'by
aggregationof useful re-congurablecomponents
, e.g. (Sutton et al, 1998; Larsson and Traum,
2000). The automated within-domain
recongu-ration required for a plug and play domain, has
not,toourknowledge,beendescribedpreviously.
Pursuitof plugand playfunctionality(and its
realizationinstrongandweakforms-discussedin
section3) formsacentraltheme oftheDHomme
project. 1
In the rest of this paper, we begin by
detail-ing ourconcrete Plug andPlayscenario- device
control in the home - with an example dialogue
fromourdemonstratorandanoutlineofthemain
dialogueprocessingelements. Insection3,we
dis-tinguishstrongandweaknotionsofPlugandPlay
and their applicability to spoken language
inter-faces. In section 4, we discuss the strong plug
andplaycapabilitywehavebuiltintothe
recogni-tion,parsingand(contextindependent)semantic
interpretation systemcomponentsof our
demon-strator. Insection5,wediscusssomefuturework
forPlugandPlaydialoguemanagement. Section
6containsourconclusions.
2 A Plug and Play Scenario
Inthissectionwepresentexampledialoguesfrom
our currentdemonstrator and briey outline the
main processing elements. The domain is
net-workedhomedevices,anareawhereplugandplay
isalreadyinareasonablyadvancedstateof
devel-opmentandspeechcontrollookshighlyattractive.
2.1 Plug and Play Examples
Figure 1 displays an example dialogue from our
currentdemonstrator.
InU1,theuserasks fortheTV tobeswitched
on. Thesystem reports (S1) that it simply does
not understand the user. (If it possessed \TV"
initsrecognitionvocabularyandknewsomething
aboutthe conceptof TVsit couldhavereported
I don'tknow of any TV).WhenaTVis plugged
intothenetwork(followingU2),thesystemisable
tounderstand, andactontheuser'srepeated
re-questtoswitchontheTV.Thesystemreportson
1
This work is supported by EU 5th Framework
ror" message. Whenanother TV isplugged into
thenetwork,thesystemmustnowengage in
dis-ambiguationbehaviourwhereaspreviouslyithad
noneedto(S7).
S10illustratesthat,intheabsenceofdimmable
lights, \Dim" is not understood, and, possibly,
not even recognized. When a dimmable light is
plugged in (or, at least, knowledge of dimmable
lights is plugged in), then a more helpful error
message can begiven in S12. Finally, when the
grammarisincreasedtocovernewcommands,the
systemmaybegintomakemistakesthatitdidnot
makeoriginally (S13).
2.2 The currentdemonstrator
Our demonstrator expects devices of three main
types: switchable,dimmableandsensors.
Switch-able devicesare binary statedevicesthat can be
set or queried. Dimmable devices have a state
varyingonasinglescalardimensionwhichcanbe
set, changed,or queried. Sensorsare similarbut
canonlybequeried.
Formally, these commands and queriesare
en-coded by a 4-ary slot-value structure and
De-vice Grammars must generate semantic values
containing these slots. (Not all slots are
re-quired for every utterance, of course.) The
four slots are: op (lled by command or query);
level; change and dir (on or o). In
or-der to identify devices, there are 4 other slots:
device(light,tv...),loc(bathroom,kitchen...),
device-spec(all,the...) andpronoun(it,them
...). For example, Is the light in the hall on?
translates to h op=query dir=on device=light
device-spec=theloc=halli. Dim everything by
ten percenttranslatesto hop=commanddir=o
device-spec=everything change=10 i. Switch
the hall and kitchen lights o translates to h
op=command dir=o level=0 device=light h
loc=kitchenloc=hallii.
Dialogue interpretation contains three stages.
First,conjunctionsintheinput(whicharetreated
asjustpackedrepresentations)areunpackedinto
asetof(7-ary)slot-structures.Secondly,aformof
ellipsisresolution,\stickydefaults",takesplacein
which missing slotsare lled in from their
previ-ousvalues. A fragmentarysemanticvalueis
sim-plypastedoverthecorrespondingpartsofthelast
one. Thus And in the bathroom translates to h
loc=bathroomibut theother required slots(e.g.
device)are suppliedfromtheprevious
represen-tation. Finallyreferenceresolutiontriesto
deter-minedeviceidentiersforpronounsanddenitely
identi-Following contextual interpretation, either a
program(a command orsequence ofsuch) or an
`error'condition(e.g. resolutionfailed toidentify
a device) will havebeen generated. The system
must then executethe program and/orgenerate
feedback. Knowledgeofhowtoexecutethese
pro-grams,e.g. thatitisan`error'totrytoswitchon
alight that is already on, and possiblefeedback
messagesaresimplyhardcodedinto theDialogue
Manager. There is apre-dened set of feedback
messagestructuresassociatedwitheach
underly-ingactionandtheirpossibleresults. Some
exam-pleparaphrasesofmessagestructuresare\TheX
isnowon",\TheXisalreadyon",\TheXisnow
atYpercent",\ThereisnoXintheY".
3 Strong and Weak Plug and Play
Initsweakest form,Plugand Playrefersonlyto
the ability to add a device to a network
with-out manual conguration. Knowledge
distribu-tionisnotincluded. StandardPlugand Playfor
PCperipheralssimplyautomatesthematchingup
of physical devices with software device-specic
driversin thePC. Communicationlinks between
them are establishedby reserving resourcessuch
assharedmemoryandinterruptrequestnumbers.
The weak sense is very useful. Users need not
congure their hardware via jumper switches or
software drivers by entering `magic' numbers in
congurationles.
In the strong sense, Plug and Play can refer
also to modular, distributed knowledge. Devices
notonlysetupnetworkcommunicationsbut
pub-lishinformation aboutthemselvesoverit. Other
devices can obtainand useit. InJini, for
exam-ple, anew printerdevice canregisterits printing
service(andjavacodeforinvokingmethodsonit)
onthenetwork. Then,aword-processing
applica-tion can nd it and congure itself to use it. In
UPnP,devices publishXML descriptionsof their
interfaces.
Thestrong-weakcontrast isnota sharpor
bi-nary one. The word-processor might know the
industry agreed printer interface and so display
a greyed-out print button if no printer is
net-worked. When a new type of printer is
net-worked, it mightsupply additional print options
(e.g. \print colour") that the processor knows
nothingabout.
Onedesirable Plug and Play property in both
strong and weak forms iscommutativity, i.e. the
system understands the same commands in the
same way no matter which device is connected
ingdeviceX.Thisseemsreasonableinaweakplug
and playsystem, but in thestrongcase itwould
mean that the recognizer would cease to
under-standtheword\TV"assoonastheTVwere
dis-connected. Thismightbeconfusingfortheuser.
The strong and weak senses of plug and play
apply to spoken languagedialogueinterfaces. In
the weakest sense, the dialoguesystem mightbe
entirely pre-congured to deal with all possible
devices and device-combinations. The required
knowledgeisalreadypresentinthenetwork. Plug
andPlaythenconsistsofidentifyingwhich
partic-ular devices are currently networked and
estab-lishing communication channels with them. In
thestrongersense,thecomponentsof thespoken
languagedialogueinterfaceacquiretheknowledge
pertinenttoparticulardevicesfromthosedevices.
So,asinexampleS1above,thespeechrecognizer
may not have the word \TV" in its vocabulary
until aTV is pluggedinto thenetwork. The
di-alogue manager may not be capable of uttering
\That deviceis notdimmable"untiladimmable
device is plugged into the network. A strongly
Plug and Play system may therefore be
distin-guishable from aweaker oneby its behaviourin
the absence of certain device specic knowledge.
Iftherelevantknowledgeispresent,onecannotbe
certainwhetheritwaspre-conguredoruploaded
\on demand".
Plug and Play also enforces a certain sort of
modularityonthesystem. Sincedevicesmust
de-claretheinformation requiredtoupdate the
dia-loguecomponents,aclearinterfaceisprovidedfor
re-conguringthe systemfornewtypesof device
aswellasaclearerpictureoftheinternalstructure
ofthose dialoguecomponents. Indeed,itisreally
justadesignchoicewhetherdeviceknowledgeisin
fact installed onlywhenthedevice ispluggedin.
One may, forexample, choose tooptimize
recog-nition performanceon theset ofdevices actually
installed by notloadinginformation about other
devices. Alternatively, one might prefer to
rec-ognize thenamesof devices notinstalled sothat
helpful errormessagescanbedelivered.
Potentially, each component in a spoken
lan-guageinterface(recognizer,parser,interpreter,
di-aloguemanageretc.) canbeupdatedby
informa-tionfromadeviceinaPlugandPlaydomain.
Dif-ferentcomponentsmightsupportdierentdegrees
ofstrengthofthePlugandPlaynotion. F
urther-more,dierentinstantiationsofthesecomponents
may require very dierent sorts of update. To
car-particular device will evidently beasignicantly
dierent task from updating a recognizer which
usesagrammar-basedlanguagemodel.
Ourcurrentdemonstratorprograminstantiates
aPlugandPlaycapabilityforrecognition,parsing
andcontextindependentsemanticanalysisandis
builtontopoftheNuancetoolkit(Nuance
Com-munications, 1999). The next section discusses
the capability in detail. Section 5discusses and
makessomeproposalsforPlugandPlayDialogue
Management.
4 Distributed Grammar
4.1 Introduction
Inthis section,wewilldescribehowwehave
ad-dressedtheissuesthat arise whenweattempt to
apply the(strong)Plug andPlayscenarioto the
tasksofspeechrecognitionandlanguage
process-ing. Eachdevicewill providetheknowledgethat
thespeechinterfaceneedsinordertorecognisethe
new types of utterance relevant to the device in
question, and convert these utterancesinto
well-formedsemanticrepresentations.
Let's start by considering what this means in
practice. Thereare in fact awhole rangeof
pos-siblescenariosto consider,dependingonhowthe
speechunderstandingmoduleiscongured. Ifthe
module'sconstructionissimpleenough,theremay
beno signicant problems involvedin extending
it to oer Plug and Play functionality. For
ex-ample, the command vocabulary oered by the
speech interfacemayjustconsistof alistofxed
phrases. Inthiscase,PlugandPlayspeech
recog-nitionbecomestrivial:eachdevicecontributesthe
phrases it needs, after which they can be
com-bined into a single grammar. An approach of
thiskindfailshowevertoscaleuptoaninterface
whichsupportscomplexcommands,inparticular
commandswhichcombinewithin thesame
utter-ance language referring to two or moredierent
devices. For example, a command may address
severaldevicesatonce(\turnontheradioandthe
livingroomlight");alternately,severalcommands
maybecombinedintoasingleutterance(\switch
onthecookerandswitchothemicrowave"). Our
experiencewithpracticalspokendeviceinterfaces
suggeststhatexampleslikethesearebynomeans
uncommon.
Another architecture relatively easy to
com-bine with Plug and Play is doing recognition
through a general large-vocabulary recogniser,
and language-processing through device-specic
phrase-spotting (Milward, 2000). Therecogniser
lemsatthelevelofspeechrecognition,anditisin
principlepossibletosupportcomplexcommands.
The maindrawback,however,isthat recognition
qualityismarkedlyinferiorcomparedtoasystem
inwhichrecognitioncoverageislimitedtothe
do-main denedbythecurrentsetofdevices.
Modern speech interfaces supporting complex
commands are typically specied using a
rule-based grammar formalism dened by a platform
like Nuance (Nuance Communications, 1999) or
SpeechWorks (Inc, 2001). The type of grammar
supportedissomesubsetoffullCFG,extendedto
includesemanticannotations. Grammarrules
de-nethelanguagemodelthatconstrainsthe
recog-nitionprocess,tuningittothedomaininorderto
achievehighperformance. (They alsosupplythe
semanticrulesthat denetheoutput
representa-tion;wewillreturntothispointlater). Ifwewant
to implementanambitiousPlugandPlayspeech
recognitionmodulewithinthiskindofframework,
wehavetwotop-levelgoals. Ontheonehand,we
want to achieve high-quality speech recognition.
At thesametime, standard software engineering
considerationssuggest that we want to minimize
the overlapbetween the rule-sets contributed by
each device: ideally, the device will only upload
thespeciclexicalitemsrelevanttoit.
It turns out that our software engineering
ob-jectives conict to some extent with our initial
goalofachievinghigh-qualityspeechrecognition.
Considerastraightforwardsolution,inwhichthe
grammaticalinformationcontributedbyeach
de-vice consistspurely of lexical entries,i.e. entries
oftheform
<Nonterminal> --> <Terminal>
In aCFG-based framework,this implies that we
haveacentraldevice-independentCFGgrammar,
which denesthe otherruleswhich link together
the nonterminals that appear on the
left-hand-sides of thelexical rules. The crucial questionis
what these lexical non-terminal symbols will be.
Suppose, for concreteness, that we want our set
ofdevicestoincludelightswithdimmerswitches,
which will among otherthings acceptcommands
like \dim the light". We might achieve this by
makingthedeviceuploadlexicalrulesoftherough
form
TRANSITIVE_VERB --> dim
NOUN --> light
wheretheLHSsareconventionalgrammatical
cat-egories. (We will for the moment skip over the
questionofhowtorepresentsemantics). The
NP --> DET NOUN
DET --> the
Thiskindofsolutioniseasytounderstand,but
ex-perienceshowsthatitleadstopoorspeech
recog-nition. The problem is that the languagemodel
producedbythegrammarisunderconstrained:it
willinparticularallowanytransitiveverbto
com-binewithanyNP.However,averblike\dim"will
only combine with a restricted range of possible
NPs, and ideally we would like to capture this
fact. Whatwereally want to dois parameterise
thelanguagemodel. Inthepresentcase,wewant
toparameterisetheTRANSITIVEVERB\dim"with
theinformationthatitonlycombineswithobject
NPs that can be used to refer to dimmable
de-vices. We will parameterisethe NP and NOUN
non-terminals similarly. The obvious way to do
thiswithintheboundsofCFGistospecialisethe
rulesapproximatelyasfollows:
COMMAND --> TRANS_DIM_VERB DIMMABLE_NP
DIMMABLE_NP --> DET DIMMABLE_NOUN
TRANS_DIM_VERB --> dim
DIMMABLE_NOUN --> light
DET --> the
Unfortunately, however, this defeats the original
object of the exercise, since the \general" rules
nowmakereferencetothedevice-specicconcept
of dimming. What we want instead is a more
generictreatment,likethefollowing:
COMMAND -->
TRANSITIVE_VERB:[sem_obj_type=T]
NP:[sem_type=T]
NP:[sem_type=T] -->
DET NOUN:[sem_type=T]
DET --> the
TRANSITIVE_VERB:[sem_obj_type=dimmable]
--> dim
NOUN:[sem_type=dimmable] --> light
This kind of parameterisation of a CFG is not
in any way new: it is simply unication
gram-mar (Pullum and Gazdar, 1982; Gazdar et al.,
1985). Thus our rst main idea is to raise the
levelofabstraction,formulatingthedevice
gram-mar at the level of unication grammars, and
compiling these down into the underlying CFG
representation. There are now a number of
sys-temswhich canperform thistypeof compilation
(Moore, 1998; Kieferand Krieger,2000);the
ba-sic methods we use in our system are described
in detail elsewhere(Rayner et al., 2001a). Here,
wefocusontheaspectsthatarerequiredfor
\dis-tributed" unication grammars needed for Plug
object-oriented programming".
Our basic idea is to start with a general
device-independent unication grammar, which
imple-mentsthecoregrammar rules. Inourprototype,
there are 34 core rules. Typical examples are
the NP conjunction and PP modications rules,
schematically
NP --> NP CONJ NP
NP --> NP PP
which are likelyto occur in connection with any
kind ofdevice. Theserules areparameterisedby
various features. Forexample,thesetof features
associated with the NP category includes
gram-maticalnumber(singularorplural),WH(plusor
minus)andsortaltype(multiple options).
Each individual type of device can extend the
coregrammarinoneofthreepossibleways:
New lexical entries A device may add lexical
entriesfordevice-specicwordsandphrases;
e.g.,adevicewillgenerallycontributeatleast
onenounusedtorefertoit.
New grammar rules Adevicemayadd
device-specic rules; e.g., adimmerswitch may
in-cluderulesfordimmingandbrightening,like
\anotherXpercent"or\abitbrighter".
New featurevalues Least obviously, a device
mayextendtherangeofvaluesthata
gram-maticalfeature cantake(seefurther below).
Forusualsoftwareengineeringreasons,wendit
convenienttodividethedistributedgrammarinto
modules; the grammatical knowledge associated
withadevicemayresideinmorethanonemodule.
Thegrammarinourcurrentdemonstrator
con-tains 21 modules, including the \core" grammar
described above. Each device typically requires
betweentwo and vemodules. For example, an
on/olightswitch loads three modules: thecore
grammar,thegeneralgrammarforon/o
switch-able devices, and the grammar specically for
on/oswitchablelights. Thecoregrammar,as
al-readyexplained,consistsoflinguisticallyoriented
device-independent grammar rules. The
mod-ule for on/o switchable devices contains
gram-marrulesspecictoon/oswitchablebehaviour,
which in general make use of the framework
es-tablished by thegeneralgrammar. Forexample,
there arerulesoftheschematic form
QUESTION -->
is
PARTICLE_VERB:[particle_type=onoff]
--> switch
Finally,themoduleforon/oswitchablelightsis
verysmall, and justconsists of ahandful of
lexi-calentriesfornounslike\light",deningtheseas
nounsreferringtoon/oswitchabledevices. The
way in which nouns of this kind cancombine is
howeverdened entirelybytheon/oswitchable
devicegrammarandcoregrammar.
Thepatternhereturnsouttobetheusualone:
thegrammarappropriatetoadeviceiscomposed
ofachainofmodules,eachonedependingonthe
previouslinkinthechainandinsomeway
special-isingit. Structurally,thisissimilar tothe
organ-isation of apiece of normal object-oriented
soft-ware,andwehavebeeninterestedtodiscoverthat
manyofthestandardconceptsofobject-oriented
programmingcarry overnaturallyto distributed
unicationgrammars. Intheremainderofthe
sec-tion,wewillexpandonthisanalogy.
Ifwethinkin termsof Javaorasimilar
main-stream OO language, a major grammatical
con-stituent likeS, NP or PPhas manyof the
prop-erties ofan OOinterface. Grammarrulesin one
module canmakereferencetothese constituents,
letting rules in other modules implement their
denition. For example, the temperature
sen-sor grammar module containsasmall numberof
highlyspecialisedrules,e.g.
QUESTION -->
what is the temperature
PP:[pp_type=location]
QUESTION -->
how many degrees is it
PP:[pp_type=location]
The point to note here is that the temperature
sensorgrammarmodule doesnotdenethe
loca-tive PP construction; this is handled elsewhere,
currently in the core grammar module. The
up-shotisthatthetemperaturesensormoduleisable
todeneitsconstructionswithoutworryingabout
theexactnatureof thelocativePPconstruction.
Asaresult,wewereforinstance ableto upgrade
thePPrulestoincludeconjoinedPPs(thus
allow-ing e.g. \whatis the temperaturein thekitchen
and the living room") withoutin any way
alter-ingthegrammar rulesin thetemparaturesensor
module 2
2
Anambitioustreatmentofconjunctionmight
ar-guably alsonecessitate changesinthe dialogue
man-agement componentspecicto the temperature
sen-sor device. Inthe implemented system, conjunction
is uniformly treated as distributive, so \what is the
{themajorcategories{naturallyneedtobe
well-dened. In practice, this implies restrictions on
thewaywehandlethreethings: thesetof
syntac-ticfeatures associatedwith acategory,therange
of possible values (the domain) associated with
each feature, and the semantics of the category.
Weconsidereachofthesein turn.
Most obviously, we need to standardise the
feature-set for the category. At present, we
de-ne most major categories in the core grammar
module, to theextentof specifying therethe full
range of features associated with each category.
It turns out,however,that it issometimes
desir-ablenottoxthedomainofafeatureinthecore
grammar, but rather to allow this domain to be
extended asnew modules areadded. The issues
thatarisehereareinteresting,andwewilldiscuss
them insomedetail.
The problems occur primarily in connection
with features mediating sortal constraints. As
we have already seen in examples above, most
constituents will have at least one sortal
fea-ture, encoding thesortaltypeof theconstituent;
there may also be further features encoding the
sortal types of possible complements and
ad-juncts. For example, the V category has a
fea-ture vtypeencoding the sortaltype of the V
it-self, afeature objsemnp typeencodingthe
sor-tal typeofapossibledirect object,andafeature
vp modifierstypeencodingthesortaltypeofa
possiblepostverbalmodier.
Featureslikethese pose twointerrelated
prob-lems. First, the plug and play scenario implies
thatwecannotknowaheadoftimethewhole
do-mainofasortalfeature. Itisalwayspossiblethat
wewill connect a device whose associated
gram-mar module requires denition of a new sortal
type,inordertoenforceappropriateconstraintsin
the languagemodel. Thesecond problem isthat
it isstill oftennecessaryto dene grammarrules
referringto sortalfeatures before thedomains of
these features areknown: in particular, thecore
modulewillcontainmanysuchrules. Evenbefore
knowingtheidentityof anyspecic devices,
gen-eral grammar rules may well wantto distinguish
between \device" NPs and \location" NPs. For
example, the general \where-question" rule has
theform
QUESTION --> where is NP
Here,weprefertoconstraintheNPsoastomake
itreferonlytodevices,sincethesystemcurrently
tomatically interpretedas equivalentto \whatis the
tempera-toaroom,e.g. \whereisthebathroom".
Wehaveaddressedtheseissuesinanaturalway
byadapting theOO-orientedideaofinheritance:
specically, wedene ahierarchyof possible
fea-turevalues, allowingone feature valueto inherit
from another. In the context of the \where is
NP" rule above, we dene the rule in the core
module; in this module, the sortal NP feature
semnp typemayonlytakethetwovaluesdevice
andlocation,whichwespecifywiththe
declara-tion 3
domain(sem_np_type, [location, device])
Thisallowsustowritetheconstrained\whereis"
ruleas
QUESTION -->
where is NP:[sem_np_type=device]
Supposenowthatweaddmodulesforbothon/o
switchable and dimmable devices; we would like
to make these into distinct sortal types, called
switchabledevice and dimmable device. We
dothis by includingthefollowingdeclarations in
the\switchable"module:
domain(sem_np_type,
[location,
device,
switchable_device])
specialises(switchable_device, device)
andcorrespondinglyin the\dimmable"module:
domain(sem_np_type,
[location,
device,
dimmable_device])
specialises(dimmable_device, device)
When all these declarations are combined at
compile-time, the eect is as follows. The
do-main of the sem nptype feature is now the
union of the domains specied by each
compo-nent, and is thus the set flocation, device,
switchabledevice, dimmabledeviceg. Since
switchabledevice and dimmable device are
the precise values specialising device, the
com-piler systematically replaces the original feature
valuedevicewiththedisjunction
switchable_device \/ dimmable_device
Thusthe\whereis"rulenowbecomes
QUESTION -->
where is
NP:[sem_np_type=switchable_device \/
dimmable_device]
3
Wehaveslightlysimpliedtheformofthe
decla-switchabledevice, then the rule will again be
adjustedbythecompilersoastoinclude
appropri-ate newelementsin thedisjunction. The
impor-tantpointtonoticehereisthatnochangeismade
to the original rule denition; in line with
nor-malOOthinking,thefeaturedomaininformation
isdistributedacrossseveralindependentmodules,
andthechangesoccurinvisiblyatcompile-time 4
.
Wehavesofarsaidnothingabouthowwedeal
with semantics, and we conclude the section by
sketching our treatment. In fact, it is not clear
to us that the demands of supporting Plug and
Play greatly aect semantics. If they do, the
most important practicalconsiderationis
proba-bly that plug and play becomes easierto realise
ifthesemanticsare keptsimple. Wehaveat any
rate adopted a minimal semantic representation
scheme,andthelackofproblemswehave
experi-encedwithregardtosemanticsmaypartlybedue
to this.
TheannotatedCFGgrammarsproducedbyour
compilerareinnormalNuanceGrammar
Speci-cation Language (GSL)notation, which includes
semantics; unication grammar rules encode
se-manticsusingthedistinguishedfeaturesem,which
translatesinto theGSL returnconstruction. So
forexampletheunicationgrammarrules
DEVICE_NOUN:[sem=light] --> light
DEVICE_NOUN:[sem=heater] --> heater
translatesintotheGSLrule
DEVICE_NOUN
[ light {return(light)}
heater {return(heater)}]
Unicationgrammarrulesmaycontainvariables,
translatingdownintoGSLvariables;sofor
exam-ple,
NP:[sem=[D, N]] -->
DET:[sem=D]
NOUN:[sem=N]
translatesintotheGSLrule
NP (DET:d NOUN:n) {return(($d $n))}
Ourbasicsemanticrepresentationisaformof
fea-ture/value notation, extended to allow handling
4
Readers familiar with OO methodology may
be disturbed by the fact that the rule appears
to have been attached to the daughter nodes
(switchable device dimmable device, etc), rather
thantothemotherdevicenode. Wewouldarguethat
the ruleis still conceptuallyattachedto the device
node,butthatthenecessityofeventuallyrealisingit
inCFGformimpliesthatitmustbecompiledinthis
construction:
Simplevalues,e.g.light,heater. Typically
associatedwithlexicalentries.
Feature/value pairs expressed in list
no-tation, e.g. [device, light], [location,
kitchen]. These areassociated withnouns,
adjectivesandsimilar constituents.
Lists of feature/value pairs, e.g. [[device,
light], [location, kitchen]]. Theseare
associated with major constituents such as
NP,PP,VP andS.
Conjunctions of lists of feature/value pairs,
e.g.[and, [[device, light]], [[device,
heater]]] These represent conjoined
con-stituents,e.g. conjoinedNPs,PPsandSs.
Thisschememakesitstraightforwardtowritethe
semanticpartsofgrammarrules. Mostoften,the
rulejust concatenatesthesemanticcontributions
of its daughters: thus for example the semantic
featuresofthenominalPPrule aresimply
NP:[sem=concat(Np, Pp)] -->
NP:[sem=Np]
PP:[sem=Pp]
Thesemanticoutputofaconjunctionruleis
typ-ically the conjunction of its daughters excluding
theconjunctionitself,e.g.
NP:[sem=[and, Np1, Np2]] -->
NP:[sem=Np1]
and
NP:[sem=Np2]
5 Future Plug & Play work
In the future, we intend to move to a system
in which all dialogue components can be
recon-gured by devices. For example, in a complete
Plug and Play scenario, the possible device
ac-tions themselves should be declared by devices
perhapsfollowing UPNPstandards in which
de-vices publish allinterface commands in theform
actionname(arg
1
...arg
i
)plusaninternalstate
modelofasimplevectorofvalues. Inthissection
we start with some very generalobservations on
PlugandPlaydialoguemanagementandtherole
ofinference. Thenweoutlineaproposalforarule
basedformalism.
At a very general level of course, indirections
betweenexecutableactionsandlinguisticcontents
can arise at several levels: the speech act level
(\It'stoowarminhere"),thecontentlevel(\How
cal constructions. At the moment, our
pronom-inal and ellipticalresolution methods depend on
verysimple`matching'algorithms. Ingeneral,one
mightatleastwantsomesortofconsistencycheck
betweenthelinguistic properties expressed in an
utteranceandthose ofcandidateobjectsreferred
to. Onemightexpectthat inferentialelementsin
contextualinterpretationshouldbestronglyPlug
and Play - theywill depend, for correctness and
eÆciency, on tailoring to the current objects in
the domain. The research project of uploading
relevant axioms and meaning postulates from a
device toageneralpurpose inferenceenginethat
canbeinvokedincontextualresolutionlooksvery
exciting.
Evidently, higher pragmatic relations between
what the user\strictlysays" and possible device
operationsare also very heavily inference based.
At the moment, we simply encode any
neces-sary inferencesdirectlyinto thedevice grammars
and this suÆces to deal with certain simple
be-haviours. However, the requirement to
encapsu-latealldevicebehaviourinaPlugandPlay
man-ner imposes a signicant requirement. For
ex-ample, themost naturalinteraction with a
ther-mometer is, for example, \How warm is it?" or
\What isthetemperature?" andnot\Querythe
thermometer". In our demonstrator, the
(gram-mar derived) semantic values simply reect
di-rectly therelevantdevice operations:hop=query
device=thermometeri. Thestrategysupportsthe
simple natural interactions it is designed to. It
eveninteractstolerablywellwithourellipsis and
reference resolutionmethods. \What is the
tem-perature in the hall? And in the living room?"
and \Whatisthetemperatureinthehall? What
is it in theliving room?" can both be correctly
interpreted. Other interactions are less natural.
The defaultoutput whenresolution cannot
iden-tify a device N is \I don't know which N you
mean". However,askingforthetemperatureina
roomwithseveral thermometersshould probably
notresultin\Idon'tknowwhichtemperatureyou
mean". Itfollowsthatprescribingallbehaviourin
aPlugandPlayfashionisasignicantconstraint.
Indeed,amoregeneralpointcanbemadehere.
A problem hasarisenbecausetheinference from
servicerequiredtoserviceproviderhasbecome
in-secure in the presence ofother service providers.
Inthehighlynetworkedhomesofthefuture,more
sophisticated inference may be required just
In this section we assume that semantic
values consist of 5-ary slot-structure with
slots deviceclass (dimmable light, TV ...),
deviceattributes(kitchen,blue ...),pronoun
(as before) and device-specifier (as before)
and operation. An operation is the action
the user wants carried out, e.g. switchon,
setlevel(X), where X is a real number (for
dimmable lights), setprogram(Y) etc. (for
Hi-Fis,TVs),and soon. Asin thegrammar,device
classes are ordered in a hierarchy in a standard
object-orientedway. Thus \dimmable light" is a
subclass of \dimmable device"and inherits from
it.
For strong plug and play, at least the
follow-ing information must beloaded bya device into
the dialogue manager: the device interface (e.g.
thatthe\switchablelight"classhasaswitchon
method; the feedbackto theuser; the update to
thesystem'sdevicemodelgeneratedbyexecuting
thecommand. Clearly,behaviourcanalsodepend
onthe currentstate. Reactionto \switchon the
kitchenlight"dependsonwhetherthelampiso,
on,andwhetherthereisakitchenlight. Wewrite
a rule as command(A,B,C,D)where A is a
com-mand,BisaclassofdevicesforwhichAis
appli-cable,Cisalistofdeviceattributeswhosevalues
mustbeknowninordertoexecuteA,Disalistof
items describinghow thesystem should reacton
A.Eachitem hasthefollowingcomponents:
precondition(X) | X is an executable
proce-durethatteststhenetworkstateandreturns
trueorfalse. Iftrue,theitemcan`re'.
action(Y) |Yistheprocedureto execute
feedback(Z,R) |Zisthefeedbackfortheuser
and candepend on R, thereturn valuefrom
thedeviceoperation
upd(W,R) | W describes how the system's
model of the network state should be
up-dated. AlsoWcandependonR.
For example switching on
a light might be encoded as
command(switch;switchablelight;[id=ID];D)
where D is a list of items in the above form of
which one describes behaviour when the light is
alreadyothus:
[ precondition(lightoff(ID));
action(switchonlight(ID));
feedback(lightnowon(ID);success);
feedback(couldnotswitch(ID);error);
upd([devupdate(ID;status=1)];success);
provided bythe lamp. Thefeedbackto the user
and theupdate rulesdependontheresult ofthe
switchon lightprocedure.
6 Conclusion
Applying theideaofplug andplaytospoken
di-alogue interfaces poses a number of interesting
and importantproblems. Sincethelinguisticand
dialogue management information is distributed
throughout the network,aplug and playsystem
must updateitsspeechinterfacewheneveranew
device is connected. In this paper, we have
fo-cussed in particular ondistributed grammars for
plug and play speech recognition which we have
integratedintoourdemonstratorsystem. Wehave
also examinedsomeissuesand describeda
possi-bleapproachtodistributeddialoguemanagement
whichweplanto undertakeinfurtherwork.
Acknowledgments
We are very grateful to our partners in the
DHommeprojectfordiscussionoftheaboveideas
- especially on theimportance and roleof
dier-ing strengths of Plug and Play. The DHomme
projectpartnersincludenetdecisionsLtd,SRI
In-ternational,TeliaResearchAB,andthe
Universi-ties ofEdinburgh, GothenburgandSeville.
References
N.M.FraserandJ.H.S.Thornton. 1995. Vocalist:
A robust, portable spoken language dialogue
system for telephone applications. In Proc. of
Eurospeech '95,pages1947{1950,Madrid.
Gerald Gazdar, Ewan Klein, Georey Pullum,
andIvanSag. 1985. GeneralizedPhrase
Struc-tureGrammar. HarvardUniversityPress,
Cam-bridge,MA.
J.R Glass. 1999. Challengesfor spoken dialogue
systems. InProc.IEEEASRUWorkshop,
Key-stone,CO.
A.GoldschenandDLoehr. 1999. Theroleofthe
darpa communicator architecture as a human
computer interfacefor distributed simulations.
In 1999 SISO Spring Simulation
Interoperabil-ityWorkshop, Orlando, Florida, March 1999.
SpeechWorks Int Inc, 2001. SpeechWorks.
http://www.speechworks.com. Asat31/01/01.
B. Kiefer and H. Krieger. 2000. A context-free
approximationofhead-drivenphrase structure
grammar. InProceedings of 6th Int.Workshop
for conversational systems. In Proceedings of
IJCAI'99 Workshop on Knowledge &
Reason-ingIn Practical Dialogue Systems,Stockholm.
S.LarssonandD.Traum. 2000. Informationstate
anddialoguemanagementinthetrindidialogue
moveenginetoolkit. Nat.Lang. Engineering, 6.
Microsoft,2000. Universal Plug andPlay Device
Architecture. http://www.upnp.org. Version
1.0,8June2000.
D.Milward. 2000. Distributingrepresentationfor
robustinterpretationofdialogueutterances. In
Proc. of38thACL, HongKong,pages133{141.
R. Moore. 1998. Using naturallanguage
knowl-edgesourcesin speechrecognition. In
Proceed-ingsof the NATOAdvancedStudiesInstitute.
Nuance Communications, 1999. Nuance Speech
RecognitionSystemDeveloper'sManualversion
6.2. 1380WillowRoad,MenloPark,CA94025.
S. Oaks and H. Wong. 2000. Jini in a Nutshell.
O'Reilly.
Georey K. Pullum and Gerald Gazdar. 1982.
natural languages and context-free languages.
Llinguistics andPhilosophy,4:471{504.
M.Rayner,J.Dowding,andB.A.Hockey. 2001a.
Abaselinemethodforcompilingtyped
unica-tiongrammarsintocontextfreelanguage
mod-els. Proc.Eurospeech2001,Aalborg.
M.Rayner,G.Gorrell,B.A.Hockey,J.Dowding,
and J. Boye. 2001b. Do cfg based language
models need agreement constraints? In
Pro-ceedings of 2ndNAACL,Pittsburgh.
S.Suttonetal. 1998. Universalspeechtools: The
cslu toolkit. In Proc. ICSLP-98, pages 3221{
3224.
Networkstatus: Thereisno TV.ThereisaVCR
in the TVroom.
U1 Switchonthetelevision
S1 Sorry,tryagain
U2 Add new component (this must precede
deviceinstallation)
Now, aTV is plugged intothe living room.
Cur-rently we simulate this in asimple text interface,
where devices and rooms are chosen from
prede-nedlists. The speech interface isthen
automati-callyrecongured(thiscurrentlytakes29seconds)
U3 Switchonthetelevision
S3 TheTVisnowon
U4 Switchonthetelevision
S4 TheTVisalreadyon
U5a Addnewcomponent
Now, another TV ispluggedintothe kitchen. No
speechinterfacerecongurationrequiredthis time
U5 Switchonthetelevision
S5 Theliving-roomTVisalreadyon
U6 SwitchontheVCR
S6 TheVCRisnowon
U7 Switchonthetelevision
S7 Sorry,I don'tknowwhichTVyoumean
U8 Thetelevisioninthekitchen
S8 ThekitchenTVisnowon
U9a Addnewcomponent
Now, on-o lights are installed in the hall and
bathroom. Recongurationtimeis27 seconds
U9 Switchonthelightin thehall
S9 Thehalllightisnowon
U10 Dimthelighttoftypercent
S10 Sorry,tryagain.
U11a Addnewcomponent
Now, a dimmablelight isinstalled inthe kitchen.
Recongurationtimeis39 seconds
U11 Dimthekitchenlightto 50percent
S11 Thelightinthekitchen isnowonatfty
percent
U12 Dimthelightin thehall
S12 I'msorry,thehalllightcanonlybeonor
o
U13 Turnonthelightinthekitchen(misheard
as\increasethelightinthekitchen")