WebSphere Voice Server for Multiplatforms. VoiceXML Programmer s Guide

(1)

WebSphere

Voice

Server

for

Multiplatforms

VoiceXML

Programmer’s

Guide

(2)

(3)

WebSphere

Voice

Server

for

Multiplatforms

VoiceXML

Programmer’s

Guide

(4)

Note

Beforeusingthisinformationandtheproductitsupports,besuretoreadtheinformationin“Notices”onpage 185

SixthEdition(November2005)

Thiseditionappliestoversion5release1ofIBM®_WebSphere®_Voice_Server_for_{Multiplatforms}_and_to_all_subsequent

releasesandmodificationsuntilotherwiseindicatedinneweditions.IBMmaypublishoneormoreneweditionsof thispublicationinadownloadableformataftertheprogramisgenerallyavailable.Toobtainthemostrecentedition ofthispublication,gototheWebsiteat

http://www.elink.ibmlink.ibm.com/public/applications/publications/cgibin/pbi.cgi ©CopyrightInternationalBusinessMachinesCorporation2000,2005.Allrightsreserved.

(5)

figures

1. Spoke-too-soon(STS)incident . . . . 66 2. Spoke-way-too-soon(SWTS)incident 66

(10)

(11)

List

of

tables

1. Samplescriptfor“WizardofOz”testing 16

2. Whentouseaspeechinterface. . . . 20

3. Barge-inEnabledversusDisabled 21 4. Barge-indetectionmethods . . . 22

5. Audioformatting . . . 28

6. Simpleversusnaturalcommand grammars . . . 29

7. Promptstyles. . . 31

8. Mixedinputmodes. . . 33

9. Recommendedlistofglobalcommand types . . . 35

10. Useofhumanagents . . . 39

11. Comparisonofhelpstyles . . . 40

12. Successfuluseofexample . . . 46

13. Unsuccessfuluseofexample—switchto directeddialog . . . 46

14. Recommendedmaximumnumberof menuitems . . . 49

15. Recognitionerrorswhenspelling 56 16. Grammarword/phraselengthtrade-offs 60 17. Vocabularyrobustnessandgrammar complexitytrade-offs . . . 60

18. Numberofactivegrammartrade-offs 61 19. Error-recoverytechniques. . . 64

20. SummaryofVoiceXMLelementsand attributes . . . 82

21. SummaryofSSMLelementsand attributes . . . 90

22. Built-intypesforUSEnglish . . . . 92

23. Propertiesforcapturingaudioduring speechrecognition . . . 95

24. Predefinedeventsandeventhandlers forUSEnglish . . . 97

25. Variablescope . . . 100

26. Examplesofrelationaloperators 102 27. Shadowvariables . . . 103

28. SpeechRecognitionGrammar limitationsinWebSphereVoiceServer . 104

29. Formgrammarscope. . . 110

30. Incompletetimeout. . . 116

31. Completetimeout . . . 116

32. SampleinputforUSEnglishbuilt-in fieldtypes . . . 128

33. CanadianFrenchbuilt-intypes 139 34. CanadianFrenchpredefinedevents andevent-handlermessages . . . . 145

35. CanadianFrenchbuilt-inVoiceXML browsercommands . . . 145

36. SampleinputforCanadianFrench built-infieldtypes. . . 146

37. LimitationsforCanadianFrenchSSML elements . . . 147

38. Germanbuilt-intypes . . . 149

39. Germanpredefinedeventsand event-handlermessages . . . 153

40. Germanbuilt-inVoiceXMLbrowser commands . . . 154

41. SampleinputforGermanbuilt-infield types . . . 155

42. Japanesebuilt-intypes . . . 157

43. Japanesepredefinedeventsand event-handlermessages . . . 160

44. Japanesebuilt-inVoiceXMLbrowser commands . . . 160

45. SampleinputforJapanesebuilt-infield types . . . 161

46. LimitationsforJapaneseSSML elements . . . 162

47. Koreanbuilt-infieldtypes . . . 164

48. Koreanpredefinedeventsand event-handlermessages . . . 168

49. Koreanbuilt-inVoiceXMLbrowser commands . . . 168

50. SampleinputforKoreanbuilt-infield types . . . 169

51. LimitationsforKoreanSSMLelements 170 52. SimplifiedChinesebuilt-intypes 171 53. SimplifiedChinesepredefinedevents andevent-handlermessages . . . . 174

54. SimplifiedChinesebuilt-inVoiceXML browsercommands . . . 175

55. SampleinputforSimplifiedChinese built-infieldtypes. . . 175

56. LimitationsforSimplifiedChinese SSMLelements. . . 176

57. UKEnglishbuilt-intypes . . . 177

58. SampleinputforUKEnglishbuilt-in fieldtypes . . . 183

(12)

(13)

About

this

book

ThisbookprovidesinformationaboutusingVoiceXMLVersion2.0and2.1to designanddevelopvoiceapplications.Theresultingapplicationscanthen be deployedina telephonyenvironmentusingIBM® _WebSphere® _Voice_Server _to providevoiceaccesstoWeb-baseddatausingstandardtelephones.

Who

should

read

this

book

Readthis bookifyouare:

v

Someonewho wantstofindoutabouttheadvantages ofusingVoiceXML todeliverWeb-based voiceservices

v Anapplication developerinterested increatingVoiceXMLapplications

v AcontentcreatorresponsibleforthecreativeaspectsofVoiceXML applications

Reference,design,andprogramminginformationforcreating voice applicationsisavailablefroma varietyofsources,asrepresentedbythe documentslistedinthissection.

Note: Guidelinesandpublicationscitedinthis bookare foryourinformation onlyanddonotinanymannerserveasanendorsementof those materials.Youaloneare responsiblefordeterminingthesuitabilityand applicabilityofthisinformationtoyour needs.

Specifications

and

standards

Youmaywanttorefertothefollowingsourcesforinformationaboutrelevant specificationsandstandards:

v VoiceExtensibleMarkupLanguage(VoiceXML)Version2.0specification, publishedbyW3Candavailable athttp://www.w3.org/TR/voicexml20/

v SpeechSynthesisMarkupLanguageVersion1.0specification,publishedby W3Candavailable athttp://www.w3.org/TR/speech-synthesis/

v

SpeechRecognitionGrammarSpecification(SRGS) Version1.0,published by W3Candavailable athttp://www.w3.org/TR/speech-grammar/

v SemanticInterpretationforSpeechRecognitionW3CWorkingDraft1 April2003, publishedbyW3Candavailable at http://www.w3.org/TR/semantic-interpretation/

(14)

v HTTPStateManagementMechanism (CookieSpecification)available at http://www.w3.org/Protocols/rfc2109/rfc2109

v ECMAStandard262:ECMAScriptLanguageSpecification,3rdEdition, publishedbyECMAathttp://www.ecma.ch/ecma1/stand/ECMA-262.htm

v TheInternationalPhoneticAlphabet,publishedbytheInternationalPhonetic Associationathttp://www2.arts.gla.ac.uk/IPA/ipachart.html

v TheUnicodeStandardVersion3.0,publishedbyTheUnicodeConsortium at http://www.unicode.org

Speech

user-interface

design

Thespeechuserinterfaceguidelinespresented inthisbookareanevolving setofrecommendationsbased onindustryresearchandlessons learnedinthe processofdevelopingourownVoiceXMLandtelephonyapplications.For moreinformation,refertospeechindustryliteratureandpublicationssuchas thefollowingsources:

v AudioSystemforTechnicalReadings(ASTeR)byT.V.Raman,aPh.D.Thesis publishedbyCornellUniversity,May1994.

v AuditoryUserInterfaces—TowardsTheSpeakingComputerbyT.V.Raman, publishedbyKluwerAcademicPublishers,August1997.

v “DirectingtheDialog:TheArt ofIVR”byMyraHambleton,publishedin

SpeechTechnology,Feb/Mar 2000.

v HandbookofHuman-ComputerInteractionbyThomasKLandauer,Martin Helander,andPrasad V.Prabhu,published byElsevierScience,Amsterdam, North-Holland,June1997.

v

HowtoBuildaSpeechRecognitionApplication:AStyleGuideforTelephony

Dialogues(SecondEdition)byBruceBalentine,DavidP.Morgan,and WilliamS.Meisel,publishedbyEnterpriseIntegrationGroup,SanRamon, CA,2001.

v HumanFactors andVoice InteractiveSystemsbyDaryleGardner-Bonneau, publishedbyKluwerAcademicPublishers, Boston,MA,March1999.

Server-side

programming

Informationaboutserver-sideprogrammingisavailablefroma numberof sources,includingthefollowing:

v BuildingBlocksforCGIScriptsinPerlat

http://www.cc.ukans.edu/~acs/docs/other/cgi-with-perl.shtml

v Designand ImplementServlets,JSPs,and EJBsforIBM WebSphereApplication

Server(IBMRedbook)SG24-5754-00

v Developingane-businessApplicationfortheIBM WebSphereApplicationServer (IBMRedpiece)SG24-5423-00

v JavaServerPages(JSP)athttp://java.sun.com/products/jsp

v JavaServletathttp://java.sun.com/products/servlet/

(15)

v TheASP (ActiveServerPages)ResourceIndexat http://www.aspin.com/index/default.asp

v TheFrontofIBMWebSphereBuildinge-businessUserInterfaces (IBMRedbook) SG24-5488-00

v WebSphereV4.0AdvancedEditionHandbook(IBM Redbook)SG24-6176-00

v WebSphereVersion4ApplicationDevelopmentHandbook(IBMRedbook) SG24-6134-00

Deployment

information

Forinformationaboutdeployingyour voiceapplications,refertothe

documentationprovidedwith WebSphereVoice ServerforMultiplatformsand WebSphereVoice ResponseforAIX®.

Thefollowingdocumentationisprovidedinsoftcopyonlywiththeproduct, orcanbedownloadedfromtheIBMPublicationsCenter:

v IBMText-to-SpeechSSMLProgrammingGuideVersion6.7.3.0 WebSphereVoiceServerforMultiplatforms

v WebSphereVoiceServerforMultiplatforms:Administrator’sGuide,G210-1561

v WebSphereVoiceServerforMultiplatforms:ApplicationDevelopmentusingState

Tables,G210-1562

WebSphereVoiceResponseforAIX

v WebSphereVoiceResponseforAIX:GeneralInformation andPlanning, GC33-1840

v

WebSphereVoiceResponseforAIX:Installation,GC33-1842

v WebSphereVoiceResponseforAIX:UserInterfaceGuide,SC33-1841

v WebSphereVoiceResponseforAIX:ConfiguringtheSystem,SC33-1843

v WebSphereVoiceResponseforAIX:Managingand MonitoringtheSystem, SC33-1844

v WebSphereVoiceResponseforAIX:Designingand ManagingStateTable

Applications,SC33-1845

v WebSphereVoiceResponseforAIX:ApplicationDevelopmentusingStateTables, SC33-1846

v WebSphereVoiceResponse: DevelopingJavaApplications,GC34-6318

v

WebSphereVoiceResponse: DeployingandManaging VoiceXMLandJava

Applications,GC34-6319

v WebSphereVoiceResponse: ApplicationDevelopmentusing Javaand VoiceXML, GC34-6049

v WebSphereVoiceResponseforAIX:CustomServers,SC33-1847

v WebSphereVoiceResponseforAIX:3270Servers,SC33-1848

(16)

v WebSphereVoiceResponseforAIX:FaxusingBrooktrout,SC34-5982

v

WebSphereVoiceResponseforAIX:CiscoICMInterfaceUser’sGuide,SC34-5317

v WebSphereVoiceResponseforAIX:ProgrammingfortheADSIFeature, SC34-5380

v WebSphereVoiceResponseforAIX:ProgrammingfortheSignalingInterface, SC33-1851

How

this

book

is

organized

Chapter1,“IntroductiontoVoiceXML,”onpage1 providesanintroductionto voiceapplicationsand VoiceXML.

Chapter2,“Designing aSUI(SUI),”onpage9 presentsuser-interface guidelinesfordevelopingvoiceapplications.

Chapter3,“VoiceXMLlanguage,”onpage75introducesbasicconceptsand constructsofVoiceXML.Forcompletesyntax,pleaserefertotheVoiceXML2.0 specificationathttp://www.w3.org/TR/voicexml20/

Chapter4,“Hints,tips, andbestpractices,”onpage123contains hintsand tipsforstructuringandcoding yourVoiceXMLapplications.Itincludes examplesofthefollowing:useoftheVoiceXML<object>elementtoreference Javacode;useof the<property>element’s maxnbestnametoobtainn-best results;and,useofaprecompiledgrammar.

AppendixA, “CTTScaching,”onpage137describesaudio cachingfor improvingtheperformanceofapplicationsthatuseconcatenative text-to-speech(CTTS)synthesis.

Youmight wanttorefertothefollowingappendixesforlanguage-specific information:

v AppendixC, “German,”onpage149 containsinformationthatisspecific to German.

v AppendixD, “Japanese,”onpage157 containsinformationthatisspecific toJapanese.

v AppendixF,“SimplifiedChinese,” onpage171 containsinformationthatis specifictoSimplifiedChinese.

v AppendixG,“UKEnglish,” onpage177 containsinformationthatis specifictoUKEnglish.

“Notices”onpage185 containsnoticesand trademarkinformation. “Glossary”onpage189defineskeyterminologyusedinthisdocument.

(17)

Document

conventions

and

terminology

Thefollowingconventionsareusedtopresentinformationinthisdocument: Italic Usedforemphasis,toindicatevariabletext,andforreferencestoother

documents.

Bold Usedforconfigurationparameters,filenames,URLs,anduser-interface controlssuchascommandbuttonsandmenus.

Monospaced Usedforsamplecode.

<text> Usedforeditorialcommentsinscripts.

Making

comments

on

this

book

Ifyouespeciallylike ordislikeanythingaboutthisbook,feelfreetosend us yourcomments.

Youcancommentonwhatyouregard asspecific errorsoromissions,andon theaccuracy, organization,subjectmatter,orcompletenessofthis book.Please limityourcomments totheinformationthatisinthisbookandtothewayin whichtheinformationispresented.SpeaktoyourIBMrepresentativeif you havesuggestionsabouttheproductitself.

Whenyousenduscomments,yougranttoIBM anonexclusiverighttouseor distributetheinformationinanywayit believesappropriatewithout

incurringanyobligationtoyou.

Youcanget yourcommentstous quicklybysendingan e-mailto [email protected].Alternatively,youcanmailyour commentsto: User Technologies

IBM United Kingdom Laboratories, Mail Point 095, Hursley Park,

Winchester, Hampshire, SO21 2JN, United Kingdom

(18)

(19)

Chapter

1.

Introduction

to

VoiceXML

Thisintroductionaddressesthefollowingquestions:

v “Whatare voiceapplications?”

v “Whycreatevoiceapplications?”onpage2

v “Whatare typicaltypesofvoiceapplications?”onpage2

v “WhatisVoiceXML?”onpage3

v

“Whatare theadvantages ofVoiceXML?”onpage4

v “Howdo youcreateanddeploy aVoiceXMLapplication?”onpage5

v

“Howdo usersaccessthedeployedapplication?”onpage6

What

are

voice

applications?

Voiceapplicationsareapplicationsinwhichtheinputand/oroutputare througha spoken,ratherthanagraphical,user interface.Theapplicationfiles canresideonthelocalsystem,anintranet,ortheInternet.Userscanaccess thedeployed applicationsanytime,anywhere, fromanytelephone.

“Voice-enablingtheWorldWideWeb”doesnotsimplymean:

v

Usingspokencommandstotellavisualbrowsertolookupa specificWeb addressorgo toaparticularbookmark

v Havingavisualbrowserthrowawaythegraphicsona traditionalvisual Webpageandread therestoftheinformationaloud

v Convertingtheboldoritalicsona visualWebpagetosomekindof emphasizedspeech

Rather,voiceapplicationsprovideaneasyand novelwayforuserstosurfor shopontheInternet—“browsing byvoice.”Users caninteractwith Web-based data(thatis,dataavailablevia Web-stylearchitecturesuchasservlets,ASPs, JSPs,JavaBeans,CGIscripts,etc.)usingspeechratherthanakeyboardand mouse.

Theformthatthisspokendatatakesisoftennotidenticaltotheformit takes inavisualinterface,due totheinherent differencesbetweentheinterfaces. Forthisreason, transcoding—thatis, usinga tooltoautomaticallyconvert HTMLfilestoVoiceXML—maynotbethemosteffectivewaytocreatevoice applications.

(20)

Why

create

voice

applications?

Untilrecently,theWorldWideWebhasreliedexclusivelyonvisualinterfaces todeliverinformationandservices tousersviacomputersequipped witha monitor,keyboard,andpointingdevice.Indoingso,a hugepotential customerbasehasbeenignored:peoplewho (duetotime,location,and/or costconstraints)donothaveaccesstoacomputer.

Manyofthese peopledo,however,haveaccesstoatelephone.Providing “conversationalaccess”(thatis,spokeninputandaudio outputovera telephone)toWeb-baseddatawillpermitcompanies toreachthisuntapped market.Users benefitfromtheconvenienceofusingthemobileInternet for self-servicetransactions,whilecompaniesenjoytheWeb’srelatively low transactioncosts.And,unlikeapplicationsthatrelyondualtone

multi-frequency(DTMF)(telephonekeypress)input,voiceapplicationscanbe usedina hands-freeoreyes-freeenvironment,aswellasbycustomerswith rotarypulsetelephoneserviceortelephonesinwhichthekeypadisonthe handset.

What

are

typical

types

of

voice

applications?

Voiceapplicationswilltypicallyfallintooneof thefollowingcategories:

v “Queries”

v “Transactions”onpage3

Queries

Inthis scenario,a customercallsintoa systemtoretrieveinformationfroma Web-basedinfrastructure.

Thesystemguidesthecustomerthrough aseries ofmenusand formsby playinginstructions,prompts,andmenu choicesusingprerecordedaudiofiles orsynthesizedspeech.

ThecustomerusesspokencommandsorDTMFinputtomakemenu selectionsandfillinformfields.

Basedonthecustomer’sinput,thesystemlocatestheappropriaterecordsina back-endenterprisedatabase.Thesystempresents thedesiredinformationto thecustomer,either byplayingbackprerecordedaudiofilesorby

synthesizingspeechbased onthedataretrievedfromthedatabase.

Examplesofthistype ofself-serviceinteractionincludeapplicationsorvoice portalsprovidingweatherreports,movielistings,stock quotes,

health-care-providerlistings,andcustomer serviceinformation(Webcall centers).

(21)

Transactions

Inthis scenario,a customercallsintoa systemtoexecutespecifictransactions witha Web-basedback-enddatabase.

Thesystemguidesthecustomertoprovidethedatarequiredforthe transactionbyplayinginstructions,prompts,and menuchoicesusing prerecordedaudiofiles orsynthesizedspeech.Thecustomerrespondsusing spokencommandsorDTMF input.

Basedonthecustomer’sinput,thesystemconductsthetransactionand updatestheappropriaterecordsina back-endenterprisedatabase.Typically thesystem alsoreportsbacktothecustomer,eitherbyplayingprerecorded audiofilesorbysynthesizingspeechbased ontheinformationinthedatabase records.

Examplesofthistype ofself-serviceinteractionincludeapplicationsorvoice portalsforemployeebenefits,employeetimecardsubmission,financial

transactions,travel reservations,calendarappointments,electronicrelationship management(eRM),salesautomation,andordermanagement.

What

is

VoiceXML?

TheVoiceeXtensible MarkupLanguage(VoiceXML)isan XML-basedmarkup languagefor creatingdistributedvoiceapplications,muchasHTMLisa markuplanguageforcreating distributedvisualapplications.Itisanindustry standarddefinedbytheWorld WideWebConsortium (W3C)at

http://www.w3.org/TR/voicexml21/.

TheVoiceXMLlanguageenablesWebdeveloperstouseafamiliarmarkup styleandWebserver-sidelogictodeliverapplicationsforuseovertelephone lines.TheresultingVoiceXMLapplicationscaninteractwithexistingback-end businessdataandlogic.

UsingVoiceXML,application developerscancreateWeb-basedvoice applicationsthatuserscanaccessbytelephoneor otherpervasive devices. VoiceXMLsupportsdialogs thatfeature:

v

Recognitionofspokeninput

v DTMFinput

v Recordingofspokeninput

v Synthesizedspeechoutput(“text-to-speech”)

v Pre-recordeddigitized audiooutput

v Dialogflowcontrol

(22)

v AutomaticNumberIdentification(ANI)

v

DialedNumberIdentificationService(DNIS)

v Calltransfer

What

are

the

advantages

of

VoiceXML?

Whileyoucouldcertainlybuildvoiceapplicationswithoutusingavoice markuplanguageand aspeechbrowser(forexample,bywritingyour applicationsdirectlytoaspeechAPI),usingVoiceXMLand aVoiceXML browserprovideseveralimportantcapabilities:

v VoiceXMLisamarkuplanguagethatmakesbuildingvoiceapplications easier,inthesamewaythatHTMLsimplifies buildingvisualapplications. VoiceXMLalsoreduces theamount ofspeechexpertisethatdevelopers need.

v VoiceXMLapplicationscanusethesameexistingback-endbusinesslogicas theirvisualcounterparts,enablingvoicesolutionstobe introducedtonew marketsquickly. Currentandlong-termdevelopmentandmaintenance costsareminimizedbyleveragingtheWebdesignskillsandinfrastructures alreadypresentintheenterprise.Customerscanbenefitfroma consistency ofexperiencebetweenvoiceandvisualapplications.

v VoiceXMLimplementsa client/serverparadigm,whereaWebserver providesVoiceXMLdocuments thatcontaindialogs tobeinterpretedand presentedtoa user.Theuser’sresponsesare submittedtotheWebserver, whichrespondsbyprovidingadditionalVoiceXMLdocuments,as

appropriate.VoiceXMLallowsyoutorequestdocumentsand submitdata toserverscriptsusingUniversalResource Identifiers(URIs).VoiceXML documentscanbestatic, ortheycanbedynamicallygeneratedbyCGI scripts,JavaBeans,ASPs,JSPs,Javaservlets, orotherserver-sidelogic.

v Unlikea proprietaryInteractiveVoice Response(IVR)system,VoiceXML providesanopenapplication developmentenvironmentthatgenerates portableapplications.ThismakesVoiceXMLacost-effectivealternativefor providingvoiceaccessservices.

v MostinstalledIVRsystemstodayacceptinputfromthetelephonekeypad only.Incontrast,VoiceXMLisdesignedpredominantlytoacceptspoken input,butitcanalsoacceptDTMFinput,ifdesired.Asa result,VoiceXML helpsspeedupcustomer interactionsbyprovidingamore naturalinterface thatreplacesthetraditional, hierarchicalIVRmenutreewitha streamlined dialogusinga flattenedcommandstructure.

v VoiceXMLdirectlysupportsnetworkedandWeb-based applications,

meaningthatauser atonelocationcanaccessinformationoranapplication providedbya serveratanothergeographicallyororganizationallydistant location.Thiscapitalizesontheconnectivityand commercepotentialofthe WorldWideWeb.

(23)

v Usinga singleVoiceXMLbrowsertointerpretstreamsofmarkuplanguage originatingfrommultiple locationsprovidestheuserwith aseamless conversationalexperienceacrossindependentapplications.Forexample,a voiceportalapplication mightallowa usertotemporarilysuspendan airlinepurchasetransactiontointeractwitha bankingapplicationona differentservertocheckanaccountbalance.

v VoiceXMLsupportslocalprocessingand validationofuser input.

v VoiceXMLsupportsplaybackofprerecordedaudiofiles.

v VoiceXMLsupportsrecordingof userinput.Theresultingaudiocanbe playedbacklocallyoruploadedtotheserverforstorage,processing,or playbackatalatertime.

v

VoiceXMLdefinesaset ofeventscorrespondingtosuchactivitiesasa user requestforhelp,thefailureofa usertorespondwithin atimeoutperiod, andanunrecognizeduserresponse.AVoiceXMLapplicationcanprovide catchelementsthatrespondappropriatelytoa giveneventforaparticular context.

v VoiceXMLsupportscontext-specificand taperedhelpusinga systemof eventsandcatchelements.Helpcanbetapered byspecifyinga countfor eacheventhandler,sothatdifferentevent handlersareexecuteddepending onthenumber oftimesthattheevent hasoccurred inthespecifiedcontext. Thiscanbe usedtoprovideincreasingly moredetailedmessageseachtime theuserasksfor help.For moreinformation,see“Choosinghelpmodeor self-revealinghelp”onpage40.

v VoiceXMLsupportssubdialogs,whichareroughlytheequivalentof functionormethod calls.Subdialogscanbeusedtoprovidea disambiguationorconfirmationdialog,andtocreatereusable dialog components.Formoreinformation,see“Subdialogs” onpage79.

How

do

you

create

and

deploy

a

VoiceXML

application?

1. An applicationdevelopercreatesavoiceapplication writteninVoiceXML. YoucanwriteVoiceXMLapplicationsusinga texteditorbutyoumight find itmoreconvenienttousea graphicaldevelopment environmentthat helpsyoucreate,manageand testVoiceXMLfiles.TheVoice Toolkitfor WebSphereStudio(Voice Toolkit)supportsthedevelopment of

VoiceXML-basedapplications.

VoiceXMLpagescanbestaticormaybegenerateddynamicallyfromCGI scripts, JavaBeans,ASPs,JSPs,Javaservlets,orotherserver-side

techniques.

2. (Optional)ThedeveloperpublishestheVoiceXMLapplication (VoiceXML documents,grammarfiles,anyprerecordedaudiofiles,andany

(24)

3. (Optional)Thedeveloperusesa desktopworkstationandtheVoiceToolkit totest theVoiceXMLapplicationrunningontheWebserverorlocaldisk, pointingtheVoiceXMLbrowsertotheappropriatestartingVoiceXML page.

4. Atelephony expertconfigures thetelephonyinfrastructureforWebSphere Voice ResponseforAIX. SeeWebSphereVoiceResponseforAIX:Installation

and WebSphereVoiceResponseforAIX: ConfiguringtheSystemfor instructions.

5. Thesystem administratorusesWebSphereVoiceResponseforAIXto configure,deploy,monitor,and managea dedicatedWebSphereVoice Server system.WebSphereVoice Response’stelephone networkconnection provides theaudio channelsfortheVoiceXMLbrowser.

6. Thedeveloperusesarealtelephonetotest theVoiceXMLapplication usingWebSphereVoiceServer.

How

do

users

access

the

deployed

application?

Onceyourvoiceapplicationsaredeployed,userssimplydialthetelephone numberthatyouprovideand areconnectedtothecorrespondingvoice application.Thefigurebelow showsaflowchartofatypical call.

(25)

Answer the telephone call

Play a prompt

Wait for the caller’s response

Take action as directed by the caller

Complete the interaction

1. Auserdialsthetelephonenumberyouprovide.WebSphereVoice Responseanswersthecallandexecutestheapplicationreferenced bythedialedphonenumber.

2. WebSphereVoiceServerplaysagreetingtothecallerandprompts thecallertoindicatewhatinformationheorshewants.

v Theapplicationcanuseprerecordedgreetingsandprompts,or theapplicationcanhavethegreetingorpromptsynthesized fromtextusingthetext-to-speechengine.

v Iftheapplicationsupportsbarge-in,thecallercaninterruptthe promptifheorshealreadyknowswhattodo.

3. Theapplicationwaitsforthecaller’sresponseforasetperiodof time.

v Thecallercanrespondeitherbyspeakingorbypressingoneor morekeysonaDTMFtelephonekeypad,dependingonthe typesofresponsesexpectedbytheapplication.

v Iftheresponsedoesnotmatchthecriteriadefinedbythe application(suchasthespecificword,phrase,ordigits),the voiceapplicationcanpromptthecallertoentertheresponse again,usingthesameordifferentwording.

v Ifthewaitingperiodhaselapsedandthecallerhasnot responded,theapplicationcanpromptthecalleragain,using thesameordifferentwording.

4. Theapplicationtakeswhateveractionisappropriatetothecaller’s response.Forexample,theapplicationmight:

v Updateinformationinadatabase

v Retrieveinformationfromadatabaseandspeakittothecaller v Storeorretrieveavoicemessage

v Launchanotherapplication v Playahelpmessage

Aftertakingaction,theapplicationpromptsthecallerwithwhat todonext.

5. Thecallerortheapplicationcanterminatethecall.Forexample: v Thecallercanterminatetheinteractionatanymomentby

hangingup.WebSphereVoiceResponsecandetectifthecaller hangsupandcanthendisconnectitself.

v Iftheapplicationpermitsit,thecallercanuseacommandto indicateexplicitlythattheinteractionisover(forexample,by saying“Exit”).

v Iftheapplicationhasfinishedrunning,itcanplayaclosing messageandthendisconnect.

(26)

(27)

Chapter

2.

Designing

a

SUI

(SUI)

Thischaptercoversthefollowingtopics:

v “Introduction”

v “TheimportanceofSUIdesign”onpage10

v “Designmethodology”onpage13

v “Gettingstarted—high-leveldesigndecisions”onpage19

v

“Gettingspecific—low-leveldesigndecisions”onpage43

v “Advanceduserinterfacetopics”onpage71

Introduction

TheSUIguidelinespresentedherearejustthat:guidelines.Insomecases,the requirementsand objectivesofparticularVoiceXMLapplicationsmaypresent validreasonsforoverridingcertainguidelines.Furthermore,these guidelines addressthedesignpointswehavefoundtobethemostimportantin

producingspeech/audiouserinterfaces, butarenotascomprehensiveas thosefoundinabookdedicatedto thetopic.Finally,keepinmindthatwhile thefollowingguidelinescanhelpyouproduceausableapplication,theydo notguarantee usabilityorusersatisfaction; youshouldplantoconduct usabilitytestswithyourapplication. (See“Designmethodology”onpage13.) Note: Theguidelinesandreferenced publicationspresented inthisbookare

foryour informationonly,anddo notinanymannerserveasan endorsementofthosematerials.Youaloneare responsiblefor

determiningthesuitabilityand applicabilityofthisinformationtoyour needs.

Thegoalsoftheguidelinespresentedinthis chapterinclude:

v Helpingyoucreatestandardized,well-behavedVoiceXMLapplications

v Reducingdevelopment timebyteachingcurrentbestpracticesinSUI(SUI) design

v

Increasingtheusabilityof theSUIandreducingtheenduser’slearning curvebypromotingconsistentcomputeroutputand predictableuserinput

(28)

The

importance

of

SUI

design

The

bases

of

SUI

design

EffectiveSUIdesigndrawsuponmanydisciplines.Thekeyscientific disciplinesare Psychology,Human-ComputerInteraction,HumanFactors, LinguisticsandCommunicationTheory.TheartisticdisciplinesofAuditory DesignandWriting (especiallythetechniquesofwritingdialog)arealso very important.Finally,fortruecraftsmanshipinSUIdesign,thereisnosubstitute forexperienceandcodificationofbestpractices(suchastheinformation providedinthischapter).

The

consumers

of

SUI

design

ASUIhasmanyconsumersandmust simultaneouslysatisfymanyobjectives. AmongtheconsumersofSUIsaremarketers,serviceproviders,endusers, anddevelopers.

Marketershavetheresponsibilityofsellingspeechapplicationstoservice providers.Theprimaryobjectiveof aSUIfromamarketer’spointofviewis toappealtothetargetedserviceprovider,thushelpingtomakethesale. Serviceprovidersrely onthespeechapplication tohelpthemprovidea servicetoendusers.For serviceproviders,theprimary objectivesof anSUI aretosavemoneyandmaintaincustomer contactand satisfaction.Tothis end,theyare alsoconcernedwith theircorporateimageandhowtheirspeech applicationswillfitintotheiroverallbrandingstrategies.

Enduserscallthespeechapplicationforthepurposeofobtaininga service fromtheserviceprovider. EnduserswantSUIsthatareeasy touse,allow efficienttaskcompletion,andprovidea pleasantuser experience.

Developersmustwritethecodethatcreatestheentirespeechapplication, includingtheSUI.Theprimaryobjectivesfor developerscreating SUIsare thattheinterface betechnologicallyfeasible,capableofcompletiongiven resourceconstraints, andrequireminimaldevelopmenteffort.

e-Service

and

speech

technology

Speechapplicationsareoneway toprovidecustomer self-serviceover electronicnetworks(thetechnologiescollectivelyknownase-Service). e-Servicesmustfocusonmeetingcustomerneedstoincreasemarketshare andrevenue.Technologiesareenablers ine-Servicethatcanenhancecustomer convenienceandsupport,butapplicationsrequirecarefullydesigned user interfacestomanage customerexpectations.

However,a keyelementofcustomerrelationship building,humaninteraction, isabsentfrommoste-Servicefacilities.Throughcarefuldesign,speech

(29)

limitationsofcurrenttechnologies,conversationswithautomated systemare verydifferentfromconversations betweenhumans. Nonetheless,excellent userinterfacedesigns workingwithcurrenttechnologiescancreatethe illusionofhuman-like interaction.Toachievethislevelof excellenceinuser interfacedesign,itisimportanttoconsider users’interpersonal

communication,cognitive,andsocialskillsin theinitial design,thentoapply usabilitytestingmethodsthroughoutthedesigncycletotunetheinterface. Goode-Servicedesignrequiresabalance ofbusinessneedsandenduser needs.Somekeybusinessneedsare:

v Provideinformationandservices.

v

Marketproductsandservices.

v Buildcustomerloyaltyandtrust.

v

Conveya positivebusinessimagetothecustomer.

v Reducecosts.

Incontrast,keycustomer(enduser) goalsare:

v Obtaininformationorperform ataskquicklyand easily.

v Havea smoothand pleasantinteraction.

Thesegoalsconvergewhendesigners applyknowledgeofhuman

conversationalbehaviortotheuserinterface.Key elementstoconsiderare:

v Organizationandcallflow

v

Languageand promptstyle

v Thesystemvoice(especiallyloudness,pitchandpitchvariation)

v

Useofnon-speechaudio(forexample,musicand audiologos)

v Socialexpectationoftheserviceproviderrole

Customer

satisfaction

with

e-Service

Accordingtorecentmarketresearch,customersare satisfiedwithself-service technologieswhenthey:

v Arebetter thanotherservicealternativesbyvirtueofsavingtime,saving money,providing morecustomercontroloravoidingserviceemployees. (68%)

v Dotheirjobsandperform asintended.(21%)

v

Solveanintensifiedneed.(11%) Customersbecomedissatisfiedwhen:

v Technologyfails.(43%)

v Applicationshavepoordesign.(36%)

v Serviceprocess fails.(17%)

(30)

Pooruserinterface designcanbetherootcauseofcustomer perceptionof theseapparentlydifferentproblems.

Service

recovery

Acommon factorthatrunsthrough customerdissatisfaction isthelackof servicerecoveryinthefaceofsystem failure.Toavoidcustomer

dissatisfaction,itisimportantto:

v Usehighquality,reliabletelephonyand speechtechnology.

v Designforcommunicationbreakdownbyexpectingmisrecognitionsto occur.

v Reinforcecorrectuserresponsesbymovingforward.

v Useahierarchyof non-intrusiveerrormessagesthatfoster animpressionof movingforwardwhilerepairingthecommunicativebreakdowninthemost appropriateway.

v Useauser centereddesignprocess ofiterativeevaluationandredesignto quicklyidentifyandavoiduserproblems.

SUI

misconceptions

Thesestatementsare notnecessarilytrue:

v Itisalways easyand naturaltouseSUIs.Afterall,almost everyonecan talk!

v Talkingtoa computerisjustliketalking toaperson.

v Usablespeechapplicationsrequirethelatestandgreatesttechnologies.

v

Barge-inisessential.Withoutbarge-in,aSUIisunusable.

v Usershatebeepsand othertonesinSUIsbecausetheyaren’tnatural.

Fundamental

SUI

design

SUIdesignisnotjustreadinga visualwebpage.Youmustdecide:

v Whattopresent.

v Howmuchtopresent.

v Howtopresentit.

v Whentopresentit.

EffectiveSUIdesignisbased on:

v Understandingcustomer profiles.

v Meetingrealisticexpectations.

v

Followinga designmethodologythatusesproventechniques.

Major

SUI

objectives

Intheirinfluential bookonSUIs,BalentineandMorgan statedthatthemain enemyofthespokenuserinterface istime. Thebasisforthis assertionisthat speechhasa temporaryexistenceandlistenersmust rememberwhatthey haveheard.Ifprompts inaspeechapplication aretooshort,however,they

(31)

canbe subjecttomultipleinterpretations.Acleardesignobjective, therefore,is toavoidmaking usershear more(orless)thantheyneedtohearorto say more(orless)thantheyneedtosay.

Itisalso importanttostrivetomakeevery interactionmovetheuserforward (oratleastcreatetheillusionofmovingforward).Thisiseasiersaidthan donebecausedialogsneedcareful craftingandusability evaluation.Towork towardthese objectives,usepromptsthataresuccinct andsincere(modeled afterthepromptsprovidedbyexpertcallcenter agents),provide

self-revealingcontextualhelpmessages,and usea professionalvoicetalent for recordedprompts.

The

power

of

the

SUI

AgoodSUIhasa natural,human-like quality.Thereisnoneedtoassociatea functionwith anumber(incontrasttotouchtone userinterfaces)whenspeech labelsprovidegoodfunctional descriptions.Systempromptsbecomemuch shorterandmorenatural,anditispossibletoaddoptionsinthefuture withoutanyneedtochangeexistingspeechlabels,eveniftheorderof functionschanges. Finally,thereisnoneedforauser tomovethephone awayfromhis orhereartofindthebuttontopress.

AHarris surveyconductedin2003supportstheclaimoftheeffectivenessof speechapplicationswith thefollowingresults:

v Speechiswidelyusedandaccepted(only7% ofrespondentsinthesurvey wouldavoidfutureuseofspeechsystems).

v

Consumersreportedhighsatisfactionwith speechexperiences(61%highly satisfiedwithmostrecentspeechinteraction).

v Consumersfeelthatspeechprovidesmanyadvantagesoverothere-Service methods(90%ofrespondentspreferredspeechtotouchtone systems).

Design

methodology

DevelopingSUIs,likemostdevelopment activities,involvesan iterative 4-phaseprocess:

v “DesignPhase”

v “Prototypephase(“WizardofOz”testing)”onpage16

v “Testphase” onpage18

v

“Refinementphase”onpage19

Becausethisprocessisiterative,youshouldattempttokeeptheinterface as fluidaspossibleforaslongaspossible.

Design

Phase

Inthis phase,thegoalistodefineproposedfunctionalityandcreateaninitial design.Thisinvolvesthefollowingtasks:

(32)

v “Analyzingyourusers”

v

“Analyzingusertasks”

v “Developingtheconceptualdesign(vision clips)”onpage15

v “Makinghigh-leveldecisions”onpage15

v “Makinglow-leveldecisions”onpage15

v “Definingthecompletecallflow”onpage15

v “Creatingtheinitial dialogscript”onpage16

v “Planningforexpertusers”onpage16 Analyzingyour users

Thefirststep indesigningyourVoiceXMLapplicationsshouldbe toconduct useranalysisto identifyanyusercharacteristicsandrequirementsthatmight influenceapplicationdesign.Forexample:

v Howfrequentlywillyour usersusethesystem?

v Whatistheirmotivationforusingthesystem?

v Inwhattypeof environmentwillyour usersusethesystem(quietoffice, outdoors,noisyshoppingmall)?

v Whattypeoftelephone connectionwillmostofyour usershave(land-line, cordless,cellular)?

v Aremanyofyour intendedusersnon-nativespeakersofthelanguagein whichtheapplication willbe written?

v

Howcomfortableareyour userswithautomated (“self-service”) applications?

v Willyourapplication bepersonalized (basedonANIinformationoruser login)?

Analyzingusertasks

Afteryouhaveidentifiedwhoyour usersare,thenext stepistodetermine whattasks yourapplicationshouldsupport.

Consider:

v Whatarethemostcommon tasksyouruserswillperform?Whattasksare lesscommon?

v

Areyour usersfamiliar withthetaskstheywillneedtocomplete?

v Willyourusersbe abletoperformthese tasksbyothermeans(inperson, usingavisualWebinterface,bycallingacustomer servicerepresentative, etc.)?

v Willyourusershavetheoptionoftransferringtoahuman operator?

v Whatwordsandphrasesdoyour userstypicallyusetodescribethetasks anditemsinyour proposedapplication?

(33)

Forexample,tasksina bankingapplicationcould includetransferringmoney, obtainingcurrentaccountbalance,listing somenumber ofmostrecent

transactions,etc.Youmight allocatethefunctionsasfollows:

v Havetheapplication locate,sort,andstore accountinformation,andmake anyroutinedecisions.For example,if theuserattemptstotransfermore moneythanisavailable,theapplicationplaysawarningmessage.

v Havetheuser confirmtransactionsand makeanynon-routineorelective decisions.Forexample,asktheusertoconfirmtheamount ofmoneyand accountnumber beforetheapplicationsubmits aformthatinitiatesa monetarytransfer.

Developingthe conceptualdesign(visionclips)

Afteridentifyingthetasks,thefirst designactivityshouldbetheconceptual developmentoftheapplication.Thisisusuallydonebywritinghigh-level scriptsofproposeduser-systeminteractionscalledvisionclips.Thereisno attemptmadetocompletelydefinethecallflow.Rather,thefocus isonthe user-systeminteractioninkeypartsofimportanttasks. Thisdesignactivity providesaninexpensivefirst steppriortoanylargeinvestmentofresource. Ina visionclip,designers createpreliminarysamplesofconversations to promotediscussionsbetweenthedesignerand thecustomer regardingthe customer’staskand userinterface expectations.Ideally,thesescriptsare recordedsocustomerscanreview thesound andfeeloftheproposed interactions.Designersofvisionclipsshouldbeveryfamiliarwith the capabilitiesofthespeechtechnologiestoavoidpreparingvision clipsthat wouldbedifficult orimpossibletodeployasapplications.

Makinghigh-leveldecisions

Thenextstep istomakehigh-levelapplicationdecisions,suchasselectingthe appropriateuserinterface,barge-instyle,promptstyle,andhelpstyle.For details,see“Gettingstarted—high-leveldesigndecisions”onpage19. Makinglow-leveldecisions

Afteryouhavemadethehigh-leveldecisions,youshouldproceedtothe lower-levelsystemdecisionsthataddress suchissuesassoundandfeel, word choice,etc.“Getting specific—low-leveldesigndecisions”onpage43provides informationtohelpyoumakethese decisions.

Definingthe completecallflow

Next,youwillwanttooutlinethecallflowthatmapstheinteractionbetween yourapplicationand theuser.Forexample:

v Whatquestionsdo youneedtoasktheuser?

(34)

Yourapplication interactionshouldhavealogicalprogressionthattakesinto accounttypicalresponses,unusualresponses,and anyerrorconditions that mightoccur.

Creatingthe initialdialogscript

Afteryouhavedefinedthecallflow,youshouldbe readytocreateaninitial draftofthescriptforthedialogbetweentheapplication andtheuser.The scriptshouldincludeallofthetextthatwillbespokenbytheapplication,as wellasexpecteduserresponses.

Planningforexpertusers

Asthefinalstep intheDesign phase,youmaywanttoidentifythepotential forexpertusersand beginconsideringwhereyoumaybeable tohelpthem cutthrough someoftheinterface toquicklyperformcommon tasks.

Prototype

phase

(“Wizard

of

Oz”

testing)

Thegoalofthisphaseistocreateaprototypeof theapplication, leavingthe designflexibleenoughtoaccommodatechanges inpromptsanddialogflow insubsequentphasesofthedesign.

Forthefirstiteration,youmaywanttousea techniqueknownas“Wizardof Oz”testing. Thistechniquecanbe usedbefore youbegincoding,asit

requiresonlyaprototype paperscriptand twopeople:onetoplaytheroleof theuser,and ahuman“wizard”toplaytheroleofthecomputersystem. Here’showitworks:

v Thescriptshouldincludetheproposedintroduction, prompts,listofglobal commands,andallplannedself-revealinghelp.ConsiderTable1 asan exampleofa scriptappropriatefor“WizardofOz”testing:

Table1.Samplescriptfor“WizardofOz”testing

Title Messagetypes Promptsandresponses

System actions Greeting Intromessage WelcometoPhonePay!You

cansayRepeatorHelpat anytime.

GotoGet Account Number.

GetAccountNumber Prompt Accountnumber?

Help1 Youraccountnumberison

theupper-rightportionof yourbill.Speakonlythe numbersontheleftsideof thedash.Youcanignore leadingzeros.

(35)

Table1.Samplescriptfor“WizardofOz”testing (continued)

Title Messagetypes Promptsandresponses

System actions

Help2 Atanytime,youcansay

Help,Repeat,GoBack,Main Menu,Exit,orTransferto Agent.Tocontinue,sayor enteryouraccountnumber.

Callerresponse <Saysorentersnumbers> Ifinput spoken,go toConfirm Account Number.If enteredvia DTMF,go toGetPIN Number.

ConfirmAccountNumber Prompt Wasthat<number>?

Help1 PleasesayYes,No,or

Repeat.

Help2 Atanytime,youcansay

Help,Repeat,GoBack,Main Menu,Exit,orTransferto Agent.Tocontinue,sayYes orNo.

Callerresponse Yes GotoGet

PIN Number.

Callerresponse No GotoGet

Account Number.

v Thetwoparticipantsshouldbe physicallyseparatedsothatthey cannotuse visualcuestocommunicate;a partitionwillsuffice,oryoucoulduse separateroomsandallowthepeopletocommunicatebytelephone.

v Thewizardmust beveryfamiliarwith thescript.Theuser shouldnever seethescript.

v Theusertelephones(orpretendstotelephone)thewizard,who begins readingthescriptaloud.Theuser respondstotheprompts,andthewizard continuesthedialogbased onthescripted responsetotheuser’sutterance. “WizardofOz”testinghelpsyoufixproblemsinthescriptandtaskflow beforecommittinganyofthedesigntocode.However, itcannotdetectother

(36)

typesofusability problems,suchasrecognitionoraudio qualityproblems; addressingproblemslinked tothesetypesof errorsrequiresa working prototype.

Test

phase

Afteryouhaveincorporatedtheresultsofthe“WizardofOz”testing,you willwanttocodeandtesta workingprototypeof theapplication.Duringthis phase,be suretoanalyzethebehaviorofbothnewand, ifapplicable,expert users.

Identifyingrecognition problems

AsyouproceedwiththeTestphase, noteanyconsistentrecognitionproblems. Themostcommoncauseofrecognitionproblemsisacousticconfusability amongthecurrentlyactivephrases.Forexample,bothMadisonandAddison areUSairports.Thus,thesepotentialuser inputstoa travelapplicationare highlyconfusable:

User: Flying from Madison

User: Flying from Addison

Sometimesthereisnothingyoucando whenthishappens.Other timesyou cantry tocorrecttheproblemby:

v Usinga synonymfor oneoftheterms. Forexample,ifthesystemis confusing“no”and“new,”youmight beable toreplace“new”with “recent,”dependingontheapplication’scontext.

v Addinga wordtooneormoreof thechoices.FortheMadison/Addison airportconfusion,youcouldmakestatesoptional inthegrammarformost cities,butrequire thestateforlow-trafficairportsthathaveacoustic confusabilitywithhigher-trafficairports.

v Planfordisambiguation bywritingcodethatincludesoraccessesdata abouttypicalacousticconfusions.For example:

System: Flying from?

User: Los Angeles <not flagged as confusable>

System: Flying to?

User: Newark <flagged as confusable with New York> System: Newark, New Jersey or New York, New York?

User: Newark, New Jersey

Identifyinganyuserinterfacebreakdowns

TheTestphaseisalso whereyouwillidentifypotentialuserinterface breakdowns.Some factorsyoumaywanttoanalyzeinclude:

(37)

v Percentageofuserswhodidnotsuccessfullycompleteyourtestscenarios

v

Percentageofuserswhotransferred toahumanoperator, whenthiswas notthedesiredoutcome

v Pointsintheapplication whereusersexperienced themostdifficulty

v Unexpecteduserbehaviors

v Effectivenessoferrorrecoverymechanisms

v Timetocompletetypicaltransactions

v Self-reportedlevelofusersatisfaction

Thefirstroundofusertestingtypicallyrevealsplaceswherethesystem’s responseneedstoberephrasedtoimproveusability.For thisreason,system promptsandothermessagesshouldbe leftflexibleforaslongaspossible,at leastuntilafter thefirst roundof usertesting.

Refinement

phase

Duringthisphase,youwillupdatetheuserinterfacebased ontheresultsof testingtheprototype.Forexample,youmayreviseprototypescripts,add taperedpromptsand customizableexpertiselevels,createdialogsfor inter-andintra-applicationinteractions.

Finally,youwillwanttoiteratetheDesign—Prototype—Test—Refineprocess. Thisincludes, intheTest phase,usersfrompreviousroundsoftestingand usersnew tothesystem.Ideally,thefinalusabilitytest shouldbe ona deployedsystemtoallowevaluationofitsaccuracy,latency,barge-in characteristicsand qualityof speechoutput.

Getting

started—high-level

design

decisions

Designinga SUIinvolvesat leasttwolevelsof designdecisions.First,you needtomakecertainhigh-leveldesigndecisionsregardingsystem-level interfaceproperties.Onlythen canyouget downtothedetailsofdesigning specificsystemprompts anddialogs.Thehigh-leveldecisions youneedto makeinclude:

v “Selectinganappropriateuserinterface”onpage20

v “Decidingonthetype andlevelofinformation” onpage20

v

“Choosingthebarge-instyle”onpage21

v “Selectingrecordedprompts orsynthesizedspeech”onpage24

v

“Decidingwhethertouseaudioformatting” onpage27

v “Usingsimpleornaturalcommandgrammars”onpage29

v “Adoptingaterseor verbosepromptstyle”onpage31

v “AllowingonlyspeechinputorspeechplusDTMF”onpage32

(38)

v “Decidingwhethertousehumanagentsinthedeployedsystem”onpage 39

v “Choosinghelpmodeorself-revealinghelp”onpage40

Thereisnosinglecorrectanswer;theappropriatedecisionsdepend onthe application,theusers,andtheusers’environment(s).Theremainderofthis sectionpresentsthetrade-offsassociatedwitheachofthesedecisions.

Selecting

an

appropriate

user

interface

Thefirstdecisionthatyoumust makeistoselecttheappropriateuser interfaceforyourapplication. Notallapplicationsarewell-suited toaSUI; someworkbestwitha visualinterface,andothersbenefitfromamulti-modal interface(thatis,botha speechanda visualinterface).

ThecharacteristicsinTable2 canhelpyoudecidewhetheryourapplication is suitedtoaSUI.

Table2.Whentouseaspeechinterface

Considerusingspeechif: Applicationsmaynotbesuitedtospeechif:

Usersaremotivatedtousethespeechinterface becauseit:

v Savesthemtimeormoney v Isavailable24hoursaday

v Providesaccesstofeaturesnotavailablethrough othermeans

v Allowsthemtoremainanonymousandavoid discussingsensitivesubjectswithahuman

Usersarenotmotivatedtousethespeech interface.

Userswillnothaveaccesstoacomputerkeyboard whentheywanttousetheapplication.

Thenatureoftheapplicationrequiresalotof graphicsorothervisuals(forexample,mapsor commerceapplicationsforapparel).

Userswanttousetheapplicationina “hands-free”or“eyes-free”environment.

Userswillbeoperatingtheapplicationinan extremelynoisyenvironment(duetosimultaneous conversations,backgroundnoise,etc.)

Usersarevisuallyimpairedorhavelimiteduseof theirhands.

Usersarehearingimpaired,havedifficulty speaking,orareinanenvironmentthatprohibits speech(forexample,acourtroom).

Deciding

on

the

type

and

level

of

information

Tokeepfromoverloadingtheuser’sshort-termmemory,information

presentedinaSUImustgenerallybe moreconcisethaninformationpresented visually.Itiscommontopresentonlythemostessentialinformationinitially, thengiveuserstheopportunitytoaccessdetailedinformation.

(39)

Forexample,considera bankingapplicationinwhicha usercanrequestalist ofrecentlyclearedchecks.Inavisualinterface,theapplicationmight returna tableshowingthechecknumber,datecleared,payeename,and amount.A similarapplicationwitha speechinterfacemight returnonlythecheck numberanddatecleared,and thenpermittheuser toselecta specificcheck numbertohearthepayeenameandamount,if desired.

Choosing

the

barge-in

style

Enablingbarge-inallowsthecomputerandtheusertospeak atthesame time,permittingtheuser’sspeechtointerruptsystemprompts asthemachine playsthem.

Onthesurface,itmight seemthatenablingbarge-inisalwayspreferableto disablingbarge-in.Itiseasy toimagineexperienceduserswantingto interruptprompts (especiallylengthyones) whentheyknowwhattosay. Thereare situations,however,inwhichasystemforwhichbarge-inhasbeen disabledwillbe aseasyoreasiertouse.

Table3 comparesimplementationswithbarge-inenabledordisabled:

Table3.Barge-inEnabledversusDisabled

Style Advantages Disadvantages

Barge-inenabled Experienceduserscaninterruptsystem promptstospeeduptheinteraction. Userscansay“Quiet”tostopthe prompt.

Note: Forcommandsinlanguagesother thanUSEnglish,seetheappropriate appendixes.

Inexperiencedusersmayinadvertently interruptthepromptbeforehearing enoughtoformanacceptableresponse. Youcanminimizethisproblemby keepingsystempromptsshort,tolessen theuser’sneedtobargein;ifyour promptsarelong,youshouldtryto presentkeyinformationearlyinthe prompt.

Whenusinghotwordbarge-in(see Table4onpage22),Lombardspeechand thestutteringeffectcanbeproblematic. Tominimizethisproblem,youshould keeprequireduserinputsveryshort.See “ControllingLombardspeechandthe stutteringeffect”onpage23formore information.

(40)

Table3.Barge-inEnabledversusDisabled (continued)

Style Advantages Disadvantages

Barge-indisabled Guaranteesthattheentireprompttext plays.Thismaybeespeciallyusefulfor applicationswithlotsoflegalnotices, advertisements,orotherinformationthat youwanttomakesurealwaysgets presentedtotheuser.

Createsa“myturn-yourturn”rhythm forthedialog.

Experienceduserscannotinterrupt prompts;however,ifthepromptsare shortenough,usersshouldnotneedto interrupt.

Usersmayexperienceturn-takingerrors. Keepingpromptsshorthelpsminimize this.

Ifenablingbarge-in,youshouldplayaninitial promptof3secondsorlonger withbarge-indisabledtogive thesystem timetocalibrateechocancellation. Comparingbarge-indetectionmethods

Tousebarge-ineffectively,it isimportanttounderstandhow thesystem determineswhentostopaninterruptedprompt.ForWebSphereVoiceServer thedefaultbarge-indetectionmethod isspeech.

Table4 comparestheavailable barge-indetectionmethods.

Table4.Barge-indetectionmethods Barge-indetection

method Description Advantages Disadvantages

hotword Audiooutputstopsonlyafter thesystemdeterminesthat theuserhasspokena completewordorphrasethat isvalidinacurrentlyactive grammar.

Resistanttoaccidental interruptions,suchas,those causedbycoughing,

muttering,orusingthesystem inanenvironmentwithloud ambientconversation.

Increasedincidenceof Lombardspeechandthe “stutteringeffect”(seenext section);however,youcan controlthissomewhatby makingrequireduser responsesasshortaspossible. Thetimerequiredto

recognizespokeninputcan causeslowersystemresponse times.

speech Audiooutputstopsassoon asthespeechrecognition enginedetectssound.

Thisbehaviorismoretypical ofconversationbetweentwo humans.

MinimizesLombardspeech, thestutteringeffect,andthe distortiontothefirstsyllable ofuserspeechthatoften occurswhenusersbargein.

Susceptibletoaccidental interruptiondueto

backgroundnoise,non-speech vocalizations,andspeechnot intendedforthesystem.

(41)

ControllingLombardspeechandthestutteringeffect

Whenspeakinginnoisyenvironments,peopletendtoexaggeratetheirspeech orraisetheirvoicessootherscanhearthemoverthenoise. Thisdistorted speechpatternisknown asLombardspeech(namedfortheresearcherE. Lombard,who in1911wasthefirsttoreportsuchaneffect),and itcanoccur evenwhentheonlynoiseisthevoiceof anotherparticipantinthe

conversation(forexample,when onepersontriestointerruptanother,or,in thecaseofa voiceapplication, whentheuser triestobarge-inwhilethe computerisspeaking).

The“stutteringeffect”mayoccurwhena promptkeepsplayingformorethan about300 msafter theuserbeginsspeaking. Unlessusershaveundergone trainingwith thesystem,they mayinterpretthecontinuedplayingofthe promptasevidencethatthesystemdidnothear them.Inresponse,some usersmaystop whattheywere sayingand beginspeakingagain –causinga stutteringeffect.Thisstutteringmakesitvirtuallyimpossibleforthesystemto matchtheutterancetoanything inanactivegrammar,sothesystem generally treatstheinputasan“out-of-grammar”utterance,evenif whattheuser intendedtosaywasactuallyinoneoftheactivegrammars.

TocontrolLombardspeechand thestuttering effectwhenusinghotword barge-indetection,thepromptshouldstopwithin about300 msafter theuser beginstalking.Theaveragetimerequiredtoproduceasyllableofspeechis about150-200ms, thismeansthatthesystem designshouldpromoteshort userresponses(ideallynomorethantwoor threesyllables)whenusing hotwordbarge-indetection.Youshouldalsotrytokeeppromptsasshortas possibletominimizethelikelihoodthatuserswillwanttointerruptthe prompt.Ifthis isnotpossible,youshouldconsiderswitchingtospeech barge-indetection,orin extremecasesconsider disablingbarge-in. Weighinguserandenvironmentalcharacteristics

Whendecidingwhethertousebarge-inandwhichtypeofbarge-indetection ismostappropriate,youshouldconsiderhow frequentlyuserswilluseyour application(expertusersaremorelikelytobargein),andinwhat

environment(qualityofthetelephone connection,generalnoiselevel,etc.). Ingeneral,youshouldenable barge-infordeployedapplications.However,if echocancellationonyour telephonyequipmentisnotgoodenough,itmight benecessarytodisablebarge-in.

Minimizingtheneedtobargein

Evenwhenthesystempermitsbarge-in,manyusersdonotliketointerrupt thesystem.Tominimizetheuser’sneedtobargein,youmight consider placingshortpauses (around0.75second)at logicalpoints duringand betweenprompts,suchasattheendofa sentenceor aftereachmenuitem. Thesebriefpauseswillgiveuserstheopportunitytobegintalkingwithout

(42)

activelyinterruptingthesystem.Insystemsfor whichbarge-inhasbeen disabled,youcansimulatebarge-inbyenabling recognitionduringthese pauses.Besurenottoproduceaturn-takingtoneattheendofthese “recognitionwindows”because speechatthesetimesisoptional,not required.

Usingaudioformatting

Ifyouneedtotemporarilydisablebarge-in(using<prompt bargein=“false”>),suchaswhilethesystemreadslegalnotices or

advertisements,youmaywanttousea uniquebackgroundsound,tone,or promptasan indicator.Forguidance,see“Applying audioformatting”on page28.

Ifyoudisablebarge-in,considerplayingatonetosignaltheuserwhenitis timetospeak.Theintroductorymessageshouldexplicitlytelluserstospeak onlyafter this“turn-taking”tone.

Note: Theuseoftonestosignaluserinputissomewhatcontroversial,with somedesignersavoidingtones basedonabelief thattonesare unnaturalinspeechandannoyingtousers.Otherscontend that effectivecomputerspeechinterfacesneednotperfectly mimichuman conversation,andthata well-designedtonecanpromoteclearand efficientturn-takingwithoutannoyance.For guidanceincreatingan effectiveturn-takingtone,see“Designingaudiotones”onpage27. Wordingprompts

Forsystemswithoutbarge-in,makepromptsasconciseaspossible.Ifa promptmustberelatively long,placethekeyinformationtowardtheendof theprompttodiscourageusersfromspeakingbefore theirturn.

Youcando thesame forsystemswith barge-in,assumingyourprompts are relativelyshort;ifthepromptsarelong,youmaydecidetomovethekey informationtothebeginningofthepromptsousersknowwhatinputto provideiftheyinterrupttheprompt.

Selecting

recorded

prompts

or

synthesized

speech

Synthesizedspeech(text-to-speechorTTS)isusefulasa placeholderduring applicationdevelopment,orwhenthedatatobe spokenis“unbounded”(not knowninadvance),whichmakesit impossibletoprerecord.

Whendeployingyourapplications,however,youshouldplantouse

professionallyrecordedpromptswheneverpossible.Usersexpectcommercial systemstousehigh-qualityrecordedspeech,andonlyrecordedspeechcan guaranteehighlynaturalpronunciationand prosody.

(43)

Creatingrecordedprompts

Thefollowingguidelineswillhelpyougeneratehigh-qualityrecorded prompts:

v Useprofessionalvoicetalent, qualityrecordingequipment,anda suitable recordingenvironment.

v Maintainconsistencyinmicrophoneplacementandrecordingarea.

v Ifpromptscontainlongnumbers,orifmanyoftheapplication’susersare notnativespeakersofthelanguageinwhichtheapplicationspeaks, considerslowingdownthespeechorexaggeratingnaturalpauses.

v Ifyouare planningondisablingbarge-in,aggressivelytrim recorded promptstoremovethebeginningandendingsilences.Ifyouareplanning onenablingbarge-inorifanotherspeechsegmentwillfollowimmediately, trimthebeginningaggressively,butleavesilenceattheendthatis

appropriatefortheendingpunctuation(500msforfinalpunctuation,250 msfornon-final punctuation).Otherwise,leaveaslittlesilenceaspossible.

v Asageneralrule,useonlyonevoice.Whenusingmultiple voices,havea cleardesigngoal (forexample,afemalevoiceforintroductionandprompts, anda malevoiceformenuchoices).Foraconsistent sound,youshould recordyour ownmessagestohandlethe<error>,<cancel> and<help> events.

v Ifavoicesegmentwillappearinphraseswith differentintonations,be sure torecordthatsegmentforeachintonation.For example,supposethe systemwillseekconfirmationofa telephonenumberusingthephrase“Was thatfourthreethree<pause>fivefivesixthree?”The“three”thatappears beforethepauseshouldhaveaslightlyfallingpitch,but the“three”that appearsbefore thequestionmarkshouldhavearisingpitch. The“three” thatappearsbetweentwoothernumbersshouldhavea steadypitch. This suggeststhatit willbe necessarytoobtainonerecordingfor eachofthe threeintonations,toobtainthehighest-qualityspeechoutput.Notethatthe developmenteffortrequiredforthismightnotbe appropriateforevery application.

v Beawareof theappropriatestresstouseineachsegmentthatyouplanto record.Iftheappropriatestresspointisnotthelastopen-classitem(which iseither anounoraverb)inthesentence,make anoteofwherethe speakershouldplacethestress.

v

Ifyouare recordingsegments thattheapplication willplaysequentially(in otherwords,willsplice),be suretochoosethesplicepointscarefully.If possible,choosesplicepointsatnaturalpausepoints.Avoidsplicepoints thatseparatearticlessuchas″a,″″an,″ and″the″fromthefollowingword (oranyothercombinationthatspeakersnormallyruntogether).

v Ifyouintendtotranslate theapplication intootherlanguages,planahead whendefiningtheaudio segmentstorecord.Youmight needtoseek assistancefromanativespeakerof thetarget language.Ingeneral,tryto avoiddefiningaudiosegmentstorecord thatareisolatednounsbecausein

(44)

manylanguagesthecorrectformfordeterminers(inEnglish,“a”,“an”and “the”)dependsonthefollowingnoun.Youshouldbe awarethatthere mightbe othercontextualdependenciesthatare importantinthetarget language.Some oftheknownissuesaregendersensitivity, orderingof recordedsegmentsandplurality.Goodplanningearly inthedefinitionof audiosegmentscanpreventunnecessaryreworkduring translation. Whenusingrecordedprompts,youcanimprovesystemperformance by prefetchingand cachingtheaudiofiles.See“Fetchingandcachingresources forimprovedperformance”onpage124.

ForDTMFprompts(forexample,“Forchecking,press1.For savings,press2. Totransfertoanagent,press3.”)usethefollowingtimingguidelines:

v Usea500 mspausebetweenitems.

v

Usea250 mspausebefore “press”.

v Nodetectable pauseafter“for”,“to”or“press”.

Forspeechprompts(forexample,“Selectchecking,savingsortransferto agent.”or“To workwithyour checkingaccountsaychecking.”)usethe followingtimingguidelines:

v Usea750 mspausebetweenitemswhentherearemorethan3 items. Whenthereareonlytwoorthreeitems,donotintroduceanyexaggerated pauses.Speakthephraseasanormalsentence.

v

Usea250 mspausebefore “say”or“select”.

v Nodetectable pauseafter“say”or“select”.

Fora mixtureofbothDTMFandspeechpromptsusethefollowingtiming guidelines:

v Usea300 to500mspauseafteraninformationalmessagethatprecedes the presentationofa menu.

v Forlongermessages,use250msfor acommatypepauseand500 msfora periodtypepause.

UsingTTSprompts

Althoughrecordedpromptsare bestformanyapplications,itisimportantto keepinmind thatitiseasiertomaintainand modifyanapplicationthatuses TTSprompts.Forthisreason, youshouldtypicallyuseTTSprompts during development.

Whenyouarereadytodeployyourapplication,userecordedpromptswhen possible.Ifpart ofasentencerequiresproductionvia TTS,itisgenerally bettertogeneratethatentiresentencewith TTStoavoidthejarring juxtapositionofrecordedandartificialspeech.Itisalsopossibletodesign sentencestopositionthedynamiccontent attheend,and toplaythedynamic

(45)

contentfollowingashortpausetoseparate thedynamicTTScontent fromthe staticrecordedcontent.Fornow,designersshouldbe cautiousin usingthis approachbecauseitisn’tclearwhetherpeoplewouldgenerallypreferhearing allTTSorthistype ofcombinationofrecordedandTTSoutput.

Handlingunboundeddata: Iftheinformationthattheapplicationneedsto speakisunbounded,youwillneedtouseTTS.Examples ofunbounded informationinclude:

v Telephonedirectories

v E-mailmessages

v Frequentlyupdatedlistsofemployeeorcustomernames,movietitles,or otherpropernouns

v Up-to-the-minutenewsstories

ImprovingTTSoutput: Youcanimprovethequalityofsynthesizedspeech outputbyusingSSMLtoprovideadditionalinformationintheinputtext.For example:

v YoucanimprovetheTTSengine’sprocessingofnumericalconstructsby usingthe<say-as>elementtospecifythedesiredpronunciation.

v YoucanimprovetheTTSengine’sprocessingofuncommonnamesby usingthe<phoneme>tag.

v Forsyn

WebSphere Voice Server for Multiplatforms. VoiceXML Programmer s Guide

WebSphere

SixthEdition(November2005)

Contents

Listoffigures . . . . . . . . . . . vii

Chapter3.VoiceXMLlanguage . . . . . 75

AppendixA.CTTScaching . . . . . . 137

AppendixE.Korean . . . . . . . . 163

figures

About

publications

Speech

Server-side

http://www.cc.ukans.edu/~acs/docs/other/cgi-with-perl.shtml

Deployment

WebSphereVoiceResponseforAIX

organized

Document

Chapter

create

typical

Transactions

VoiceXML?

advantages

create

users

Chapter

importance

e-Service

Customer

misconceptions

Major

power

Design

Analyzingusertasks

Developingthe conceptualdesign(visionclips)

Definingthe completecallflow

Prototype

Title Messagetypes Promptsandresponses

System actions

phase

Identifyingrecognition problems

Identifyinganyuserinterfacebreakdowns

Refinement

started—high-level

Selecting

Considerusingspeechif: Applicationsmaynotbesuitedtospeechif:

Deciding

Choosing

Style Advantages Disadvantages

Style Advantages Disadvantages

method Description Advantages Disadvantages

ControllingLombardspeechandthestutteringeffect

Minimizingtheneedtobargein

Selecting

Creatingrecordedprompts

UsingTTSprompts