VECSYS LIMSI
ARCHITECTURE
Samir Bennacef
Vecsys
Centralized Architecture
Semantic Frame sentence text results queries Semantic frame Speech Recognizer Semantic Analyzer Sentence Generator Dialog Manager Information Retrieval Speech Synthesis Language models Acoustic models Caseframe GrammarTask Model Database
Generation Grammar Unit Dictionary Telephone Interface
Telephone Interface
•
phone
program
– Input: commands and speech
– Output: events and speech
• Recording and playback, DTMF detection and
generation, pickup, hangup and call transfert
• Hardware echo cancellation
• Barge-in based on adaptative speech detection
• NMS QX2000 hardware
Speech Recognizer
• Cepstral Features Computation:
sig2mfcc
– Input: speech recorded by the
phone
program
– Output: 13 component cepstral vector every 10
ms on a 8kHz bandwidth.
• Speech recognizer:
nsearch
– Inputs: commands and cepstral coefficients
– Output: recognized text
Semantic Analyzer
• Lexical normalization and labelling:
sentprocess
– Input: recognized sentence
– Output: labelled sentence
• Caseframe analysis:
cases
– Input: labelled sentence
– Output: semantic frame
Dialog Manager
•
Dialog
– Input: semantic frame resulting from
cases
– Output: semantic frame to be converted in
natural language
• Contextual understanding
• Database query generation
• Semantic frame generation
Natural Language Generator
•
Genere
– Input: semantic frame resulting from
dialog
– Output: natural language sentence
Information Retrieval Interface
•
Dbserver
– Input: SQL query
– Output: database result
• Query parsing and translating
• Retrieves informations from the target database
• Provides the result table
Speech Synthesis System
•
Syn
– Input: sentence resulting from
genere
– Output: speech signal which is played by the
telephone interface
• Use of unit dictionary
• Select the
best
sequence of units using a
dynamic programming algorithm
C-shell script
• # --- Phone interface --- #
rsh $remote $bin/phone.exe –h$dialhost $dialport –t70 –x8192 –n2 \ –l2 –g –f$cfg/cta.cfg –a$data&
• # --- Speech recognizer loading --- #
• # ---- SigToCep ---- #
set CEP = \
$bin/sig2mfcc -w240 -s80 -l20 -n12 -r8000 -b0:3500 -c -en0 -0 \
--$fifo/tosentrec.fifo$i $fifo/tosig2mfcc.fifo$i -:
• # ---- Speech Recognizer ----#
set RECO = \
$bin/nsearch -@$phones -d$fifo/tosentrec.fifo$i -t \
-p${plist}:$stbl -s0:160:0:f -l$voc -z3 -w4:25 -n1 -q63,12:8:3 \ -zb$tg -zw30 -zr -xg$gsl -zy$clst -sw50 -sh25000 \
-cmr${cepmean}:0.996 -en4.5 -- $hmm -xf
• $bin/recocheck -r $fifo/torecord.fifo$i -c$CEP –d$RECO \
-t$fifo/fromdial.fifo$i -v < \
•
#--- Semantic Analyzer and Dialogue loading ---#
$bin/
sentprocess
-k -t -d -c -v2 $dial/rules.txt < \
$fifo/
tocases.fifo
$i | \
•
$bin/
cases
-k -o -m -v $dial/caseframe.txt | \
•
$bin/
dialogue
-i -v1 $dial/task.txt $dial/dial.arg \
-tr$fifo/
pushtotalk.fifo
$i -fp$fifo/
fromplay.fifo
$i \
-fn$fifo/
todial.fifo
$I -rf$tmp/reco.tmp$i \
-e$fifo/
fromdial.fifo
$i -fg$fifo/
fromgenere.fifo
$i \
-tt$fifo/
todb.fifo
$i -ft$fifo/
fromdb.fifo
$i | \
•
$bin/
genere
$dial/genere.txt -f$fifo/
fromgenere.fifo
$i –v
-l"$logcmd" -s$fifo/torecord.fifo$i -db$fifo/fromplay.fifo$i \
-dt$fifo/todial.fifo$i -df$fifo/fromdial.fifo$i -dp$dialpid \
-kf$fifo/fromdbconn.fifo$i -kt$fifo/todb.fifo$i \
-kw$synt/sig/waitdb.sig -kl$synt/sig/wait.sig -v –r \
< $fifo/fromphone.fifo$i > $fifo/tophone.fifo$i &
• # --- Database Loading --- #
$bin/dbserver -t$fifo/todbtarg.fifo$i -f$fifo/fromdbtarg.fifo$i \
-c$db/table1.txt -s$db/table2.txt -p$db/table3.txt \
-d$fifo/fromdbconn.fifo${i}:120 -m10 -a -v2 \
< $fifo/todb.fifo$i > $fifo/fromdb.fifo$i &
• # --- Synthesis loading --- #
$bin/syn -s${sig}:2 -l$wd -w4:2:0 -o$fifo/toplay.fifo$i -c \
$synt/wdlist.lst &
How the system works
• server.csh: telephone interface loading
server.csh: speech recognizer loading server.csh: dialog loading
server.csh: dispatcher loading server.csh: dbserver loading server.csh: synthesis loading
• telephone: pickup
telephone: line number=[0] telephone: play
telephone: get dtmf [*]
• dialogue:
frame: { concept: (acte formalite-ouverture). }
• genere: Quel voyage souhaitez-vous effectuer ?
• telephone: play
telephone: end of play telephone: recording
Lille -> $place
matin -> *matin
$place(Paris) $place(Lille) *to(pour) demain(demain)
*matin(matin)
•
cases
: <defaut>
{
place: Paris.
place: Lille.
departure-period: *matin.
departure-date: demain.
}
•
dialogue
: request=[SELECT from, deph, to, arrh, chg, day,
stopa, stopah, stopd, stopdh, stopdur, type WHERE
from=Paris AND to=Lille AND day=17/5/101 AND arrh ~= 1000]
•
dbserver
: target query=[00043 00000001 ? 12 FRPAR FRLIL 17
MAY 1000]
dbserver: result=[1 ( from deph to arrh chg day stopa
nb-trains: (value 1).
concept2: (acte confirmation) (value hour). from-place: (value Paris-Gare-du-Nord). to-place: (value Lille-Flandres).
departure-wday: (value jeudi). departure-day: (value 17/5/101). departure-period: (value *matin). stop: (value 0).
sched: (dep 0858) (arr 0959). }
• genere:Le matin , jeudi dix-sept mai vous avez un train de
Paris-Nord `a Lille-Flandres `a huit heures cinquante-huit arrivant `a neuf heures cinquante-neuf. Cet horaire vous convient-il ?
• nsearch: <s> oui </s>
• cases: <defaut> { mode: *affirmatif.}
• dialogue: { concept: (acte relance) (value retour). }
• genere: Souhautez-vous le retour ?
• nsearch: recognized string: <s> non merci </s>
• genere:Vous avez donc un aller Paris-Nord Lille-Flandres le jeudi
dix-sept mai d'epart huit heures cinquante-huit, arriv'ee neuf heures cinquante-neuf. Souhaitez-vous un autre trajet ?
Distributed Architecture
…
Network (TCP/UDP) Network (TCP/UDP) Client Application m Client Application m Application Programming Interface(Data exchange Protocols) (Service Name-Address Resolution) Application Programming Interface
(Data exchange Protocols) (Service Name-Address Resolution)
Client Application 2 Client Application 2 Client Application 1 Client Application 1 Host n (Slave) Recognizer Recognizer Vnetd Daemon Other Services… Dialogue
Dialogue SpeechsynthesisSpeech synthesis Recognizer Recognizer Other Services… Dialogue Dialogue Speech synthesis Speech synthesis Host1 (Master) Vnetd Daemon Net Audio server Net Audio server
Services
• 1. Audio
• 2. Speech recognition
• 3. Dialog (understandig, dialog and generation)
• 4. Information retrieval
• 5. Speech synthesis
Galaxy Communicator
• Similarities between GC and Oasis
– A distributed client/server architecture
– A central manager :
hub
in GC and the
application manager
in Oasis
– A set of services listening for client connections
and requests
Compliant
• Include the GC server functions in all services:
– make initialization
– include a dispatch function
– invoke the hub by using
GalIO_Comm
family
functions
Tests and Evaluation
• The speech recognizer only
• The dialog connected to the database
• The dialog with the recognizer
• The whole system