Voice Search 2009 Conference Paolo Baggia 11 March 3rd, 2009
Loquendo Speech Technologies as key
differentiating factor
Paolo Baggia
Director of International Standards
Voice Search 2009 Conference Paolo Baggia 2
Agenda
Loquendo Today
Loquendo Products
Loquendo Speech Technologies
Automatic Speech Recognition
Text To Speech
Loquendo MRCP Server
VoxNauta Platform
Voice Search 2009 Conference Paolo Baggia 3
Company Profile
Privately held company (fully owned by Telecom Italia), founded in 2001 as
spin-off from Telecom Italia Labs, capitalizing on
30yrs experience and
expertise in voice processing
.
Global Company
, leader in Europe and South America for award-winning,
high
quality voice technologies
(synthesis, recognition, authentication and
identification) available in
26 languages
and
62 voices.
Multilingual, proprietary technologies
protected
over 100 patents worldwide
Financially robust, break-even reached in 2004
,
revenues and earnings growing year on year
Growth-plan investment
approved for
the evolution of products and services.
Offices in New York
.
Headquarters in Torino,
local representative sales offices in Rome,
Madrid, Paris, London, Munich
Flexible
: About 100 employees, plus a
vibrant ecosystem of local freelancers.
Torino Rome Madrid Paris London New York Munich
Voice Search 2009 Conference Paolo Baggia 4
International Awards
“Best Innovation in Automotive Speech Synthesis” Prize
AVIOS-SpeechTEK West 2007
“Best Innovation in Expressive Speech Synthesis” Prize
AVIOS-SpeechTEK West 2006
“Best Innovation in Multi-Lingual Speech Synthesis”
Prize AVIOS-SpeechTEK West 2005
“2008 Frost & Sullivan European Telematics and Infotainment
Emerging Company of the Year” Award
Winner of “Market leader-Best Speech Engine” Speech
Industry Award 2007 and 2008
Loquendo MRCP Server: Winner of 2008 IP Contact
Center Technology Pioneer Award
Voice Search 2009 Conference Paolo Baggia 5
Loquendo main points
A Complete set of speech technologies and voice platforms
(TTS, ASR, SV)
focus on quality & innovative features, simplifying apps
development
multilingual worldwide coverage
Extensive support for international standards
All speech-related W3C and IETF standards
A full range of integration options
APIs, standard interfaces and protocols, client-server configurations
Same technologies on a wide spectrum of platforms
Same core engine for server, desktop, embedded & mobile devices,
guarantees platform-independent sw engineering
Partnership as a key factor
Strategic alliance portfolio for each vertical market
Set of powerful tools made available to our partners for tuning and
improving speech applications, without need for costly
professional services
Voice Search 2009 Conference Paolo Baggia 6
Same Core Engine
Same Core Engines
for all versions:
Server, Multimedia and
Embedded.
Same languages (voices) are available for all versions Same APIs and support to standards (W3C, SAPI, …)
Multiplatform:
Symbian™ series 60 (7,8 and 9),
Pocket PC 2003™, CE.NET™ 4.2 and 5, Windows Mobile 2005/6, Windows Automotive™, SmartPhone™ 2003, WindRiver VxWorks™, QNX ™, Linux, Windows™ XP Embedded, Windows™ XP TabletPC Edition, Windows Vista, …The only embedded engines with
server quality and features
Voice Search 2009 Conference Paolo Baggia 7
Powered by Loquendo
solutions in vertical
markets
Telco
Automotive
and Navigation
Industry
Banking and
Insurance
Local and Central
Government
Mobile
devices
Transport
Healthcare and
Differently able
Media Center
and Set-top-box
Voice Search 2009 ConferenceFootnotes Paolo Baggia 18
Voice Search 2009 Conference Paolo Baggia 9
Loquendo:
Value chain
&
Product positioning
Windows, Linux, WinMobile, Symbian, …
O.S.
Applications
Solutions
Servers
VoxNauta
MRCP Server
Speech
Engines
LTTS
LASR
LSV
Speech Engines
Loquendo FocusHardware
Turnkey solutions:Auto Attendant, DA, Banking, CRM, Self-service, Voice Controlled Media Center Basic Resources for
application developments: Specialized lexicons, grammars Reusable Dialogue Objects
VoiceXML & CCXML Platform: For vocal applications on any network (fixed, mobile, VoIP) Turnkey MRCP (v1 & v2)
Server: For interfacing with IVRs and third party voice platforms
Speech Engines, SW only: Text-To-Speech, Automatic Speech Recognition, Speaker Verification and Identification, Language Identification
For servers, desktops,
Voice Search 2009 Conference Paolo Baggia 10
Language Coverage
Language Female Male
English US PP PPP English UK PP P Spanish (Castilian) PP PP Catalan (bilingual) P P Valencian (bilingual) P Galician (bilingual) P
American Spanish / Colombian P P
Mexican P Chilean P Argentinean P Italian PPPP PPPP French PP PP Canadian French P P Portuguese P P Brazilian Portuguese PP P German PP P Dutch P P Greek PP Danish P P Finnish P P Swedish P P Russian P P Polish P P Turkish P P Chinese PP Esperanto (robotic) P
Voice Search 2009 ConferenceFootnotes Paolo Baggia 111
Loquendo Speech Technologies
Voice Search 2009 Conference Paolo Baggia 12
Text To Speech
Voice Search 2009 Conference Paolo Baggia 13
Loquendo TTS – Text To Speech
Multi-language
: 26 languages, 62 voices – and more coming!
Truly Natural
and
Expressive sounding
voices for highly…
Emotional pronunciation
:
Commonly used phrases such as
“How are you?”
or
“You’ve got to be
kidding!”
and paralinguistic events such as yawning, coughing and
laughing and to confirm, exclaim, thank, express doubt, etc.)
Reading
Styles
and specialized support (e.g. addresses, SMS, etc.)
Audio Mixer
: to have complete control over all audio sources (including
sampling rates and coding) – audio files can be mixed, looped, faded in/out,
and synchronized with synthetic speech
Mixed Language
Capability:
Language Guesser
: that automatically identifies the language of any
text so that by means of…
Phonetic Mapping
: any of Loquendo‟s voices can correctly pronounce
any foreign word (e.g. English words in an Italian text)
Voice Creator
tool for new voice generation
TTS Director
for designing effective prompts and
Lexicon Manager
tool for
creating personalized user lexicons
Voice Search 2009 Conference Paolo Baggia 14
Loquendo TTS Director
is a complete development environment for
creating your own voice prompts, and for designing your own
personalised voices.
Loquendo TTS Director
Target:
clients
wanting to edit their
prompts at a more
complex level and
adjust parameters
with far more
precision, as well as
to add pauses,
phonetic
transcriptions, and
tailor-made lexicons
for atypical
pronunciation.
Voice Search 2009 Conference Paolo Baggia 15
Loquendo TTS Tools: Lexicon Manager
Loquendo Lexicon Manager
helps to define the pronunciation of foreign
language words, toponyms, proper names, acronyms, abbreviations,
etc.
The virtual keyboard
helps to write the
phonetic
transcriptions
Sections for different
languages in the
same lexicon
Future support for
the PLS
(Pronunciation
Lexicon
Specification)
standard format
Voice Search 2009 Conference Paolo Baggia 16
Voice Search 2009 Conference Paolo Baggia 17
A reliable
speaker independent
technology
Broad Vocabulary & Flexible Recognition
- recognizes up to 1,000,000
words; supports isolated and continuous speech even in the noisiest
environment such as wireless
Highly Accurate Speech Recognition
- thanks to integration of neural
networks and hidden Markov models, and detailed acoustic-phonetic
units trained on large speech corpora
Multi-language speech recognition
(20 languages)
Barge-in
capability to guarantee high reactivity and robustness to
noise and background speech;
Garbage rules
definition to match
arbitrary spoken sequences not modeled by the grammar
Powers
Loquendo Speaker Verification
Tool package that automatically analyzes data collected in the field to
improve service performances, including:
Acoustic Model Adaptation
(to the environment, speaker, channel
adaptation, etc.)
Phonetic Learning
to identify frequent formulations that have not been
covered and additional pronunciation variants
Loquendo ASR –
Voice Search 2009 Conference Paolo Baggia 18
→ Extended Standards Support
future-proofs customer investments:
– MRCP compliancy (for client-server architectures);
– complete support for grammar standards, such as W3C SRGS & SISR,
enables optimization for VoiceXML applications;
– support for AURORA DSR (for distributed speech recognition);
→ ASR efficiency
reduces hardware costs: lower PC-power requirements enables
more recognition channels to run simultaneously.
Loquendo ASR is so efficient that core ASR engine is used on embedded
platforms. e.g smartphones, navigation devices.
→
Loquendo Speaker Verification
- an extension to ASR module - combines both
speaker and knowledge verification (i.e. matches „what was said‟ with „who said
it‟).
→ A highly accurate phonetic transcriber
enhances recognition results.
Loquendo ASR based on same phonetic transcriber as Loquendo TTS
-tested both automatically and by painstaking human listening
.
→ Either HMM
(Hidden Markov Models)
or NN
(Neural Networks) for core
algorithms.
Loquendo ASR combines both approaches, giving high performance, and
increased efficiency with large vocabularies.
Loquendo ASR –
Voice Search 2009 Conference Paolo Baggia 19
Learn to distinguish a best-in-breed ASR from those
not up to the job
→ Key Factors For Success
• ASR Tuning
e.g. to the environment, to the speaker
• Tools enabling ASR to learn from the field
- avoids need for costly
professional services
e.g. Acoustic Model Adaptation Tool, Phonetic Learning Tool,
• De-noising module
- improves performance in noisy environments by
cleaning the signal while computing spectral parameters
→ Loquendo ASR specialized tasks, including:
• „Word Spotting‟
- recognizes keywords in audio streams;
• ‟Garbage Rules‟ definition
–
enables free speech
,
simplifying
grammars and application development
, matches expressions like
“Um, Er”, "Well", "Let me think", etc., giving greater flexibility and a
more natural interaction experience.
Voice Search 2009 Conference Paolo Baggia 20
Loquendo ASR SDK Tool Suite
Evaluation Kit
Tool to select recorded audio material:
Offline execution of recognition tests
Statistics about recognition performance
Semantic parsing over text strings
Acoustic Model Adaptation
Tool to efficiently adapt the recognition engine to
difficult conditions, such as:
different audio channels (wireless, multimedia microphone, PDA, etc)
different environments (in-car, public areas, factory)
application-dependent vocabulary (specific jargon, such as aeronautical terms)
ways of speaking (regional accents, fast speech)
Phonetic Learning
Tool to improve performance using data collected in
the field to deal with
:
additional linguistic formulations
complex speech recognition applications
Voice Search 2009 Conference Paolo Baggia 21
Loquendo MRCP Server
Voice Search 2009 Conference Paolo Baggia 22
Loquendo MRCP Server
Optimized
client-server solution
for the
large-scale deployment
of
speech technologies in the telephony field, such as call centers, CRM,
news and email-reading, self-service applications, etc.
Full benefits of Loquendo‟s high-performance technologies using
standard protocols and languages
.
Easy-to-integrate
through the standard IETF protocol MRCP (Media
Resource Control Protocol). Both MRCP versions are supported:
MRCP v1 (RFC 4463)
based on RTSP/RTP and
MRCP v2,
the new
IETF protocol, based on SIP/RTP offering the new audio recording and
Speaker Verification
functionalities.
Loquendo MRCP Server is
fully configurable
and makes
software
component status available to
both its onboard
Management Console
and external
Management Systems
through the
SNMP protocol
.
Its
modular architecture
leaves Loquendo MRCP Server independent
from ASR/TTS engine releases
Voice Search 2009 Conference Paolo Baggia 23
Media Resource Control Protocol MRCP are IETF standards
MRCP v1 is RFC 4463,
http://www.ietf.org/rfc/rfc4463.txt
MRCP v2 is Internet Draft,
http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-17
Provides a mechanism to control Speech Recognition and
Text-To-Speech servers in distributed environments, allowing for
implementation of distributed IVR platforms
Standard for Speech technologies integration: cut costs and
protect investments
MRCP Standard benefits
Standards are key drivers for the market.
Support for Standards is one of the key
considerations solution providers should make when
selecting a speech system/platform
We do not only adopt standards.
We drive them!
Voice Search 2009 Conference Paolo Baggia 24
VoxNauta Platforn
Voice Search 2009 Conference Paolo Baggia 25
Multi-Purpose Server Solution
VoIP SIP/RTP interface is based on a SW-only implementation, no additional cost for HW boards
DSR (Distributed Speech Recognition)
support minimizes the network load for voice transmission and allows the creation of
multimodal applications on GPRS networks
SIP RFC 4240 (NETANN) support is required by IMS (IP Multimedia Subsystem)
architecture
Video applications are possible through a subset of VoiceXML 3.0 (2009)
Highly scalable and flexible solution allows a wide range of deployed applications with
support of advanced management for both telcos and enterprises
VoxNauta 7.0 platform offers all the
key drivers
for voice applications,
multimodal and multimedia, to create both telco and enterprise solutions:
Voice Search 2009 Conference Paolo Baggia 26
VoxNauta 7.0 – Key Points
A renewed architecture – The last release of Loquendo speech platform, is based on a renewed architecture that exploits the MRCP v2 protocol for technology integration with a pervasive modularity that ensures the highest efficiency.
Full standard compliance – VoxNauta 7.0 and Loquendo technologies implement ALL the most advanced standards in the speech area: VoiceXML 2.1, CCXML 1.0, MRCP v2, SGRS, SISR 1.0, SSML 1.0.
VoiceXML Forum certified – VoxNauta 7.0 has been formally certified by the VoiceXML Forum to be VoiceXML 2.0 compliant.
SW technology – The SW has been re-engineered to allow Operating System independency to a great extent, without any loss in efficiency.
Call control – The new platform incorporates a CCXML interpreter that allows complete service development and control in the platform back-end (e.g. any third party
application server)
Service development – The new version of the Loquendo VoiceXML interpreter supports both 2.0 and 2.1 giving increased possibilities to the customer for service deployment. Management and configuration – The VoxNauta Management Console collapses in a
single user friendly hierarchic graphic interface any OA&M needs. In addition
proprietary provisioning mechanisms are no longer required, relying on standard URI access of files, grammars and lexicon, whenever required.
Voice Search 2009 Conference Paolo Baggia 27
VoxNauta Management and Reporting
Management Console
:
Configuration, Logging, Monitoring, Reporting
Multiplatform graphic tool
(Win and Linux) based on SNMP Network control of multiple platforms
Trap Viewer
:
Real-time visualization of SNMP traps sent by VoxNauta components in case of errors
Reporting
:
Statistics: calls, duration, etc
Service Log Analyzer
:
In-depth analysis of VoiceXML execution
Voice Search 2009 Conference Paolo Baggia 28
VoxNauta Compliance with Standards
Full Standard Compliance -
complete support of all the relevant speech IETF and
W3C standards
VoiceXML – Complete support of VoiceXML 2.0 and 2.1, certified by the VoiceXML
Forum Certification Program
CCXML - Call Control XML: Standard for Call Control
ASR – the W3C SRGS 1.0 (Speech Recognition Grammar Specification) grammar
formats in both XML and ABNF (Augmented Backus-Naur Form) formats, and also complete support of SISR 1.0 (Semantic Interpretation for Speech Recognition)
DTMF – even DTMF applications can take advantage of the SRGS 1.0 and SISR 1.0
standards, so that a voice/DTMF application can be given uniform results from voice and DTMF interactions
TTS – the W3C SSML (Speech Synthesis Markup Language) is the standard for
enhancing text-to-speech rendering and for accessing the many unique features of Loquendo TTS
EMMA – published by the MMI (Multimodal Interaction) group of the W3C, it‟s a
language for returning different modality results to the application (voice, gesture, keyboard)
Pronunciation Lexicon: PLS (Pronunciation Lexicon Specification) standard for TTS and ASR (Loquendo is editor of this specification)
Voice Search 2009 ConferenceFootnotes Paolo Baggia 291
Loquendo Speech Technologies
Voice Search 2009 Conference Paolo Baggia 30
Loquendo is a global provider of
high quality and reliable TTS,
ASR and Speaker Verification
worldwide, covering 26 languages,
62 voices – and rising!
TTS
: our most renowned best-in-breed product
ASR
: providing innovative features such as Garbage techniques
Full standards support. We strive for an open world:
standards
drive the speech industry
. Customers are free to choose us
without proprietary bindings
Highly professional, customer-oriented technical assistance
Price competitive
Why Choose Loquendo?
Voice Search 2009 Conference Paolo Baggia 31
Looking Over the Engine, Checking Under the Hood
The Key to Success in Voice Applications and CRM:
A natural, accurate, well-designed speech interface
A first-rate ASR and TTS
to power it.
Your voice-enabled service is the face your company
presents to your customers!
The naturalness and user-friendliness of your voice
interface is key to enhancing customer experience.
Voice Search 2009 Conference Paolo Baggia 32
For more information please:
Keep an eye on:
www.loquendo.com
Contact:
paolo.baggia@loquendo.com
Loquendo S.p.A.
745 Fifth Ave, 27th Floor New York, NY 10151 USA Tel. +1 212.310.9075 Fax. +1 212.310.9001 www.loquendo.com
THANK YOU
Loquendo S.p.A. Via Arrigo Olivetti, 6 10148 TORINOItaly
Tel. +39 011 291 3111 Fax +39 011 291 3199 www.loquendo.com Keep in touch with Loquendo news, subscribe to
the Loquendo Newsletter
Try our interactive TTS demo: insert your text, choose a language, and listen
The latest News at a click
Consult the Loquendo Newsletter online Keep up to date on events and initiatives