Loquendo Speech Technologies as key differentiating factor

(1)

Voice Search 2009 Conference Paolo Baggia 11 March 3rd, 2009

Loquendo Speech Technologies as key

differentiating factor

Paolo Baggia

Director of International Standards

(2)

Voice Search 2009 Conference Paolo Baggia 2

Agenda

Loquendo Today

Loquendo Products

Loquendo Speech Technologies

Automatic Speech Recognition

Text To Speech

Loquendo MRCP Server

VoxNauta Platform

(3)

Voice Search 2009 Conference Paolo Baggia 3

Company Profile



Privately held company (fully owned by Telecom Italia), founded in 2001 as

spin-off from Telecom Italia Labs, capitalizing on

30yrs experience and

expertise in voice processing

.



Global Company

, leader in Europe and South America for award-winning,

high

quality voice technologies

(synthesis, recognition, authentication and

identification) available in

26 languages

and

62 voices.



Multilingual, proprietary technologies

protected

over 100 patents worldwide



Financially robust, break-even reached in 2004

,

revenues and earnings growing year on year



Growth-plan investment

approved for

the evolution of products and services.



Offices in New York

.

Headquarters in Torino,

local representative sales offices in Rome,

Madrid, Paris, London, Munich



Flexible

: About 100 employees, plus a

vibrant ecosystem of local freelancers.

Torino Rome Madrid Paris London New York Munich

(4)

Voice Search 2009 Conference Paolo Baggia 4

International Awards

“Best Innovation in Automotive Speech Synthesis” Prize

AVIOS-SpeechTEK West 2007

“Best Innovation in Expressive Speech Synthesis” Prize

AVIOS-SpeechTEK West 2006

“Best Innovation in Multi-Lingual Speech Synthesis”

Prize AVIOS-SpeechTEK West 2005

“2008 Frost & Sullivan European Telematics and Infotainment

Emerging Company of the Year” Award

Winner of “Market leader-Best Speech Engine” Speech

Industry Award 2007 and 2008

Loquendo MRCP Server: Winner of 2008 IP Contact

Center Technology Pioneer Award

(5)

Voice Search 2009 Conference Paolo Baggia 5

Loquendo main points



A Complete set of speech technologies and voice platforms

(TTS, ASR, SV)



focus on quality & innovative features, simplifying apps

development



multilingual worldwide coverage



Extensive support for international standards



All speech-related W3C and IETF standards



A full range of integration options



APIs, standard interfaces and protocols, client-server configurations



Same technologies on a wide spectrum of platforms



Same core engine for server, desktop, embedded & mobile devices,

guarantees platform-independent sw engineering



Partnership as a key factor



Strategic alliance portfolio for each vertical market



Set of powerful tools made available to our partners for tuning and

improving speech applications, without need for costly

professional services

(6)

Voice Search 2009 Conference Paolo Baggia 6

Same Core Engine

Same Core Engines

for all versions:

Server, Multimedia and

Embedded.

Same languages (voices) are available for all versions Same APIs and support to standards (W3C, SAPI, …)

Multiplatform:

Symbian™ series 60 (7,8 and 9)

,

Pocket PC 2003™, CE.NET™ 4.2 and 5, Windows Mobile 2005/6, Windows Automotive™, SmartPhone™ 2003, WindRiver VxWorks™, QNX ™, Linux, Windows™ XP Embedded, Windows™ XP TabletPC Edition, Windows Vista, …

The only embedded engines with

server quality and features

(7)

Voice Search 2009 Conference Paolo Baggia 7

Powered by Loquendo

solutions in vertical

markets

Telco

Automotive

and Navigation

Industry

Banking and

Insurance

Local and Central

Government

Mobile

devices

Transport

Healthcare and

Differently able

Media Center

and Set-top-box

(8)

Voice Search 2009 ConferenceFootnotes Paolo Baggia 18

(9)

Loquendo:

Value chain

&

Product positioning

Windows, Linux, WinMobile, Symbian, …

O.S.

Applications

Solutions

Servers

VoxNauta

MRCP Server

Speech

Engines

LTTS

LASR

LSV

Speech Engines

Loquendo Focus

Hardware

Turnkey solutions:

Auto Attendant, DA, Banking, CRM, Self-service, Voice Controlled Media Center Basic Resources for

application developments: Specialized lexicons, grammars Reusable Dialogue Objects

VoiceXML & CCXML Platform: For vocal applications on any network (fixed, mobile, VoIP) Turnkey MRCP (v1 & v2)

Server: For interfacing with IVRs and third party voice platforms

Speech Engines, SW only: Text-To-Speech, Automatic Speech Recognition, Speaker Verification and Identification, Language Identification

For servers, desktops,

(10)

Language Coverage

Language Female Male

English US PP PPP English UK PP P Spanish (Castilian) PP PP Catalan (bilingual) P P Valencian (bilingual) P Galician (bilingual) P

American Spanish / Colombian P P

Mexican P Chilean P Argentinean P Italian PPPP PPPP French PP PP Canadian French P P Portuguese P P Brazilian Portuguese PP P German PP P Dutch P P Greek PP Danish P P Finnish P P Swedish P P Russian P P Polish P P Turkish P P Chinese PP Esperanto (robotic) P

(11)

Voice Search 2009 ConferenceFootnotes Paolo Baggia 111

Loquendo Speech Technologies

(12)

Voice Search 2009 Conference Paolo Baggia 12

Text To Speech

(13)

Voice Search 2009 Conference Paolo Baggia 13

Loquendo TTS – Text To Speech



Multi-language

: 26 languages, 62 voices – and more coming!



Truly Natural

and

Expressive sounding

voices for highly…



Emotional pronunciation

:



Commonly used phrases such as

“How are you?”

or

“You’ve got to be

kidding!”

and paralinguistic events such as yawning, coughing and

laughing and to confirm, exclaim, thank, express doubt, etc.)



Reading

Styles

and specialized support (e.g. addresses, SMS, etc.)



Audio Mixer

: to have complete control over all audio sources (including

sampling rates and coding) – audio files can be mixed, looped, faded in/out,

and synchronized with synthetic speech



Mixed Language

Capability:



Language Guesser

: that automatically identifies the language of any

text so that by means of…



Phonetic Mapping

: any of Loquendo‟s voices can correctly pronounce

any foreign word (e.g. English words in an Italian text)



Voice Creator

tool for new voice generation



TTS Director

for designing effective prompts and

Lexicon Manager

tool for

creating personalized user lexicons

(14)

Voice Search 2009 Conference Paolo Baggia 14



Loquendo TTS Director

is a complete development environment for

creating your own voice prompts, and for designing your own

personalised voices.

Loquendo TTS Director

Target:

clients

wanting to edit their

prompts at a more

complex level and

adjust parameters

with far more

precision, as well as

to add pauses,

phonetic

transcriptions, and

tailor-made lexicons

for atypical

pronunciation.

(15)

Voice Search 2009 Conference Paolo Baggia 15

Loquendo TTS Tools: Lexicon Manager



Loquendo Lexicon Manager

helps to define the pronunciation of foreign

language words, toponyms, proper names, acronyms, abbreviations,

etc.



The virtual keyboard

helps to write the

phonetic

transcriptions



Sections for different

languages in the

same lexicon



Future support for

the PLS

(Pronunciation

Lexicon

Specification)

standard format

(16)

Voice Search 2009 Conference Paolo Baggia 16

(17)



A reliable

speaker independent

technology



Broad Vocabulary & Flexible Recognition

- recognizes up to 1,000,000

words; supports isolated and continuous speech even in the noisiest

environment such as wireless



Highly Accurate Speech Recognition

- thanks to integration of neural

networks and hidden Markov models, and detailed acoustic-phonetic

units trained on large speech corpora



Multi-language speech recognition

(20 languages)



Barge-in

capability to guarantee high reactivity and robustness to

noise and background speech;

Garbage rules

definition to match

arbitrary spoken sequences not modeled by the grammar



Powers

Loquendo Speaker Verification

Tool package that automatically analyzes data collected in the field to

improve service performances, including:

Acoustic Model Adaptation

(to the environment, speaker, channel

adaptation, etc.)

Phonetic Learning

to identify frequent formulations that have not been

covered and additional pronunciation variants

Loquendo ASR –

(18)

Voice Search 2009 Conference Paolo Baggia 18

→ Extended Standards Support

future-proofs customer investments:

– MRCP compliancy (for client-server architectures);

– complete support for grammar standards, such as W3C SRGS & SISR,

enables optimization for VoiceXML applications;

– support for AURORA DSR (for distributed speech recognition);

→ ASR efficiency

reduces hardware costs: lower PC-power requirements enables

more recognition channels to run simultaneously.

Loquendo ASR is so efficient that core ASR engine is used on embedded

platforms. e.g smartphones, navigation devices.

→

Loquendo Speaker Verification

- an extension to ASR module - combines both

speaker and knowledge verification (i.e. matches „what was said‟ with „who said

it‟).

→ A highly accurate phonetic transcriber

enhances recognition results.

Loquendo ASR based on same phonetic transcriber as Loquendo TTS

-tested both automatically and by painstaking human listening

.

→ Either HMM

(Hidden Markov Models)

or NN

(Neural Networks) for core

algorithms.

Loquendo ASR combines both approaches, giving high performance, and

increased efficiency with large vocabularies.

Loquendo ASR –

(19)

Voice Search 2009 Conference Paolo Baggia 19

Learn to distinguish a best-in-breed ASR from those

not up to the job

→ Key Factors For Success

• ASR Tuning

e.g. to the environment, to the speaker

• Tools enabling ASR to learn from the field

- avoids need for costly

professional services

e.g. Acoustic Model Adaptation Tool, Phonetic Learning Tool,

• De-noising module

- improves performance in noisy environments by

cleaning the signal while computing spectral parameters

→ Loquendo ASR specialized tasks, including:

• „Word Spotting‟

- recognizes keywords in audio streams;

• ‟Garbage Rules‟ definition

–

enables free speech

,

simplifying

grammars and application development

, matches expressions like

“Um, Er”, "Well", "Let me think", etc., giving greater flexibility and a

more natural interaction experience.

(20)

Voice Search 2009 Conference Paolo Baggia 20

Loquendo ASR SDK Tool Suite



Evaluation Kit

Tool to select recorded audio material:

 Offline execution of recognition tests

 Statistics about recognition performance

 Semantic parsing over text strings



Acoustic Model Adaptation

Tool to efficiently adapt the recognition engine to

difficult conditions, such as:

 different audio channels (wireless, multimedia microphone, PDA, etc)

 different environments (in-car, public areas, factory)

 application-dependent vocabulary (specific jargon, such as aeronautical terms)

 ways of speaking (regional accents, fast speech)



Phonetic Learning

Tool to improve performance using data collected in

the field to deal with

:

 additional linguistic formulations

 complex speech recognition applications

(21)

Voice Search 2009 Conference Paolo Baggia 21

Loquendo MRCP Server

(22)

Voice Search 2009 Conference Paolo Baggia 22

Loquendo MRCP Server



Optimized

client-server solution

for the

large-scale deployment

of

speech technologies in the telephony field, such as call centers, CRM,

news and email-reading, self-service applications, etc.



Full benefits of Loquendo‟s high-performance technologies using

standard protocols and languages

.



Easy-to-integrate

through the standard IETF protocol MRCP (Media

Resource Control Protocol). Both MRCP versions are supported:

MRCP v1 (RFC 4463)

based on RTSP/RTP and

MRCP v2,

the new

IETF protocol, based on SIP/RTP offering the new audio recording and

Speaker Verification

functionalities.



Loquendo MRCP Server is

fully configurable

and makes

software

component status available to

both its onboard

Management Console

and external

Management Systems

through the

SNMP protocol

.



Its

modular architecture

leaves Loquendo MRCP Server independent

from ASR/TTS engine releases

(23)

Voice Search 2009 Conference Paolo Baggia 23



Media Resource Control Protocol MRCP are IETF standards



MRCP v1 is RFC 4463,

http://www.ietf.org/rfc/rfc4463.txt



MRCP v2 is Internet Draft,

http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-17



Provides a mechanism to control Speech Recognition and

Text-To-Speech servers in distributed environments, allowing for

implementation of distributed IVR platforms



Standard for Speech technologies integration: cut costs and

protect investments

MRCP Standard benefits

Standards are key drivers for the market.

Support for Standards is one of the key

considerations solution providers should make when

selecting a speech system/platform

We do not only adopt standards.

We drive them!

(24)

Voice Search 2009 Conference Paolo Baggia 24

VoxNauta Platforn

(25)

Voice Search 2009 Conference Paolo Baggia 25

Multi-Purpose Server Solution

 VoIP SIP/RTP interface is based on a SW-only implementation, no additional cost for HW boards

 DSR (Distributed Speech Recognition)

support minimizes the network load for voice transmission and allows the creation of

multimodal applications on GPRS networks

 SIP RFC 4240 (NETANN) support is required by IMS (IP Multimedia Subsystem)

architecture

 Video applications are possible through a subset of VoiceXML 3.0 (2009)

 Highly scalable and flexible solution allows a wide range of deployed applications with

support of advanced management for both telcos and enterprises

VoxNauta 7.0 platform offers all the

key drivers

for voice applications,

multimodal and multimedia, to create both telco and enterprise solutions:

(26)

Voice Search 2009 Conference Paolo Baggia 26

VoxNauta 7.0 – Key Points

A renewed architecture – The last release of Loquendo speech platform, is based on a renewed architecture that exploits the MRCP v2 protocol for technology integration with a pervasive modularity that ensures the highest efficiency.

Full standard compliance – VoxNauta 7.0 and Loquendo technologies implement ALL the most advanced standards in the speech area: VoiceXML 2.1, CCXML 1.0, MRCP v2, SGRS, SISR 1.0, SSML 1.0.

VoiceXML Forum certified – VoxNauta 7.0 has been formally certified by the VoiceXML Forum to be VoiceXML 2.0 compliant.

SW technology – The SW has been re-engineered to allow Operating System independency to a great extent, without any loss in efficiency.

Call control – The new platform incorporates a CCXML interpreter that allows complete service development and control in the platform back-end (e.g. any third party

application server)

Service development – The new version of the Loquendo VoiceXML interpreter supports both 2.0 and 2.1 giving increased possibilities to the customer for service deployment. Management and configuration – The VoxNauta Management Console collapses in a

single user friendly hierarchic graphic interface any OA&M needs. In addition

proprietary provisioning mechanisms are no longer required, relying on standard URI access of files, grammars and lexicon, whenever required.

(27)

Voice Search 2009 Conference Paolo Baggia 27

VoxNauta Management and Reporting



Management Console

:

Configuration, Logging, Monitoring, Reporting

Multiplatform graphic tool

(Win and Linux) based on SNMP Network control of multiple platforms



Trap Viewer

:

Real-time visualization of SNMP traps sent by VoxNauta components in case of errors



Reporting

:

Statistics: calls, duration, etc



Service Log Analyzer

:

In-depth analysis of VoiceXML execution

(28)

Voice Search 2009 Conference Paolo Baggia 28

VoxNauta Compliance with Standards

Full Standard Compliance -

complete support of all the relevant speech IETF and

W3C standards

 VoiceXML – Complete support of VoiceXML 2.0 and 2.1, certified by the VoiceXML

Forum Certification Program

 CCXML - Call Control XML: Standard for Call Control

 ASR – the W3C SRGS 1.0 (Speech Recognition Grammar Specification) grammar

formats in both XML and ABNF (Augmented Backus-Naur Form) formats, and also complete support of SISR 1.0 (Semantic Interpretation for Speech Recognition)

 DTMF – even DTMF applications can take advantage of the SRGS 1.0 and SISR 1.0

standards, so that a voice/DTMF application can be given uniform results from voice and DTMF interactions

 TTS – the W3C SSML (Speech Synthesis Markup Language) is the standard for

enhancing text-to-speech rendering and for accessing the many unique features of Loquendo TTS

 EMMA – published by the MMI (Multimodal Interaction) group of the W3C, it‟s a

language for returning different modality results to the application (voice, gesture, keyboard)

 Pronunciation Lexicon: PLS (Pronunciation Lexicon Specification) standard for TTS and ASR (Loquendo is editor of this specification)

(29)

Voice Search 2009 ConferenceFootnotes Paolo Baggia 291

Loquendo Speech Technologies

(30)

Voice Search 2009 Conference Paolo Baggia 30

Loquendo is a global provider of

high quality and reliable TTS,

ASR and Speaker Verification

worldwide, covering 26 languages,

62 voices – and rising!

TTS

: our most renowned best-in-breed product

ASR

: providing innovative features such as Garbage techniques

Full standards support. We strive for an open world:

standards

drive the speech industry

. Customers are free to choose us

without proprietary bindings

Highly professional, customer-oriented technical assistance

Price competitive

Why Choose Loquendo?

(31)

Voice Search 2009 Conference Paolo Baggia 31

Looking Over the Engine, Checking Under the Hood

The Key to Success in Voice Applications and CRM:



A natural, accurate, well-designed speech interface



A first-rate ASR and TTS

to power it.

Your voice-enabled service is the face your company

presents to your customers!

The naturalness and user-friendliness of your voice

interface is key to enhancing customer experience.

(32)

Voice Search 2009 Conference Paolo Baggia 32

For more information please:

Keep an eye on:

www.loquendo.com

Contact:

paolo.baggia@loquendo.com

Loquendo S.p.A.

745 Fifth Ave, 27th Floor New York, NY 10151 USA Tel. +1 212.310.9075 Fax. +1 212.310.9001 www.loquendo.com

THANK YOU

Loquendo S.p.A. Via Arrigo Olivetti, 6 10148 TORINO

Italy

Tel. +39 011 291 3111 Fax +39 011 291 3199 www.loquendo.com Keep in touch with Loquendo news, subscribe to

the Loquendo Newsletter

Try our interactive TTS demo: insert your text, choose a language, and listen

The latest News at a click

Consult the Loquendo Newsletter online Keep up to date on events and initiatives

http://www.ietf.org/rfc/rfc4463.txt

http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-17