• No results found

RealSpeak Telecom Software Development Kit

N/A
N/A
Protected

Academic year: 2021

Share "RealSpeak Telecom Software Development Kit"

Copied!
53
0
0

Loading.... (view fulltext now)

Full text

(1)

User’

s

Gui

de

f

or

German

V 4 . 0

(2)

1. Grant of Rights

In consideration of a possible commercial relationship, ScanSoft hereby grants to you, the LICENSEE, who accepts, a non-exclusiverightto internally evaluateand testthesoftwareprogram (“theSoftware”).

2. Ownership of Software

ScanSoft retains title, interests and ownership of the Software recorded on the original disk(s) and all subsequent copies of the Software and Documentation, regardless of the form or media in or on which the original and other copies may exist. ScanSoft reserves all rights not expressly granted to LICENSEE.

3. Copy Restrictions

This Software and the accompanying documentation are copyrighted. Unauthorized copying of the Software, including Software that has been merged or included with other software, or of the documentation is expressly forbidden. LICENSEE may be held legally responsible for any intellectual property infringement that is caused or encouraged by his failure to abide by the terms of this agreement. LICENSEE is allowed to make two (2) copies of the Software solely for backup purposes, provided that the copyright notice is included on the backup copy. 4. Use Restrictions

LICENSEE agrees not to use the Software for any other purpose than internally evaluating the Software. LICENSEE may physically transfer the Software from one computer to another, provided that the Software is used on only one computer at a time. LICENSEE may not modify, adapt, translate, reverse engineer, decompile, disassemble or create derivative works based on the Software.LICENSEE may not modify, adapt, translate or create derivative works based on the documentation provided by ScanSoft. The Software may not be transferred to anyone without the prior written consent of ScanSoft. In no event may LICENSEE transfer, assign, lease, sell or otherwise dispose of the Software and Documentation on a temporary or permanent basis except as expressly provided herein.

5. Warranty

THE SOFTWARE IS PROVIDED “AS IS”WITHOUT WARRANTY OF ANY KIND,EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. ScanSoft shall have no liability to LICENSEE or any third party for any claim, loss or damage of any kind, including but not limited to lost profits, punitive, incidental, consequential or special damages, arising out of or in connection with the use or performance of the Software and accompanying documentation.

6. Termination

This agreement is effective until terminated. ScanSoft reserves the right to terminate this agreement automatically if any provision of this agreement is violated. LICENSEE may terminate this agreement by returning the Software and the accompanying documentation to ScanSoft, along with a written warranty stating that all copies have been returned.

(3)

Trademarks

MS-DOS®, WINDOWS®, MICROSOFT® VISUAL C++, BORLAND C++ and Sound Blaster are registered trademarks of their respective owners. ScanSoft is a registered trademark. All rights reserved.

(4)

Date

Type of change

SoftwareVersion

July 2004

Reformatting of document

Update for version 4.0

V4.0

November 2004

Update for version 4.0

V4.0

December 2005

Upda

t

e

“SSML

Pr

e

pr

oc

e

s

s

or

”

c

ha

pt

e

r

a

nd

“Us

i

ng

Cont

r

ol

Se

que

nc

e

s

”

section of Chapter I

(5)

Native Character Set ...8

Using Control Sequences ...8

Quick Reference of the RealSpeak native Control Sequences for German...10

Entering phonetic input...15

How to proceed ...15

Lexical stress and sentence accents in phonetic input...16

The German L&H+ & UNIPA Phonetic Alphabet ...18

Using a User Dictionary...20

Using the Microsoft SAPI5 Lexicon...20

User Lexicons ...20

Application Lexicons ...20

The German SAPI5 Phoneme List ...20

Notes on the German Text-To-Speech System ...24

Cardinal Numbers ...24

Decimal Numbers ...24

Ordinal Numbers...25

Roman Numbers ...25

Telephone Numbers ...25

Bank Account Numbers ...25

Dates ...26

Time Indications ...26

Currencies ...26

Abbreviations and Acronyms...27

E-MAIL PREPROCESSOR... 29

Introduction ...29

E-Mail Header Processing ...30

Header Field Extraction ...30

Header Field Reading ...32

From Field...32

Date Field ...33

Subject Field ...34

E-Mail body processing ...35

Message Extraction...35

Text Normalization ...36

Language Specific Text Normalization ...39

Umlaut...39

English Words...39

Customizing the E-Mail Preprocessor ...40

(6)

Introduction ...51

Proper names Custom G2P dictionary ...51

APPENDICES ... 53

(7)

Chapter I

German Text-To-Speech System

User’

s

Gui

de

f

or

German

(8)

German Text-To-Speech System

Introduction

This section provides operational instructions for the ScanSoft Text-To-Speech system for German. It reviews the functionality of the system, and describes how the user can customize the

pronunciation of input texts. This part also describes issues that are particular to the German Text-To-Speech system. It introduces the German phonetic alphabet and discusses a number of the language-specific features of the German Text-To-Speech system.

Preparing a text for Text-To-Speech

In general, there are four ways to intervene in the pronunciation of text:

 By using control sequences

 By entering phonetic input

 By using a user dictionary or a user ruleset

 Byusing oneofthesupported API’s

Thesemechanismsaredescribed in theProgrammer’sGuide.

In this part, however, the specifications for German are fully described.

Native Character Set

The native character set of the German TTS system is Windows-1252 which has the printable characters in the ASCII range 1-127 as a subset. Note that TTS input encoded in another supported character set is converted to the native character set for that language before it is processed internally. Consequently, input must be representable in the native character set even if it is encoded in another character set supported by the API.

Using Control Sequences

For a description of the various supported markup languages

(independent from the language), refer to theProgrammer's Guide. Remark:<ESC> representsthe escape character“\x1B”

(9)

Below, you find a quick reference table for the RealSpeak native control sequences for German. The language-specific support for the

SSML markup languageisdescribed in the“SSML Preprocessor”

(10)

Quick Reference of the RealSpeak native Control Sequences for German Sequence Description Range Default Delimiter

Volume (x : 0 .. 100) 0 = silence 10 = low 100 = high 80 No <ESC> \vol=x\ For example:

<ESC>\vol=10\ Ich kann sehr leise sprechen, <ESC>\vol=90\ aber auch sehr laut.

Speech Rate (x : 1 .. 100) 10 = slow 100 = fast 50 No <ESC> \rate=x\ For example:

Ich kann <ESC>\rate=70\ sehr schnell sprechen <ESC>\rate=20\ oder aber ganz langsam. Word per

minute (xxx: 1..1000)

Voice-specific (see

subsequent table) Voice-specific No

<ESC> \rate_wpm= xxx\

For example:

Ich kann <ESC>\rate_wpm=350\ sehr schnell sprechen <ESC>\rate_wpm=110\ oder aber ganz langsam. Read mode;

some read modes are not supported in e-mail mode x = 0..3: 0 = character-by-character 1 = word-by-word (not supported in e-mail mode) 2 = sentence-by-sentence 3 = line-by-line (not supported in e-mail mode) 2 Yes <ESC>Mx For example: <ESC>M0 Demonstration

(The word "Demonstration" will be spelled.) <ESC>M1 Dies ist eine Demonstration. (This sentence will be read word by word.) <ESC>M2 Dies ist eine Demonstration. (This sentence will be read as one sentence.) Wait Period

<ESC>Wx

For example: Sie hören jetzt <ESC>W5

x= 0…9

0 = no wait period 1 = 200 ms

(11)

Sequence Description Range Default Delimiter Long Pause <ESC> \Pause=xxx \ For example: Man kann die Dauer der Pause ganz genau <ESC>\pause =1500\ definieren. 1 ..65535 msec No Sentence Accent No <ESC>" For example:

<ESC>"Jakob hat gestern angerufen. (nicht Peter)

Jakob hat <ESC>"gestern angerufen. (nicht heute)

Note:

Manually inserted sentence accents may have no effect in RealSpeak. The RealSpeak synthesis module may indeed have reasons to override the requested sentence accent, and thus not realize it.

Continuation No

<ESC>C

For example: Der 50. Besucher.

Der 50. <ESC>C Besucher.

In the first of the above examples, the text-to-speech system will detect an end-of-sentence after 50 and will inappropriately split the input into two separate sentences. In the second example, a continuation sequence is inserted in order to make the system pronounce the entire input as one sentence.

End-of-Message Yes

<ESC>E

For example:

Dies ist der erste Satz <ESC>E und dies ist der zweite. In the above example, the sequence <ESC>E forces the system to pronounce the two halves of the input separately.

Phonetic Input (L&H+ phonetic alphabet) No <ESC>/+ For example: <ESC>/+’Stat<ESC>/+

(12)

Sequence Description Range Default Delimiter

Preprocessing

Mode text = standard textmode email = e-mail mode

Yes

<ESC>%x

For example:

<ESC>%text Sie hören eine Nachricht von Hans Kessler. <ESC>%email Von: [email protected] (Hans Kessler) Guide text normalization; limited support in e-mail mode s = string: address=address mode (not supported in e-mail mode) normal=standard mode spell=spell mode The text normalization types corresponding with the SSML <say-as> types are also supported in standard text mode (not in e-mail mode), see the

“SSML Preprocessor”

chapter for more details.

Normal No

<ESC> \tn=s\

For example:

<ESC>\tn=address\ Prof. Dr. Meier, GH Bonn, Gebäude III, 2. St., PF 360, Bonn <ESC>\tn=normal\ Dr. GH Meier

Reset to Default Yes

<ESC>F

For example:

<ESC>\vol=10\ Dies ist der niedrigste Wert der Lautstärke. <ESC>F Und dies ist der normale Wert.

<ESC>\rate=90\ Dies ist der höchste Wert der

Geschwindigkeit. <ESC>F Und dies ist der normale Wert.

<ESC>@c Declare the part-of-speech (not supported in German) c = character No <ESC>\do main=s\ Enable the extension (only if a custom g2p has been loaded)

s = string: the name

(13)

Sequence Description Range Default Delimiter <ESC>\voic

e=s\

Set the voice (if there is more than 1 voice is available)

s = string: the name

of the voice Yes

<ESC>\mrk =n\

Insert a bookmark For example: Hallo, ich bin

\mrk=2000\ Steffi n = 0.. 2147483647 No <ESC>\p\ Insert a paragraph boundary For example: Herr Krueger \p\ Kirchstrasse 12 \p\ Berlin Yes <ESC>\aud io="s"\ Insert an audio file; not supported in e-mail mode s = string: the URI of a document with an appropriate MIME type Yes

(14)



NOTE

The speech rate is language, gender and technology dependent. It can be set in 9 discrete steps. The values given here are in words per minute.

Rate

level Steffi RealSpeakRate (wpm)

1 40 2 70 3 100 4 120 5 140 6 220 7 300 8 380 9 470

(15)

Entering phonetic input

How to proceed

To switch from orthographic to phonetic mode, insert <ESC>/+ to use the L&H+ phonetic alphabet. The phonetic input mode remains active until the command is explicitly reset by entering <ESC>/+ again.

The phonetic input string is composed of symbols of the L&H+ phonetic alphabet (see phonetic table below). Examples are given below in the phonetic table.

In addition to the phonetic symbols, it is advised to use the following characters in the phonetic input string:

Special characters L&H +

Symbol Meaning As in:

' (ASCII 39,

Hex 27)

primary word stress <ESC>/+ mo:.'dERn <ESC>/+ (Adjektiv 'modern')

vs.

<ESC>/+ 'mo:.d$Rn <ESC>/+ (Verb 'modern')

'2 secondary word stress <ESC>/+ 'tE.nIs.'2Spi:.l$R <ESC>/+

(Tennisspieler) "

(ASCII 34, Hex 22)

sentence accent <ESC>/+Es_gIbt_"tsva&i_?ak."t&s En.t$_?In_'di:.z$m_ 'zats <ESC>/+ (Es gibt ZWEI AKZENTE in diesem Satz.)

. syllable boundary <ESC>/+ 'bu:x.Sta:.b$ <ESC>/+ (Buchstabe)

# silence (pause) <ESC>/+

?ER_"fRa:k.t$_#_vi:_"ge:t_?Es <ESC>/+

(Er fragte: wie geht es?)

Note that the use of punctuation marks remains useful within phonetic input to assure a correct intonation. Each punctuation mark needs to be preceded by an asterisk.

(16)

For example:

<ESC>/+"vIl.kO.m$n*,_"kO.m$n_zi:_hE."Ra&in*. <ESC>/+ (Willkommen, kommen Sie herein.)

Punctuation Marks

L&H+ Symbol Meaning

- Word delimiter *. End of declarative *, Comma *! End of exclamation *? End of question *; Semicolon *: Colon

Lexical stress and sentence accents in phonetic input

In phonetic input strings, lexical stress and sentence accents can be

indicated manuallybytheuser,byusing asinglequote(‘)ordouble quote(“)respectively.

Note that manually inserted lexical stress or sentence accents may have no effect in RealSpeak. The RealSpeak synthesis module may indeed have reasons to override the requested stress/accent and thus not realize it.

1. The Text-To-Speech system will automatically convert all lexical stress marks into sentence accents in case no manually added sentence accents are found in the phonetic input string. Example: <ESC>/+IC_'ha:.b$_zi:_'hO&y.t$_nOx_g$.'SpRO.x$n*.<ES C>/+ is the same as <ESC>/+IC_"ha:.b$_zi:_"hO&y.t$_nOx_g$."SpRO.x$n*. <ESC>/+

(Ich habe sie heute noch gesprochen.)

2. If phonetic input contains at least one manually addedsentence accent, no additional sentence accents are assigned by the Text-To-Speech system. Therefore, only those words marked with " will get a sentence accent. As a consequence, a message containing only one manual sentence accent will have an almost flat intonation on all the other words.

(17)

Example:

<ESC>/+IC_'ha:.b$_zi:_"hO&y.t$_nOx_g$.'SpRO.x$n*. <ESC>/+

(Only one sentence accent will be realized.)

3. Phonetic input can also be combined with orthographic input. If no sentence accents are found in the input text (indicated by <ESC>" in orthographic input, or by " in phonetic input), the Text-To-Speech system will automatically assign sentence accents. In the orthographic part of the input, the

Text-To-Speech system will realize these sentence accents on the basis of part-of-speech and syntactic information. In the phonetic part of the input, all lexical stress marks (if any) will be converted into sentence accents. If there are no lexical stress marks, no sentence accent will be realized for the phonetic part of the input (see point 1 above).

If the user has manually specified one or more sentence accents, no additional sentence accents will be realized (see point 2 above).

For example:

Er hat heute noch mit <ESC>/+'klIn.t$n <ESC>/+ gesprochen.

(No sentence accents are found; the Text-To-Speech system will automatically assign sentence accents.)

Er hat heute noch mit <ESC>/+"klIn.t$n <ESC>/+ gesprochen.

(A sentence accent is specified in the phonetic part of the input text. No additional sentence accents will be realized.)

Er hat <ESC>"heute noch mit <ESC>/+'klIn.t$n <ESC>/+ gesprochen.

(A sentence accent is specified in the orthographic part of the input text. No additional sentence accents will be realized. Hence, the lexical stress that is specified in the phonetic part will NOT be converted into a sentence accent.)

(18)

The German L&H+ & UNIPA Phonetic Alphabet Vowels and Diphthongs L&H+

Symbol TranscriptionL&H+ UNIPASymbol TranscriptionUNIPA As in:

a ‘Stat a ‘Stat Stadt a: ‘va:.g$n a: ‘va:.g$n Wagen E ‘lEts.t$ E ‘lEts.t$ Letzte e: ‘ke:.l$ e: ‘ke:.l$ Kehle I ‘mIlC I ‘mIlC Milch i: ‘Ri:.z$ i: ‘Ri:.z$ Riese O ‘fOl O ‘fOl voll o: ‘gRo:s o: ‘gRo:s groß U ‘kUnst U ‘kUnst Kunst u: ‘fu:s u: ‘fu:s Fuß Y ‘kYs.t$ Y ‘kYs.t$ Küste y: ‘gRy:n y: ‘gRy:n grün E+ ‘lE+.S$n E= ‘lE=.S$n löschen e+ ‘Se+n e= ‘Se=n schön E: 'fE:.R$ E: 'fE:.R$ Fähre $ ‘tas.t$ $ ‘tas.t$ Taste

a&u 'ba&um a+u 'ba+um Baum

O&y 'hO&y.t$ O+y 'hO+y.t$ heute

a&i 'ta&il a+i 'ta+il Teil

A%~ REs.to:.'RA%~ A%~ REs.to:.'RA%~ Restaurant

O%~ bal.'kO%~ O%~ bal.'kO%~ Balkon

E%~ 'tE%~ E%~ 'tE%~ Teint

(19)

Consonants L&H+

Symbol TranscriptionL&H+ UNIPASymbol TranscriptionUNIPA As in:

p 'pOst p 'pOst Post

b 'ba&in b 'ba+in Bein

t 'tIn.t$ t 'tIn.t$ Tinte

d 'dIC d 'dIC dich

k 'kla&in k 'kla+in klein

g 'li:.g$n g 'li:.g$n liegen

f 'fElt f 'fElt Feld

v 'vax v 'vax wach

s 'fEls s 'fEls Fels

S 'Sne: S 'Sne: Schnee

z 'za:l z 'za:l Saal

Z ZUR. 'na:l Z ZUR. 'na:l Journal

C 'mIlC C 'mIlC Milch

x 'bax x 'bax Bach

h 'hant h 'hant Hand

j 'je:.mant j 'je:.mant jemand

l 'lICt l 'lICt Licht

R 'Ra&i.z$ R 'Ra+i.z$ Reise

m 'man m 'man Mann

n 'nOR.d$n n 'nOR.d$n Norden

nK 'RInK nK 'RInK Ring

? b$.'?ax.t$n ? b$.'?ax.t$n beachten

t&s 't&su:k t+s 't+su:k Zug

p&f 'p&fe:Rt p+f 'p+fe:Rt Pferd

t&S 't&SIl.p$n t+S 't+Sil.p$n tschilpen

(20)



NOTE

Note that the L&H+alphabet is not SSML compliant. For SSML, use the UNIPA alphabet.

Using a User Dictionary

For information on how to create and use user dictionaries, please refer tothe“UserConfiguration” chapterin the RealSpeak Telecom

Programmer’sGuide.

Using the Microsoft SAPI5 Lexicon

Microsoft SAPI5 provides lexicons so that users and applications can specify pronunciation and part of speech information for particular words. As such, all SAPI compliant Text-To-Speech engines should use these lexicons to guarantee uniformity of pronunciation and part of speech information.

There are two types of lexicons in SAPI: user lexicons and application lexicons.

User Lexicons

Each user who logs into a computer will have a User Lexicon. Initially, this lexicon is empty; words can be added either

programmatically, or by using an engine's add/remove words UI component (for example, the sample application Dictation Pad provides an Add/Remove Words dialog).

Application Lexicons

Applications can create and ship their own lexicons of specialized words. These lexicons are fixed and cannot be edited.

Detailed information on how to use the MS SAPI5 lexicons can be

found in themanual“MicrosoftSpeech SDK V5.1”,chapter “ISpLexicon Interface”.

(21)



NOTE

Note that the Microsoft Speech SDK V5.1 only provides a phoneme set for American English.

The German SAPI5 phoneme set mentioned below has been developed by ScanSoft, based on the symbols available for American English. The phoneme list below is therefore not to be considered as an official phoneme set defined by Microsoft SAPI5.

SAPI5 Symbols

SAPI Symbol PhoneID Example SAPI Transcription

A 13 Satz Z A TS A : 13 12 Tat T A : T AW 14 Haus H AW S AX 15 bitte B IH T AX AX 15 besser B EH S AX R AY 16 Eis AY S EH 20 Gesetz G AX Z EH TS EY 22 Beet, spät B EY T , S P EY T EH : 20 12 (Buchstabe) Ä EH : OE 34 plötzlich P L OE TS L IH X EU 21 blöd B L EU T IH 26 Sitz Z IH TS IY 27 lieb L IY P OH 35 Trotz T R OH TS OW 36 Boot B OW T OY 37 Kreuz K R OY TS UY 48 hübsch H UY P SH UE 45 süss (süß) Z UE S UH 46 Schutz SH UH TS UW 47 Blut B L UW T B 17 Bein B AY N D 18 Deich, dank D AY X , D A NG K CH 19 deutsch D OY CH F 23 fast F A S T

(22)

SAPI5 Symbols

SAPI Symbol PhoneID Example SAPI Transcription

G 24 Gunst G UH N S T H 25 Hand H A N T JH 28 dschungel JH UH NG AX L K 29 Kunst K UH N S T L 30 Leim L AY M M 31 mein M AY N N 32 nein N AY N NG 33 Ding D IH NG P 38 Pein P AY N PF 39 Pfahl PF A : L R 40 Reim R AY M S 41 Tasse T A S AX SH 42 waschen V A SH AX N T 43 Teich T AY X TS 44 Zahl TS A : L V 49 was V A S X 50 sicher Z IH X AX R X 50 Buch B UW X Y 51 Jahr Y A : AX R Z 52 Hase H A : Z AX ZH 53 Genie ZH EY N IY IY AX R 27 15 Tier T IY AX R IH AX R 26 15 Wirt V IH AX R T UE AX R 45 15 Tür T UE AX R UY AX R 48 15 Türke T UY AX R K AX EY AX R 22 15 schwer SH W EY AX R EH AX R 20 15 Berg B EH AX R K EH : AX R 20 12 15 Bär B EH : AX R EU AX R 21 15 Föhr F EU AX R OE AX R 34 15 Wörter V OE AX T AX R A : AX R 13 12 15 Haar H A : AX R A AX R 13 15 hart H A AX R T UW AX R 47 15 Kur K UW AX R UH AX R 46 15 kurz K UH AX R TS OW AX R 36 15 Ohr OW AX R OH AX R 35 15 dort D OH AX R T

(23)

SAPI5 Symbols SAPI5

Symbol SAPI PhoneID Meaning SAPI Transcription

- 1 syllable boundary (hyphen) B IH - T AX ! 2 Sentence terminator (exclamationmark) B IH T AX !

& 3 word boundary AX RB IH T AX & B EH S , 4 Sentence terminator (comma) AX RB IH T AX , _ B EH S . 5 Sentence terminator (period) B IH T AX . ? 6 Sentence terminator (question mark) AY S ?

_ 7 Silence (underscore) AX RB IH T AX , _ B EH S 1 9 primary stress 1 B IH - T AX 2 10 secondary stress ~ 11 nasalization : 12 lengthen ^ 8 Verein F EH AX R ^ AY N

(24)

Notes on the German Text-To-Speech System

The German Text-To-Speech system has been designed in order to pronounce correctly any input written according to the rules of German orthography. The following cases, however, require special attention.

Cardinal Numbers

Cardinal numbers up to 15 digits are pronounced as full numbers. Periods may be used to separate groups of digits.

For example: 6230

or

6.230

Decimal Numbers

Decimal numbers may consist of up to 15 digits before or after the comma. Periods may be used to separate groups of digits in the string before the comma. The digits after the comma are pronounced one by one. For example: 9550,5 9.550,5



NOTE

Numerals that are normally pronounced as full numbers, can also be pronounced digit by digit by using the control sequence

(25)

Ordinal Numbers

A cardinal number smaller than 32 followed by a period is

pronounced as an ordinal number if it is not in a sentence initial or sentence final position.

For example: am 15. Mai 1998

Roman Numbers

Roman numbers smaller than 10 that cannot be interpreted as single letters are pronounced as full numbers (i.e. II, III, IV, VI, VII, VIII, IX).

For example: Abteilung IV Garzweiler II

Telephone Numbers

In order to ensure a correct pronunciation of telephone numbers, it is recommended to use parentheses to separate country code and/or area code from the remainder of the telephone number. Also, use spaces to separate groups of digits. Telephone numbers written in this format will always be pronounced in groups of two or three digits, with a pause at the place of the space.

For example: (041) 317 11 33 (03 35) 23 02 16

Bank Account Numbers

To have a bank account number correctly pronounced (in groups of 2 or 3 digits), use hyphens between groups of digits. To have the number pronounced digit by digit, switch to spell mode (<ESC>\tn=spell\).

(26)

LG Stuttgart 303-504-52 BLZ 900-563-78

Dates

Dates can be written as structured groups of digits, separated by periods or slashes.

Day (1 or 2 digits)/Month (1 or 2 digits)/Year (2 digits) Day (1 or 2 digits).Month (1 or 2 digits).Year (2 or 4 digits)

For example: 17/12/2003 9.9.2001 01.02.97 am 10.07.99

Time Indications

Time indications will be pronounced correctly when written in one of the following formats:

9:15 09:15 Uhr 4.00 Uhr 22.15 Uhr

Currencies

The German Text-To-Speech system correctly handles the German currency indication DM when written in one of the following ways:

DM 40 50 DM DM

10,-The Austrian, Swiss and American currencies are handled correctly:

19 Sfr 130 ÖS. 5 $

(27)

15 Franken 10 Gulden

Currencies up to 15 digits will be correctly pronounced. Periods may be used to separate groups of digits.

For example: 250.850.990 DM 250850990 DM

Decimal digits in combination with currency indications are also supported. Decimal currency amounts up to 15 digits will be pronounced correctly.

For example: 1999,50 DM 1.999,50 DM

Abbreviations and Acronyms

The German Text-To-Speech system contains a dictionary with the most common abbreviations and acronyms, such as:

bzw.: beziehungsweise MwSt.: Mehrwertsteuer usw.: und so weiter

Some abbreviations are not case-sensitive: uppercase and lowercase are both accepted. Examples of case-sensitive abbreviations are:

Hrsg.: Herausgeber hrsg.: Herausgegeben So.: Sonntag

Abbreviations that are NOT in the dictionary:

 will be spelled if they consist of consonants only (with or without punctuation)

 will be spelled if the abbreviation contains one or more vowels, separated by periods

 will be spoken as full words if the abbreviation contains one or more vowels and is not separated by periods

(28)

Chapter II

E-Mail Preprocessor

User’

s

Gui

de

f

or

German

(29)

E-Mail Preprocessor

Introduction

The ScanSoft e-mail preprocessor (EMPP) is developed to analyze a specific type of text, namely e-mail messages. E-mail messages differ from any average type of text in both their structure and contents. An e-mail message consists of two clearly distinguishable parts: the header and the body. A substantial part of the header contains routing and administrative information, which is irrelevant to the user. Both the header and the body contain all kinds of e-mail specific text features, e.g. e-mail addresses, emoticons such as smileys, etc. Furthermore, informal writing is often combined with a lack of grammatical conventions. Spelling rules are frequently violated, punctuation is often omitted, etc.

Although the standard ScanSoft Text-To-Speech system can handle special text items (abbreviations, numbers, dates, etc.), it is not capable of correctly handling all e-mail specific text features. These text features are therefore dealt with by the e-mail preprocessor. The EMPP transforms e-mail specific information into a format that complies with the rules of the standard ScanSoft Text-To-Speech system. The EMPP is a plug-in preprocessing module of the ScanSoft Text-To-Speech system. It replaces the preprocessor of the standard Text-To-Speech system.

In the following sections you will find a description of the functioning of the ScanSoft e-mail preprocessor as well as an overview of its features.

The e-mail preprocessor has two main tasks: processing of the e-mail header and processing of the body of the e-mail message.

The input to the EMPP consists of one or more e-mail messages. In order to process the e-mail header, the EMPP extracts relevant header fields and then provides an intelligent header field reading.

(30)

During the processing of the e-mail body, the text is divided into smaller text units, called text-to-speech messages, which are synthesized by the Text-To-Speech system. Text normalization is applied to e-mail specific text features such as e-mail addresses, proper names, emoticons, URLs (Universal Resource Locators), etc. For the text normalization of an e-mail message, the ScanSoft EMPP applies linguistic rules and performs dictionary look-up, in order to yield an adequate phonetic transcription. The EMPP also supports the ScanSoft user dictionary mechanism, which allows the user to customize the output of the e-mail processing.

E-Mail Header Processing

Header Field Extraction

An e-mail message consists of two clearly distinguishable parts: the header and the body. The EMPP detects the header and extracts the relevant header fields. Information that is of no interest to the user (such as routing information) is not retained.

The EMPP extracts the following header fields:

From Field Containsthesender’snameand/oraddress

Date Field Contains the date and time of sending Subject Field Optionally contains the subject of the e-mail The extraction of the header fields is based on the detection of specific keywords in the e-mail header. The supported keywords for the extraction of the header fields are listed below:

From Field From: Author: Sender: De: Von: Date Field Date:

Enviado: Gesendet: Subject Field: Subject:

Subj: Asunto: Betreff:

(31)

The following is an example of header field extraction. The original header holds information that is irrelevant to the user. After extraction of date, sender and subject, the processed header merely mentions the Date field, the From field and the Subject field.:

Original header: Path: news.be.innet.net!INbe.net!news.nl.innet.net!INnl.net!hunter.prem ier.net!www.nntp.primenet.com!nntp.primenet.com!feed1.news.erol s.com!howland.erols.net!news.sprintlink.net!news - peer.sprintlink.net!uunet!in3.uu.net!01-newsfeed.univie.ac.at!02-newsfeed.univie.ac.at!news.ecrc.de!news00.btx.dtag.de!not -for-mail

From: [email protected] (Ulrike Noska) Newsgroups: de.etc.sprache.deutsch

Subject: Re: Neue Wörter Date: 19 Oct 1996 20:08:38 GMT

Organization: Telekom Online Internet Gateway Lines: 10

Message-ID: <[email protected]> Mime-Version: 1.0

Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit

X-Sender: [email protected] (Ulrike Noska) X-Mailer: Mozilla 2.01DT [de] (Win16; I)

Extracted header fields:

From: [email protected] (Ulrike Noska) Subject: Re: Neue Wörter

(32)

Header Field Reading

After the header fields have been extracted, they are processed by the EMPP. The header field keywords (see above) are replaced by an introductory message. The remainder of the header fields is processed by the EMPP in order to allow the Text-To-Speech system to

intelligently read the fields. From Field

TheFromfield keyword is replaced by the introductory message “Nachrichtvon:”.

For example: Author: Thomas Kärtner

is pronounced:

Nachricht von: Thomas Kärtner

The remainder of the From field is further processed by the EMPP. The EMPP supports From fields that either consist of

a) a proper name

b) a proper name and an address c) an address

a) - b) In case theFromfield contains a proper name, this name and only this name is sent to the Text-To-Speech system. This means that if both a name and an address are found in the From field, the address will not be read by the Text-To-Speech system.

For example:

From: Alex Wunneburger From: "Jo Hans Kessler"

<[email protected]>

From: Peter Kuppelwieser at IepXchgPO From: Udo Kohl/LHS/IEP/BE

(33)

are pronounced:

Nachricht von: Alex Wunneburger Nachricht von: Jo Hans Kessler Nachricht von: Peter Kuppelwieser Nachricht von: Udo Kohl Nachricht von: Heiko Felsmann

c) In case theFromfield contains only an address, the EMPP extracts the name out of the address and expands the domain that is contained in the address. In other words, the e-mail address is not read literally.

For example:

Author: [email protected] Author: wien.at!kunze

are pronounced:

Nachricht von: thomas b at Bundestag punkt d e Nachricht von: kunze at Wien punkt a t Date Field

TheDatefield keyword is replaced by the introductory message “Datum:”

TheDatefield contains the date and time of sending. The EMPP supports multiple date and time formats, which are transformed to a uniform format that complies with the rules for date and time indications of the ScanSoft Text-To-Speech system. The EMPP only pronounces the date.

The EMPP supports dates in the following formats:

For example: Date: 12/23/2002 21:15 PM Date: 13 Mar 1996 06:50 AM are pronounced: Datum: 23. Dezember 2002 Datum: 13. März 1996

(34)

Subject Field

TheSubjectfield keyword is replaced by the introductory message

“Betreff:”.

TheSubjectfield can contain all kinds of data, but may also be empty. The EMPP searches for keywords that are typical of the subject field (e.g. RE, FYI, FW).

For example:

Subject: RE: Wer holt mich ab?

Subject: FYI: Stundenplan fuer die naechsten Monate Subject: FW: Ich habe es gefunden!!!!

are pronounced:

Betreff: Antwort auf: Wer holt mich ab?

Betreff: zu Ihrem Interesse: Stundenplan fuer die naechsten Monate Betreff: weitergeleitete Nachricht: Ich habe es gefunden!!!!

(35)

E-Mail body processing

Message Extraction

The e-mail preprocessor splits the body of the e-mail message into text-to-speech messages. This is done on the basis of a number of criteria, such as punctuation, capitalization, layout, intelligent abbreviation handling, etc.

The following examples illustrate some criteria for splitting the e-mail text into text-to-speech messages:

 Using sentence final punctuation and capital letters

Schreiben Sie mir bitte, wenn Sie noch Fragen haben. Meine E-mail-Adresse ist [email protected]

 Using layout

Themen der Versammlung: 1) Neue Kollegen

2) Organisation des Umzugs 3) Colloquium in Berlin

 Using intelligent abbreviation handling

(36)

Text Normalization

An e-mail message typically contains e-mail specific text features, such as e-mail addresses, URLs, file names, emoticons, etc. The EMPP transforms these e-mail specific features into a format that complies with the rules of the standard text normalization of the ScanSoft Text-To-Speech system.

The following are examples of e-mail specific text normalization:

 Support for multiple e-mail address formats

[email protected]

 Support for URLs (Universal Resource Locators)

http://www.norwegen.org

http://www.chemie.fu-berlin.de/adressen gopher://viertel.com/1

 Support for file names ldb001.tse

sysinfo.exe lipedu.xls

 Processing of emoticons

:-) is pronounced : haha

:-x is pronounced : Nicht weitersagen! :-( is pronounced :och

 Processing of overuse of punctuation

PASS AUF!!!!!!!: INTERNET VIRUS!!!!!!!!

Ich habe immer noch Probleme! #%&#@$<! Kannst Du mir helfen?

becomes:

PASS AUF! INTERNET VIRUS!

(37)

 Normalization of lay-out lines (e.g. part of an e-mail signature); not active when in spell mode.

These sequences of identical characters are not pronounced:

o 10 or more identical digits

o a word consisting of 5 or more identical US-ASCII

encoded letters of the modern Latin alphabet o a sequence of 3 or more identical US-ASCII

characters that are no letters, no digits, no sentence-final punctuation marks (.?!) and no white spaces; e.g. '&', '#', '%', '*', '-'

For example:

oooooooooooooooooooooooooooooo ---will be removed.

(38)

 Processing of Question/Answer (FAQ)

Q. Ich habe eine neue E-mail-Adresse. Wie kann ich meine EcoLink mails weiter empfangen?

A. Schick einfach eine E-mail mit Deiner alten und Deiner neuen Adresse an [email protected].

becomes: Frage:

Ich habe eine neue E-mail-Adresse. Wie kann ich meine EcoLink mails weiter empfangen?

Antwort:

Schick einfach eine E-mail mit Deiner alten und Deiner neuen Adresse an [email protected].

 Processing of inserted mail

Ralf> Na, ich hab' mir 'ne neue Stereoanlage Ralf>gekauft und bin ziemlich

Ralf> enttäuscht. Die klingt zwar nicht schlecht, Ralf> aber die

Lautstärke-Ralf> regelung wirkt nicht. Ich weiss nicht, was Ralf>ich damit machen soll.

Heide> Bring sie gleich zurück und tausch sie um!

becomes:

Ralf:

Na, ich hab' mir 'ne neue Stereoanlage gekauft und bin ziemlich enttäuscht. Die klingt zwar nicht schlecht, aber die Lautstärke-regelung wirkt nicht. Ich weiss nicht, was ich damit machen soll.

Heide:

(39)

Language Specific Text Normalization

Umlaut

The L&H E-mail preprocessor for German is able to convert umlauted characters. In e-mail messages, umlauts are written in several ways:

über ueber gelähmt

All above-mentioned ways of writing the umlaut are supported by the L&H e-mail preprocessor.

English Words

Since e-mail is an international medium, German e-mail messages will inevitably contain a lot of English words, which might refer to Internet, electronic mail or soft- and hardware. The typical e-mail jargon is handled by the exceptions dictionary of the e-mail

preprocessor. This dictionary is a lexicon for e-mail terminology and provides the Text-To-Speech system with an adequate German transcription for a number of English words.

asciitext /+'as.ki:.tEkst attachment /+$.'tEt&S.m$nt banner /+'bE.n$R forwarded /+'fo:R.vaR.d$t frame /+'fRe:m hypertext /+'ha&i.p$R.tEkst iexplorer /+'?In.t$R.nEt.Eks.'plo:R$R image /+'?I.m$t&S

(40)

Customizing the E-Mail Preprocessor

The e-mail preprocessor supports the standard ScanSoft Text-To-Speech SDK user dictionary mechanism, which allows the user to customize the output of the e-mail preprocessor. The user dictionary is consulted both during the header processing and the body

processing.

For more information on how to build and use user dictionaries, see the“UserConfiguration” chapterof theProgrammer’sGuide.

Customizing the E-Mail Header

The user dictionary is consulted during the header processing while reading the From field and the Subject field.

From Field

TheFromfield either consists of a) a proper name

b) a proper name and an address c) an address

a) In case theFromfield contains only a proper name, the name is passed to the user dictionary. If the lookup is successful, the proper name is substituted by the replacement string. If not, the name is further processed by the header reading module.

For example:

If the user dictionary contains the following line:

Johann /+'jo:.han

the following From field:

From: Johann Strauss

Becomes:

(41)

b) In case the From field contains a proper name and an address, the EMPP first passes the address to the user dictionary. If the lookup is successful, both the proper name and the address are substituted by the replacement string. If not, the EMPP passes the proper name to the user dictionary. If this lookup is successful, the name and the address are substituted by the replacement string. If not, the name is further processed by the header reading module. The address will not be read by the Text-To-Speech system.

For example:

If the user dictionary contains the following lines: [email protected], vom Tennis

Heinz Mein bester Freund Schreve /+ 'sCRe:.v$

the following From fields:

From: "Alex Van Schreve" <[email protected]> Author: [email protected] (Heinz)

From: [email protected] (P. Schmidt)

become:

Nachricht von: Alex Van */+ 'sCRe:.v$*/+ Nachricht von: Mein bester Freund

Nachricht von: Peter, vom Tennis

c) In case theFromfield contains only an address, the complete address is looked up in the user dictionary. If the lookup is successful, a proper name is added to the From field. If not, only the domain part is sent to the user dictionary. The EMPP first calls the dictionary for the complete domain part. If the lookup is successful, the

complete domain part is substituted by the replacement string. Otherwise, the EMPP cuts off the leftmost sublevel domain and repeats the lookup and matching procedures for the remainder of the domain part. If the lookup is successful, the remainder of the domain part is substituted by the replacement string. This procedure is repeated until the top level domain is encountered. If none of the lookups is successful, the address is further processed by the header reading module.

(42)

For example:

If the e-mail user dictionary contains the following lines:

[email protected] Mein bester Freund postbank.de Deutsche Postbank AG

the following From fields:

Sender: [email protected] From: [email protected]

become:

Nachricht von: Mein bester Freund

Nachricht von: anja gihl at Deutsche Postbank AG



NOTE

To allow a correct processing of the From field, the replacement string in the user dictionary should not contain an address or a domain.

Subject Field

Every word in theSubjectfield is sent to the user dictionary. If the lookup is successful, the replacement string is sent directly to the Text-To-Speech system. If not, the Subject field is further processed by the header reading module.

For example:

If the user dictionary contains the following lines:

ECAI /+ ?e:.tse:.?a:.'?i: IDT I D T

the following Subject fields:

Subject: ECAI Versammlung am Donnerstag Subject: Windows und IDT

(43)

are pronounced:

Betreff: */+ ?e:.tse:.?a:.'?i:*/+ Versammlung am Donnerstag Betreff: Windows und I D T

Customizing the E-Mail Body

When the user dictionary has been loaded, the EMPP will call the dictionary for every word of the e-mail body. If the word is found in the user dictionary, it is substituted by the replacement string. If not, the body is further processed by the e-mail body processing module.

For example:

If the user dictionary contains the following line:

IH : Iris Heller

The word "IH" in the following sentence:

Ich bleibe hier und IH wird die Kunden besuchen.

is replaced by the corresponding string found in the e-mail user dictionary:

(44)

Chapter III

SSML Preprocessor

User’

s

Gui

de

f

or

German

(45)

SSML Preprocessor

Introduction

SSML (Speech Synthesizer Markup Language) is part of a set of markup specifications by the W3C for voice browsers.

General information regarding the RealSpeak SSML processor can be found in theSSML Supportchapter of theProgrammer’s Guide. The RealSpeak Telecom SDK provides a built-in preprocessor that supports a large portion of the SSML 1.0 September 2004

Recommendation (REC). Moreover RealSpeak extends SSML with a number of Scansoft specific elements/attributes.

The setsupported byScansoftiscalled “ScanSoftSSML”(4SML).

The section below describes language-specific SSML support

included in the RealSpeak Telecom V4.0–German language version.

German specific SSML markup

XML encoding types for German

The encoding is specified in the XML text declaration

("<?xml… ?>") by the encoding declaration which is of the form encoding="<EncodingName>".

E.g. <?xml version="1.0" encoding="UTF-8"?> RealSpeak Telecom V4.0–German supports:

 “Windows-1252” and “ISO-8859-1” (ISO Latin1)

 TheUnicodeencoding “UTF-8”,“UTF-16” and “UCS-4”

(Note that the alias "ISO-10646-UCS-4" is not supported)

 Any coding character set supported by the ICU component as long as the input text only contains characters that can be transcoded to the native coded character set, being

“Windows-1252”.For more information about the character sets supported by ICU, take a look at the ICU website

http://www-306.ibm.com/software/globalization/icu

(46)

NOTE

Encoding names are parsed case-insensitive; hyphens and underscores are ignored

4SML Specifics for German

For reasonsofcompatibilitywith the‘standard’German system, the parallel text control sequence (<esc> sequence) is listed where applicable. As such, a similar TTS behavior can be created–or combined–with non-SSML driven text input.

4SML Tags Comment Corresponding control sequence High-level and document structure tags

xml:lang Supported ‘de-DE’ for German. Attribute of speak, paragraph, sentence and voice.

Text normalization tags

<say-as

interpret-as=”xxx”> Supported; limitedsupport in e-mail mode. In e-mail mode the only supported interpret-asvalueis“spell”. <say-as interpret-as=”number” format=”cardinal”> Supported <esc>\tn=number_cardinal\ <say-as interpret-as=”number” format=”digits”>

Supported <esc>\ tn=number_digits\

<say-as

interpret-as=”number” format=”decimal”>

Supported <esc>\ tn=number_decimal\

<say-as

interpret-as=”number”> Supported <esc>\ tn=number\

<say-as

interpret-as=”number” format=”ordinal”>

Supported <esc>\ tn=number_ordinal\

(47)

4SML Tags Comment Corresponding control sequence <say-as interpret-as=”number” format=”telephone” detail= “punctuation”> Supported <esc>\ tn=number_telephone_punctuation \ <say-as

interpret-as=”ordinal>” Supported <esc>\ tn=ordinal\

<say-as

interpret-as=”acronym”> Supported <esc>\ tn=acronym\

<say-as

interpret-as=”acronym” detail=”strict”>

Supported <esc>\ tn=acronym_strict\

<say-as

interpret-as=”measure”> Supported <esc>\ tn=measure\

<say-as

interpret-as=”letters”> Supported <esc>\ tn=letters\

<say-as interpret-as=”letters” detail=”strict”>

Supported <esc>\ tn=letters_strict\

<say-as

interpret-as=”words”> Supported <esc>\ tn=words\

<say-as

interpret-as=”date”> Supported <esc>\ tn=date\

<say-as interpret-as=”date” format=”mdy”>

Supported <esc>\ tn=date_mdy\

<say-as

interpret-as=”date” format =”dmy”>

Supported <esc>\ tn=date_dmy\

<say-as

interpret-as=”date” format=”ymd”>

Supported <esc>\ tn=date_ymd\

<say-as

interpret-as=”date” format=”ym”> Supported <esc>\ tn=date_ym\ <say-as interpret-as”date”

format=”my”> Supported <esc>\ tn=date_my\

<say-as interpret-as=”date” format=”dm”>

Supported <esc>\ tn=date_dm\

<say-as

(48)

4SML Tags Comment Corresponding control sequence

<say-as

interpret-as=”date” format=”m”> Supported <esc>\ tn=date_m\

<say-as

interpret-as=”date” format=”d”> Supported <esc>\ tn=date_d\

<say-as

interpret-as=”time”> Supported <esc>\ tn=time\

<say-as

interpret-as=”time” format=”h”> Supported <esc>\ tn=time_h\

<say-as interpret-as=”time” format=”hm”> Supported <esc>\tn=time_hm\ <say-as interpret as=”time” format=”hms”>

Supported <esc>\ tn=time_hms\

<say-as interpret-as=”duration” format=”hms”>

Supported <esc>\ tn=duration_hms\

<say-as interpret-as=”duration” format=”hm”>

Supported <esc>\ tn=duration_hm\

<say-as interpret-as=”duration” format=”ms”>

Supported <esc>\ tn=duration_ms\

<say-as interpret-as=”duration” format=”h”>

Supported <esc>\ tn=duration_h\

<say-as interpret-as=”duration” format=”m”>

Supported <esc>\ tn=duration_m\

<say-as interpret-as=”duration” format=”s”>

Supported <esc>\ tn=duration_s\

<say-as

interpret-as=”duration”> Supported <esc>\ tn=duration\

<say-as

interpret-as=”currency”> Supported <esc>\ tn=currency\

<say-as

interpret-as=”telephone”> Supported <esc>\ tn=telephone\

<say-as interpret-as=”telephone” detail=”punctuation”>

(49)

4SML Tags Comment Corresponding control sequence

<say-as

interpret-as=”spell”> Supported <esc>\ tn=spell\

<say-as

interpret-as=”name”> Supported <esc>\ tn=name\

<say-as interpret-as=”net” format=”email”>

Supported <esc>\ tn=net_email\

<say-as

interpret-as=”net” format=”uri”> Supported <esc>\ tn=net_uri\

<say-as

interpret-as=”net”> Supported <esc>\ tn=net\

Pronunciation tags

<phoneme

alphabet=”unipa”> SeSupportede section ‘the

German L&H+

and UNIPA

phonetic alphabets’

for an overview of the alphabet.

(50)

Chapter IV

Custom G2P Dictionaries

User’

s

Gui

de

f

or

German

(51)

Custom G2P Dictionaries

Introduction

ScanSoft's RealSpeak system now offers support for custom G2P dictionaries. A custom G2P dictionary module is an add-on module specifically designed to improve the quality of pronunciation for specific kinds of words.

One example of a custom G2P dictionary module currently available

from Scansoftisthe“propernames.” module,which isdescribed

below. Check with Scansoft for the availability of other custom G2P dictionary modules.

Proper names Custom G2P dictionary

The standard German RealSpeak system correctly pronounces a number of common proper names. However, given the complexity of the grapheme to phoneme conversion for proper names, a dedicated module is used to guarantee the same quality for proper names as for common words.

The“propernames” modulecontainsalexicon ofaround 85K proper

names (including company/brand names, countries, cities, and first/last names of people).

The“propernames” customG2P module is dynamically enabled/disabled by using the 4SML tags <ssft=domain

type=”propernames”>,or<say-astype=”name”>,ortheScansoft

escape sequences. Example:

Mr. VanderHoff went to Abbeville, and also to Bois-de-Boulogne.

<ssft=domain type=”propernames”> Mr.VanderHoffwentto

Abbeville, and also to Bois-de-Boulogne </ssft=domain>. Mr. VanderHoff went to Abbeville, and also to <say-as

(52)

Appendices

User’

s

Gui

de

f

or

German

(53)

Appendices

Appendix A: German voice names

The RealSpeak Telecom Text-To-Speech system now supports selecting the voice and language via a string as well as a define (please see the definition for the functionTtsInitializeEx()in the

Programmers Guideand also theBackwards Compatibility Guidefor details). The name strings for the currently supported German voices are listed in the table below.

Japanese Voice Name Strings Voice Name String

Steffi “Steffi”

References

Related documents

The main optimization of antichain-based algorithms [1] for checking language inclusion of automata over finite alphabets is that product states that are subsets of already

In this study, it is aimed to develop the Science Education Peer Comparison Scale (SEPCS) in order to measure the comparison of Science Education students'

This work describes the results of computational simulations and measurement of a multiband filter using double elliptical ring resonator excited by coplanar slot

By first analysing the image data in terms of the local image structures, such as lines or edges, and then controlling the filtering based on local information from the analysis

Além disso, verifica-se nos resultados en- contrados deste estudo que a capacidade fun- cional foi considerada preditora do escore de depressão, fato também observado no estudo

Indira Nooyi, CEO PepsiCo Madras Christian College The non‐IIT Faces Satya Nadela, CEO Microsoft Manipal Institute of Technology Rakesh Kapoor, CEO Reckitt Benckiser BITS

Again, the value inputs that help create fairness in the exchange transaction—based on ethical and business theory—are (1) an authentic engagement with customers, particularly

Similarly, nearly 78% of the respondents in (Vicknasingam et al., 2010) study reported that they were unable to quit from ketum use. Previous studies on ketum use in humans