Userâ
s
Gui
de
f
or
German
V 4 . 0
1. Grant of Rights
In consideration of a possible commercial relationship, ScanSoft hereby grants to you, the LICENSEE, who accepts, a non-exclusiverightto internally evaluateand testthesoftwareprogram (âtheSoftwareâ).
2. Ownership of Software
ScanSoft retains title, interests and ownership of the Software recorded on the original disk(s) and all subsequent copies of the Software and Documentation, regardless of the form or media in or on which the original and other copies may exist. ScanSoft reserves all rights not expressly granted to LICENSEE.
3. Copy Restrictions
This Software and the accompanying documentation are copyrighted. Unauthorized copying of the Software, including Software that has been merged or included with other software, or of the documentation is expressly forbidden. LICENSEE may be held legally responsible for any intellectual property infringement that is caused or encouraged by his failure to abide by the terms of this agreement. LICENSEE is allowed to make two (2) copies of the Software solely for backup purposes, provided that the copyright notice is included on the backup copy. 4. Use Restrictions
LICENSEE agrees not to use the Software for any other purpose than internally evaluating the Software. LICENSEE may physically transfer the Software from one computer to another, provided that the Software is used on only one computer at a time. LICENSEE may not modify, adapt, translate, reverse engineer, decompile, disassemble or create derivative works based on the Software.LICENSEE may not modify, adapt, translate or create derivative works based on the documentation provided by ScanSoft. The Software may not be transferred to anyone without the prior written consent of ScanSoft. In no event may LICENSEE transfer, assign, lease, sell or otherwise dispose of the Software and Documentation on a temporary or permanent basis except as expressly provided herein.
5. Warranty
THE SOFTWARE IS PROVIDED âAS ISâWITHOUT WARRANTY OF ANY KIND,EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. ScanSoft shall have no liability to LICENSEE or any third party for any claim, loss or damage of any kind, including but not limited to lost profits, punitive, incidental, consequential or special damages, arising out of or in connection with the use or performance of the Software and accompanying documentation.
6. Termination
This agreement is effective until terminated. ScanSoft reserves the right to terminate this agreement automatically if any provision of this agreement is violated. LICENSEE may terminate this agreement by returning the Software and the accompanying documentation to ScanSoft, along with a written warranty stating that all copies have been returned.
Trademarks
MS-DOS®, WINDOWS®, MICROSOFT® VISUAL C++, BORLAND C++ and Sound Blaster are registered trademarks of their respective owners. ScanSoft is a registered trademark. All rights reserved.
Date
Type of change
SoftwareVersion
July 2004
Reformatting of document
Update for version 4.0
V4.0
November 2004
Update for version 4.0
V4.0
December 2005
Upda
t
e
âSSML
Pr
e
pr
oc
e
s
s
or
â
c
ha
pt
e
r
a
nd
âUs
i
ng
Cont
r
ol
Se
que
nc
e
s
â
section of Chapter I
Native Character Set ...8
Using Control Sequences ...8
Quick Reference of the RealSpeak native Control Sequences for German...10
Entering phonetic input...15
How to proceed ...15
Lexical stress and sentence accents in phonetic input...16
The German L&H+ & UNIPA Phonetic Alphabet ...18
Using a User Dictionary...20
Using the Microsoft SAPI5 Lexicon...20
User Lexicons ...20
Application Lexicons ...20
The German SAPI5 Phoneme List ...20
Notes on the German Text-To-Speech System ...24
Cardinal Numbers ...24
Decimal Numbers ...24
Ordinal Numbers...25
Roman Numbers ...25
Telephone Numbers ...25
Bank Account Numbers ...25
Dates ...26
Time Indications ...26
Currencies ...26
Abbreviations and Acronyms...27
E-MAIL PREPROCESSOR... 29
Introduction ...29
E-Mail Header Processing ...30
Header Field Extraction ...30
Header Field Reading ...32
From Field...32
Date Field ...33
Subject Field ...34
E-Mail body processing ...35
Message Extraction...35
Text Normalization ...36
Language Specific Text Normalization ...39
Umlaut...39
English Words...39
Customizing the E-Mail Preprocessor ...40
Introduction ...51
Proper names Custom G2P dictionary ...51
APPENDICES ... 53
Chapter I
German Text-To-Speech System
Userâ
s
Gui
de
f
or
German
German Text-To-Speech System
Introduction
This section provides operational instructions for the ScanSoft Text-To-Speech system for German. It reviews the functionality of the system, and describes how the user can customize the
pronunciation of input texts. This part also describes issues that are particular to the German Text-To-Speech system. It introduces the German phonetic alphabet and discusses a number of the language-specific features of the German Text-To-Speech system.
Preparing a text for Text-To-Speech
In general, there are four ways to intervene in the pronunciation of text:
ï· By using control sequences
ï· By entering phonetic input
ï· By using a user dictionary or a user ruleset
ï· Byusing oneofthesupported APIâs
Thesemechanismsaredescribed in theProgrammerâsGuide.
In this part, however, the specifications for German are fully described.
Native Character Set
The native character set of the German TTS system is Windows-1252 which has the printable characters in the ASCII range 1-127 as a subset. Note that TTS input encoded in another supported character set is converted to the native character set for that language before it is processed internally. Consequently, input must be representable in the native character set even if it is encoded in another character set supported by the API.
Using Control Sequences
For a description of the various supported markup languages
(independent from the language), refer to theProgrammer's Guide. Remark:<ESC> representsthe escape characterâ\x1Bâ
Below, you find a quick reference table for the RealSpeak native control sequences for German. The language-specific support for the
SSML markup languageisdescribed in theâSSML Preprocessorâ
Quick Reference of the RealSpeak native Control Sequences for German Sequence Description Range Default Delimiter
Volume (x : 0 .. 100) 0 = silence 10 = low 100 = high 80 No <ESC> \vol=x\ For example:
<ESC>\vol=10\ Ich kann sehr leise sprechen, <ESC>\vol=90\ aber auch sehr laut.
Speech Rate (x : 1 .. 100) 10 = slow 100 = fast 50 No <ESC> \rate=x\ For example:
Ich kann <ESC>\rate=70\ sehr schnell sprechen <ESC>\rate=20\ oder aber ganz langsam. Word per
minute (xxx: 1..1000)
Voice-specific (see
subsequent table) Voice-specific No
<ESC> \rate_wpm= xxx\
For example:
Ich kann <ESC>\rate_wpm=350\ sehr schnell sprechen <ESC>\rate_wpm=110\ oder aber ganz langsam. Read mode;
some read modes are not supported in e-mail mode x = 0..3: 0 = character-by-character 1 = word-by-word (not supported in e-mail mode) 2 = sentence-by-sentence 3 = line-by-line (not supported in e-mail mode) 2 Yes <ESC>Mx For example: <ESC>M0 Demonstration
(The word "Demonstration" will be spelled.) <ESC>M1 Dies ist eine Demonstration. (This sentence will be read word by word.) <ESC>M2 Dies ist eine Demonstration. (This sentence will be read as one sentence.) Wait Period
<ESC>Wx
For example: Sie hören jetzt <ESC>W5
x= 0â¦9
0 = no wait period 1 = 200 ms
Sequence Description Range Default Delimiter Long Pause <ESC> \Pause=xxx \ For example: Man kann die Dauer der Pause ganz genau <ESC>\pause =1500\ definieren. 1 ..65535 msec No Sentence Accent No <ESC>" For example:
<ESC>"Jakob hat gestern angerufen. (nicht Peter)
Jakob hat <ESC>"gestern angerufen. (nicht heute)
Note:
Manually inserted sentence accents may have no effect in RealSpeak. The RealSpeak synthesis module may indeed have reasons to override the requested sentence accent, and thus not realize it.
Continuation No
<ESC>C
For example: Der 50. Besucher.
Der 50. <ESC>C Besucher.
In the first of the above examples, the text-to-speech system will detect an end-of-sentence after 50 and will inappropriately split the input into two separate sentences. In the second example, a continuation sequence is inserted in order to make the system pronounce the entire input as one sentence.
End-of-Message Yes
<ESC>E
For example:
Dies ist der erste Satz <ESC>E und dies ist der zweite. In the above example, the sequence <ESC>E forces the system to pronounce the two halves of the input separately.
Phonetic Input (L&H+ phonetic alphabet) No <ESC>/+ For example: <ESC>/+âStat<ESC>/+
Sequence Description Range Default Delimiter
Preprocessing
Mode text = standard textmode email = e-mail mode
Yes
<ESC>%x
For example:
<ESC>%text Sie hören eine Nachricht von Hans Kessler. <ESC>%email Von: [email protected] (Hans Kessler) Guide text normalization; limited support in e-mail mode s = string: address=address mode (not supported in e-mail mode) normal=standard mode spell=spell mode The text normalization types corresponding with the SSML <say-as> types are also supported in standard text mode (not in e-mail mode), see the
âSSML Preprocessorâ
chapter for more details.
Normal No
<ESC> \tn=s\
For example:
<ESC>\tn=address\ Prof. Dr. Meier, GH Bonn, Gebäude III, 2. St., PF 360, Bonn <ESC>\tn=normal\ Dr. GH Meier
Reset to Default Yes
<ESC>F
For example:
<ESC>\vol=10\ Dies ist der niedrigste Wert der Lautstärke. <ESC>F Und dies ist der normale Wert.
<ESC>\rate=90\ Dies ist der höchste Wert der
Geschwindigkeit. <ESC>F Und dies ist der normale Wert.
<ESC>@c Declare the part-of-speech (not supported in German) c = character No <ESC>\do main=s\ Enable the extension (only if a custom g2p has been loaded)
s = string: the name
Sequence Description Range Default Delimiter <ESC>\voic
e=s\
Set the voice (if there is more than 1 voice is available)
s = string: the name
of the voice Yes
<ESC>\mrk =n\
Insert a bookmark For example: Hallo, ich bin
\mrk=2000\ Steffi n = 0.. 2147483647 No <ESC>\p\ Insert a paragraph boundary For example: Herr Krueger \p\ Kirchstrasse 12 \p\ Berlin Yes <ESC>\aud io="s"\ Insert an audio file; not supported in e-mail mode s = string: the URI of a document with an appropriate MIME type Yes
ï±
NOTE
The speech rate is language, gender and technology dependent. It can be set in 9 discrete steps. The values given here are in words per minute.
Rate
level Steffi RealSpeakRate (wpm)
1 40 2 70 3 100 4 120 5 140 6 220 7 300 8 380 9 470
Entering phonetic input
How to proceed
To switch from orthographic to phonetic mode, insert <ESC>/+ to use the L&H+ phonetic alphabet. The phonetic input mode remains active until the command is explicitly reset by entering <ESC>/+ again.
The phonetic input string is composed of symbols of the L&H+ phonetic alphabet (see phonetic table below). Examples are given below in the phonetic table.
In addition to the phonetic symbols, it is advised to use the following characters in the phonetic input string:
Special characters L&H +
Symbol Meaning As in:
' (ASCII 39,
Hex 27)
primary word stress <ESC>/+ mo:.'dERn <ESC>/+ (Adjektiv 'modern')
vs.
<ESC>/+ 'mo:.d$Rn <ESC>/+ (Verb 'modern')
'2 secondary word stress <ESC>/+ 'tE.nIs.'2Spi:.l$R <ESC>/+
(Tennisspieler) "
(ASCII 34, Hex 22)
sentence accent <ESC>/+Es_gIbt_"tsva&i_?ak."t&s En.t$_?In_'di:.z$m_ 'zats <ESC>/+ (Es gibt ZWEI AKZENTE in diesem Satz.)
. syllable boundary <ESC>/+ 'bu:x.Sta:.b$ <ESC>/+ (Buchstabe)
# silence (pause) <ESC>/+
?ER_"fRa:k.t$_#_vi:_"ge:t_?Es <ESC>/+
(Er fragte: wie geht es?)
Note that the use of punctuation marks remains useful within phonetic input to assure a correct intonation. Each punctuation mark needs to be preceded by an asterisk.
For example:
<ESC>/+"vIl.kO.m$n*,_"kO.m$n_zi:_hE."Ra&in*. <ESC>/+ (Willkommen, kommen Sie herein.)
Punctuation Marks
L&H+ Symbol Meaning
- Word delimiter *. End of declarative *, Comma *! End of exclamation *? End of question *; Semicolon *: Colon
Lexical stress and sentence accents in phonetic input
In phonetic input strings, lexical stress and sentence accents can be
indicated manuallybytheuser,byusing asinglequote(â)ordouble quote(â)respectively.
Note that manually inserted lexical stress or sentence accents may have no effect in RealSpeak. The RealSpeak synthesis module may indeed have reasons to override the requested stress/accent and thus not realize it.
1. The Text-To-Speech system will automatically convert all lexical stress marks into sentence accents in case no manually added sentence accents are found in the phonetic input string. Example: <ESC>/+IC_'ha:.b$_zi:_'hO&y.t$_nOx_g$.'SpRO.x$n*.<ES C>/+ is the same as <ESC>/+IC_"ha:.b$_zi:_"hO&y.t$_nOx_g$."SpRO.x$n*. <ESC>/+
(Ich habe sie heute noch gesprochen.)
2. If phonetic input contains at least one manually addedsentence accent, no additional sentence accents are assigned by the Text-To-Speech system. Therefore, only those words marked with " will get a sentence accent. As a consequence, a message containing only one manual sentence accent will have an almost flat intonation on all the other words.
Example:
<ESC>/+IC_'ha:.b$_zi:_"hO&y.t$_nOx_g$.'SpRO.x$n*. <ESC>/+
(Only one sentence accent will be realized.)
3. Phonetic input can also be combined with orthographic input. If no sentence accents are found in the input text (indicated by <ESC>" in orthographic input, or by " in phonetic input), the Text-To-Speech system will automatically assign sentence accents. In the orthographic part of the input, the
Text-To-Speech system will realize these sentence accents on the basis of part-of-speech and syntactic information. In the phonetic part of the input, all lexical stress marks (if any) will be converted into sentence accents. If there are no lexical stress marks, no sentence accent will be realized for the phonetic part of the input (see point 1 above).
If the user has manually specified one or more sentence accents, no additional sentence accents will be realized (see point 2 above).
For example:
Er hat heute noch mit <ESC>/+'klIn.t$n <ESC>/+ gesprochen.
(No sentence accents are found; the Text-To-Speech system will automatically assign sentence accents.)
Er hat heute noch mit <ESC>/+"klIn.t$n <ESC>/+ gesprochen.
(A sentence accent is specified in the phonetic part of the input text. No additional sentence accents will be realized.)
Er hat <ESC>"heute noch mit <ESC>/+'klIn.t$n <ESC>/+ gesprochen.
(A sentence accent is specified in the orthographic part of the input text. No additional sentence accents will be realized. Hence, the lexical stress that is specified in the phonetic part will NOT be converted into a sentence accent.)
The German L&H+ & UNIPA Phonetic Alphabet Vowels and Diphthongs L&H+
Symbol TranscriptionL&H+ UNIPASymbol TranscriptionUNIPA As in:
a âStat a âStat Stadt a: âva:.g$n a: âva:.g$n Wagen E âlEts.t$ E âlEts.t$ Letzte e: âke:.l$ e: âke:.l$ Kehle I âmIlC I âmIlC Milch i: âRi:.z$ i: âRi:.z$ Riese O âfOl O âfOl voll o: âgRo:s o: âgRo:s groà U âkUnst U âkUnst Kunst u: âfu:s u: âfu:s Fuà Y âkYs.t$ Y âkYs.t$ Küste y: âgRy:n y: âgRy:n grün E+ âlE+.S$n E= âlE=.S$n löschen e+ âSe+n e= âSe=n schön E: 'fE:.R$ E: 'fE:.R$ Fähre $ âtas.t$ $ âtas.t$ Taste
a&u 'ba&um a+u 'ba+um Baum
O&y 'hO&y.t$ O+y 'hO+y.t$ heute
a&i 'ta&il a+i 'ta+il Teil
A%~ REs.to:.'RA%~ A%~ REs.to:.'RA%~ Restaurant
O%~ bal.'kO%~ O%~ bal.'kO%~ Balkon
E%~ 'tE%~ E%~ 'tE%~ Teint
Consonants L&H+
Symbol TranscriptionL&H+ UNIPASymbol TranscriptionUNIPA As in:
p 'pOst p 'pOst Post
b 'ba&in b 'ba+in Bein
t 'tIn.t$ t 'tIn.t$ Tinte
d 'dIC d 'dIC dich
k 'kla&in k 'kla+in klein
g 'li:.g$n g 'li:.g$n liegen
f 'fElt f 'fElt Feld
v 'vax v 'vax wach
s 'fEls s 'fEls Fels
S 'Sne: S 'Sne: Schnee
z 'za:l z 'za:l Saal
Z ZUR. 'na:l Z ZUR. 'na:l Journal
C 'mIlC C 'mIlC Milch
x 'bax x 'bax Bach
h 'hant h 'hant Hand
j 'je:.mant j 'je:.mant jemand
l 'lICt l 'lICt Licht
R 'Ra&i.z$ R 'Ra+i.z$ Reise
m 'man m 'man Mann
n 'nOR.d$n n 'nOR.d$n Norden
nK 'RInK nK 'RInK Ring
? b$.'?ax.t$n ? b$.'?ax.t$n beachten
t&s 't&su:k t+s 't+su:k Zug
p&f 'p&fe:Rt p+f 'p+fe:Rt Pferd
t&S 't&SIl.p$n t+S 't+Sil.p$n tschilpen
ï±
NOTE
Note that the L&H+alphabet is not SSML compliant. For SSML, use the UNIPA alphabet.
Using a User Dictionary
For information on how to create and use user dictionaries, please refer totheâUserConfigurationâ chapterin the RealSpeak Telecom
ProgrammerâsGuide.
Using the Microsoft SAPI5 Lexicon
Microsoft SAPI5 provides lexicons so that users and applications can specify pronunciation and part of speech information for particular words. As such, all SAPI compliant Text-To-Speech engines should use these lexicons to guarantee uniformity of pronunciation and part of speech information.
There are two types of lexicons in SAPI: user lexicons and application lexicons.
User Lexicons
Each user who logs into a computer will have a User Lexicon. Initially, this lexicon is empty; words can be added either
programmatically, or by using an engine's add/remove words UI component (for example, the sample application Dictation Pad provides an Add/Remove Words dialog).
Application Lexicons
Applications can create and ship their own lexicons of specialized words. These lexicons are fixed and cannot be edited.
Detailed information on how to use the MS SAPI5 lexicons can be
found in themanualâMicrosoftSpeech SDK V5.1â,chapter âISpLexicon Interfaceâ.
ï±
NOTE
Note that the Microsoft Speech SDK V5.1 only provides a phoneme set for American English.
The German SAPI5 phoneme set mentioned below has been developed by ScanSoft, based on the symbols available for American English. The phoneme list below is therefore not to be considered as an official phoneme set defined by Microsoft SAPI5.
SAPI5 Symbols
SAPI Symbol PhoneID Example SAPI Transcription
A 13 Satz Z A TS A : 13 12 Tat T A : T AW 14 Haus H AW S AX 15 bitte B IH T AX AX 15 besser B EH S AX R AY 16 Eis AY S EH 20 Gesetz G AX Z EH TS EY 22 Beet, spät B EY T , S P EY T EH : 20 12 (Buchstabe) à EH : OE 34 plötzlich P L OE TS L IH X EU 21 blöd B L EU T IH 26 Sitz Z IH TS IY 27 lieb L IY P OH 35 Trotz T R OH TS OW 36 Boot B OW T OY 37 Kreuz K R OY TS UY 48 hübsch H UY P SH UE 45 süss (süÃ) Z UE S UH 46 Schutz SH UH TS UW 47 Blut B L UW T B 17 Bein B AY N D 18 Deich, dank D AY X , D A NG K CH 19 deutsch D OY CH F 23 fast F A S T
SAPI5 Symbols
SAPI Symbol PhoneID Example SAPI Transcription
G 24 Gunst G UH N S T H 25 Hand H A N T JH 28 dschungel JH UH NG AX L K 29 Kunst K UH N S T L 30 Leim L AY M M 31 mein M AY N N 32 nein N AY N NG 33 Ding D IH NG P 38 Pein P AY N PF 39 Pfahl PF A : L R 40 Reim R AY M S 41 Tasse T A S AX SH 42 waschen V A SH AX N T 43 Teich T AY X TS 44 Zahl TS A : L V 49 was V A S X 50 sicher Z IH X AX R X 50 Buch B UW X Y 51 Jahr Y A : AX R Z 52 Hase H A : Z AX ZH 53 Genie ZH EY N IY IY AX R 27 15 Tier T IY AX R IH AX R 26 15 Wirt V IH AX R T UE AX R 45 15 Tür T UE AX R UY AX R 48 15 Türke T UY AX R K AX EY AX R 22 15 schwer SH W EY AX R EH AX R 20 15 Berg B EH AX R K EH : AX R 20 12 15 Bär B EH : AX R EU AX R 21 15 Föhr F EU AX R OE AX R 34 15 Wörter V OE AX T AX R A : AX R 13 12 15 Haar H A : AX R A AX R 13 15 hart H A AX R T UW AX R 47 15 Kur K UW AX R UH AX R 46 15 kurz K UH AX R TS OW AX R 36 15 Ohr OW AX R OH AX R 35 15 dort D OH AX R T
SAPI5 Symbols SAPI5
Symbol SAPI PhoneID Meaning SAPI Transcription
- 1 syllable boundary (hyphen) B IH - T AX ! 2 Sentence terminator (exclamationmark) B IH T AX !
& 3 word boundary AX RB IH T AX & B EH S , 4 Sentence terminator (comma) AX RB IH T AX , _ B EH S . 5 Sentence terminator (period) B IH T AX . ? 6 Sentence terminator (question mark) AY S ?
_ 7 Silence (underscore) AX RB IH T AX , _ B EH S 1 9 primary stress 1 B IH - T AX 2 10 secondary stress ~ 11 nasalization : 12 lengthen ^ 8 Verein F EH AX R ^ AY N
Notes on the German Text-To-Speech System
The German Text-To-Speech system has been designed in order to pronounce correctly any input written according to the rules of German orthography. The following cases, however, require special attention.
Cardinal Numbers
Cardinal numbers up to 15 digits are pronounced as full numbers. Periods may be used to separate groups of digits.
For example: 6230
or
6.230
Decimal Numbers
Decimal numbers may consist of up to 15 digits before or after the comma. Periods may be used to separate groups of digits in the string before the comma. The digits after the comma are pronounced one by one. For example: 9550,5 9.550,5
ï±
NOTE
Numerals that are normally pronounced as full numbers, can also be pronounced digit by digit by using the control sequence
Ordinal Numbers
A cardinal number smaller than 32 followed by a period is
pronounced as an ordinal number if it is not in a sentence initial or sentence final position.
For example: am 15. Mai 1998
Roman Numbers
Roman numbers smaller than 10 that cannot be interpreted as single letters are pronounced as full numbers (i.e. II, III, IV, VI, VII, VIII, IX).
For example: Abteilung IV Garzweiler II
Telephone Numbers
In order to ensure a correct pronunciation of telephone numbers, it is recommended to use parentheses to separate country code and/or area code from the remainder of the telephone number. Also, use spaces to separate groups of digits. Telephone numbers written in this format will always be pronounced in groups of two or three digits, with a pause at the place of the space.
For example: (041) 317 11 33 (03 35) 23 02 16
Bank Account Numbers
To have a bank account number correctly pronounced (in groups of 2 or 3 digits), use hyphens between groups of digits. To have the number pronounced digit by digit, switch to spell mode (<ESC>\tn=spell\).
LG Stuttgart 303-504-52 BLZ 900-563-78
Dates
Dates can be written as structured groups of digits, separated by periods or slashes.
Day (1 or 2 digits)/Month (1 or 2 digits)/Year (2 digits) Day (1 or 2 digits).Month (1 or 2 digits).Year (2 or 4 digits)
For example: 17/12/2003 9.9.2001 01.02.97 am 10.07.99
Time Indications
Time indications will be pronounced correctly when written in one of the following formats:
9:15 09:15 Uhr 4.00 Uhr 22.15 Uhr
Currencies
The German Text-To-Speech system correctly handles the German currency indication DM when written in one of the following ways:
DM 40 50 DM DM
10,-The Austrian, Swiss and American currencies are handled correctly:
19 Sfr 130 ÃS. 5 $
15 Franken 10 Gulden
Currencies up to 15 digits will be correctly pronounced. Periods may be used to separate groups of digits.
For example: 250.850.990 DM 250850990 DM
Decimal digits in combination with currency indications are also supported. Decimal currency amounts up to 15 digits will be pronounced correctly.
For example: 1999,50 DM 1.999,50 DM
Abbreviations and Acronyms
The German Text-To-Speech system contains a dictionary with the most common abbreviations and acronyms, such as:
bzw.: beziehungsweise MwSt.: Mehrwertsteuer usw.: und so weiter
Some abbreviations are not case-sensitive: uppercase and lowercase are both accepted. Examples of case-sensitive abbreviations are:
Hrsg.: Herausgeber hrsg.: Herausgegeben So.: Sonntag
Abbreviations that are NOT in the dictionary:
ï· will be spelled if they consist of consonants only (with or without punctuation)
ï· will be spelled if the abbreviation contains one or more vowels, separated by periods
ï· will be spoken as full words if the abbreviation contains one or more vowels and is not separated by periods
Chapter II
E-Mail Preprocessor
Userâ
s
Gui
de
f
or
German
E-Mail Preprocessor
Introduction
The ScanSoft e-mail preprocessor (EMPP) is developed to analyze a specific type of text, namely e-mail messages. E-mail messages differ from any average type of text in both their structure and contents. An e-mail message consists of two clearly distinguishable parts: the header and the body. A substantial part of the header contains routing and administrative information, which is irrelevant to the user. Both the header and the body contain all kinds of e-mail specific text features, e.g. e-mail addresses, emoticons such as smileys, etc. Furthermore, informal writing is often combined with a lack of grammatical conventions. Spelling rules are frequently violated, punctuation is often omitted, etc.
Although the standard ScanSoft Text-To-Speech system can handle special text items (abbreviations, numbers, dates, etc.), it is not capable of correctly handling all e-mail specific text features. These text features are therefore dealt with by the e-mail preprocessor. The EMPP transforms e-mail specific information into a format that complies with the rules of the standard ScanSoft Text-To-Speech system. The EMPP is a plug-in preprocessing module of the ScanSoft Text-To-Speech system. It replaces the preprocessor of the standard Text-To-Speech system.
In the following sections you will find a description of the functioning of the ScanSoft e-mail preprocessor as well as an overview of its features.
The e-mail preprocessor has two main tasks: processing of the e-mail header and processing of the body of the e-mail message.
The input to the EMPP consists of one or more e-mail messages. In order to process the e-mail header, the EMPP extracts relevant header fields and then provides an intelligent header field reading.
During the processing of the e-mail body, the text is divided into smaller text units, called text-to-speech messages, which are synthesized by the Text-To-Speech system. Text normalization is applied to e-mail specific text features such as e-mail addresses, proper names, emoticons, URLs (Universal Resource Locators), etc. For the text normalization of an e-mail message, the ScanSoft EMPP applies linguistic rules and performs dictionary look-up, in order to yield an adequate phonetic transcription. The EMPP also supports the ScanSoft user dictionary mechanism, which allows the user to customize the output of the e-mail processing.
E-Mail Header Processing
Header Field Extraction
An e-mail message consists of two clearly distinguishable parts: the header and the body. The EMPP detects the header and extracts the relevant header fields. Information that is of no interest to the user (such as routing information) is not retained.
The EMPP extracts the following header fields:
From Field Containsthesenderâsnameand/oraddress
Date Field Contains the date and time of sending Subject Field Optionally contains the subject of the e-mail The extraction of the header fields is based on the detection of specific keywords in the e-mail header. The supported keywords for the extraction of the header fields are listed below:
From Field From: Author: Sender: De: Von: Date Field Date:
Enviado: Gesendet: Subject Field: Subject:
Subj: Asunto: Betreff:
The following is an example of header field extraction. The original header holds information that is irrelevant to the user. After extraction of date, sender and subject, the processed header merely mentions the Date field, the From field and the Subject field.:
Original header: Path: news.be.innet.net!INbe.net!news.nl.innet.net!INnl.net!hunter.prem ier.net!www.nntp.primenet.com!nntp.primenet.com!feed1.news.erol s.com!howland.erols.net!news.sprintlink.net!news - peer.sprintlink.net!uunet!in3.uu.net!01-newsfeed.univie.ac.at!02-newsfeed.univie.ac.at!news.ecrc.de!news00.btx.dtag.de!not -for-mail
From: [email protected] (Ulrike Noska) Newsgroups: de.etc.sprache.deutsch
Subject: Re: Neue Wörter Date: 19 Oct 1996 20:08:38 GMT
Organization: Telekom Online Internet Gateway Lines: 10
Message-ID: <[email protected]> Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit
X-Sender: [email protected] (Ulrike Noska) X-Mailer: Mozilla 2.01DT [de] (Win16; I)
Extracted header fields:
From: [email protected] (Ulrike Noska) Subject: Re: Neue Wörter
Header Field Reading
After the header fields have been extracted, they are processed by the EMPP. The header field keywords (see above) are replaced by an introductory message. The remainder of the header fields is processed by the EMPP in order to allow the Text-To-Speech system to
intelligently read the fields. From Field
TheFromfield keyword is replaced by the introductory message âNachrichtvon:â.
For example: Author: Thomas Kärtner
is pronounced:
Nachricht von: Thomas Kärtner
The remainder of the From field is further processed by the EMPP. The EMPP supports From fields that either consist of
a) a proper name
b) a proper name and an address c) an address
a) - b) In case theFromfield contains a proper name, this name and only this name is sent to the Text-To-Speech system. This means that if both a name and an address are found in the From field, the address will not be read by the Text-To-Speech system.
For example:
From: Alex Wunneburger From: "Jo Hans Kessler"
From: Peter Kuppelwieser at IepXchgPO From: Udo Kohl/LHS/IEP/BE
are pronounced:
Nachricht von: Alex Wunneburger Nachricht von: Jo Hans Kessler Nachricht von: Peter Kuppelwieser Nachricht von: Udo Kohl Nachricht von: Heiko Felsmann
c) In case theFromfield contains only an address, the EMPP extracts the name out of the address and expands the domain that is contained in the address. In other words, the e-mail address is not read literally.
For example:
Author: [email protected] Author: wien.at!kunze
are pronounced:
Nachricht von: thomas b at Bundestag punkt d e Nachricht von: kunze at Wien punkt a t Date Field
TheDatefield keyword is replaced by the introductory message âDatum:â
TheDatefield contains the date and time of sending. The EMPP supports multiple date and time formats, which are transformed to a uniform format that complies with the rules for date and time indications of the ScanSoft Text-To-Speech system. The EMPP only pronounces the date.
The EMPP supports dates in the following formats:
For example: Date: 12/23/2002 21:15 PM Date: 13 Mar 1996 06:50 AM are pronounced: Datum: 23. Dezember 2002 Datum: 13. März 1996
Subject Field
TheSubjectfield keyword is replaced by the introductory message
âBetreff:â.
TheSubjectfield can contain all kinds of data, but may also be empty. The EMPP searches for keywords that are typical of the subject field (e.g. RE, FYI, FW).
For example:
Subject: RE: Wer holt mich ab?
Subject: FYI: Stundenplan fuer die naechsten Monate Subject: FW: Ich habe es gefunden!!!!
are pronounced:
Betreff: Antwort auf: Wer holt mich ab?
Betreff: zu Ihrem Interesse: Stundenplan fuer die naechsten Monate Betreff: weitergeleitete Nachricht: Ich habe es gefunden!!!!
E-Mail body processing
Message Extraction
The e-mail preprocessor splits the body of the e-mail message into text-to-speech messages. This is done on the basis of a number of criteria, such as punctuation, capitalization, layout, intelligent abbreviation handling, etc.
The following examples illustrate some criteria for splitting the e-mail text into text-to-speech messages:
ï· Using sentence final punctuation and capital letters
Schreiben Sie mir bitte, wenn Sie noch Fragen haben. Meine E-mail-Adresse ist [email protected]
ï· Using layout
Themen der Versammlung: 1) Neue Kollegen
2) Organisation des Umzugs 3) Colloquium in Berlin
ï· Using intelligent abbreviation handling
Text Normalization
An e-mail message typically contains e-mail specific text features, such as e-mail addresses, URLs, file names, emoticons, etc. The EMPP transforms these e-mail specific features into a format that complies with the rules of the standard text normalization of the ScanSoft Text-To-Speech system.
The following are examples of e-mail specific text normalization:
ï· Support for multiple e-mail address formats
ï· Support for URLs (Universal Resource Locators)
http://www.norwegen.org
http://www.chemie.fu-berlin.de/adressen gopher://viertel.com/1
ï· Support for file names ldb001.tse
sysinfo.exe lipedu.xls
ï· Processing of emoticons
:-) is pronounced : haha
:-x is pronounced : Nicht weitersagen! :-( is pronounced :och
ï· Processing of overuse of punctuation
PASS AUF!!!!!!!: INTERNET VIRUS!!!!!!!!
Ich habe immer noch Probleme! #%&#@$<! Kannst Du mir helfen?
becomes:
PASS AUF! INTERNET VIRUS!
ï· Normalization of lay-out lines (e.g. part of an e-mail signature); not active when in spell mode.
These sequences of identical characters are not pronounced:
o 10 or more identical digits
o a word consisting of 5 or more identical US-ASCII
encoded letters of the modern Latin alphabet o a sequence of 3 or more identical US-ASCII
characters that are no letters, no digits, no sentence-final punctuation marks (.?!) and no white spaces; e.g. '&', '#', '%', '*', '-'
For example:
oooooooooooooooooooooooooooooo ---will be removed.
ï· Processing of Question/Answer (FAQ)
Q. Ich habe eine neue E-mail-Adresse. Wie kann ich meine EcoLink mails weiter empfangen?
A. Schick einfach eine E-mail mit Deiner alten und Deiner neuen Adresse an [email protected].
becomes: Frage:
Ich habe eine neue E-mail-Adresse. Wie kann ich meine EcoLink mails weiter empfangen?
Antwort:
Schick einfach eine E-mail mit Deiner alten und Deiner neuen Adresse an [email protected].
ï· Processing of inserted mail
Ralf> Na, ich hab' mir 'ne neue Stereoanlage Ralf>gekauft und bin ziemlich
Ralf> enttäuscht. Die klingt zwar nicht schlecht, Ralf> aber die
Lautstärke-Ralf> regelung wirkt nicht. Ich weiss nicht, was Ralf>ich damit machen soll.
Heide> Bring sie gleich zurück und tausch sie um!
becomes:
Ralf:
Na, ich hab' mir 'ne neue Stereoanlage gekauft und bin ziemlich enttäuscht. Die klingt zwar nicht schlecht, aber die Lautstärke-regelung wirkt nicht. Ich weiss nicht, was ich damit machen soll.
Heide:
Language Specific Text Normalization
Umlaut
The L&H E-mail preprocessor for German is able to convert umlauted characters. In e-mail messages, umlauts are written in several ways:
über ueber gelähmt
All above-mentioned ways of writing the umlaut are supported by the L&H e-mail preprocessor.
English Words
Since e-mail is an international medium, German e-mail messages will inevitably contain a lot of English words, which might refer to Internet, electronic mail or soft- and hardware. The typical e-mail jargon is handled by the exceptions dictionary of the e-mail
preprocessor. This dictionary is a lexicon for e-mail terminology and provides the Text-To-Speech system with an adequate German transcription for a number of English words.
asciitext /+'as.ki:.tEkst attachment /+$.'tEt&S.m$nt banner /+'bE.n$R forwarded /+'fo:R.vaR.d$t frame /+'fRe:m hypertext /+'ha&i.p$R.tEkst iexplorer /+'?In.t$R.nEt.Eks.'plo:R$R image /+'?I.m$t&S
Customizing the E-Mail Preprocessor
The e-mail preprocessor supports the standard ScanSoft Text-To-Speech SDK user dictionary mechanism, which allows the user to customize the output of the e-mail preprocessor. The user dictionary is consulted both during the header processing and the body
processing.
For more information on how to build and use user dictionaries, see theâUserConfigurationâ chapterof theProgrammerâsGuide.
Customizing the E-Mail Header
The user dictionary is consulted during the header processing while reading the From field and the Subject field.
From Field
TheFromfield either consists of a) a proper name
b) a proper name and an address c) an address
a) In case theFromfield contains only a proper name, the name is passed to the user dictionary. If the lookup is successful, the proper name is substituted by the replacement string. If not, the name is further processed by the header reading module.
For example:
If the user dictionary contains the following line:
Johann /+'jo:.han
the following From field:
From: Johann Strauss
Becomes:
b) In case the From field contains a proper name and an address, the EMPP first passes the address to the user dictionary. If the lookup is successful, both the proper name and the address are substituted by the replacement string. If not, the EMPP passes the proper name to the user dictionary. If this lookup is successful, the name and the address are substituted by the replacement string. If not, the name is further processed by the header reading module. The address will not be read by the Text-To-Speech system.
For example:
If the user dictionary contains the following lines: [email protected], vom Tennis
Heinz Mein bester Freund Schreve /+ 'sCRe:.v$
the following From fields:
From: "Alex Van Schreve" <[email protected]> Author: [email protected] (Heinz)
From: [email protected] (P. Schmidt)
become:
Nachricht von: Alex Van */+ 'sCRe:.v$*/+ Nachricht von: Mein bester Freund
Nachricht von: Peter, vom Tennis
c) In case theFromfield contains only an address, the complete address is looked up in the user dictionary. If the lookup is successful, a proper name is added to the From field. If not, only the domain part is sent to the user dictionary. The EMPP first calls the dictionary for the complete domain part. If the lookup is successful, the
complete domain part is substituted by the replacement string. Otherwise, the EMPP cuts off the leftmost sublevel domain and repeats the lookup and matching procedures for the remainder of the domain part. If the lookup is successful, the remainder of the domain part is substituted by the replacement string. This procedure is repeated until the top level domain is encountered. If none of the lookups is successful, the address is further processed by the header reading module.
For example:
If the e-mail user dictionary contains the following lines:
[email protected] Mein bester Freund postbank.de Deutsche Postbank AG
the following From fields:
Sender: [email protected] From: [email protected]
become:
Nachricht von: Mein bester Freund
Nachricht von: anja gihl at Deutsche Postbank AG
ï±
NOTE
To allow a correct processing of the From field, the replacement string in the user dictionary should not contain an address or a domain.
Subject Field
Every word in theSubjectfield is sent to the user dictionary. If the lookup is successful, the replacement string is sent directly to the Text-To-Speech system. If not, the Subject field is further processed by the header reading module.
For example:
If the user dictionary contains the following lines:
ECAI /+ ?e:.tse:.?a:.'?i: IDT I D T
the following Subject fields:
Subject: ECAI Versammlung am Donnerstag Subject: Windows und IDT
are pronounced:
Betreff: */+ ?e:.tse:.?a:.'?i:*/+ Versammlung am Donnerstag Betreff: Windows und I D T
Customizing the E-Mail Body
When the user dictionary has been loaded, the EMPP will call the dictionary for every word of the e-mail body. If the word is found in the user dictionary, it is substituted by the replacement string. If not, the body is further processed by the e-mail body processing module.
For example:
If the user dictionary contains the following line:
IH : Iris Heller
The word "IH" in the following sentence:
Ich bleibe hier und IH wird die Kunden besuchen.
is replaced by the corresponding string found in the e-mail user dictionary:
Chapter III
SSML Preprocessor
Userâ
s
Gui
de
f
or
German
SSML Preprocessor
Introduction
SSML (Speech Synthesizer Markup Language) is part of a set of markup specifications by the W3C for voice browsers.
General information regarding the RealSpeak SSML processor can be found in theSSML Supportchapter of theProgrammerâs Guide. The RealSpeak Telecom SDK provides a built-in preprocessor that supports a large portion of the SSML 1.0 September 2004
Recommendation (REC). Moreover RealSpeak extends SSML with a number of Scansoft specific elements/attributes.
The setsupported byScansoftiscalled âScanSoftSSMLâ(4SML).
The section below describes language-specific SSML support
included in the RealSpeak Telecom V4.0âGerman language version.
German specific SSML markup
XML encoding types for German
The encoding is specified in the XML text declaration
("<?xml⦠?>") by the encoding declaration which is of the form encoding="<EncodingName>".
E.g. <?xml version="1.0" encoding="UTF-8"?> RealSpeak Telecom V4.0âGerman supports:
ï· âWindows-1252â and âISO-8859-1â (ISO Latin1)
ï· TheUnicodeencoding âUTF-8â,âUTF-16â and âUCS-4â
(Note that the alias "ISO-10646-UCS-4" is not supported)
ï· Any coding character set supported by the ICU component as long as the input text only contains characters that can be transcoded to the native coded character set, being
âWindows-1252â.For more information about the character sets supported by ICU, take a look at the ICU website
http://www-306.ibm.com/software/globalization/icu
NOTE
Encoding names are parsed case-insensitive; hyphens and underscores are ignored
4SML Specifics for German
For reasonsofcompatibilitywith theâstandardâGerman system, the parallel text control sequence (<esc> sequence) is listed where applicable. As such, a similar TTS behavior can be createdâor combinedâwith non-SSML driven text input.
4SML Tags Comment Corresponding control sequence High-level and document structure tags
xml:lang Supported âde-DEâ for German. Attribute of speak, paragraph, sentence and voice.
Text normalization tags
<say-as
interpret-as=âxxxâ> Supported; limitedsupport in e-mail mode. In e-mail mode the only supported interpret-asvalueisâspellâ. <say-as interpret-as=ânumberâ format=âcardinalâ> Supported <esc>\tn=number_cardinal\ <say-as interpret-as=ânumberâ format=âdigitsâ>
Supported <esc>\ tn=number_digits\
<say-as
interpret-as=ânumberâ format=âdecimalâ>
Supported <esc>\ tn=number_decimal\
<say-as
interpret-as=ânumberâ> Supported <esc>\ tn=number\
<say-as
interpret-as=ânumberâ format=âordinalâ>
Supported <esc>\ tn=number_ordinal\
4SML Tags Comment Corresponding control sequence <say-as interpret-as=ânumberâ format=âtelephoneâ detail= âpunctuationâ> Supported <esc>\ tn=number_telephone_punctuation \ <say-as
interpret-as=âordinal>â Supported <esc>\ tn=ordinal\
<say-as
interpret-as=âacronymâ> Supported <esc>\ tn=acronym\
<say-as
interpret-as=âacronymâ detail=âstrictâ>
Supported <esc>\ tn=acronym_strict\
<say-as
interpret-as=âmeasureâ> Supported <esc>\ tn=measure\
<say-as
interpret-as=âlettersâ> Supported <esc>\ tn=letters\
<say-as interpret-as=âlettersâ detail=âstrictâ>
Supported <esc>\ tn=letters_strict\
<say-as
interpret-as=âwordsâ> Supported <esc>\ tn=words\
<say-as
interpret-as=âdateâ> Supported <esc>\ tn=date\
<say-as interpret-as=âdateâ format=âmdyâ>
Supported <esc>\ tn=date_mdy\
<say-as
interpret-as=âdateâ format =âdmyâ>
Supported <esc>\ tn=date_dmy\
<say-as
interpret-as=âdateâ format=âymdâ>
Supported <esc>\ tn=date_ymd\
<say-as
interpret-as=âdateâ format=âymâ> Supported <esc>\ tn=date_ym\ <say-as interpret-asâdateâ
format=âmyâ> Supported <esc>\ tn=date_my\
<say-as interpret-as=âdateâ format=âdmâ>
Supported <esc>\ tn=date_dm\
<say-as
4SML Tags Comment Corresponding control sequence
<say-as
interpret-as=âdateâ format=âmâ> Supported <esc>\ tn=date_m\
<say-as
interpret-as=âdateâ format=âdâ> Supported <esc>\ tn=date_d\
<say-as
interpret-as=âtimeâ> Supported <esc>\ tn=time\
<say-as
interpret-as=âtimeâ format=âhâ> Supported <esc>\ tn=time_h\
<say-as interpret-as=âtimeâ format=âhmâ> Supported <esc>\tn=time_hm\ <say-as interpret as=âtimeâ format=âhmsâ>
Supported <esc>\ tn=time_hms\
<say-as interpret-as=âdurationâ format=âhmsâ>
Supported <esc>\ tn=duration_hms\
<say-as interpret-as=âdurationâ format=âhmâ>
Supported <esc>\ tn=duration_hm\
<say-as interpret-as=âdurationâ format=âmsâ>
Supported <esc>\ tn=duration_ms\
<say-as interpret-as=âdurationâ format=âhâ>
Supported <esc>\ tn=duration_h\
<say-as interpret-as=âdurationâ format=âmâ>
Supported <esc>\ tn=duration_m\
<say-as interpret-as=âdurationâ format=âsâ>
Supported <esc>\ tn=duration_s\
<say-as
interpret-as=âdurationâ> Supported <esc>\ tn=duration\
<say-as
interpret-as=âcurrencyâ> Supported <esc>\ tn=currency\
<say-as
interpret-as=âtelephoneâ> Supported <esc>\ tn=telephone\
<say-as interpret-as=âtelephoneâ detail=âpunctuationâ>
4SML Tags Comment Corresponding control sequence
<say-as
interpret-as=âspellâ> Supported <esc>\ tn=spell\
<say-as
interpret-as=ânameâ> Supported <esc>\ tn=name\
<say-as interpret-as=ânetâ format=âemailâ>
Supported <esc>\ tn=net_email\
<say-as
interpret-as=ânetâ format=âuriâ> Supported <esc>\ tn=net_uri\
<say-as
interpret-as=ânetâ> Supported <esc>\ tn=net\
Pronunciation tags
<phoneme
alphabet=âunipaâ> SeSupportede section âthe
German L&H+
and UNIPA
phonetic alphabetsâ
for an overview of the alphabet.
Chapter IV
Custom G2P Dictionaries
Userâ
s
Gui
de
f
or
German
Custom G2P Dictionaries
Introduction
ScanSoft's RealSpeak system now offers support for custom G2P dictionaries. A custom G2P dictionary module is an add-on module specifically designed to improve the quality of pronunciation for specific kinds of words.
One example of a custom G2P dictionary module currently available
from Scansoftistheâpropernames.â module,which isdescribed
below. Check with Scansoft for the availability of other custom G2P dictionary modules.
Proper names Custom G2P dictionary
The standard German RealSpeak system correctly pronounces a number of common proper names. However, given the complexity of the grapheme to phoneme conversion for proper names, a dedicated module is used to guarantee the same quality for proper names as for common words.
Theâpropernamesâ modulecontainsalexicon ofaround 85K proper
names (including company/brand names, countries, cities, and first/last names of people).
Theâpropernamesâ customG2P module is dynamically enabled/disabled by using the 4SML tags <ssft=domain
type=âpropernamesâ>,or<say-astype=ânameâ>,ortheScansoft
escape sequences. Example:
Mr. VanderHoff went to Abbeville, and also to Bois-de-Boulogne.
<ssft=domain type=âpropernamesâ> Mr.VanderHoffwentto
Abbeville, and also to Bois-de-Boulogne </ssft=domain>. Mr. VanderHoff went to Abbeville, and also to <say-as
Appendices
Userâ
s
Gui
de
f
or
German
Appendices
Appendix A: German voice names
The RealSpeak Telecom Text-To-Speech system now supports selecting the voice and language via a string as well as a define (please see the definition for the functionTtsInitializeEx()in the
Programmers Guideand also theBackwards Compatibility Guidefor details). The name strings for the currently supported German voices are listed in the table below.
Japanese Voice Name Strings Voice Name String
Steffi âSteffiâ