• No results found

Efficient diphone database creation for MBROLA, a multilingual speech synthesiser

N/A
N/A
Protected

Academic year: 2021

Share "Efficient diphone database creation for MBROLA, a multilingual speech synthesiser"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

Efficient diphone database creation

for MBROLA,

a multilingual speech synthesiser

Jolanta Bachan

Institute of Linguistics

Adam Mickiewicz University Poznań

OWD 2010

(2)

Why MBROLA?

● useful for testing

speech models in linguistic work

● easy manipulation of

duration and pitch values

● easy to create new

synthetic voices

● Recently used for:

● expressive speech ● dialogue synthesis ● voice quality

● underresourced

languages

● large speech corpora

(3)

2010-10-24 Efficient diphone database creation for 3

Ph.D. thesis context

● to model different speech styles which will align

with the speaker in a consultation situation

● in a stress situation

● based on the phonetic and linguistic characteristics

of the speaker’s speech

● to design and build a speech synthesis

component and a style selection module for an adaptive dialogue system

(4)

2010-10-24 Efficient diphone database creation for 4

Ph.D. thesis context

● Adaptive dialogue system

● to adapt its speech by selecting a speech style

appropriate for the speaker’s level of speech arousal

● to improve human-computer interaction at

emergency unit control centres and the help desks of call centres, by making the dialogue more

(5)

2010-10-24 Efficient diphone database creation for 5

Objectives

● Minimasation of the material to be recorded and

annotated for a synthetic voice creation

● Automatisation of the process of synthetic voice

(6)

2010-10-24 Efficient diphone database creation for 6

MBROLA voice creation

(Dutoit et al. 1996)

● Creating text corpus

● list of phones with

allophones (PL) ● list of diphones (DL) |DL| = |PL|2 ● list of words ● words in carries sentences ● Recording corpus with monotonous intonation ● Segmenting corpus ● phone level ● automatically and/or manually ● extracting diphones ● Equalising corpus (mbrolation) ● energy levels normalisation ● pitch normalisation

(7)

2010-10-24 Efficient diphone database creation for 7 ● Creating text corpus

● list of phones with

allophones (PL) ● list of diphones (DL) |DL| = |PL|2 ● list of words ● words in carries sentences ● Recording corpus with monotonous intonation ● Segmenting corpus ● phone level ● automatically and/or manually ● extracting diphones ● Equalising corpus (mbrolation) ● energy levels normalisation ● pitch normalisation

MBROLA voice creation

(8)

2010-10-24 Efficient diphone database creation for 8

Mbrolation

The Mbrolator, is a software suite for MBROLA voice creation

● database file in the SEG format

● diphone filename ● diphone start & end ● diphone label ● diphone subsplitting

● restrictions put on the diphone files are:

● 16000Hz sampling rate

● no longer than 10000 samples

(9)

2010-10-24 Efficient diphone database creation for 9

(10)

2010-10-24 Efficient diphone database creation for 10

Phonetically rich sentence extractor

to select the smallest possible set of sentences

from a text corpus which will contain the largest number of diphones

(11)

2010-10-24 Efficient diphone database creation for 11

Available text resources

● 1623 sentences from the BOSS corpus

● 8828 sentences from the Jurisdict database ● 10451 ← altogether

● transcription in

● Polish SAMPA = 37 phonemes

(12)

2010-10-24 Efficient diphone database creation for 12

(13)

2010-10-24 Efficient diphone database creation for 13

Results

● SAMPA (38*38=1444 diphones)

● 1008 diphones in 211 sentences out of 10451 ● PE-SAMPA (41*41=1681 diphones)

(14)

2010-10-24 Efficient diphone database creation for 14

Diphone extractor

● to automatically cut out diphones from the

recordings based on the annotations of those recordings on the phone level

(15)

2010-10-24 Efficient diphone database creation for 15

Available material

● 1580 sentences from BOSS corpus

● recordings in professional recording studio

● recorded male voice in monotonous intonation ● annotated in Polish Extended-SAMPA

– automatic annotation – manual correction

(16)

2010-10-24 Efficient diphone database creation for 16

(17)

2010-10-24 Efficient diphone database creation for 17

Diphone extraction results

● SAMPA: 1039 diphones from 1580 sentences

(18)

2010-10-24 Efficient diphone database creation for 18

Tools combination and evaluation

● 226 sentences rocorded by a male speaker ● sentences annotated automatically

● 1002 extracted diphones ● MBROLA voice creation ● Total time: ca. 5 hours

(19)

2010-10-24 Efficient diphone database creation for 19

Tools combination and evaluation

● original

● fully automatic

(20)

2010-10-24 Efficient diphone database creation for 20

Conclusions

● Phonetically rich sentence extractor and

diphone extractor seem to be indispensable in MBROLA voice creation

(21)

2010-10-24 Efficient diphone database creation for 21

Acknowledgements

● This work was partly funded by

● the research supervisor project grant to Prof. Grażyna

Demenko & the author No. N N104 119838

● the international cooperation scholarship funded by the

Bielefeld University, Germany

● the scholarship for scientific achievements funded by the

Kulczyk Family Foundation

● The author is very grateful to Prof. Grażyna Demenko

for providing the text and speech corpora and to Prof. Dafydd Gibbon for his invaluable advice on the system design and implementation.

(22)

2010-10-24 Efficient diphone database creation for 22

References

Related documents