1
Using Segmentation Constraints
in an Implicit Segmentation Scheme
for Online Word Recognition
Émilie CAILLAULT – LASL, ULCO/University of Littoral (Calais)
Christian VIARD-GAUDIN – IRCCyN, University of Nantes
2
Aim
• Problem :
– Training of a neuro-markovian system in the context
of cursive handwritten word recognition.
• Difficulty :
– Bootstrapping of the system allowing a training from
scratch.
• Proposed solution :
Implicit segmentation procedure
•
Definition of a segmentation control function
3
How train a neuro-markovian system?
Classifier
HMM
Teaching
Signal: character
Signal
Character level training
Classifier
HMM
Teaching signal:
word
Word level training
RN-MMC
Parole/Hors-ligne
Différents hybrides
MMC-MMC
RN-MMC
MMC - RN
SVM-MMC
Kppv-MMC
4
Global system overview : MS-TDNN
O b se rva tion w in d o w s O t of th e T D N N S a m e c o nv o lu tion a l w ind o w in sid e th e o b se rva tio n w in do w s of th e T D N N O b s1 O bs n O b s1 O bs n … … ... x r y dx dy cy cx P T D N N in pu ts : te m p o ral se q u en c e o f fe a ture s v e cto rs for th e w o rd « qu a n d »
‘z’ ‘a ’ ‘b ’ ‘c’ ... O b s1 O bs n O b sT
O b se rva tion s v e cto rs se q u en c es p ro ce s se d
by the H M M
T D N N T D N N S a m e w e ig th s
m a tric es
• Scanning of the input word by a
TDNN (temporal convolution).
• TDNN produces the sequence of
observation probabilities.
•
No explicit segmentation
is
carried out but just an overlapping
window is regularly shifted.
• Advantages of the TDNN
architecture: weight sharing
⇒
reduction in computation times
and in memory requirements.
• Reduction of the number of
computation with an adapted
topology of the TDNN due to the
sliding observation window.
1- MsTDNN-HMM 2- Implicit Segmentation Control
3- Conclusion
1- MsTDNN-HMM
P/D1 state /
character:
66 states
3 states /
character:
198 states =
198 output neurons
5
Dynamic Programming Likelihood computation
a
b
c
d
…
.n
o
p
q
…
.u
…
.z
Ch
a
ra
c
te
r
m
o
d
e
ls
O1 O2 O3 ….. Ot, Observation sequences
Best Viterbi path
bj(Ot), observation probability aij = cst transition probability
Dynamic Programming
to find the best path alignment
Word Lexicon
---un
deux
trois
…
quand
…
qs 11 uu 11 nn 11 qf qs 11 qq 11 uu 11 aa 11 nn 11 dd 11 qfWords HMM = concatenation of letters HMM
DP
DP
Likelihood
Scores
• Hence, the likelihood for each word in
the lexicon is computed by
multiplying the observation
probabilities over the best path
through the graph using the Viterbi
algorithm.
• The word model with the highest
likelihood is the top one recognition
candidate.
For each entry in the
lexicon, a
pseudo-HMM-word model is constructed
dynamically by
concatenating letter HMM
1- MsTDNN-HMM 2- Implicit Segmentation Control
3- Conclusion
1- MsTDNN-HMM
q
s
q
1
q
f
1-state letter model
q
sq
1q
2q
3q
f3-state letter model
6
Training
Word Likelihood Computation
Recognition
List of words ranked by their WL score
Word Lexicon
Normalisation
...
Sequence of features vectors
Original data Delta x - coord y - coord x - direction Y _ direction .... y - coord x - direction Y _ direction .... 1 N
...
Sequence of vectors of observation probabilities ,
P ( Class 1 | X ) P ( Class 2 | X ) P ( Class 3 | X ) P ( Class 4 | X ) .... P ( Class 1 | X ) P ( Class 2 | X ) P ( Class 3 | X ) P ( Class 4 | X ) .... Obs 1 ObsT
DP
TDNN
Features Extraction
Normalized dataCharacter Likelihood
Delta x - coordGlobal cost function
Backpropagation of the gradient
Training :
• Word level training
• Classical
back-propagation algorithm
• Cost function mixing
MLE, MMI and
character level
discrimination
[ICDAR 2005]
1- MsTDNN-HMM 2- Implicit Segmentation Control
3- Conclusion
1- MsTDNN-HMM
« one » recognized as « over »
True Hmm: (+)
ooonneee
Best Hmm: (–)
oovveeer
Best character: (–)
aouvneel
« one »
True Hmm: (+)
ooonneee
« one » recognized as « over »
True Hmm: (+)
ooonneee
7
Illustration of the segmentation problem
• Random initialisation of the TDNN
⇒
all outputs are equally probable
Example: Possible paths for the English word
‘of’ with 10 observations (TDNN
windows)
1- MsTDNN-HMM 2- Implicit Segmentation Control
3- Conclusion
2- Implicit Segmentation Control
ooooooooof
ooooooooff
ooooooofff
ooooooffff
ooooofffff
ooooffffff
ooofffffff
ooffffffff
offfffffff
o
f
8
How to obtain desired Likelihood distribution?
1- MsTDNN-HMM 2- Implicit Segmentation Control
3- Conclusion
2- Implicit Segmentation Control
P(
Γ
)
offfffffff
1
ooffffffff
ooofffffff
ooooffffff
ooooofffff
ooooooffff
ooooooofff
ooooooooff
ooooooooof
9
Segmentation
paths
for the word
‘of’
Existing solutions:
Training with a character database
- duration model for HMM sub-unit [Penacee]
- manual segmentation constraints [Remus, Npen++]
Proposed solution:
- no modification of the HMM structure (transitions=1)
- no training with a character database
9
Weighting function
f
T
:
B = First observation position
for the concerned letter
E = Last observation position
for the concerned letter
H = Height of the function
ƒ
T
S = E-B = letter size
1- MsTDNN-HMM 2- Implicit Segmentation Control
3- Conclusion
2- Implicit Segmentation Control
0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9
Observations range corresponding to one letter
W
e
ig
h
ti
n
g
o
f
T
D
N
N
o
u
tp
u
ts
%
H=100
H>1000
( , )
min(
NN
( , ) *
T
(
);1)
Outputs t l
=
Outputs
t l
f
H
H
B
E
10
Weighting function applied : case of the word ‘of’
H=100
1- MsTDNN-HMM 2- Implicit Segmentation Control
3- Conclusion
2- Implicit Segmentation Control
0
50
100
1
2
3
4
5
6
7
8
9
10
observation position
w
e
ig
h
ti
n
g
c
o
e
ff
ic
ie
n
t
o
f
1
11
How to evaluate the effectiveness of this weighting function ? ASR
1- MsTDNN-HMM 2- Implicit Segmentation Control
3- Conclusion
2- Implicit Segmentation Control
ASR : Averaged homogenous Segmentation Rate
e:
index of the considered sample
T:
number of observations scanned by the TDNN
N:
(fixed or limited) number of training samples
NL(e):
number of letters in the word label
NLs(e): number of sequences of different letters
i:
index of a sequence of letters from 1 to NLs
NLc(i): number of the same consecutive letter
and nb_obs(i): number of observations of the letter i in the optimal Viterbi path.
( )
1
1
1
_
( )
(
)
( ) *
/
( )
NLs e
N
e
i
nb obs i
ASR
N
=
=
NLc i
T NL e
=
∑ ∑
0.36
0.64
0.84
0.96
1
0.96
0.84
0.64
0.36
ASR
1
2
3
4
5
6
7
8
9
NL(f)
9
8
7
6
5
4
3
2
1
NL(o)
Case: ‘of’
12
Experiments :
Training
in 2 stages :
1-
f
T
segmentation
constraint function
for 20 epochs
2- training with no
segmentation
constraint
1- MsTDNN-HMM 2- Implicit Segmentation Control
3- Conclusion
3- Conclusion
Word Likelihood Computation
...
Sequence of vectors of observation probabilities ,
P ( Class 1 | X ) P ( Class 2 | X ) P ( Class 3 | X ) P ( Class 4 | X ) .... P ( Class 1 | X ) P ( Class 2 | X ) P ( Class 3 | X ) P ( Class 4 | X ) .... Obs 1 ObsT
True HMM
TDNN
Character Likelihood
Normalisation
Original data Normalized data...
Sequence of features vectors
Delta x - coord y - coord x - direction Y _ direction .... y - coord x - direction Y _ direction .... 1 N
Features Extraction
Delta x - coordGlobal cost function
Backpropagation
of the gradient
...
Constrained Sequence wtih
duration model
,P ( Class 1 | X ) P ( Class 2 | X ) P ( Class 3 | X ) P ( Class 4 | X ) .... P ( Class 1 | X ) P ( Class 2 | X ) P ( Class 3 | X ) P ( Class 4 | X ) .... Obs 1 ObsT
Label of the true word
Weighting
function
ƒ
TWeighting
function
ƒ
T13
1- MsTDNN-HMM 2- Implicit Segmentation Control
3- Conclusion
3- Conclusion
0.684
0.498
11.8
17.7
Relative
improvement (%)
0.612
0.423
Average
Segmentation
Rate (ASR)
94.81
3S-TDNN
Segmentation
constraint
28.1
32.6
Relative
improvement (%)
90.51
91.31
87.09
Recognition rate
(%)
3State-TDNN
1S-TDNN
Segmentation
constraint
1State-TDNN
System
Results on online word IRONOFF database
IRONOFF word database
English / French
Lexicon : 197 words
2/3 Training : 20 898 words
14
Conclusions
• A simple and effective process to bootstrap a hybrid
system without any labeled character database
• Important increasing of the recognition rate by an
initialisation stage with a duration model apply
directly on the TDNN outputs
Weighting function ƒ
T
• Better quality of segmentation
⇒
better recognition
rates
ASR – Averaged Segmentation Rate
1- MsTDNN-HMM 2- Implicit Segmentation Control
15
Future works : controlled segmentation stage
• Measure of the ASR for each training iteration
• Adaptation of the parameter H in the weighting
function to reduce the segmentation constraints
according to the ASR value.
– H=ƒ(ASR).
– Initialisation with H = 100
⇒
high segmentation
constraint, only balanced segmentation paths are possible
– Then relax constraint on H when ASR has reached a
reasonable value.
1- MsTDNN-HMM 2- Implicit Segmentation Control
16
17
Le moteur de reconnaissance : le TDNN
TDNN
...
...
MLP
...
Global
Vision
Weight
sharing
Local
vision to
global
vision