Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
1990
Word hypothesis of phonetic strings using hidden
Markov models
Jeffery W. Engbrecht
Follow this and additional works at:
http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact
.
Recommended Citation
Rochester Institute of Technology
School of Computer Science and Technology
WQrQ.
Hypothesis
Qf
Phonetic Strings
~
Hidden Markoy Models
by
Jeffery W. Engbrecht
A thesis, submitted to the Faculty of the School of Computer Science and Technology, in partial
fulfillment of the requirements for the degree of Master of Science in Computer Science.
Approved by :
John A. Biles
Word Hypothesis of Phonetic Strings Using Hidden Markov
Models
I Jeffery W. Engrecht Hereby grant permission to the Wallace
Memorial Library of RIT to reproduce my thesis in whole or in part.
Any reproduction will not be for commercial use or profit.
~----ABSTRACT
This thesis
investigates
a stochastic modeling approach to word hypothesis ofphonetic strings for a speaker
independent,
large vocabulary, continuous speechrecognition system. The stochastic modeling technique used is Hidden Markov
Modeling.
Hidden Markov Models(HMM)
are probabilistic modeling tools most oftenused to analyze complex systems.
This
thesisis
part of a speakerindependent,
large vocabulary, continuousspeech understanding system under development at the Rochester Institute of
Technology
ResearchCorporation.
The system is primarily data-driven and is void ofcomplex control structures such as the blackboard approach used in many expert
systems. The software modules used to
implement
the HMM were created in COMMONLISP on a Texas Instruments Explorer II workstation.
The HMM was
initially
tested on a digit lexicon and then scaled up to a U.S. AirForce cockpit lexicon. A sensitivity analysis was conducted using varying error
rates. The results are discussed and a comparison with Dynamic Time
Warping
resultsis
made.ACM Keywords: Speech Recognition and
Understanding
Natural Language
Processing
Table
of ContentsPage
1.
Introduction
1.1
Problem
Statement
11.2
Previous Work
31.3
Theoretical
andConceptual
Development 31.3.1 An Introduction to Hidden Markov
Models
41.3.2
HMMs
inSpeech
Recognition 71.3.3
The Three Problems
of HMMs 101.3.4 HMM types 1 5
2.
Project
Description andMethodology
1 52.1
Phoneme
ConfusionData
1 72.2
Knowledge Used
1 82.3
Software Tools
1 92.3.1
Phone
and Word Model Construction 2 02.3.2 Phoneme
String
Generator 20
2.3.3 Viterbi
Search
with Optional BeamSearch Technique
23
2.3.4 Grammar Construction 2 5
2.3.5 Error Analysis 2 8
2.4 Hardware
Tools
2 82.5 Lexicon Construction 2
8
2.6
Test
Data Creation 2 92.7
Log
probabilities 2 93.
Results
3 11.
Introduction
andBackground
1.1
Problem
Statement
This
thesisinvestigates
a stochasticmodeling
approach to word hypothesis of phoneticstrings for a speaker
independent,
large
vocabulary, continuous speech recognition system. Thestochastic
modeling
technique usedis Hidden
Markov Modeling. Hidden MarkovModels
(HMM)
are probabilistic
modeling
tools most often used to analyze complex systems.This,
in addition totheir
inherent ability
tohandle
time-varying
processes, makesHMMs
a natural candidate foruse in speech recognition.
This
thesisis
part of a speakerindependent,
large
vocabulary, continuous speechunderstanding
system underdevelopment
at the RochesterInstitute
ofTechnology
ResearchCorporation.
The
systemis primarily
data-driven andis
void of complex control structuressuch as the
blackboard
approach used inmany
expert systems.Figure
1 shows the software architecture ofthe project. Aninput
utterance is processed to produce adigitized
signal. Thedigitized
signalis
then provided asinput
to severalfeature
extractors.The feature
extractorscompress the
data
as much as possible withoutlosing
the phonetic content.The feature frames
are then sentto a
knowledge based
phoneme builder to produce strings of undifferentiatedphonemes.
Words
are then hypothesized.The
wordhypothesis
portion ofthe projectis
thefocus
of this thesis and will
be described in detail later.
The last
processis
the use of syntactic andsemantic
knowledge
sourcesin
an attempt to form asyntactically
correct utterance and provideit
with meaning.The parsing
of a phonetic transcription would not pose a problemif
there were no errorscontained within the transcription.
It
wouldbe
a simple matter ofstring
comparison.However,
errorsintroduced
to the phonetic transcriptionincrease
thecomplexity
and prevent the lexicalaccess procedure from
being
a simplelexicon lookup.
Front-end
errors are causedby
a lowerlevel's
inability
to segment the signalproperly
anddistinguish between
similarsounding
Digitized Signal
Spectral
Analysis
Energy
Measures
Pitch
Trackers
Formant
Trackers
Fricative Classifier
Stop
Classifier
Vowel
Classifier
Word Hypothesizer
Natural
Language
Understanding
Insertion Error
-chauffeur : sh ow f er -> sh ow
I
f
erDeletion Error
- and: ae n
d
-> nA substitution error
is
one in which a similarsounding
error takes the place of the proper one.An insertion
erroris
onein
which a phoneme, not in the correct phonetictranscription,
is
added to the string.And
finally,
a deletion error occurs when a phonemeis
not recognized andsubsequently
left
out.When
combined, these three types of errors can present aformidable
obstaclefor
the speech recognition system to overcome.1.2
Previous
WorkAs a
first
attempt to solve the wordhypothesis
problem in the RIT SpeechProject,
R.Thomas Selman investigated
adynamic programming
approachby
usingDynamic Time
Warping
[SELM89].
Dynamic Time
Warping (DTW)
is
a common method of sequence comparison used inmatching
the acousticfeature
vectorsrepresenting
an unknowninput
utterance and some reference utterance. DTW met withlimited
success. Although theaccuracy
obtainedby
using DTW wasrelatively
good, the time required to process a phoneticstring did
not provesatisfactory.
In
1989,
L. E. Levinson
used a form ofHidden Markov Models
which wasindependent
oflexical
and syntactic constraints[LEVI90].
In
that study,Levinson
treated word recognition as a classicalstring-to-string editing
problem whichis
solved with a two-leveldynamic
programming
algorithm that accountsfor lexical
and syntactic structure.Again,
Levinson
obtainedfairly
accurate results,but
the time required to obtain those resultsis far away from
our goal of real time processing.1.3 Theoretical and Conceptual Development
phoneme recognition
level,
asLee
did,
this thesis simulates the levels of phoneme classification.As
part ofthe speech project, this thesis groups togethergiven phonemes, obtained from alower
level phoneme classification routine, to represent words.
1.3.1
An
Introduction
toHMMs
An HMM is
adoubly
stochastic process with an underlying stochastic process thatis
notobservable
(it is
hidden),
but
canonly be
observed through another set of stochastic processesthat producesthe sequence of observed symbols
[RABI86].
Simply
stated, an HMMis
a collection of statesconnectedby
transitions.They
are somewhat analogous to automata.The
transitionsinclude
two sets of probabilities; an outputprobability
density
function(pdf)
and a transitionprobability from
state to state. In themodeling
of speech, the pdf candefine
theconditional
probability
of an output phoneme and the transitionprobability
can define theconnection ofthose phonemes.
The
connectionof multiple phonemes comprisewords. Twoexamples are provided which provide a
better understanding
of howHMM's
work.They
are thelily
pond example[HOWA71]
and the coin toss example [RABI86].Lilv Pond
The
lily
pond example shown inFigure
2illustrates
a situation where afrog
is sitting
ona
lily
pad.The
frog
is
permitted tojump
only
from pad to pad(the
frog
may
not swim between pads).Hence,
pads are represented asdiscrete
states of the transitions possiblein
thelily
pad.The
frog
may
evenjump
andland
on the same padhe is currently
on.Figure
3 shows some of thepossible transitions in the
lily
pad example.The
probabilitiesillustrated
canbe
represented in matrixform.
This
facilitates easy
manipulationby
computers. In apurely Markovian
system,the sum of the probabilities
in any
row sum to1.
The
transition of thefrog
from
one pad to anotheris based
on the Markovian assumption which states thatonly
thelast
state occupiedby
the given process
is
relevant indetermining
its future behavior.
This
assumptionis
avery
Lilypad
-frog
example
The
frog
must always sit onthe
lily
pad;
he
neverswims.
Occasionally
the
frog
jumps in
the
airandlands
on another pad.Markovian
Assumption:
Lilypad
-frog
example
p
=(pij)
Pi
iP12
Pin
P21
P22
P2N
Pni
Pn2
Pnn
Figure
3
wide classof systems.
Chemical
processes and speech recognition arebut
two examples ofsystems thatcan
be
emulated using HMMs.Coin Toss
The
coin toss exampleillustrates
how states in a Markov model are hidden.Imagine
youare
in
a room where thereis
another personbehind
a partition and that personis giving
you results of consecutive coin tosses.The
observations are described by:=
1.2
n
h,t
hUsing
HMM's
to represent thehidden
processoccurring behind
the partition can providegreat assistance
in
determining
whatis
transpiring
to create the observations. A single faircoin is the model most people would useto describe the outcomesof heads and tails. Figure 4
showsthis model
along
with twoother models.The
observed outcomes ofthe coin tosscan alsobe
explainedby
a 2 fair coins model, a 2biased
coins model, or a model withany
number ofcoins.
The
probabilities associated withheads
ortailsvary
according
to the model used.1.3.2
HMMs
in
Speech
RecognitionHidden Markov models are named after the Russian mathematician
Andrei Markov.
Markov's modeling
came about quite indirectly. Markov wastrying
todetermine
if the law oflarge
numbers applied todependent
variables as well asindependent
variables.In
particular,Markov wondered
if
the sum of the dependent variables wouldsatisfy
the centrallimit
theoremCoin
Toss Example
fair
coin.5
P(h)=l
P(t)=0
P(h)=0
P(t)=l
2 fair
coinsP(h)= .5 P(t)=
.5
P(h)= .5 P(t)=
.5
2 biased
coinsP(h)= .75 P(t)=
.25
.5
Figure
4
P(h)= .25 P(t)=
developed
for speech recognition,formed
thebasis for
much of the work accomplished in thespeech recognition
domain.
Until
theearly
1970's,
HMMs
were not used in speech recognition.This
wasprimarily
because
of thetractability
problem encountered whiletraining
HMMs
to change their associated probabilities to moreoptimally
model the process.A
major breakthrough occurred when a maximization technique wasdeveloped
in theearly
1970's.This technique,
known as theforward-backward
re-estimation orBaum-Welch
algorithm, solved thetractability
problemfor HMMs [BAUM72].
This breakthrough
paved theway
for the application ofHMM's
toautomatic speech recognition
[BAKE75], [BAKI76],
and [JELI76].Funding by
theAdvanced Research Project
Agency (ARPA)
oftheDepartment
ofDefense
provided much ofthe
impetus
formany
of the speech recognition systems developed in the 1970's.Of
the systemsdeveloped
in the1970's,
the two that couldbe
classified asforerunners
of
HMM's
were theDRAGON
system[BAKE75]
and theHarpy
system [L0WE77].The
DRAGON system,developed
atCarnegie-Mellon
University,
was thefirst
system touse a simplistic
Markov
technique.The
DRAGON system used uniform stochasticmodeling
forall
knowledge
sources.However,
the results were not aspromising
as werefirst
expected. Ononly
a 194 word speaker-dependent continuoustask, DRAGON
recognized 84% ofthe words correctly.Consequently,
theinterest in
the DRAGON system designgroup dissipated
and theDRAGON
system evolvedinto
theHarpy
speech recognition system.The
Harpy
system, alsodeveloped
in theComputer Science Department
atCarnegie-Mellon
University,
was theonly
speakerindependent,
large
vocabulary, continuous speech recognition system to meet or surpass the standards set forthby
ARPA.The
Harpy
systemhad
a sentenceaccuracy
of 91% acrossfive different
speakers(3
male and 2female)
and ranin less
than 7 million machineinstructions
per second (MIPS). Although theHarpy
system used statediagrams
similar to those usedin
HMMs,
CMU's
approach was to combine the syntactic,lexical,
and wordjuncture knowledge into
onelarge
state diagramdesigned specifically for
adistinct
lexical domain.
That
statediagram,
or network, became a complete and pre-compiledrepresentation of all possible utterances in the task language.
Since
these works accomplishedin
the1970's,
therehave been
two approaches to theproblems encountered
in
the speech recognition domain.One
direction was toview speechinvolves
stochasticmodeling
andthe use ofHMMs.
Waibel
[WAIB86]
also showed that the use ofhuman knowledge
of prosodic parameters such as stress,intensity
and duration couldbe
used toenhance performance.
This
approachis promising because it
improves word recognitionsignificantly
[LEE89].
In
1989,
R.Thomas Selman
usedDynamic Time
Warping
in an attempt to solve the wordhypothesis
problem[SELM89].
His
system met withlimited
results which willbe
used laterfor
comparison.A widely
acceptedtheory today
holds
that speech recognition inhumans
proceedsfrom anintermediate
representation of the acoustic signalin
terms of a small number of phoneticsymbols.
Levinson
used a speech recognition systembased
on thistheory [LEVI90]
in
which theacoustic-to-phonetic
mapping
wasdone using HMMs. It
resulted in a 76.6% wordaccuracy
on the DARPAresource managementtask.The latest
speakerindependent,
large
vocabulary, continuous speech recognition systemdeveloped
was theSPHINX
system atCarnegie-Mellon
University
in the late 1980's [LEE89].The SPHINX
speech recognition system usedHMMs
and Vector Quantization(VQ)
codebooks
forphoneme recognition.
The
results oftheSPHINX
system were promising.When
theHMMs
are combined with phonological rules, the systemaccuracy
approaches 97% for word recognitionusing a lexicon of size 997 and a
perplexity
of 20.1.3.3
The Three
Problems
of Hidden MarkovModels
To
useHMMs,
aformal
notationis
required to addressnecessary
variables.The formal
notation used throughout the thesis
is
shownin Figure
5.The
variables ofA,
B
and n are the three mostimportant
variablesin
the notation used.Hence HMMs
are represented asX
being
afunction ofA,
B
andn.When using
theHMM
approach to model a particulardomain,
the three problems whichare presented are the evaluation problem, the
decoding
problem and thelearning
problem.The
first
problemis
to determine theprobability
of an observed sequence(0
=o-|, 02, .
. .
Formal HMM
Notation
T
=length
of observation sequencesN
= number of statesin
the
modelM
= number of observation sequencesQ
={q1,q2)...
,qn} states
V
={
v-i, V2,
. . .,
vm
}
discrete
set of possible observationsA
={
ajj
},
ay
, statetransition
probability
distribution
B
={ bj(k)
}, bj(k),
observation symbolprobability
distribution
in
staten
={
7tj},
7i| ,initial
statedistribution
multiplications and N-1 additions.
The brute force
method becomes quickly intractable.The
forward-backward
algorithm solves thefirst
problem [BAUM72]. Figure6
illustrates asimple
HMM
and thecomputationsinvolved
inboth
aforward
and backwardsweep
of the lattice.The
conceptbehind
theforward-backward
algorithmis
to start at the first andlast
outputsymbols and worktoward the middle of the trellis
summing
aand P's as you go.The
second problemis
how tobest
choose a state sequence(i)
so thatit
maximizestheexpected number of correct
individual
states. The Viterbi algorithm was developed toaccomplish
just
that[VITE67].
It
worksby
simply
determining
the mostlikely
state atevery
instance
without regard to thelattice
structure, theneighboring
statesin time,
and the length oftheobservation sequence.
The Viterbi
algorithm canbe
used for segmentation, annotation, andrecognition.
Figure
7highlights
the formal stepsin
the Viterbi algorithm.The
Viterbialgorithm
is
similarin implementation
to theforward-backward
calculation.However,
amaximization over previous states
is
used in place of thesumming
procedure. A trellisstructure
efficiently implements
the computation [RABI86].The
third probleminvolves performing
asensitivity
analysis on the model. Modelparameters are adjusted to maximize the
probability
of the observation sequence.There does
not exist an analytical method to solve this problem.
However,
as mentioned earlier, a gradientiterative
technique wasdeveloped
by
Baum [BAUM72].The
Baum-Welch re-estimationformulas
guarantee that either1)
theprobability
willbe improved
or2)
a critical pointhas
been
reached where theprobability is already
maximized.The
re-estimationformulas
are asfollows:
1. =
yi
(i),
1 <=i<=N
2. = E
$t(ij)
/ S Yi(') for t = 1 . . . T-13.
bj'(k)
= I yt(j)/I
yT(cp) fort*1
...T
The
re-estimationformula for ttj is
trivially
theprobability
ofbeing
in
stateqj
at t = 1.The
re-estimationformula for
ay
is
the ratio of the expected number oftransitions from
stateqj
to qi,divided
by
the expected number of transitions out of state qj.The
re-estimationForward
and
Backward
Sweep
of
the
Lattice
Qi =
1
qi =
2
qi =
N
at(i)
t+1
at+i
G)
qj
qi
qi-1
qj=2
Viterbi
Algorithm
Step
1
-Initialization
5x0)
=Klbi(01),1
<=i<=NStep
2
-Recursion
for
2<=t
<=T,
1<=j<=N
5,(j)
= max[ 5t-i(i) a\\ ]
bj
(O,)
for
1
<=i
<=N
^t(j)
= argmax[
8t-i
(i) a.\\ ]
for
1
<=i
<=N
Step
3
-Termination
P*
= max
[ 5T(i)
]
for
1
<=i
<=N
ij*= argmax
[ bj(\) ]
for
1
<=i
<=N
Step
4
-Path
(state
sequence)
backtracking
for t
=T-1,
T-2,
,1it* =
t+i(it+n
symbol
k divided
by
the expected number of times ofbeing
in state j.The
primed values of thenew model are
iteratively
used to generate a new model until an optimum pointis
reached[RABI86].
1.3.4
HMM
Types
Hidden Markov Models
are categorizedby
types.As
shown in Figure8,
the twoprimary
types of
HMM's
are ergodic and non-ergodic.Ergodic HMM's
are where all statesin
the model are interconnected.Using
automataterminology,
they
canbe
thought of as a clique of states.The
non-ergodic, and theleft-to-right
in particular, HMMs are of specialinterest
inspeech recognition.
The left
to rightHMM displays
aninherent
temporal structure.Transitions
are allowed
only
to an equal or higher numbered state.This
makes theleft-to-right HMM
virtually ideal for for
modelling
time-varying
processes such as speech.2.
Project
Description andMethodology
Little
or notraining
data
existed for thelexicons
available.Therefore, training
of theHMM was not a viable option.
Hence,
the direction ofthisstudy
was guided toward thedecoding
problem of HMMs.
This study
was not used in conjunction withany
other part of the speechproject.
As
such, a unique method ofgenerating phoneme strings needed tobe developed.
Inaddition, the natural
language
understanding portion of the speech project required notonly
thebest
state sequence,but
also required a set oflikely
alternatives to the recognized phrase.Consequently,
alattice
of possible words was constructed.The
implementation of the HMM work was accomplished in three phases.Phase
I was theimplementation of the baseline HMM system for digits.
Phase
Iincluded
the construction of thephone and word models.
The
phone models weredeveloped using
output probabilities takenfrom
the confusion matrix generated
in
R.Thomas Selman's
thesis[SELM89].
The
confusion matrixTypes
ofHMM's
ergodic
non-ergodic
(left-to-right)
Phase I
was theViterbi
search algorithm(to include
a beam search capability), variance oferror rates in the phoneme models, variance of
string
length,
and the construction of a lattice ofdifferent
parsings.Phase
IIinvolved
the addition ofknowledge
to thebaseline
system. A grammar wasintroduced
to thebaseline
system in an attempt toincrease
word accuracy.Phase III involved
thescaling up
of thebaseline
model to the cockpit lexicon used in thespeech project at the
RIT Research Corporation.
A
grammar, in theforms
of word pair andbi-gram which will
be
explainedlater,
wasintroduced
to the cockpit lexicon todetermine its
effecton the system.
2.1
Phoneme
confusiondata
The
work performedin
thisstudy is based
on the assumption that the confusionprobabilities used
between
phonemes were accurate.The
vowel confusion probabilities wereobtained
directly
from
the vowel classificationstudy
[HILL87].Diphthongs
were broken downinto
combinations of vowels. For example, thediphthong
ay
was representedby
ahih . A fulllisting
of thediphthong
representations willbe
presented later.However,
for consonants, noconfusion
data
existed.The only data
available wastheperceptualdistance
measures betweenconsonants. R.T.
Selman's
[SELM89]
examination of confusion probabilities anddistances
forvowels yielded an approximation to the
following
exponentialrelationship (base
= 1.50):1
Confusion
Probability
(input
vs. output) =distance
2
1distance
where the
Z
is
taken over all phones.The
aboverelationship
was applied to all phonemes.The
confusion probabilitiesin
2.2
Knowledge Used
Inter-word
transition probabilities were extracted from the grammars introduced.The
word pair grammar, which
is
a simple grammar that specifiesonly
thelist
of words that canlegally
follow any
given word, and thebi-gram
grammar were used.The
use of word pairknowledge
comes from the development of inter-word transitionprobabilities.
These
probabilities are a result of the extraction of pairs of words which occurin phrases throughout the
lexical domain.
The
lexicaldomain is
parsed to produce the pairs ofwords.
Then
allduplicate
word pairsfound
therein are removed. When a word pairis
found,
atransition
probability is
assigned.When
no word pairis
found,
no transitionis
permittedbetween
the two words.The
transitionprobability
usedis
theinverse
of the number of wordpairs
found
forany
one word.This implies
that the transitions allowed from one word toany
other permissible word
have
the same probabilities. Whenknowledge is
available about themanner
in
which thelexicon is
used, the measure of the number of choices at each decision pointcan
be
reduced.The perplexity
orentropy
of the modelis roughly
the number of choices perdecision
point andis
a measure ofthe constraint imposedby
the grammaror the level ofuncertainty
given the grammar [LEE89].The perplexity
of thedigit
lexiconusing
no grammarwas 10.
The perplexity
was reduced to 5 when using a word pair grammar.The perplexity
ofthe cockpit
lexicon
testdata
set with no knowledge added was 666.The
word pair grammarreduced the
perplexity
toapproximately
4.6.The
addition of bi-gramknowledge is
similar to that of word pair.The difference is
thatduplicates
are not removed and a more accurateinter-word
transitionprobability is
assigned.The
probabilities are estimatedby
counting.The probability is
calculatedby
adivision
of thenumber of
instances
of a specific word pairby
the total number of transitions permittedfrom
the
first
word toany
other word.The perplexity
of the digit lexiconusing
abi-gram
grammarwas
approximately
4.9.The
test set perplexity obtained from usingbi-gram
probabilities onthe cockpit lexicon was reduced to approximately 4.4.
The limited
testdata for
the cockpitdomain
did
notsignificantly
reduce theperplexity
as was the case in theSphinx
speechrecognition system.
However,
better information aboutinter-word
transitions should produce2.3
Software
Tools
A software module was created to
facilitate
the developmentof the HMM models. Thesoftware module was
implemented
inCOMMON
LISPdue
to the predeterminingfact
that the RITResearch Corporation Speech Project is
being
developed
on a LISP machine. LISPis
afunctionalprogramming language
oriented for the manipulation of symbols.Using
LISP in an interpretivefashion
allowed rapid prototyping.This
gives the programmer quick conformation of thesuccess or
failure
of thecode.COMMON
LISP wasdeveloped
in an effort tocombine the featuresof other
dialects
in an optimal was and to promote thecommonality
amongdiverging
new LISPdialects
[STEE84].This study
wasdeveloped
using the flavor system, whichis
anobject-oriented
programming facility.
When a flavor typeis
defined,
a data type and a set of operationsimplemented
by
function objects called methods thatoperate on thatdata
type are defined.Instances
ofthesedata
typescan thenbe
generated.The
flavorsystem allows the combination ofvarious flavor
definitions
to construct a new flavor.This may
prove helpful in the integrationof several
levels
of the speech project.The
software module contained thefollowing
items:
1.
Phone
andWord
model constructioninput
-lexicon,
error rates, and optional grammar2. Phoneme
String
generatorinput
-word or phrase sequence,
lexicon,
and error rates3. Viterbi search with optional beam search
capability
input
-list
of word models and phoneme stringsoutput
-word
lattice
generated from the search space4. Grammar construction
input
- collection of sentences or phrasesoutput - word pair and bi-gram probabilities
5. Error
Analysis
input
-list
of correct parsings and search spacelattice
Figure
9 shows the general architecture of the HMM system implemented.The
modelis
loaded
by
using
thelexicon,
error rates and an optional grammar.The
phoneme stringis
thengenerated and used as an
input
to theViterbi
module.Hypothesized
words are produced andstatistics are gathered.
2.3.1
Phone
andWord
model constructionThe
model used torepresent words was one of parallel connected phoneme strings.The
baseline digit
modelis
shown inFigure
10 .This
model was used forits
simplicity.Once
astarting
state wasdetermined,
transitions between states occurred in a very clean andeasy
tofollow
manner.With
no grammarimposed,
a transition from a terminal state to aninitial
stateoccurs with a
probability
equal to the reciprocal of the number of words in the lexicon.Deletion
errors were permitted asindicated
in Figure 10. Insertion errors occurred attheir specified error rate and
simply
resulted in a return of the model to the same stateit
wasin last.
Insertions
are not shownin Figure
10.Deletions from
one state to a statelocated
2temporal positions to the right
in
the model took place with aprobability
equal to the specifiederror rate.
Any
deletion transition greater than 2 temporal positionshad
an associatedprobability
which wasdecreased
as the square ofthe number of positionsjumped
above 2 .While
tolerating
transitions of great magnitude, this particular modeling structureimposed
lowprobabilities to preclude this from
frequently
occurring.With
an optional grammarintroduced,
the transitions permitted fromany
terminal stateto
any
other initial state were constrained.This
effect reduced the expected number ofinter
word transition decisions in the model significantly.
2.3.2 Phoneme string generator
In
all three phases of thisthesis,
phoneme strings were generatedby
the softwaremodule using the confusion matrix
data
(AppendixA)
and a random number generator.General
Architecture
collectionofsentences lexicon Error Rates input string
c
Grammar
Construction
word pair/
bi-gram
Word Model
Phone Model
Phoneme
String
GeneratorFigure
9
Viterbi
Module
Recognized Word
String
Lattice\
C
Error Analysis3
I
G>
Digit
Model
Note: insertion transitionsare not shown
phoneme and the movement within the model to the next state
in
time.A
random numberwasgenerated. When the magnitude ofthe random number was
less
than or equal to the cumulativeprobability, that phoneme was
instantiated
and appendedto the phoneme string.The
sameprocedure was repeated to
determine
the transition to the next state.During
Phase
I,
thestarting location
as well as the transition between word models wasdetermined
at random.This
producedtruly
random sequences of words.The only
restrictionplaced upon the construction of aphoneme sequence was that
it
terminate after a preset numberof phoneme
based
word strings were created.Word
phrases of length between 1 and5
were usedto create phoneme strings.
The
addition of a grammarto thebaseline digit
systemin Phase
II affected the productionof a phoneme string.
The
transitionsbetween
a terminal state of one word and an initial state ofanother word were now no
longer randomly
generated,but
were constrainedby
theinter-word
transition probabilities obtained
from
the grammar.In
Phase
III,
the generation of phoneme strings was further restricted.The
phonemegenerator
determined
thestarting location based
on the first word of theinput
word sequence.Transitions
were permitted asbefore
until a transition to another word was reached. Then thenext word
in
theinput
sequence was taken and theprocedure was repeated.The
testing
ofspecific word phrases from the cockpit lexicon did not permit the generation of a phoneme
string randomly
or as inPhase
II.2.3.3
Viterbi search with optional beam searchThe
beam search option of the Viterbi algorithm restricts the allowable space to searchfor
the correct phrase [LEE90]. With no beam search capability, theViterbi
algorithmlooks
atevery
possible state atany
particular instant in time to determine the mostlikely
state(Figure
11).
For large
vocabularies, the exhaustive search technique without the use of a grammar canbe
a timeconsuming
and resource expensive process.The
beam search option restricts thesearch space that the Viterbi algorithm
is
permitted to look atby
pruning
offunlikely
Lexical Search
Space
with no
beam
search
w o ro
55
Observation
Sequence
8
values and their associated state numbers(Figure
12).This figure is
a crude representationof the search space which
is very difficult
to portray. Infact,
theremay
exist more than one setof
boundaries inter-dispersed
throughout
thelattice.
The
second version was to expand thesearch space
initially
and then todecrease
the search spacelinearly
after several transitionsthrough the
lattice
structure(Figure
13).This
technique was used to prevent the correctphrase
from
being
pruned at anearly
stage in thelexical
search space.As
part of theViterbi
module, alattice
of possible wordsfound
within the search spacewas constructed.
This lattice
generated willbe
passed on to the next higher level ofthe speechproject
for
naturallanguage
understanding.Along
with the wordidentified,
the lattice containedthe
starting
andending
positionin
the phonemestring
and theprobability
of the word.2.3.4
Grammar
constructionThe
grammar used as aknowledge
source in thedigit
model allowed transitionsonly
between
words thatwere even and between words thatwere odd.As
a result,only
word stringslike
zero -svctwo and one-nine -three-fivecouldbe
generated.Additionally,
a transitionback
to the same word was twice as
likely
as a transition to adifferent
word.This
would makegenerating
the phrase zero-zero -zero morelikely
than that of zero-two -zero.The
grammar usedin
the cockpit modelis
an amalgamation ofdata
taken from threedifferent
sources.Situational input
phrases taken from theCockpit
NaturalLanguage
Study
[LIZZ87],
phrases usedfor
evaluation purposesby
R.T.Selman
[SELM89],
andadditionally
constructed phrases were used.
The
phrases used forevaluation purposes served as abasis
forcomparing
the results of R.T.Selman's dynamic
timewarp
work and the results of this study.The additionally
constructed phrases were incorporated to ensure thatevery
word in thelexicon
had
a transitionprobability
associated withits
terminal states.In both sets ofgrammars, sentences or phrases were used as input.
As
mentionedearlier, in
determining
word pair probabilities, all duplicate word pairs were removedfrom
the input.
They
wereleft in
for the calculation of bi-gram probabilities.The
entire grammarsLexical Search Space
with
linear bounds
CO a>
ii
ro CO
Observation
Sequence
Lexical Search
Space
with
extended
and
then
linearly
reduced
boundaries
CO
ro CO
Observation
Sequence
2.3.5
Error
Analysis
In
all three phrases, the percentage of correct phrases was determined.Also,
thelattice
structure of possible words contained in the search space generated as a result of the Viterbi
module was used to
determine
if the correct phrase was present in the search space.The
percentage ofoccurrences ofthe correct phrase
in
the search space was also calculated.This
will
be
used as anindicator
ofthe results that the naturallanguage
portion of the speech projectcan expect.
The
average time required to process phoneme strings through the Viterbi modulewas also recorded for each phase ofthis study.
2.4
Hardware
toolsThis study
wasimplemented
on aTexas
Instruments Explorer3 II machine.The
Exploreris
a microprogrammed,dedicated
workstation, providing a comprehensive ArtificialIntelligence
(Al)
environment forfast
symbolic processing.Additionally,
the Explorer canbe
augmentedwith a
TMS
32020Signal
Processor Board that allows low-levelfeature
extraction to proceedin parallel
using
fourindependent
signal processors.These
characteristics allow theintegration
of low-levelprocessing
with the high level control mechanismstypically
found
inAl
applications.
2.5
Lexicon constructionThe
vocabulary used in thisstudy
was taken from the UnitedStates
AirForce Cockpit
Natural
Language study
[LIZZ87]. For each of the 656 words containedin
the study, alexicon
entry
was constructed containing the wordin
thestudy
andits
phonetic transcription.The
words from the Air
Force study
wereinput into
a text-to-speech synthesis system as part ofUniversity
phonetic symbol-set(Figure
14).Homonyms
form a single lexicalentry
withmultiple
English
representations.Words
with multiple pronunciations wereonly
entered onceinto
thelexicon.
For those words, a proper phonetic transcription was taken from Webster'sNew World
Dictionary [WEBS66]
and then enteredinto
the lexicon.As
mentioned earlier,diphthongs
werebroken down into
theirvowel components.The digit
and cockpit lexicons arelocated in Appendices D-1
andD-2,
respectively.2.6
Test data
creationThe
AirForce Cockpit Natural Language study
[LIZZ89]
was the source ofthe testutterances used as
input
strings for the HMM process. A set of 148 test phrases was selectedfrom the
study
that combined a widevariety
of words available from the lexicon.The
averagelength
of the phonetic transcription over the 148 phrases was 20.4.Initially,
test phraseswere translated to their phonetic representations with a 5% substitution error and
0%
insertion
anddeletion
error rates.After this,
insertion and deletion errors wereincreased
by
intervals
of5%
until their maximum of20%
was reached.The
substitution error rate wasthen
increased
inintervals
of 5% and the process was repeated.The
errors in the phonetictranscription were created as mentioned earlier. It was believed these error rates would
be
more than sufficient to represent actual errors obtained from the phonetic classifier.
2.7
Log
probabilitiesIn
alexicon
of substantial size, theprobability
of a word, phrase or sentencerapidly
approaches zero
during
forward
computations in the Viterbi algorithm.Recall
that thecalculation of the
8
value results from the multiplication of the previous 8 value and thetransition
probability
3jj.
Normally,
this would result in afloating
point underflow.To
preclude an underflow condition, the probabilities were transformed
into
log
probabilities.CMU
Phoneme
Symbols
Vowels
Example
erbird
aa cot aebat
ahbutt
aobought
awbough
ax the ehbet
ih
bit
IX rosesiy
beat
owboat
uhbook
uwboot
Liauids
Example
lled
r redGlides
y
wExample
yet wetStops
Example
ng
td
k sing totdad
kick g m gag mom n nonP
bpop
bob
Fricatives
Example hh fhay
fife
V verb th thief zh dh s measurethey
sister z sh zoo shoeAffricates
Example
chjh
churchjudge
Diphthong
Conversions
ay
-> ah ihey
-> eh ihoy
-> ow ihThe
addition oflogs is
required whennormalizing
a matrix oflog
values.The
addition oflogs is
more complicated.For
us to add two numbersP-j
andP2
whereP-|
is
greater than orequal to
P2,
thefollowing
formula
takenfrom
theSphinx
system was used.log
b
(p1
+P2)
=log
b
I
b'9 b P1 + b'g bP2]
=
log
b
[
blo9b
P1 (1 + blogb
P2-log
b
P1)]
=
log
b
Pi
+log
b
[1 + bl09b
P2 "lo9
b
P1]
Results
The accuracy
results for all three phases are contained in theirentirety
in AppendixC
and represent the results achieved from
testing
both lexicons over thepreviously
mentionedrange of error rates.
Below are three examples ofthe results produced
by
the software modules.The first
twoare taken
from Phase I.
The
third exampleis
taken fromPhase
III.
The first
exampleillustrates
a situation where the correct phrase was notfound
as themost
likely
candidate.However,
the correct phraseis located
in the word lattice.The
secondexample shows where the correct phrase was not found in either the Viterbi search or the word
Example Set
1
Actual
(zero
seven nineseven)
Phoneme
string
(z
r ow s ehd
sh n n ahih
n s eh v axn)
Found
(zero
six nineseven)
substitution .1
insertion .1
deletion .1
Search Space (phoneme
length)
10 15 20
zero
(-4.7)
s e v e n
(-12.22)
two
(-5.1)
s e v
(-21.38)
e n z..e..r..o .(-1.3)
I
t.w.o(-4.9)
t.h.r.e.e |(-5.3)
I
f...o...u...rI
(-7.56)
f..o..u..r(-7.9)
t...w...o(-8.2)
1 o n e
\
(-8.03)
\
s i x\
(-8.91)
e...i..g..h...t
(-18.43)
l s e v e n
(-11.13)
\
t..w..o
(-17.3)
i
(-13.02)
s i x
V
f....o...u...r\("7-43)
/
z e r o(-12.66)
/
(-22.09)
"
n i n e
(-0.38)
1
t...w...o
(-17.45)
f i v e
(-5.5)
f....o....u....r
(-20.05)
o n e
(-6.52)
o n e
(-20.49)
n i n
(-4.09)
Example
Set 2
Actual
(five one)
x ' substitution 5insertion .1
Phoneme
string
(f I
b
uh ahr)
deletion .1Found
(four one)
Search Space (phoneme
length)
5 10 15 20
two
(-3.2)
f...
(-4 o...u...r
.87)
f 0 u r
(-5.56)
n i n e
(-6.46)
f....
(-6
,..i v e
9)
f..o..u..r
(-3.52)
f....o....u..
(-6.43)
..r 0 n e
(-3.38)
z....e....r.
(-6.79)
..0 e..i..g..h...t
(-10.17)
t..h...r...e
(-7.09)
..e
z e....
(-9.08)
...r... 0
(-9.78)
Example
Set
3
correct phrase
phoneme
string
phrase
found
range and
bearing
wingmanr ae ih n jh ae 1 n g b eh ih r ix jh w ih ih ng ng g m uh b
range and
bearing
wingmancorrect phrase
phoneme
string
phrase
found
status of strike
flight
s t eh 1 ax s ax v s hh aa ih ih k sh 1 ih ih t
status of strike flight
correct phrase
-give me more information on the threat
phoneme
string
-g ih eh v m
iy
m ow r ix n f er m sh ix f aa aa y g ax th r eh nphrase
found
-give me more information on visual
3.1
Phase I
Initially,
all candidates with a5
value below thetop
5
were pruned offfrom
the searchspace. Then thesearch space was expanded toprune off
only
those with a5
valuebelow
thetop
ten states.
Figure
15 shows that for a substitution error of .15, aninsertion
error of .10, anda deletion error of .10, phrase recognition
accuracy
was above 70%.As
wasexpected, theincrease in
thesearch space raised theaccuracy
ofthe system.However,
early pruning
causedthe correct phrase tobe located
outside of the search space onseveral occasions.
For
that reason, the search space wasinitially
expanded and then reducedafter several steps through the HMM lattice.
Figure
16 shows theimprovement
in
accuracy
The
test results reflect statistics gathered from thetesting
of phrases of different length(between 1 and
5)
words. Eachiteration
of the particular phrase length specified was runthrough the Viterbi module
30
times.The
bar graphs shown areindicative
of the set of results obtained for the digit lexiconwith no grammar.
Figure
17 compares other results over the four different circumstancestested.
There
was a gradualdecrease in
word phrase accuracy asthe error rates increased.The
system ran a phonemestring
through the Viterbi module and produced a hypothesizedComparison
of
Phase I
Search
Space
Parameters
Figure 15
><
o (0
I.
3 o o
<
100
80
-60
-40
-20
-sub. error= .15
insert,error=.05
delete,error=.05
top
5 candidatestop
10 candidatesComparison
ofPhase
I
Parameters
with an
Extended
Search
Space
Figure 16
u
(0
i_ 3
o o
<
100
80
-
60-40
-
20-sub. error=.15 insert, error=.05 delete,error=.05
top
5top
10top
5 ext.Circumstances
top
10 ext. ext.Comparison
of
Various
Error Rates
in
Phase
I
Figure
17
100
ffi
CO co
u ffi
o u
LJ
top
5E3
top5w/wordlatticeM
top
100
topi 0viiwordlattice'
5Vj3
Qrammar
15-00-00 15-05-05 15-10-10 15-15-15 15-20-20
3.2
Phase
IIGrammar
wasintroduced
to the modelin
terms of the use of word pair and bi-gramprobabilities.
The
addition of a word pair grammarincreased
the phrase accuracy above thatachieved without
knowledge
(Figure
18).The
combination of an extended search space and wordpair grammar provided the
best
results to that point.Overall,
the word pair and bi-gramknowledge improved
the performance of the system.However,
theincrease
in performance wasnot as great as thatobtained
by
Kai-Fu Lee [LEE89].
Lee increased
theaccuracy
ofhis
systemnearly
twofold
by decreasing
theperplexity from
997 with no grammar to 60 with word pairgrammar and
finally
to20
withbi-gram
grammar.As Figure
19 shows, the combination of abi-gram grammar and an extended lexical search space assisted the model in surpassing all
other results.
These
tendencies were consistent through all the results.The
graph inFigure
20shows that the
primary
variablesinvolved in
increasing
the modelaccuracy
were the extensionofthe search space andtheaddition of
knowledge
to the system.Because
the resultsshown aretaken over all of the error rates
tested,
theaccuracy figures may
notbe
significant.However,
an appreciation of the
impact
of certain variables canbe
obtained.Dramatic
improvements in
the
accuracy
of the system were not realized when grammars were added.This
canbe primarily
attributed to the low
perplexity
of the lexicon. InPhase
I,
theperplexity
was 10. InPhase
II,
the
perplexity
was reduced to 5. Because thedrop
in magnitude of theperplexity
of the modelfrom
Phase I
toPhase
II was not substantial, theaccuracy
of the systemdid
not raisesignificantly.
As
was the casein Phase
I,
Phase
II saw the Viterbi moduleoperating
in real time.Comparison
of noKnowledge
and the use of aWord Pair Grammar
in
Phase
II
100
o CO
b. 3 o o
<
Figure
18
sub. error=.15
insert,error=.05
delete, error=.05
LO O
X X
Q_
5 X X
Q. CD CD CD CD
o Q. LO O
*^
o CO O CL . CL
Q_ o 5 Q. 5
O Q_ ' '
IT) O O
*-O
CL
O Q.
O
wp - word pair ext.
-extendedsearch space
Comparison
of noKnowledge,
Word
Pair
and a
Bi-gram
Grammar in
Phase
II
100 -i CO o
m
CO r^ a>
LO CO 00
u CO w 3 o o <
Figure
19
sub. error=.15
insert, error=.05
delete,error=.05
a. o a. o X CD a. o X CD a. o LO a. o
CL CD Dl _:
X 5
o
CL O
X _o X _o X
CD CD LO CD o CD CL 5 CL CL O CT> CL J3
LO O LO
CL
o O
CL O CL
O Q.
O
"
O
bg
- bi-gram wp - word pairext. - extended
search space
Results
from Phase
I
andPhase II
Figure 20
-H
top
5-?
top
5vi/ wordlattice-B
top
1 0-
top
1 0vi/ wordlattice*
%fsultsfromoil testsconducted
3.3
Phase III
The
tests were conducted on the cockpitlexicon using
thefollowing
search spaceparameters:
Position in the I attire
Candidates
kept1
top
100
2-5
top
70>5
top
35The
parameters were set to these valuesin
an attempt to preventearly pruning
of thecorrect candidate.
The
cockpitlexicon
was tested with no grammar added.Preliminary
test resultsindicated
anaccuracy
ofapproximately 35%
(7 out of20)
for error rates of .05,0.0,
0.0 for
substitution, insertion and deletion.
Only
preliminary
tests were conductedbecause
of the timeelapsed when
processing
a phoneme string. Aperplexity
of 666 necessitates thatat eachstep
through the HMM
lattice
structure, 666 other possible transitions be considered. The pruningprocess removed all but the most
likely
candidates.However,
pruning
could not occur untilafter all
8
values were calculated and sorted.The
end result ofhaving
such a highperplexity
was that the time required toprocess
only
one phoneme averaged about 12 minutes.That
corresponds to
approximately
4hours
perinput string
and an overall time of30,000 hours
totest the
data
over all error rates considered.This
timesimply
proved too great forfurther
practical consideration.
Hence, testing
withoutknowledge
wasdiscontinued.
However,
the timerequired to process an
input
string using no grammaris
not as dismal asit may
appear.Levinson conducted tests on a 47 states ergodic semi-Markov model
using
an 8CE Alliant FX-80
super computer and a 992 word lexicon [LEVI90].
He found
that recognition required about 15times real time.
With
a grammar added, the percentage of phrases more thandoubled
(Figure 21).
A
comparison
using
the error rates of .05,0,
0 for substitution,insertion
anddeletion
errorsResults
from Phase III
Figure 21
100 -A
80
-o
CO
k. 3
o
o <
60-40
-20
sub. error=.05
insert,error=0 delete,error=0
No grammar
1 ' 1 ' T"
WP WP viI lattice BG BG viI lattice
Circumstances WP
-wordpair BG- bi-gram
precise
inter-word
transition probabilitiesbeing
assigned. There was a smalldecrease
in theperplexity
of the model (~ 4.4).This
was causedby
the static properties of the model and thesmall size ofthe grammar.
Probabilities
were not adjusted becausetraining
of the model wasnot conducted.
The lack
oftraining
resulted in no reduction in the number of differenttransition
locations
that mustbe
considered.For
substitution error rates at 10% orbelow,
anincrease
in insertion and deletionerror rates caused a monotonic
decrease
in the system performance. When substitution errorrates rose above
10%,
anincrease
ininsertion
and deletion error ratesfrom
0% to5%
resulted
in
asharp
drop
in
the percentage of phrasescorrectly
recognized (Figure 22).The
reason for this acute
decrease may be
causedby
thesize of the lexicalsearch space. An increasein
the number of candidates retained for the nextstep
through thelattice
wouldcertainly
increase
the phrase recognition figures.An examination of the overall performance (Figure
23)
indicates thereis
a smalladvantage in using a
bi-gram
grammar over a word pair grammar.The
grammar used(Appendix
B-2),
althoughseemingly
lengthy,
providedonly
moderate gains in accuracy.This
was a result of the
inter-word
transition probabilitiesbeing
more precise and a slightdecrease
in
the perplexity.3.4
Comparison
ofHMMs
andDynamic Time
Warping
The
results of Hidden MarkovModels
comparefavorably
with those of theDynamic Time
Warping
methodology
usedby
Selman.
As Figure
24illustrates,
whenusing
equal substitutionrates, the HMM was able to recognize a
higher
percentage of theinput
phrases.While
thismay
not
be
afair
comparison,it does
show that the HMM method permits theincorporation
ofknowledgewhile the DTW
does
not.The
incorporation ofknowledge about thelexical domain
provided enhanced recognition performance. In addition, this
illustrates
animportant feature
ofHMMs: that knowledge can
be easily
addedtothe model.The
time required to process a phoneme string for theDynamic Time
Warping
methodwas on the order of 9 seconds per phoneme in test conditions
providing
the greatest percentageComparison
of
Word
Pair
and
Bi-gram
over
Different
Error
Rates
Figure 22
100
>.
o co i. 3 O u
<
Errorrates
15-00-00
0
15-05-05M
15-10-10El
15-15-15Results
Showing
the
Benefit
ofthe
Word
Lattice
Figure 23
50 -\
40
-u co
u o
<
*
overall results
fron
alltestsconducted
bg
w/ ssComparison
ofHMM
andDTW Results
Figure
24
100
c o
CO o o e
QC
ffi u
k.
ffi
Q.
bi-gram
DTWvi/highThreshc
10-00-00 15-00-00 20-00-00 25-00-00 15 mixed
ErrorRates
*
sub. insert. dele
HMM 10 5 5
From
these two viewpoints, theHMM
methodhas
proved more effective and efficient inthe
hypothesizing
of words than theDTW
method.4.
Conclusions
Hidden Markov
Models
have
provided a markedimprovement in
speech recognition overpreviously
used methodologies.Their inherent
structure, whichis
virtually ideal forhandling
time-varying
processes,
makesHMMs
aforerunner in
the speech recognition field.The
implementation
of theHidden Markov Models
on aLISP
workstationhas its
advantages and
disadvantages.
LISP
allowsfor
rapid prototyping.This
permits the programmerto
quickly incorporate
newideas
and approaches.However,
LISPis notoriously bad
for numbercalculations. In order to model the entire cockpit
lexicon,
many array
structures weremaintained as well as
large lists.
Even
though LISP wasprimarily designed
to processlists,
it
is very
slow when thelists become lengthy.
With no grammarintroduced
to the cockpitlexicon,
nearly
12 minutes was required to process a single phoneme.The
use of the Explorer3 provided anability
to use Flavors.This may become
particularly
usefulin
theintegration
of the entire speech project.However,
avery large
amount of
memory
was required toimplement
the HMM technology.Garbage
collectionhad
tobe
an
ongoing
process so thatvirtualmemory
space was not exceeded.This
slowedthe performanceof the Viterbi module significantly.
The
use of alittle knowledge
can go along
way. Dramaticimprovements in
the phraseaccuracy
were realized when a grammar was added.The
percentage of phrases recognized rosemore than
2.5
times whenknowledge
wasintroduced.
But,
there waslittle
gainin using
bi-gram over word pair bi-grammar.
This
canbe
attributed to thelimited data
provided about themanner
in
which the words are used and the overall static nature of theHMM.
A
more completeindex
of the situations certain word phrases are used in woulddefinitely
increase
the accuracy.The
HMM method performedvery
well when there were noinsertion
ordeletion
errorspermitted for a
large
lexicon. Asharp
decrease in the performance was noted when these errorswere introduced and the substitution error rate was above 10%.
The
confusion createdby
these
size.
So,
the HMMis
sensitive toinsertion
anddeletion
errors.One
possible method forde
sensitizing
the model wouldbe
to segmentthe phoneme string.This
would give the system abetter idea
of where words start andfinish.
The
lexical search space was akey
factor
in the results obtained.Currently
there existsno algorithmic solution to
determine
the proper size of the search space.As
an alternative, athreshold could
be
used to prune unwanted candidates,but
the problem