Context-free Grammars in Speech Processing

The grammars in spoken dialog systems serve two purposes: first, they define the natural language used formally in a certain domain and map different natural language user utterances to the representative semantics, which can be further used in the dialog management component; second, they are commonly used for language modeling to improve the quality of speech recognition [Allen et al., 2000; Bui et al., 2005; Dusan & Flanagan, 2000].

Among different types of formal models used in a natural-language processing system, the mostly widely adopted one is the context-free grammar, due to its power of expressing different linguistic structures existing in the natural language. Over the years, an augmentation of context-free grammar – so called “unification grammar” – has been established, which adds the notion of feature constraints to context-free grammar [McTear, 1998]. The adaptation of unification grammar is due to compactness and concision in the definition of the language. Though it is known that in the expressivity, unification grammar is more powerful than context-free grammar,

it is still quite arguable whether the additional expressive power that is gained by unification grammars is actually needed to describe natural languages [Moore, 1999]. So we regard context-free grammar as the most common formal model in speech processing to express and map the natural user utterances to semantics and provide a language model for better speech recognition. In the dialog systems introduced in Allen et al. [2000], Bui et al. [2005] and Dusan & Flanagan [2000], context-free grammars are adopted without any expressive restriction to the necessary natural language.

6.3.1 Approximations of Context-free Grammars

Despite the availability of extensive literature on the topic of efficient context-free parsing for large and very ambiguous grammars, context-free parsing still poses a serious problem in many practical applications such as real-time speech recognition. The human language user seems to process in linear time; humans understand longer sentences with no noticeable delay. This implies that context-free grammars are good and powerful for language interpretation, but are not likely models for human language processing. Therefore, there are different approaches in the academic world for approximating context-free grammars with finite-state devices, which are known to allow very efficient processing in linear time. In practice, these approaches solve the conflict between requirements of language modeling for recognition and of language analysis for sentence interpretation. Current recognition algorithms use the finite-state acceptor language models for computational efficiency. It is known that these models are inadequate for natural language interpretation, since they cannot express all relevant syntactic and semantic regularities. Context- free grammars can express many of those regularities, but are computationally less suitable for language modeling, because of the inherent cost of computing state transitions in their parsers. The approximation of context-free grammars with finite- state devices integrates these two techniques in a single system.

Going by the Chomsky Hierarchy, it is obvious that finite-state devices are not as powerful as context-free grammars. But interestingly, though not mentioned in the literature, I found that at least some of the constructions that cannot be treated with finite-state devices are also difficult for humans. For example, constructions involving center-embedding are very hard to process for humans, but are regarded as grammatical by linguisticians. In English particle verb constructions, the particle can either precede or follow the direct object (“put the book down” / “put down the book”). If the direct object contains a relative clause, and the particle follows the direct object, then the examples become very hard to understand (see Appendix 4 for examples).

If there is no restriction to the amount of center-embedding (recursion), it is equally impossible for finite-state devices to process these sentences. This suggests that finite-state devices could offer language models adequately accounting for the efficiency of human language processing.

Therefore, context-free grammars used in spoken-dialog applications often represent regular languages (which are equivalent to the languages modeled by finite-state automata), either by construction or as a result of a finite-state approximation of a more general context-free grammar.

6.3.2 Different Approximation Approaches

It would be perfect if the context-free grammar (generating actual regular language) could be used as a general form of specification, and an equivalent finite-state automaton could be transformed from it to be used in the recognition process. Unfortunately, there is no general algorithm that would map an arbitrary context-free grammar generating a regular language into a corresponding finite-state automaton. (See Theorem 8.15 in Hopcroft and Ullman [1979])

However, in the existing literature, a number of methods have been proposed for

approximating a context-free language with a finite-state automaton. Nederhof

[2000a] gives a good survey of different approximation methods.

Several of these methods can be categorized into two classes: one class of approaches constructs a pushdown automaton from the grammar, where the language accepted by the automaton is identical to the language generated by the grammar, and then approximates the pushdown automaton with a finite automaton [Johnson, 1998; Grimley Evans, 1997; Pereira & Wright, 1997]. Among them, there are two kinds of pushdown automaton approximation – subset and superset approximation. The subset approximations such as the one of Johnson [1998] reduce the infinite set of stacks in a pushdown automaton to a finite set leading to a finite automaton. The superset-approximations such as the one of Pereira and Wright [1997] build congruence classes of stack symbols and translate each congruence class to a unique state of a non-deterministic finite automaton.

Another superset approximation approach retains only the information about allowable terminals or pairs of adjacent parts of speech (cf. uni-gram, bi-grams, and tri-grams) [Stolcke & Segal, 1994).

A superset approximation based on recursive transition network is introduced in Nederhof [2000a]. The approach constructs a finite automaton for each non-terminal, and builds the complete recursive transaction network by collecting all finite automata of different non-terminals.

Many approximation approaches prove to be exact if the approximating context-free grammar is left-linear or right-linear [Grimley, 1997; Pereira & Wright, 1997; Nederhof, 2000a] Intuitively this can be explained by the fact that a left-linear or right-linear grammar is defined to be regular language and further equivalent to a finite-state automaton.

Furthermore, in Nederhof [2000a] it is proven that context-free grammars that are not self-embedding generate regular languages. According to Chomsky [1959b], a self- embedding grammar is defined as follows:

A grammar is self-embedding if there is some A∈N , such that

A→

αAβ

for some

α

≠ε

and

β

≠ε

A grammar that is not self-embedding is defined to be a strongly regular grammar. [Nederhof, 2000b]. The proof is based on a constructive algorithm mapping a strongly regular grammar into an equivalent finite automaton [Nederhof, 2000a; Nederhof, 2000b]. This proof is not in conflict with the theorem [Theorem 8.15 in Hopcroft & Ullman, 1979], since the condition of strong regularity is a sufficient condition for the language to be regular, but is not a necessary condition. It means there are some grammars which are not strongly regular but which generate regular language. For such grammars, the approximations generate a larger language. In practice, the finite automata approximation is normally applied to enhance the recognition accuracy. They are used as a frond-end filter to the real parser. So it is allowed that a certain percentage of ungrammatical input is recognized. Also it is allowed that “pathological” grammatical sentences are rejected that seldom occur in practice; an example are sentences requiring multiple levels of self-embedding. According to these practical considerations, most approximations are accepted according to their approximation quality. The most serious problem is actually the complexity of the construction of the automata from the compact representation for large grammars [Nederhof, 2000a].

6.4 Comparison of Context-free Grammars

In document Song, Dongyi (2006): Combining Speech User Interfaces of Different Applications. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 105-108)