3.1 Fundamental concepts in NLP
3.1.1 Approaches to processing natural language
The two basic approaches for processing natural language are usually referred to as symbolic and stochastic approaches (Jurafsky and Martin, 2009: p. 44).
Symbolic approaches to NLP are based on algorithms that allow humans, usually computational linguists, to write dictionaries or rules that determine what kind of linguistic operations can be performed on a text. These operations usually imply the linguistic analysis of the words to obtain the corresponding morphological, syntactic, semantic or pragmatic information. According to Jurafsky and Martin (2009: p. 44), symbolic approaches to NLP are related to research lines such as formal language theory, reasoning and type logic.
Stochastic approaches apply data-driven techniques to linguistic tasks using like- lihood mathematics and prediction models. These approaches consist in extracting linguistic knowledge from data, usually manually annotated by experts, by means of algorithms that generalise word behaviour on the basis of some sort of distributional property grasped by mathematical principles.
Reasons to choose between one or the other are often related to the size of the data to be handled, as well as the complexity of the phenomenon to be tackled. On the one side, NLP tasks that are well understood in terms of linguistics or that can be reasonably abstracted into lexico-grammatical patterns by NLP specialists are more convenient for symbolic approaches. This is also true for tasks for which the amount of data is low. By contrast, tasks for which there is a large amount of (usually) annotated data, or which are not easily explained in terms of linguistics or lexico-grammatical patterns, are typically tasks suited to stochastic approaches.
Although in the beginning NLP researchers tended to work following either one approach or the other, in the 1990s researchers started to work on hybrid solutions for natural language processing, that is, NLP solutions that used both symbolic approaches and stochastic approaches (see Resnik, 1995, Padr´o, 1998). The use and implementation of hybrid approaches is still a hot topic today (see for instance the latest EACL workshop on Innovative hybrid approaches to the processing of textual data, http://www-limbio.smbh.univ-paris13.fr/membres/hamon/hybrid/).
Researchers in the field discussed the advantages and disadvantages of one or the other (see for instance Tapanainen and Voutilainen, 1994, and Voutilainen and Padr´o, 1997). This thesis does not make a point in this respect: We argue that, if properly designed and implemented, both approaches are equally useful and effi- cient. However, under certain circumstances – constraintness of the task, availability of large or annotated corpora, cognitive complexity of the task, and so on –, applying one approach or the other might be more efficient. What is relevant to us, in line with Jurafsky and Martin (2009: p. 36), is that “what distinguishes language pro- cessing applications from other data processing systems is their use of knowledge of
language”, which corresponds to knowledge in phonetics and phonology, morphology, syntax, semantics, pragmatics and dialogue.
3.1.1.1 Deep versus shallow NLP processing
The result of applying NLP techniques to a text is a linguistically analysed text, which is often an intermediate step before an applied task (from machine transla- tion to phone-based banking) can be performed. A typical full syntactic analysis for a sentence such as The cat ate the fish. is represented in Figure 3.1.1 The NLP
processing techniques that provide such full-fledged morphosyntactic parses are com- monly referred to as deep parsing techniques. Higher levels of analysis might also be tackled, but this does not make a difference in the point discussed here.
(a) (b)
Figure 3.1: Full syntactic parse of a sentence in parenthetic and tree representation. Deep parses include all types of relations and interdependencies between the words in the sentences. For instance, in Figure 3.1, the subject the cat is described as consisting of a determiner and a noun that, together, form a noun phrase, of which the noun is the head, as represented in the graphical tree in Figure 3.1b. Deep parsing creates too many difficulties for the underlying algorithms, mainly related with the high degree of ambiguity and the complexity of the linguistic structures, though it is also true that certain tasks do require it (e.g. prepositional phrase attachment or coordination ambiguities, Jurafsky and Martin, 2009: p. 451).
In the early 1990s, Abney (Abney, 1991, 1996) introduced the concept of partial, or shallow, parsing as well as the notion of chunk, which he defined as clusters of words that correspond in some way to prosodic patterns (1991: p. 1). Although he applied shallow techniques to syntactic parsing, the term is nowadays generally used any kind NLP task implying linguistic annotation for which a less complex analysis suffices – and replaces a full linguistic analysis.
1Parsed with the Stanford Parser at http://nlp.stanford.edu:8080/parser in June 2012. Visuali-
A typical partial or shallow parse of the sentence The cat ate the fish is represented in Figure 3.2.2 This type of analysis ignores the relationships between determiners
and nouns, as well as the relations between the verb phrase and the two noun phrases.
Figure 3.2: Partial syntactic parse of a sentence in parenthetic representation. Shallow approaches simplify the parsing strategy in that only the amount of information needed to complete the task will be extracted. Moreover, NLP prac- tice has shown that this simplification of the analysis facilitates the application of NLP techniques to real-life tasks for which a complete processing of the text is not needed. This is the case of Information Extraction tasks, where templates requiring specific data must be completed (Jurafsky and Martin, 2000: p. 385–386, Jurafsky and Martin, 2009: ch. 22).3
An important advantage of shallow approaches to NLP is the use of cascades of finite-state automata (Abney, 1996, Jurafsky and Martin, 2009: pp. 450–451), which are much more efficient than standard parsing algorithms. Of course, efficiency is improved at the cost of coverage, but the tasks for which it is applied benefit from the trade.