• No results found

2.2 Selection of the appropriate modality

2.2.1 First level: Necessity and possibility

As stated in Section 1.1, the final objective of this work is twofold:

• To perform a comparative quantitative study between Spanish and Japanese modality based on spoken corpora.

• To develop an automatic modality tagger for both languages based on rules extracted from the theoretical and empirical information.

Since we are working in a computational field, we are dealing with the two fundamental challenges any study in this area will encounter to achieve the highest level of precision and recall: ambiguity resolution and portability. As Abney (2011) explains, any natural language is filled with ambiguity. For example, automatically assigning a morphological category to the word duck in the following sentences, could prove to be very bothersome –From Abney (2011, p. 4):

2.2. SELECTION OF THE APPROPRIATE MODALITY

(9) a. When he began flailing about, he made her duck. b. When he invited her to dinner, he made her duck.

For a competent speaker of English with some linguistic knowledge, it is easy to tag the duck from sentence (a) as a verb and the duck from b) as a noun, but to formalise it for a computer is not so straightforward. There are two solutions for this problem: (1) the human creates the rules for the computer to understand it, a solution very popular in the 80s, or the most modern one, (2) make the computer create its own rules via machine learning.

Machine learning, although extremely powerful as it is, does not provide the linguist any linguistic explanation for the solution, only probability calculations. A solution for studying modality can be to manually annotate what we consider the attitude or subjectivity of the speaker and then allow the computer to automatically learn. This is a possible solution for trends a and b, as we saw in the previous section. Not only we will obtain a useless method linguistically speaking, as the problem of formalisation will remain unsolved for the humans; also the issue of subjectivity can be very difficult to define, and due to its vagueness, an annotator may understand it different from another.

The creation of rules allows a solution of the problem based on our linguistic knowledge. The formalisation is made by the human, and then processed by the computer. The downfall of this approach as it has been revealed in the last decades, is its inability to deal with situations impossible to standardise. Natural languages are ambiguous and complex and constantly changing, and many situations would require too many rules to process them. If working with this approach, the studied feature should as much contained and objective as possible.

The second challenge mentioned by Abney is portability, or “the difficulty of porting a system developed for one subject domain to a new domain” (2011, p. 4). As shown by Biber (Biber, 1991; Biber et al., 1999), language frequencies and variables change according to the type of text and discourse. Any kind of study has to take this into account, and computational ones are no exception. If the approach

is rule-based, the rules should be contained enough to adapt to different situations. If it is generated through machine learning, the training process must be repeated in different types of texts.

This study aims to tackle these problems with the creation of hand-made rules. The focus is ruled-based, as we want to formalise the coding of modality and create a series of the instructions based on observations in theoretical studies and corpora for a tagger to automatically annotate modal markers. Therefore, the understand- ing of what modality is, and the way it is coded in the sentence, must resolve these challenges as efficiently as possible. Modality encoding in the text may be ambigu- ous: one marker can denote several types of modality, as I will explain below; Also portability, because we are moving between two languages, registers (formal, in- formal) and discourse types (monologues, conversations and dialogues). Therefore, our understanding, definition, classification and marking of modality must fulfil the following requisites:

1. It must be accountable for the grammatical differences between Spanish and Japanese.

2. It must have a morphological and syntactic approach, moving away from prag- matics.

3. It must work independently from context.

4. It must classify modal markers avoiding as much ambiguity as possible, pro- viding a sufficient amount of relevant information.

5. It must be compatible with other elements present in the discourse like nega- tion or ellipsis.

The best way to approach this will be through modal logic, as it easily re- solves ambiguity, portability and formalisation. Also, one of the most successful and widespread applications of modern logic, specially mathematical logic, has been the development of computer and computer programs. The best way to formalise modality into rules will be through the formal aspects of modality based on logic.

The most common definition of modal logic (as, once again, there is a lack of consensus on the matter), is the so-called alethic logic, the one that understands that

2.2. SELECTION OF THE APPROPRIATE MODALITY

the truth value of a proposition representing the state of affairs may be qualified as either necessary or possible, expressed through modal markers such as adverbs (possibly, necessary, etc.) or auxiliaries (must, may, etc.). It rests on the philosoph- ical studies pioneered by Aristotle and Boethius, and adapted by many typological linguists, previously explained. Considering the following sentences:

(10) a. It may rain tomorrow.

b. You must eat more vegetables. c. I am possibly mistaken.

Sentences 10a., 10b. and 10c. can be represented respectively by the following formulae selecting between possible and necessary:

(11) a. The fact that tomorrow is raining is possibly true. b. The fact that you eat vegetables is necessarily true. c. The fact that I am mistaken is possibly true.

Since alethic modality is a propositional or sentential logic, as it studies the modification of propositions, in this case, using necessity and possibility, we can express the formulae with symbols. If a proposition p is necessary, it is represented as□p. If a proposition p is possible, it is represented as ♢p. This can be applied to any language, which makes it very attractive to cross-linguistic studies. Sentences 12 and 13 show and example in each language.

(12) 明⽇ ashita tomorrow は wa nom ⾬ ame rain かもしれない kamoshirenai may.MODadv

‘It may rain tomorrow’

(13) Probablemente Probably.MODadv lluev-a rain-sbjv mañana tomorrow

‘It (will) probably rain tomorrow’

Sentences 12 and 13 can both be formulated as ‘the fact/truth value that tomor- row is raining is possible’, or simply ‘♢p’, being p the proposition ‘rain tomorrow’.

The notions of necessity and possibility may lead to misunderstandings. To better explain the concepts, we must address Kipler’s ‘possible worlds’ (1963), the understanding that, at least abstractly, an infinite number of worlds, universes, or state of affairs is possible at any moment. In our case, we are evaluating the speaker’s utterances; hence, the set of possible worlds (w) is established by him/her. The fact that it may rain tomorrow is established by the speaker, according to his/her own knowledge.

Logic assumes each sentence is either true or false (the Law of Excluded Middle), but not both true and false (the Law of Non-Contradiction) (Kaufmann et al., 2006). If the truth value of a proposition is necessary (□p), it is true in all possible worlds. If the truth value of a proposition is possible (♢p), it is true in at least one of the possible worlds. The set of possible worlds where the proposition is true has been called ‘modal base’ (R) (Kratzer, 1981). Taking a sample sentence (φ), and (V) as the evaluation function (0 for False, 1 for True), this can be formalised as the following (taken from Kaufmann et al. (2006, p. 80):

2.2. SELECTION OF THE APPROPRIATE MODALITY (14) Vw(♢pφ) =      1 if Vw′(φ) for some w′ ∈ pw 0 otherwise Vw(□pφ) =      1 if Vw′(φ) for all w′ ∈ pw 0 otherwise

In addition to this, the issue of negation can also be easily processed. As we saw in Figure 1 of Section 2.1 by Apuleius, necessity and possibility are connected, and can be implied through one another, by negation. Adding negation to the operators change them to the opposite operator:

(15) a. □p ⇐⇒ ¬♢¬p b. ♢p ⇐⇒ ¬□¬p

That is, ‘necessary p’ is equivalent to ‘not possible not p’: ‘It will rain tomorrow’ if and only if ‘it is not possible not to rain tomorrow’. Whereas ‘possible p’ is equivalent to ‘not necessary not p’: ‘It may rain tomorrow’ if and only if ‘it is not necessary no to rain tomorrow’. In other words, the negation of a possibility becomes a necessity in the form of an impossibility, whereas the negation of a necessity becomes a possibility as a ‘not necessity’ implies the possibility or the event becoming true, or not. Another example can be seen in the following sentences taken from Palmer (2001, p. 91):

(16) a. Mary must come tomorrow. - Necessity b. Mary may come tomorrow. - Possibility

c. Mary can’t come tomorrow. - Not-possibility, i.e. necessity. (Mary not coming tomorrow is necessarily true)

d. Mary needn’t come tomorrow. - Not-necessity, i.e. possibility. (Mary coming tomorrow is possibly true)

Therefore, the first level of the tree that forms our classification of modality will be divided into Necessity and Possibility. If the modal marker states that the sentence is true in one of the worlds perceived by the speaker, it will be tagged as Possibility. If the marker on the other hand states the sentences will be true in every possibility, it will be tagged as Necessary. The next subclassification will consist on Epistemic or Deontic modality, and will be explained in the following section.