• No results found

Polarity Reversals Under Sluicing k

Margaret Ilona Kroll

k

University of California, Santa Cruz

This project presents novel English sluicing data that cannot be captured under existing theories of sluicing. The data supply robust evidence for a previously unobserved phenomenon in which the elided content and the antecedent content in a sluiced construction contain opposite polarity. The phenomenon challenges current accounts of identity conditions on ellipsis by demonstrating that a greater mismatch between antecedent and elided content is possible than previously thought. This project provides six categories of corpus and constructed polarity-reversal examples and presents a new, pragmatic sluicing analysis that accounts for and unites the data. Sluicing, first noted by Ross (1969), is an ellipsis phenomenon in which the TP of an interrogative is elided, stranding an overt wh-phrase in the CP domain. The dominant semantics- based theory of sluicing is Merchant's (2001) theory of e-GIVENness. Simplified, e-GIVENness says that a TP can be elided if and only if it semantically entails and is entailed by a salient antecedent TP. e-GIVENness incorrectly predicts the polarity reversal data to be impossible. Example (1) is a polarity reversal in which negation is present in the ellipsis site (subscripted E), but absent in the antecedent site (subscripted A). Strikethrough material signifies elided content. (1) [I don't think that [TP Trumpi will comply with the debate requirements]A]A' but I don't

know why [TP hei won't comply with the debate requirements]E.

e-GIVENness requires mutual semantic entailment between A and E; however, neither TP in (1) semantically entails the other. A move to include the matrix negation in the antecedent by taking the entire matrix clause A' as antecedent also fails on its two available readings. Interpreting negation in its surface position fails to deliver A'/E entailment, as the speaker's lacking a belief that Trump will comply does not semantically entail him not complying. Interpreting negation in the embedded clause A—the neg-raised and only felicitous reading of (1) (Horn 1989)—also fails to deliver A'/E entailment, as the speaker's thinking that Trump won't comply does not semantically entail that he won't. That is, the speaker can have false beliefs.

Interim Conclusion I: Bidirectional semantic entailment accounts are too restrictive and fail to

predict the existence of polarity reversal data.

I account for the neg-raised reading of A' in (1) with the pragmatic excluded-middle presupposition that Gajewski (2007) proposes is associated with neg-raising verbs. However, corpus example (2) shows that the polarity reversal phenomenon is robust beyond neg-raising constructions, and therefore cannot be explained by appealing to a syntactic account of neg- raising. The inference from A to E in (2) is a pragmatic entailment that ¬remember p → ¬p. No semantic entailment exists between a speaker not remembering an event and the event having not occurred, as the following is felicitous: “I don't remember being scared, but apparently I was!” (2) Context: [O]n the day the Japanese invaded Pearl Harbor, Hummel was rounded up and

locked in an internment camp along with about 2,000 other foreigners. . .

Sluice: “I don't know why [I wasn't scared]E, but [I really cannot remember being scared]A,” [Hummel] said. “It all seemed like great fun.”

may rely crucially on pragmatically-enriched content. Therefore, the identity condition proposed to hold between the antecedent and elided content must allow for the pragmatic enrichment of semantic content. I propose that the correct identity relationship needed for sluicing is contextual entailment. Informally, a TP can be elided iff it expresses a proposition that is entailed by the local context (cL) and is uniquely salient. Formally:

The Sluicing Condition correctly predicts the availability of the ellipsis in (1) and (2) as well as in (3) below, in which antecedent and elided propositions are separated by exclusive disjunction: (3) Context: Students in a math class were given an option to do an extra credit problem, but

were required to mark the problem # on a spreadsheet. The professor and TA thought John would do a problem, but see nothing marked by his name. The TA says:

Sluice: [Johnj either didn't do an extra credit problem]A, or hej didn't mark which onei [hej did ti]E.

The current account predicts (3) by utilizing Karttunen's (1974) local context for exclusive disjunction: For propositions p, q such that p ˅ q is uttered in a context c: cL for p = c, cL for q =

c ∩ ¬p. Informally, the local context for the second disjunct of an exclusive disjunction is the

negation of the first disjunct. Applied to (3), this predicts that the local context of E is the negation of the proposition expressed by A, [John didn't do an extra credit problem], which entails the proposition expressed by E, [John did an extra credit problem]. Formally:

Local Context for A and E: Sluicing Condition:

Interim Conclusion II: The proposal that the licensing condition for sluicing is sensitive to (i)

the pragmatic enrichment of the antecedent content and (ii) the local context of the ellipsis site correctly accounts for the polarity reversal sluices shown here.

.

Concerns about the over-generation of a pragmatic sluicing account are resolved by using existing constraints on question-asking in felicitous discourses:

The Well-Formedness Condition: If the underlying question of a sluice is infelicitous,

then the sluice will not be well-formed (see e.g. Dayal and Schwarzschild 2010).

.

Maximize Presupposition (Heim 1991): A question must presuppose the existence of any

partial answers that are already available in the discourse (cf. Barker's 2013 Answer Ban).

Conclusion: Polarity reversals under sluicing present a new challenge to the enterprise of

determining the conditions under which linguistic content can be felicitously elided. This project shows that, counter to its dominant treatment in the syntactic literature, ellipsis is an inherently pragmatics-sensitive phenomenon subject to contextual licensing. The ability to elide linguistic content therefore fits naturally into general theories of constraints regulating coherent discourses.

Sluicing Condition:A TP α can be deleted iff ExClo(⟦α⟧g) expresses a proposition p, such that

cL⊆ pand p is uniquely salient.

cL-A= c = W

cL-E=c ∩ ¬A = W ∩{w: ¬[¬∃x[extra credit problem( x)(w) ∧do( x)( j)(w)]]}

cL-E⊆ ExClo(⟦E⟧

w , g

)= {w: ¬[¬∃x[extra credit problem( x)(w)∧do( x)( j)(w)]]} ⊆{w: ∃x[extra credit problem(x)(w) ∧ do( x)( j)(w)]}

References

Barker, Chris. 2013. Scopability and Sluicing. Linguistics and Philosophy 36: 187-223.

Dayal, Veneeta and Roger Schwarzschild. 2010. Definite Inner Antecedents and Wh-Correlates in Sluices. In Rutgers Working Papers in Linguistics 3: 92-114. Eds. P. Staroverov, D. Altschuler, A. Braver, C. Fasola, and S. Murray. New Brunswick, NJ: LGSA. Gajewski, Jon. 2007. Neg-Raising and Polarity. Linguistics and Philosophy 30: 289-328. Heim, Irene. 1991. Artikel und definitheit. In Semantics: An International Handbook of

Contemporary Research, Berlin: de Gruyter.

Horn, Laurence. 1989. A Natural History of Negation. Chicago: University of Chicago Press. Karttunen, Lauri. 1974. Presuppositions and Linguistic Context. Theoretical Linguistics 1: 181-

194.

Merchant, Jason. 2001. The Syntax of Silence. Oxford University Press

Ross, John. 1969. Guess who? Proceedings from the 5th Meeting of the Chicago Linguistics

BAYES NETS AND THE DYNAMICS OF PROBABILISTIC LANGUAGE

Daniel Lassiter, Stanford University

Abstract to be presented at Sinn und Bedeutung 21, University of Edinburgh, September 4-6 2016 Representing confidence & ignorance. You are asked to predict the outcome of a competition—say, a chess game between A and B—based on your background knowledge. Consider the following cases:

i) You know nothing at all about A and B.

ii) After watching many matches between A and B, you are confident that they are evenly matched. In both cases, A and B are equally likely to win is an appropriate judgment—but for intuitively very different reasons. It is widely assumed that the classical Bayesian theory, where an agent’s uncertainty is represented by a unique measure, cannot account for this difference. Halpern [’03, §2.3] takes a similar case to motivate the use of sets of measures or related enrichments: “probability is not good at representing ignorance”. Such considerations have also led to widespread use of these devices in formal epistemology [Joyce’10, Elga’10].

However, de Finetti [’77]; Pearl [’88, §7.3] point out that the availability of hierarchical models, e.g. Bayes nets, undercuts the intuitive motivation for this more complex representation. These models are used in many modern applications in statistics, AI, and psychology. Probabilities derive from graphs representing causal relations among variables, together with the conditional distribution on each variable given its parents. Bayes nets readily represent the two kinds of uncertainty: see Fig.1a. Player i’s performance is a Gaussian with parameters µi(skill) and σj (consistency). Each parameter’s parents represent uncertainty about causal

factors. P(A beats B) is the probability that A’s noisy performance exceeds B’s, here equal to P(µA>µB).

There is always a precise best guess about µA−µB, but confidence depends on the amount of evidence: low

when only general domain knowledge is available, and high when evidence indicates equal skill (Fig. 1b).

winner perfA µA ∀i ∶ µi∼ N (0, 1) σA ∀i ∶ σi=.1 perfB µB σB

(a) Hierarchical model of player performance in a match between A and B, inspired by the Microsoft Trueskill system used to rank Xbox Live players [Bishop’13]. Each per fi has a N (µi, σi)distribution, varying with the player’s uncertain skill µi and consistency σi. We simplify for presentational purposes by fixing σi=.1 for all i. winner is a deterministic node, with value A if per fA>per fBand B otherwise.

−3 −2 −1 0 1 2 3 0 1 2 3 4

Candidate performance distributions

Performance Density −3 −2 −1 0 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 Marginal on performance Performance Density −3 −2 −1 0 1 2 3 0.0 0.1 0.2 0.3 0.4

Performance difference, case (i)

Performance difference Density −3 −2 −1 0 1 2 3 0.0 0.5 1.0 1.5 2.0 2.5

Performance difference, case (ii)

Performance difference

Density

(b) Top left: some of the ∞ candidate performance distri- butions varying with µi. Right: marginal on performance. Bottom: distributions on per fA−per fB. Both are cen- tered on 0, so that P(winner = x) = .5 for x ∈ {A, B}. Case (i) [left] is the prior; case (ii) [right] is conditional on each player winning 15 of 30 matches. Increased confidence is reflected in lower variance of the expected difference.

Figure 1

Communicating uncertainties. Recent work suggests that likely, probable, and perhaps other epistemic operators have a probabilistic semantics as in (1) [Yalcin’05, ’07, Swanson’06, Lassiter’10,’15]. If so, we need a model of the way that people take up the information in probability statements, as in dialogue (2).

(1) Jl ikel yK

P

=λ φ⟨s,t⟩.P(φ ) > .5

(2) a. Carl: Who is going to win the chess game between A and B? b. David: B is likely to win.

Carl has learned something here, but what? Conditioning won’t do: ‘P(rain) > .5’ does not pick out a set of worlds, so P(⋅ ∣ P(rain) > .5) is not even defined. Nor will treating likely as a test on information states solve the problem [cf. Veltman’96], since then (2b) should be informationally inert.

Yalcin’s [’12] approach involves a dynamic semantics/pragmatics using sets of measures. Update by φ is likely eliminates probability measures µ where µ (φ ) ≯ .5, cf. (3). Probabilistic statements are thus straightforwardly informative, as desired.

(3) ICarl Ô⇒

learn (2b)ICarl∩ {µ ∣ µ (B wins) > .5}

To be clear, Yalcin’s modeling purposes are not the same as ours: his interest is the effect of probabilistic assertions on an abstract representation of the discourse common ground, rather than the way that individual participants’ credal states are affected by taking up the information conveyed by such statements. Still, Yalcin’s account gives us an initially plausible dynamics for the latter purpose as well, and so provides a reasonable starting point in our search for a solution.

An extension of Yalcin’s account to updating credal states gives us several reasons for pause, though. First, it requires different update procedures for probabilistic and non-probabilistic language. Second, the procedure accounts for the communication of known uncertainties—e.g., the color of a ball drawn with uniform probability from one of two urns of known composition—only if we use sets of measures to represent these cases as well, violating the spirit of the argument from ignorance. Third, the ability of hierarchical models to represent degrees of ignorance calls into question whether the added complexity of sets of measures is ever needed. Problems with the treatment of inductive learning and of decision-making also create serious difficulties for this representation of uncertainty ([Elga’10,White’10]).

I propose to resolve this tension by modifying the probabilistic semantics, relativizing probability statements not to a measure P but to a Bayes net B, as in (4). Suppose φ is a value of variable V. Each world wis associated with a ‘local probability’ function Pw. V takes φ to its conditional probability given the values,

at w, of the parents of V’s closest non-deterministic ancestors. (As a future event V cannot be observed; if it were, Pwwould take φ to 0 or 1 depending on the observed value.)

Associating probabilistic statements with sets of worlds makes it possible to condition on them, sim- plifying the dynamics dramatically: all update is conditionalization. For example, conditioning on “B is likely to win” (2b) assigns zero mass to worlds where the parents of the closest non-deterministic ancestors of winner—µA, σA, µB, and σB—do not interact so as to favor B’s victory: see (5). The effect of conditioning

on {w ∣ Pw(B wins) > .5}, depicted in Fig.2, is to filter combinations of the closest non-deterministic ancestors of winner that do not make B wins likely given the conditional probability information encoded by the Bayes net. The result is that B’s strength must exceed A’s—the intuitively correct result—and this choice of conditioning expression is generated compositionally.

(4) Jl ikel yK

B

=λ φ⟨s,t⟩.{w ∣ Pw(φ ) > .5} (5) PCarl Ô⇒

learn (2b)PCarl(⋅ ∣ {w ∣ Pw

(B wins) > .5})

Ongoing work suggests that this semantics also yields reasonable predictions for epistemics under other epistemics, such as definitely possible and probably unlikely (cf. [Moss’15]).

While many formal and empirical questions remain, we suggest that models of the dynamics of commu- nication & reasoning could benefit from increased engagement with applied Bayesian work.

Strength of B Strength of A

Density

Update on (2b)

Figure 2: Effect of conditioning the prior on the truth of B is likely to win.

References Bishop’13, Model-based machine learning ∇ Elga’10, Subjective probabilities should be sharp ∇ de Finetti’77, Probabilities of probabilities ∇ Halpern’03, Reasoning about Uncertainty ∇ Joyce’10, Defense of imprecise credences ∇ Lassiter’10, Gradable epistemic modals ∇ Lassiter’15, Epistemic compararison ∇ Moss’15b, Semantics & pragmatics of epistemic vocabulary ∇ Pearl’88, Probabilistic Reasoning in Intelligent Systems ∇Swanson’06, Interactions With Context ∇ Veltman’96, Defaults in update semantics ∇ White’10, Evidential symmetry and mushy credence ∇ Yalcin’05, A puzzle about epistemic modals ∇ Yalcin’07, Epistemic modals ∇ Yalcin’12, Context probabilism

3

Quantified indicative conditionals and the relative reading of “most”

Sven Lauer (sven@sven-lauer.net)

Georg-August-Universität Göttingen / University of Konstanz

Prerna Nadathur (pnadathur@stanford.edu)

Stanford University

Quantified indicative conditionals (QICs)constitute a challenge to semantic compositional-

ity (Higginbotham 1986): while the interpretations of (1a) and (1b) are intuitively equivalent, there is no consensus as to how, or even whether, this can be compositionally derived.

(1) a. Everyone failed if they goofed off. b. No one passed if they goofed off.

The most straightforward approach to this problem is built on what von von Fintel and Iatridou (2002) dub the ‘folkloric’ solution (von Fintel 1998): assuming that if-clauses can semantically restrict nominal quantifiers, just as they restrict modals and quantificational adverbs on Kratzer’s (1986) influential analysis. The ‘restrictor analysis’ assigns equivalent semantic content to the sentences (1a,b) but has been argued to produce unsatisfactory results in several other cases (von Fintel and Iatridou 2002, Higginbotham 2003, Huitink 2009). Leslie (2009), however, has revived the restrictor analysis, defending a version where the problematic cases are resolved by postulating a modal quantifier taking wide scope over a nominal quantifier restricted by if.

The inverse reading of most-QICs. Kratzer (in press) points out a surprising reading for

QICs under most. Out of context, (2) receives the ‘vanilla’ interpretation paraphrased in (2a), but in the right context, and with the right focus marking (as given in (3)), it can be interpreted as (2b). Observe that the continuation in (3) is incompatible with the ‘vanilla’ reading.

(2) Most kids used calculators if they had to do long divisions.

a. The majority of kids who had to do long divisions asked for calculators.

b. The majority of kids who asked for calculators were ones who had to do long division. (3) You: Did you see kids using calculators when you volunteered in your son’s school yes-

terday? What did they use the calculators for?

Me: Most kids asked for calculators if they had to do [long divisions]F. But I am

pleased to report that most kids in my son’s school do long divisions by hand.

Prima vista, the existence of ‘reverse’ readings makes a strong case against a restrictor anal-

ysis of QICs (folkloric, Leslie-style, or otherwise). These accounts only predict the reading (3a), where the if-clause indeed appears to enter into the restriction of most, and the matrix VP provides nuclear scope. On the ‘reversed’ (2b), however, it appears as if the matrix VP enters the restriction of most, while the if-clause provides the scope!

Kratzer (in press) uses the existence of (2b)-type readings to support the startling con- clusion that the perceived interpretation of QICs arises non-compositionally. She proposes that in the logical form of QICs, a (material) conditional operator is embedded under the quantifier (as in (4)). This gives the patently wrong truth conditions in (4b) for (1b). Kratzer remedies this by appeal to pragmatic domain restriction. In vanilla readings of (1a,b), the upstairs quantifiers happens to get restricted to individuals verfiying the antecedent of the embedded conditional (yielding an interpretation equivalent to (5a,b)). For most, Kratzer’s approach must additionally assume that the conditional is interpreted biconditionally, via embedded conditional perfection (Geis and Zwicky 1971).

(4) a. Allx[pers(x)][goof(x) ⊃ fail(x)]

b. Nox[pers(x)][goof(x) ⊃ pass(x)]

(5) a. Allx[pers(x) ∧ goof(x)][goof(x) ⊃ fail(x)]

b. Nox[pers(x) ∧ goof(x)][goof(x) ⊃ pass(x)]

The ‘reverse’ reading (2b) can then be construed as a case where the pragmatic domain re- striction proceeds differently: in particular, where the domain of the quantifier is restricted to those individuals satisfying the conditional consequent. Problematically, this predicts that the ‘reverse’ reading is available for all QICs, not just those involving most. But (1a,b) have no such readings.

We argue that the ‘reverse’ reading of most-QICs does not motivate abandoning compo- sitionality, and show that it is easily accounted for on a restrictor analysis. Our proposal predicts that QICs with most (and many) have ‘reverse’ readings, while those with, e.g.,

everydo not.

Proposal. Given a restrictor analysis, we show that the ‘reverse’ reading is simply an in-

stance of the ‘relative’ reading of most. For concreteness, we analyze most à la Hackl (2009), as modified by Romero (2015). Most decomposes into the cardinal quantifier many (6a) and the focus-sensitive superlative morpheme -est, which can scope independently of its host (Heim 1999, (6b)). On the relative reading, (3) is assigned the logical form sketched in (7a). With focus on long divisions, the comparison class for -est can be assumed to be the one in (7b). This produces the desired interpretation (7c).

(6) a. JmanycardK=λdnλPetλQet.∃x : P (x)[Q(x) & |x| ≥ d], where n is the degree type

b. J-estK=λCdt,tλPdt.∃dP (d) & ∀C ∈ C[C ≠ P → ¬Q(d)]

where C is a comparison class

(7) a. [-est C]1[t1-many kids[used calcs if they had long-divF] ∼ C

b. Alts: JCK⊆ {λd

0.d0-many kids used calcs if they had long-div,

λd0.d0-many kids used calcs if they had decimals, . . . }

c. ∃d∃x : (kid(x) & long-div(x))[calc(x) & |x| ≥ d] &

C ∈JCK[C ≠ λd

0.∃x : (kid(x) & long-div(x))[calc(x) & |x| ≥ d0] → ¬C(d)]

;# calculator-using long-div kids> # calculator-using kids doing other problem types

Notably, ‘reverse’ readings do not arise when most is restricted by a relative clause instead of a conditional. We suggest that the reverse reading of (8) is blocked by an independently- motivated constraint on the relative reading of most: the focus associate of (English) -est cannot be internal to the DP headed by most. Pancheva and Tomaszewicz (2012) trace this effect to the presence of the definite determiner in definite NPs with most (the most albums by U2), but it is needed for bare most on any theory that allows for relative readings: else, (9a) is predicted to have the non-existent reading (9b).

(8) Most students who had to do long divisions asked for calculators.

(9) a. Most [Scandinavians]F won the Nobel prize in literature.

b. More people from Scandinavia won the Nobel prize than from any other region.

Conclusion. We show that the ‘reverse’ reading of most-QICs is not a counter-argument to

the restrictor analysis of QICs; moreover, a compositional analysis of this reading falls out directly on a restrictor view by building on independent evidence for LF (7a), focus-sensitive comparison class construction (7b) and the lexical entries in (6). Analoguous readings are correctly predicted only for quantifiers that are focus-sensitive in the right way (e.g., many and few, according to the analysis of Romero 2015), but not for other quantifiers—in par- ticular every, all or no. Additionally, importing Leslie’s (extended) restrictor analysis solves the equivalence problem for QICs like (1a,b), and, moving forward, allows us to handle ap- parent scope ambiguities in indicative conditionals involving multiple quantifier types (e.g. No one always passes if they goof off).

References

Geis, M. and Zwicky, A.: 1971, On invited inferences, Linguistic Inquiry 2, 561–566.

Hackl, M.: 2009, On the grammar and processing of proportional quantifiers: Most versus more than half,

Natural Language Semantics 17, 63–98.

Heim, I.: 1999, Notes on superlatives. Lecture Notes, Massachusetts Institute of Technology.

Higginbotham, J.: 1986, Linguistic theory and davidson’s program in semantics, in E. Lepore (ed.), Truth and

Interpretation: Perspectives on the Philosophy of Donald Davidson, Basil Blackwell, Oxford, pp. 29–48.

Higginbotham, J.: 2003, Conditionals an compositionality, Philosophical Perspectives 17.

Huitink, J.: 2009, Quantified conditionals and compositionality, Language and Linguistics Compass 4, 42–53. Kratzer, A.: 1986, Conditionals, Chicago Linguistic Society, Vol. 22, pp. 1–15.

Kratzer, A.: in press, Chasing Hook: Quantified indicative conditionals, in L. Walters and J. Hawthorne (eds),

Conditionals, Probability, and Paradox: Themes from the Philosophy of Dorothy Edgington.

Leslie, S.-J.: 2009, “if”, “unless”, and quantification, in R. J. Stainton and C. Viger (eds), Compositionality,

Context and Semantic Values: Essays in Honor of Ernie Lepore, Springer.

Pancheva, R. and Tomaszewicz, B.: 2012, Cross-linguistic differences in superlative movement out of nominal