Phase theory: an Introduction

(1)

(2)

Phase Theory

Phase Theory is the latest empirical and conceptual innovation in syntactic theory within the Chomskyan generative tradition. Adopting a crosslinguistic perspec-tive, this book provides an introduction to Phase Theory, tracing the development of phases in minimalist syntax. It reviews both empirical and theoretical argu-ments in favor of phases, and examines the role phases play at the interface with semantics and phonology. Analyzing current phasehood diagnostics, it applies them in a systematic fashion to a broad range of syntactic categories, both phases and non-phases. It concludes with a discussion of some of the more contentious issues in Phase Theory, involving crosslinguistic variation with respect to phasehood and the dynamic versus static nature of phases.

Barbara Citko is Associate Professor of Linguistics at the University of Washington.

(3)

Research Surveys in Linguistics

In large domains of theoretical and empirical linguistics, the needs of scholarly communication are directly comparable to those in analytical and natural sciences. Conspicuously lacking in the inventory of publications for linguists, compared to those in the sciences, are concise, single-authored, non-textbook reviews of rapidly evolving areas of inquiry. The series Research Surveys in Linguistics is intended to fill this gap. It consists of well-indexed volumes that survey topics of significant theoretical interest on which there has been a proliferation of research in the last two decades. The goal is to provide an efficient overview of, and entry into, the primary literature for linguists– both advanced students and researchers – who wish to move into, or stay literate in, the areas covered. Series authors are recognized authorities on the subject matter, as well as clear, highly organized writers. Each book offers the reader relatively tight structuring in sections and subsections, and a detailed index for ease of orientation.

Previously published in this series

A Thematic Guide to Optimality Theory, John J. McCarthy The Phonology of Tone and Intonation, Carlos Gussenhoven Argument Realization, Beth Levin and Malka Rappaport Hovav

Lexicalization and Language Change, Laurel J. Brinton and Elizabeth Closs Traugott Deﬁning Pragmatics, Mira Ariel

Quantiﬁcation, Anna Szabolcsi Word Order, Jae Jung Song

(4)

Phase Theory

An Introduction

BARBARA CITKO

(5)

University Printing House, Cambridge CB2 8BS, United Kingdom Published in the United States of America by Cambridge University Press, New York

Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning and research at the highest international levels of excellence.

www.cambridge.org

Information on this title:www.cambridge.org/9781107040847

This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements,

no reproduction of any part may take place without the written permission of Cambridge University Press.

First published 2014

Printed in the United Kingdom by MPG Printgroup Ltd, Cambridge A catalogue record for this publication is available from the British Library

Library of Congress Cataloguing in Publication data Citko, Barbara, 1970–

Phase theory : an introduction / Barbara Citko. pages cm.– (Research surveys in linguistics)

ISBN 978-1-107-04084-7 (hardback)

1. Phraseology 2. Minimalist theory (Linguistics) 3. Grammar, Comparative and general– Syntax. 4. Generative grammar. I. Title.

P326.5.P45C48 2014

415–dc23 2013040516

ISBN 978-1-107-04084-7 Hardback

Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication,

and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

(6)

Acknowledgments

First, I would like to thank the many cohorts of students who took my graduate syntax classes at the University of Washington. The idea for this book came from you and all the questions you asked (that I may or may not have had answers to at the time). I hope next time I will! I also want to thank all my colleagues in the Department of Linguistics for creating an intellectually stimulating work environment, and especially my fellow syntacticians, Edith Aldridge, Karen Zagona and Julia Herschensohn, for encouragement and support, especially in theﬁnal stages of the project; and Toshi Ogihara for very kindly answering all my last-minute semantic questions.

Portions of this book were written at the Whiteley Writing Center, a wonder-ful writing oasis (on an island!), which I thank for the good writing vibes. Some parts were written when I was a visiting scholar at the Department of Linguistics and Philosophy at MIT in the Fall of 2011. I thank the department for the welcoming atmosphere it creates for all its visitors, and for all the inspiring classes, talks and conversations I had while I was there. In particular, I would like to thank David Pesetsky, Sabine Iatridou and Shigeru Miyagawa for their openness and interest in my research. I also want to thank Andrew Winnard and the whole editorial team at Cambridge University Press for their patience and guidance throughout the project, Alexander Sugar for help prooﬁng the manuscript and Brent Woo for help with the index.

Given that the literature on phases in syntactic theory is vast and ever-growing, it is impossible to do justice here to all the phenomena that bear on all the issues surrounding phases, to all the literature on phases and to all the researchers working on phases. I alone remain responsible for any omissions or misinterpretations.

Last but not least, I thank my husband Randy for his support and perspective. I dedicate this book to the memory of Icarus.

(9)

Abbreviations

The following list includes not only abbreviations used in this book but also the abbreviations often used in the primary literature on phases.

φ phi-features

Φ phonological component

Σ semantic component

ACC Accusative

ACD Antecedent-Contained Deletion

ATB across the board

AUGB approaching UG from below

AUX Auxiliary

BPS Bare Phrase Structure

C Case

CHL Computation of Human Language

CAUS Causative

C-I conceptual–intentional

CP Complementizer Phrase

DNS Derivation of Narrow Syntax

DP Determiner Phrase

ECM Exceptional Case Marking

EM External Merge

EPP Extended Projection Principle

ERG Ergative

FEM Feminine

FI Feature Inheritance

FIFA Feature Inheritance from Above

FIFB Feature Inheritance from Below

FL language faculty

FP Functional Projection

G Goal

H Head

(10)

HTA High Tone Anticipation

IC Interface Condition(s) or the Inclusiveness Condition

iF Interpretable Feature

IM Internal Merge

INTER Interrogative

INDEF Indeﬁnite

iwh interpretable wh-feature

LA Lexical Array

LBE Left Branch Extraction

LCA Linear Correspondence Axiom

LDA Long-Distance Agree

Lex Lexicon

LF Logical Form

LI Lexical Item

MASC Masculine

MOM Merge over Move

MSO Multiple Spell-Out

N Numeration

NEUT Neuter

NMLZ Nominalizer

NOM Nominative

NPI Negative Polarity Item

NS Narrow Syntax

NSL Null Subject Language

NSR Nuclear Stress Rule

NUM Number

P Probe

PART Participle

PF Phonological Form

P-feature Periphery Feature

PH Phase

PIC Phase Impenetrability Condition

PL Pair List or Plural

PLA Properly Local Agreement

PP Prepositional Phrase PrP Predication Phrase PST Past Q Question or Interrogative QP Quantiﬁer Phrase QR Quantiﬁer Raising

RNR Right Node Raising

RP Relator Phrase

SA Single Answer

(11)

SG Singular

SM Sensorimotor

SMT Strong Minimalist Thesis

S-O Spell-Out

SOR Subject to Object Raising

SOT Sequence of Tense

T Tense

TNP Traditional Noun Phrase

TP Tense Phrase

uF uninterpretable or unvalued feature

uQ uninterpretable Q feature

val Feature Value

VIR Virile

vP verb Phrase

(12)

Introduction

This research survey combines an introduction to Phase Theory with an assess-ment of the state of the art in Phase Theory. The term Phase Theory refers to a set of theoretical innovations in post-2000 minimalism (Chomsky2000,2001,2004,

2005,2008).1One of the core ideas in minimalism is the idea that the language faculty is an optimal solution to the constraints imposed on it by the two cognitive systems with which it interacts: the system of thought and the articulatory– perceptual system. What Phase Theory adds to this picture is the idea that the language faculty interacts with these two cognitive systems at very speciﬁc points during the syntactic derivation, and, consequently, that syntactic derivations are constructed in chunks referred to as phases.2 In most general terms, phases cannot be accessed by the narrow syntax once they are transferred to the interfaces.

My goal in this survey is to combine an introduction to a given issue within Phase Theory with an overview of the existing research on this issue (and an assessment thereof), giving the reader a sense of what is fairly settled upon and what is still under debate.3The fact that there is lot of research that relies on phases shows a need for a survey that situates phases in current syntactic theory, introduces the technical details of Phase Theory, synthesizes the existing research

1

Chomsky in his writings is very careful about distinguishing a program from a theory, emphasizing the programmatic nature of minimalism (see in particular Boeckx2006 for a more detailed discussion of this distinction). Chomsky does not use the term Phase Theory in his early writings on phases but does so more recently:‘One goal of Phase Theory is to provide the mechanisms to distinguish copies from repetitions, as straightforwardly as possible’ (Chomsky2012: 3).

2

The resulting model is also sometimes referred to as a Multiple Spell-Out (MSO) model. The idea of Multiple Spell-Out goes back to Uriagereka (1999) (see also Uriagereka2012for a fuller, book-length exposition).

3 _{There are a number of monographs, volumes and collections that focus on various aspects of Phase}

Theory, which I build on and am intellectually indebted to (see, among others, Frascarelli2006, Gallego2010,2012, Grohmann2009a,2009b,2009c, McGinnis & Richards2005, and the individual contributions in the two issues of Linguistic Analysis 33, guest-edited by Kleanthes Grohmann).

(13)

on phases (pointing out issues that might be still contentious), outlines directions for future research, and, last but not least, standardizes the notation.

Even though many (though not all) syntacticians (explicitly or implicitly) assume the concept of a phase, there seems to be less of a consensus regarding many of the most fundamental properties of phases, such as those listed in (1).

(1) a. How do we deﬁne phases? b. What categories count as phases?

c. Do the same categories count as phases with respect to semantic and phono-logical considerations?

d. Are phases dynamic or static?

e. Is there any crosslinguistic variation with respect to phasehood? f. How do phases interact with the interfaces?

The fundamental question to answer before we can even begin to address some of the questions listed above is in what sense a syntactic theory that assumes phases is more adequate (in a descriptive, explanatory, or beyond-explanatory adequacy kind of sense) than a syntactic theory that does not assume phases.4 This is the question we will be coming back to throughout the book. In the remainder of this introduction, I provide a brief summary of each chapter.

Chapter 1‘The Minimalist Program’ provides an overview of the core aspects

of the Minimalist Program. It outlines the general architecture of the minimalist grammar, and lays the groundwork for the discussion in the following chapters by focusing on the concepts that will be crucial to the understanding of phases, such as the distinction between interpretable and uninterpretable features and the concept of Spell-Out. This chapter is not meant as an introduction to (or survey of ) minimalist syntax; however, readers less familiar with minimalism willﬁnd all the necessary concepts, terms and mechanisms introduced in this chapter.

Chapter 2‘Motivating phases’ turns to phases themselves. It introduces the

concept of a phase, situating phases in the context of current minimalist approaches to syntactic dependencies, and asking if syntactic theory with phases is more adequate than a theory without phases, or a theory in which all phrases are phases. This chapter also gives a historical perspective on phases, and addresses some of the criticisms that have been levied in the literature against them, such as Boeckx & Grohmann’s (2007) critique of phases as‘barriers in disguise’.

Even though the idea of a cycle, which is conceptually related to a phase, goes back to the early days of generative grammar, the current concept of a phaseﬁrst appeared in Chomsky’s (2000)‘Minimalist Inquiries’, where phases (to be more speciﬁc, lexical subarrays associated with phases) were introduced as a solution to a problem arising from the so-called Merge over Move principle. Since then,

4 _{Following Chomsky (}₂₀₀₄_{), I take beyond explanatory adequacy to refer to the why-questions about}

language, captured by the following quote:‘In principle, then, we can seek a level of explanation deeper than explanatory adequacy, asking not only what the properties of language are but also why they are that way.’ (Chomsky2004: 105).

(14)

much research has focused on deﬁning phases and formulating independent phasehood diagnostics. The deﬁnitions of phases I survey in this chapter are listed in (2).5

(2) a. Phases are propositional objects. b. Phases are convergent objects.

c. Phases are objects interacting with the interfaces. d. Phase heads are loci of uninterpretable features. e. Phases are predication structures.

f. Phases are phrases.

From a diagnostic perspective, perhaps the most important aspect of Phase Theory is the so-called Phase Impenetrability Condition, which deems a portion of a phase impenetrable or inaccessible to operations from the outside. This chapter also surveys the various versions of the Phase Impenetrability Condition proposed in the literature, focusing on the empirical predictions they make, and ways to unify them (see Müller2004, Richards2004,2011, among others). The Phase Impenetrability Condition is tightly linked to the concept of Multiple Spell-Out, which I also elaborate on in this chapter, sorting through the logical possibilities of how Multiple Spell-Out can proceed, i.e. spelling out to the two interfaces at different points in the derivation, for example. Finally, this chapter introduces the concept of Feature Inheritance, as developed by Chomsky (2008) and Richards (2008), which is a logical consequence of deﬁning phase

heads as hosts of uninterpretable features. If uninterpretable features are a property of phase heads, the only way non-phase heads can get them is via Feature Inheritance.

Chapter 3‘Phasehood diagnostics’ turns to the many diagnostics that have

been proposed in the literature, a subset of which will serve as the basis for the discussion of speciﬁc phases (CPs, vPs, DPs, PPs etc.) in the chapters that follow, and the arguments in favor of (or against) these categories being phases. A common thread in many existing characterizations of phases is that they should exhibit a certain amount of independence and coherence at the interfaces. This, however, only raises the question of what it means for a given category (a candidate for a phase) to be semantically or phonologically independent and coherent. Furthermore, are there any phasehood diagnostics that do not fall neatly into either of the two groups (PF versus LF diagnostics): purely syntactic or purely morphological diagnostics? Given such rather vague existing character-izations of phases, this chapter focuses on the more tangible questions that can be (and have been) asked to establish the phasehood of a given category, which I list in (3) below. It examines these questions with a critical eye towards establishing genuine phasehood diagnostics, and avoiding those that might instead be diag-nosing something other than phasehood (such as constituency or phrasal status).

5

See also Boeckx (2006), Boeckx & Grohmann (2007), Den Dikken (2007) and Gallego (2010) for a discussion of these different views of phases, and of the problems some of them raise.

(15)

(3) a. Is XP a domain for feature valuation? b. Is X the locus of uninterpretable features? c. Does X trigger Spell-Out?

d. Is XP a phonological domain? e. Can the complement of X be elided?

f. Can XP be moved?

The Phase Impenetrability Condition also gives rise to a number of tangible phasehood diagnostics, coming mostly from the realm of successive cyclic movement through the edge of the phase, which in turn can be diagnosed by afﬁrmative answers to the following questions:

(4) a. Can the moved element be interpreted at the edge of the phase? b. Can the moved element be pronounced at the edge of the phase? c. Can the moved element leave something behind at the edge of the phase?

The discussion of phasehood diagnostics also raises the question of whether there is any crosslinguistic variation with respect to phasehood. This is the issue which I come back to in Chapter 6. While variation with respect to whether a given language has phases or not seems highly unlikely and implausible, given the conceptual arguments in favor of having phases to begin with (such as reducing computational load and being independently motivated by the interfaces), it is certainly possible for languages to differ with respect to what categories count as phases.

Chapter 4 ‘Classic phases’ discusses in detail three categories that are

com-monly assumed to be phases– CPs, vPs and DPs – and applies the diagnostics established in Chapter 3 to these categories.6 The phasehood status of CP is relatively easy to establish: the evidence in favor of successive cyclicity from the literature on A-bar dependencies is typically taken as evidence for CPs being phases (see, for example, Lahne2008for an illuminating overview). The evidence includes phenomena like wh-copying (Felser2004, Manetta2010, McDaniel1989, among many others), scope marking (Dayal 1996, Lutz Müller & von Stechow 2000, Stepanov 2000, among others), complementizer agreement (Carstens 2003, Carstens & Diercks 2011, Haegeman 1992, Haegeman & Van Koppen 2011, Zwart1993,1997), wh-quantiﬁer stranding (McCloskey2000,2001), reconstruc-tion (Barss 1986, 2001) and left branch extracreconstruc-tion (Wiland2010). The evidence in favor of vPs being phases is similar in spirit. This chapter also reviews the debate on whether unaccusative and passive vPs constitute phases, as argued for by Legate (2003), and against by Den Dikken (2006a). While many of the facts that are typically deemed to bear on the issue of C or v being a phase head might be given

6 _{The discussion in}_{Chapters 4}_and₅_{is a sequence of case studies. There are other categories that are}

conspicuously absent from the discussion (APs, AdvPs, various functional projections in the left periphery of a clause) whose phasehood we might wonder about. I thank Kleanthes Grohmann for raising this issue.

(16)

alternative explanations that do not necessarily rely on movement through the speciﬁer of CP or vP, such accounts typically still posit a relationship between C (endowed with uninterpretable features of the requisite sort) and the wh-pronoun in its domain. This also points towards C being a phase head, given that only phase heads are assumed to be the loci of uninterpretable features. More generally, I hope to show in this chapter that phase-theoretical accounts have the advantage of establishing connections between sets of facts that otherwise remain isolated and require independent explanations. For example, why should complementizer agree-ment phenomena and locality restrictions on moveagree-ment involve C? Or why would Austronesian extraction restriction and constraints on parasitic gap formation be sensitive to the properties of little v? Granting these projections a privileged syntactic status (namely, the status of a phase) brings us closer towards under-standing why syntactic phenomena should cluster around them.

The idea that DP might be a phase as well, explored by Matushansky (2005), Hiraiwa (2005) and Svenonius (2004), among others, should not come as a surprise, given the many structural and interpretive parallels between CPs and DPs, discussed in the literature going back to the very early days of generative grammar. However, since CPs contain other phases (namely vPs), an interesting question is whether DPs contain other phases as well. In order to tackle this question, this chapter also addresses the internal structure of DPs, motivating the need for DP-internal projections such as NumberP, PersonP or ClassiﬁerP, and asking which of them, if any, might be phases as well.

Chapter 5‘Other ph(r)ases’ turns to categories whose phasehood status is

somewhat more controversial, and still debated in the literature: Predication Phrases, Prepositional Phrases and Applicative Phrases. All of them have been argued to constitute phases (see, for example, Abels2003, Radkevich2010on PPs as phases and McGinnis2001on Applicative Phrases as phases); yet they are not considered to fall into the widely accepted phasehood canon. What makes these categories somewhat more controversial is that many other questions about them have to be answered ﬁrst before their phasehood can be entertained. For example, if phasehood is a property of functional categories, the question that needs to be resolved for prepositions is whether they are functional or lexical (or both or neither, depending on the preposition).

Chapter 6‘Variation in phasehood’ takes up the issue of whether there is any

crosslinguistic variation with respect to phasehood. There are two questions to consider here: the question of whether non-phase heads can acquire phasehood status in the course of the derivation (and conversely, whether phase heads can lose their phasehood status in the course of the derivation), and the question of whether different categories can count as phases with respect to phonological and semantic considerations. The former scenario (a head becoming a phase or ceasing to be a phase) has been argued to arise as a result of head movement (Phase Extension of Den Dikken2007or Phase Sliding of Gallego2010). The latter scenario (a category being a PF phase but not an LF phase or vice versa) has been explored by Marušič (2005) as a way to handle total reconstruction and

(17)

covert movement and by Felser (2004) to handle wh-copying. This chapter also addresses crosslinguistic variation with respect to phasehood: if phases are dynamic and head movement can extend phasehood, a certain amount of varia-tion will come from independent consideravaria-tions (such as the presence or absence of certain types of head movement). Variation in phasehood can also follow from variation in lexical inventories.

Chapter 7‘Phases at the interfaces’ examines the roles phases play at the

interfaces, putting them in the more general context of the syntax–phonology and syntax–semantics interface. With respect to the PF interface, it focuses on the questions of whether phases (or Spell-Out domains) are relevant and substantive phonological units, and how these phasal or Spell-Out units are manipulated by phonology. This chapter examines the role phases play in determining linear order (see Fox & Pesetsky’s (2005) Cyclic Linearization) and nuclear stress (see Adger2007, Kahnemuyipour2003,2004,2005, Kratzer & Selkirk2007, among others).

The potential evidence for the significance of phases at the syntax–semantics interface comes not only from phenomena like scope ambiguities (on the assump-tion that Quantifier Raising is constrained by phasehood) and the proposiassump-tional status of phases– both of which feature prominently as phasehood diagnostics – but also from the idea that the boundary between vP and CP phasal domains corresponds to the distinction between nuclear scope and restrictive domain in the tripartite quantificational structure, as proposed explicitly by Biskup (2009a).

(18)

1

The Minimalist Program

1.1 General architecture

The current chapter offers a bird’s eye view of the Minimalist Program. It is not meant as a comprehensive introduction (or a thorough overview) of minimal-ism. Rather, its goal is to give readers less familiar with minimalism the necessary and sufﬁcient background to follow the discussion of phases in the rest of this book. For the sake of clarity, the technical terms that I will be referring to throughout the book will be given in bold when they are ﬁrst introduced. For a more thorough textbook-style introduction to minimalism, I refer the interested reader to Adger (2003) and Hornstein, Nunes and Grohmann (2005) and the references therein.

What came to be known as the Minimalist Program was articulated explicitly in the early nineties with the publication of works such as Chomsky’s (1991) ‘Some Notes on Economy of Derivation and Representation’, and his (1993) ‘A Minimalist Program for Linguistic Theory’, both of which later became two of the four chapters of Chomsky’s (1995) The Minimalist Program. As Chomsky emphasizes in his writings, minimalism is grounded in the Principles and Parameters model, which gave us the beginnings of an understanding of which properties of language are universal (and perhaps unique to it), and which ones are subject to crosslinguistic variation. This, in turn, led to deeper questions, which are at the core of minimalist theorizing nowadays. These are questions that go beyond explanatory adequacy, alluded to above, such as the question of why language is the way it is. Computational efﬁciency and interface conditions play a central role, as stated succinctly in the following quote from ‘Beyond Explanatory Adequacy’.

(1) Its [the Minimalist Program’s, B.C.] task is to examine every device (principle, idea, etc.) that is employed in characterizing languages to determine to what extent it can be eliminated in favor of a principled account in terms of general conditions of computational efﬁciency and the interface condition [emphasis mine, B.C.] that the organ must satisfy for it to function at all.

(Chomsky2004: 106)

(19)

This is also clear in Chomsky’s discussion of the so-called three factors in language design and three types of conditions in language acquisition (see Chomsky2005 in particular). These are listed in (2a–c), and they help deter-mine how a child gets from the initial state (S0) of linguistic competence to the ﬁnal state: the fully formed adult state of linguistic competence. (2b) are the interface conditions, and (2c) are the general properties of efﬁcient

computation.

(2) a. unexplained elements of S0

b. IC (the principled part of S0)

c. general properties (Chomsky2004: 106)

The minimum that the language faculty (FL) has to accomplish is to interface with language-external systems. The two external systems in question are the sensorimotor (SM) system and the conceptual-intentional (C-I) system. The conditions imposed by these two external systems are referred to as Legibility Conditions, Bare Output Conditions or Interface Conditions (IC).1The Strong Minimalist Thesis (SMT), given in (3), states that language is designed to interface with the external systems in an optimum way.2

(3) The substantive thesis is that language design may really be optimal in some respects, approaching a“perfect solution” to minimal design speciﬁcations.

(Chomsky2000: 93)

The general architecture of the language faculty is as follows. Language has three components: Narrow Syntax (NS), the phonological component Φ and the semantic component Σ.3 For the most part, we will be concerned here with Narrow Syntax and its computational processes.

Each derivation starts with a set of‘lexical items’ which are manipulated in the course of the derivation by the syntactic operations Merge and Agree. I will discuss these two operations in more detail in Sections 1.2 and 1.4, respectively. The ‘lexical item’ is, strictly speaking, a bundle of features, not a primitive syntactic object (hence the quotes).4This set of lexical items is called a Lexical Array (LA) and is represented as an unordered set. A Lexical Array augmented by information

1 _{To the best of my knowledge, these terms are used interchangeably.} 2

The formulation of the Strong Minimalist Thesis in‘Minimalist Inquiries’ (which is the one given in (3)) is slightly different from the one Chomsky gives in‘Beyond Explanatory Adequacy’, where he formulates it as in (i):

(i) The set of unexplained elements of S0is empty. (Chomsky2004: 106)

S0refers to the genetically determined initial state in the process of language acquisition, which is

what UG provides.

3

There is no PF or LF cycle and thus there are no PF or LF operations. The terms are used to refer to PF or LF representations. The terms PF interface and LF interface are used to refer to the interface with SM or C-I systems, respectively.

4

This assumes something like the Late Vocabulary Insertion model of Distributed Morphology of Halle & Marantz (1993) and much later work.

(20)

on how many times each lexical item is selected from the lexicon is called a Numeration (N). This information is represented by the subscripts. In simple cases, Numerations and Lexical Arrays are equivalent, as shown in (4b–c).

(4) a. Icarus likes nuts.

b. LA = {Icarus, likes, nuts, v, T, C} c. N = {Icarus1, likes1, nuts1, v1, T1, C1}

The two diverge when a single item is used more than once in a given derivation, as in the infamous example given in (5a).5

(5) a. Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.

(Pinker 1994: 2006, crediting Annie Senghas) b. LA = {Buffalo, buffalo, buffalo, C, v, T}

c. N = {Buffalo3, buffalo2, buffalo2, C2, v2, T2}

However, one does not need to resort to exotic examples of this sort to show how Lexical Arrays and Numerations differ; relatively simple sentences with any level of embedding make the same point:

(6) a. Icarus thinks he likes nuts.

b. LA = {Icarus, thinks, he, likes, nuts, v, T, C} c. N = {Icarus1, thinks1, he1, likes1, nuts1, v2, T2, C2}

The output of a Narrow Syntax derivation is a pair of representations <PHON, SEM>, which is accessed by the two interfaces deﬁned above: the SM and the C-I interface. The derivation converges if the two representations satisfy the con-ditions imposed by the two interfaces; otherwise it crashes.6For a given repre-sentation to meet the Interface Conditions simply means it has to be legible to the external systems; hence the term Legibility Conditions. The question of what it means to be legible at a given interface is not trivial. A common and intuitively correct understanding of this concept of legibility is the following: an expression is legible at a given interface level (PF or LF) only if it consists of features that can be interpreted by the language-external systems: the SM and C-I system, respec-tively. But, of course, making convergence contingent on the presence of features that can only be interpreted at the interfaces raises an obvious question of what features the two interfaces can interpret. It seems quite plausible to assume that

5 _{Its notoriety comes from lexical ambiguity, of which it provides a very extreme illustration, not from}

the distinction between a lexical array, which in this case contains distinct lexical items‘buffalo’ (i.e. the proper name Buffalo, the common noun buffalo, and the less commonly used transitive verb to buffalo meaning to bully) and the Numeration that includes multiple occurrences of each of them. The following paraphrase helps distinguish the different meanings of buffalo:

(i) The buffalo from Buffalo that (another) buffalo from Buffalo bullies himself bullies (yet another) buffalo from Buffalo.

6

See, however, Frampton & Gutmann (2002) for a proposal that syntax only generates convergent derivations, and Preminger (2011) for a proposal that having unvalued features does not always lead to a crash.

(21)

the SM interface can interpret features having to do with linear order (if such exist), syllable structure, prosodic structure or intonation. The C-I interface, on the other hand, should be able to interpret features having to do with scope, quantiﬁcation, referentiality, speciﬁcity, propositional status etc. Neither interface can interpret formal features, such as structural case features or categorial fea-tures. Since features play such a major role in minimalism and there are quite a few contentious issues surrounding them, we will devote an entire section to them (Section 1.3).

The three basic operations that manipulate lexical items selected from the lexicon are External Merge, Internal Merge and Agree. These three, in con-junction with a more detailed discussion of features, are the focus of the next three sections.

1.2 External and Internal Merge

Recursion, the property of language that allows smaller units to combine iteratively to form larger units forming hierarchically structured objects, and displacement, the property that gives us the intuition that syntactic objects can surface in one position but be understood as belonging in another position, are two very fundamental (perhaps the most fundamental) properties of language. Chomsky (2004) distinguishes two kinds of Merge, External Merge (EM) and Internal Merge (IM), to capture these two fundamental phenomena. External Merge is the basic concatenation operation responsible for recursion in language. It takes two objects (such as X and Y in (7a), which have beenﬁrst selected from the Numeration), and combines them into one bigger object, as shown in (7b).7External Merge is a recursive operation; one of these two objects could

7

Elsewhere, I have argued that Merge can also create structures in which a single object can end up shared between two objects, referring to this type of Merge as Parallel Merge (see Citko2011band the references therein), illustrated in (i–ii). Parallel Merge combines the properties of External Merge and Internal Merge. Before Merge takes place, Z and YP are disjoint (as in External Merge) but Z merges with a subpart of YP (as in Internal Merge).

(i) Merge X and Y, Project Y

YP

Y XP

(ii) (Parallel) Merge X and Z, Project Z

YP ZP

Y XP Z

Chomsky (2007) excludes such a possibility on the grounds that‘it requires new operations and conditions on what counts as a copy, hence additional properties of UG’ (Chomsky2007: 8, note 10).

(22)

itself be the output of a prior Merge operation, as shown in (7d), where the complex object that is the output of (7b) merges with Z.8

(7) a. Select X and Y from the Numeration b. Merge X and Y

X

X Y

c. Select Z from the Numeration d. Merge Z with X

Z

Z X

X Y

8

Following Chomsky (1994), I assume a bare-bones approach to phrase structure, referred to as Bare Phrase Structure (BPS), in which the X-bar status of a given projection (in particular, whether it is a head (X0_{), a phrase (XP) or an intermediate projection (X}0_{)) is not a primitive of the theory, as it can}

always be derived from the structure in which it appears. Thus a head is an element that is not a projection of another element of the same type. It can (but it does not have to) project further. A phrase is an element that does not project any further. Again, it can, but it does not have to, be a projection of another element of the same type. The representations in (7a–d) are BPS representa-tions; the ones in (i) and (ii) are their more traditional X-bar theoretical counterparts, with the ones in (ib) and (iib) simpliﬁed in that they lack vacuous intermediate projections.

(i)a. XP b. XP X′ X YP X YP (ii)_a. _ZP _b. _ZP Z′ Z XP Z XP X YP X′ X YP

Even though I will be using representations of the kind given in (ib)–(iib) throughout this book, mostly for familiarity’s sake, I do assume a BPS approach throughout. The advantages of such an approach are twofold. First, it allows a given element to be simultaneously a head and a phrase, something that the behavior of clitics suggests has to be allowed in the grammar. Y in (7b) and (7d) is such an element; it is a head because it is not a projection of another Y and it is a phrase because it does not project any further. Second, BPS dispenses with vacuous intermediate projections, a welcome move from the perspective of the Strong Minimalist Thesis (SMT).

(23)

The output of External Merge could also be represented in set notation. (8a) is equivalent to (7b) and (8b) to (7d).

(8) a. {X, {X, Y}} b. {Z, {Z, {X, {X, Y}}}}

It is standard to assume that the result of Merge also has to have a label, bolded in (8a–b) above.9The label is determined by one of the two merged elements; other options would violate the Inclusiveness Condition (IC), which prohibits the intro-duction of new elements in the course of the derivation. Any label other than X or Y in (8a) (and other than Z and X in (8b)) would be a violation of this condition. Chomsky (1995) discusses two other logical possibilities. In a set consisting of two elements (such as X and Y in (8a)), the label could also be the intersection of X and Y or the union of X and Y. However, he excludes theﬁrst option on the grounds that the intersection in many cases is a null set. The second option is excluded because the union of X and Y is contradictory if X and Y differ in features, as is the case when one is N and the other V, for example. This leaves us with only one option: either for X or Y to project as the label. The question of which of the two it is, or the even more fundamental question of whether labels are necessary at all, is far from being settled. A common and fairly intuitive idea is that the object which has its selectional feature checked as a result of Merge is the object that determines the label. For example, when a verb or a preposition merges with a noun, this verb (or preposition) has its selectional feature satisﬁed via the Merge operation. This means that the label is going to be determined by this verb or preposition.10,11 Many existing labeling algorithms make reference to selection; the one from Cecchetto & Donati (2010), given in (9) below, is a representative example:12

9

An interesting issue is whether all the features of X determine the label, or only a subset thereof. While it seems clear (and commonly assumed) that categorial features (N, V, D etc.) are part of the label, what other features need to project is somewhat less clear (see Citko2008a, 2011b, Cecchetto & Donati2010, Donati & Cecchetto2011, for a discussion of this and related issues).

10

Things get a little more complicated if c-selection is removed from our syntactic toolkit, as Chomsky suggests in‘Beyond Explanatory Adequacy’ (2004).

11 _{Chomsky (}₂₀₀₈_{) proposes a different labeling algorithm, given in (i) below, which does not make}

reference to selection.

(i) In {H,α}, H a lexical item (LI), H is the label.

If_{α is internally merged to β forming {α, β}then the label of β is the label of {α, β}}

(Chomsky2008: 145)

12

Other examples similar in spirit, as also noted by Cecchetto & Donati (2010) and Gallego (2010) include those proposed by Adger (2003), Pesetsky & Torrego (2006) and Boeckx (2008b):

(i) The head is the syntactic object which selects in any Merge operation. (Adger2003: 91) (ii) Ifα and β merge, some feature F of α must probe F on β. (Pesetsky & Torrego2006) (ii) The label of {α, β} is whichever of α or β probes the other, where the Probe = Lexical Item whose uF

(24)

(9) The label of a syntactic object {α, β} is the feature(s) that act(s) as a Probe of the merging operation creating {α, β}. (Cecchetto & Donati2010: 245)

Chomsky (2004) considers the possibility that the grammar requires no labels whatsoever (see also Collins2002, Gallego2010, Seely2006, among others).13 While intuitively appealing from the perspective of the Strong Minimalist Thesis, eliminating labels raises some non-trivial issues, such as how selection or relativized minimality works. I will not discuss these issues here (as they strike me as tangential to the issue of phases. Instead, I refer the interested reader to Citko (2008a, 2011b) and the references therein for relevant discussion, pointing to the conclusion that labels are not dispensable. One argument often adduced against the existence of labels comes from the Inclusiveness Condition. However, if a label is simply a copy of the features of one of the two merged elements (or more likely, of a subset of its features), inclusiveness is not violated.

Internal Merge, which as Chomsky suggests in ‘Beyond Explanatory Adequacy’ (2004) is a very natural way to capture movement, is similar to External Merge in that it takes two objects and combines them to form one larger object. The only difference is that one of these two objects is already part of the other one (hence the term Internal Merge). (10) below provides an illustration; Y undergoes Internal Merge with X or, to use more traditional terminology, moves to the speciﬁer of X.

(10) Internal Merge of X and Y14

a. X b. X

Y X X

XEPP Y XEPP Y

The label in Internal Merge structures is determined by the element that drives movement. In (10a–b), for example, it is some feature of X that drives movement of Y to its speciﬁer. I assume that the feature in question is the EPP feature, which is checked by this movement. Checking is marked by a strikethrough. In what

13 _{However, Chomsky (}₂₀₁₃_{) seems to depart from this conclusion, and points out a number of cases}

in which the lack of labels forces movement. This suggests that in other (non-movement) cases, labels have to be in principle available.

14

(10a) represents movement (Internal Merge in our terms) in a familiar Copy and Merge notation, whereas (10b) does so in a way that reﬂects the term ‘Internal Merge’ perhaps a little more directly. Both are BPS-style representations. Strictly speaking, the movement illustrated here (i.e. movement of a complement to a speciﬁer of the same head) might be banned by independent anti-locality principles, which I discuss in more detail inChapters 4and5.

(25)

follows, I use the term valuation to refer what used to be known as checking and reserve the term checking for Extended Projection Principle (EPP) feature checking, the only remnant of checking in the current system.

The third operation that plays a crucial role in minimalist syntax is Agree. Since it is an operation that affects features, let me digress brieﬂy and discuss features ﬁrst. This is the focus of the next section, which dicusses syntactic features and the role they play in Narrow Syntax.

1.3 Features

Examples of syntactic features that populate syntactic literature include cate-gorial features (such as N features or V features), case features, φ-features (gender, number, person), EPP features, wh-features, and Q features, among others. Features play a crucial role in minimalism in that they drive syntactic operations.

Features come in two guises: Interpretable Features (iF) and

Uninterpretable Features (uF). Chomsky (2001,2004) reduces the distinction between these two types to feature valuation. Uninterpretable features are simply features that enter the derivation unvalued, whereas interpretable ones are the ones that come with inherently speciﬁed feature values:

(11) An uninterpretable feature F must be distinguished somehow in LEX from interpretable features. The simplest way, introducing no new devices, is to enter F without value: for example, [uNumber]. That is particularly natural because the value is redundant, determined by Agree. (Chomsky2004: 116)

I will use the following notation to distinguish these two types, where val stands for feature value.

(12) a. Interpretable feature: iF[val](e.g. iT[past], iφ[3sg, masc])

b. Uninterpretable feature: uF[ ](e.g. uCase[ ], uφ[ ])

In‘Minimalist Inquiries’ (2000), Chomsky takes the presence of uninterpretable features to be an imperfection from the perspective of the Strong Minimalist Thesis, since the existence of such features is not forced by Bare Output Conditions. Interestingly, though, he takes this imperfection to be linked to another imperfection in the system: movement. Uninterpretable features drive movement. ‘Perhaps these devices [uninterpretable features, B.C.] are used to yield the dislocation property’ (Chomsky 2000: 121). The fact that these two imperfections are related is somewhat‘suspect’, and it opens up the possibility that they might not be imperfections after all. This is indeed the view Chomsky takes in ‘Beyond Explanatory Adequacy’, where he takes movement to be Internal Merge.

The concept of interpretability assumed here is a relative notion in that it refers to interpretability with respect to one of the two (or perhaps both) interfaces: the C-I interface and the SM interface. Being interpretable at a given interface means containing features that this interface can interpret. For example,

(26)

it seems plausible to assume the C-I interface can interpret features that have some semantic import; examples of such features might be features involving tense (past, present, future (and null, on some accounts)), aspect, quantiﬁcation or φ-features (person, number and gender). The SM interface, on the other hand, can interpret features that are relevant for articulation. This allows us to dis-tinguish features that are interpretable at a given interface from features which the interface cannot interpret: a feature is interpretable at a given interface if it makes a distinction between two representations at this interface.15,16 Svenonius (2007a) (see also Adger & Svenonius 2011 for relevant discussion) puts it succinctly as follows:

(13) A feature F is an X feature iff F can constitute a distinction between two different X representations.(Svenonius2007a: 375)

Crucially, a given feature is not inherently interpretable or uninterpretable. It can be interpretable in one position but not in another, or at one interface but not the other. This, in turn, depends on what lexical item the feature in question is a feature of. Perhaps the most straightforward illustration comes from the domain of φ-features. These features are typically taken to be interpretable on nouns but not interpretable on verbs, which reflects the intuition that verbs agree with nouns and not vice versa. One way to think about it is that nouns are selected from the Numeration inherently specified for these features. Verbs, however, are selected from the Numeration unspecified. The following examples from Polish, a language with grammatical gender distinctions, provide an illustration.17 In (14), the noun papuga ‘parrot’ is lexically specified as a

15 _{This view of interpretable features bears some resemblance to distinctive features, not an}

unwel-come result. Distinctive features in phonology are features that distinguish between two phono-logical representations. By analogy, we could think of features that are interpretable at the C-I interface as distinctive semantic features.

16

Svenonius also makes the following distinction between module-internal features (for example, syntax-internal features) and interface features (for example, syntax–semantics features). Features that have no semantic import at all, such as class features, or arguably EPP features, provide good examples of syntax-internal features.

(i) For any F, and any modules X and Y,

a. F is an X-internal feature iff F is an X feature and not a feature of any other module. b. F is an X-Y interface feature iff F is an X feature and a Y feature. (Svenonius2007a: 375) On one view, any syntax-internal feature is an uninterpretable feature. Svenonius, however, gives uninterpretable features a narrower deﬁnition: uninterpretable features for him are those features that are not translated or interpreted in the mapping from one module to another.

17

I use English glosses for the sake of clarity. Also, the use of actual lexical items in the Numeration is misleading. Since, in this case, the verb enters the derivation with its inﬂectional features unvalued, it is more accurate to think of the Numeration as consisting of feature bundles and the derivation as manipulating these feature bundles. The use of small caps is meant to capture this intuition.

(27)

third-person singular feminine noun; the presence of third-person singular fem-inine agreement on the verb, however, is a result of run-of-the-mill subject–verb agreement.18In (15), on the other hand, the noun ptak‘bird’ is speciﬁed as a third-person singular masculine noun and the verb agrees with it.

(14) a.Papuga siedziała na drzewie. [Polish]

parrot.3SG.FEM sit.PST.3SG.FEM on tree ‘A parrot sat on a tree.’

b. N = {_PARROTiφ[3SG.FEM],. . ., SITuφ[ ],}

(15) a.Ptak siedział na drzewie.

bird.3SG.MASC sit.PST.3SG.MASC on tree ‘A bird sat on a tree.’

b. N = {_BIRDiφ[3SG.MASC],. . ., SITuφ[ ],}

The idea thatφ-features are always interpretable on nouns has sometimes been questioned in the relevant literature. We have seen above (note 18) that number might be different from other features in this respect. Legate (2002,2012) also raises some issues for the way Chomsky characterizes the distinction between interpretable and uninterpretable features, pointing out that it is not clear in what sense gender features in languages with non-natural gender are semantically

18 _{Number is different from person and gender. For most nouns, gender speci}_{ﬁcation is inherent and}

invariant; the Polish noun książka ‘book’ is always third person feminine but it can be either singular or plural. This suggests that number might not be inherent to the noun itself in the same way person and gender are. One way to implement it is to think of Number as being uninterpretable on the noun but interpretable on a higher Number head, as shown in (i) (see Danon2011for a more concrete proposal).

(i) DP D NumP NumiNum[val] NP NiPers[val] iGen[val] uNum[ ]

However, there do exist nouns that can be either feminine or masculine (such as natural gender marked nouns in (ii)), or nouns that are always singular or plural (such as so-called pluralia tantum nouns in (iii)).

(ii) aktor/aktorka ‘actor/actress’

lekarz/lekarka ‘male doctor/female doctor’ pilot/pilotka _{‘male pilot/female pilot’} (iii) spodnie ‘pants’

noz·yczki _{‘scissors’}

Since little hinges on this distinction as far as the speciﬁcs of Phase Theory are concerned, I will abstract away from it in what follows.

(28)

interpretable. Consider, for example, the noun book: in Russian it is feminine, in French masculine, and in German neuter, as shown in (16a–c).

(16) a.eta kniga [Russian]

this.3SG.FEM book3SG.FEM

b.un livre [French]

a.3SG.MASC book.3SG.MASC

c.ein Buch [German]

a.3sg.NEUT book.3SG.NEUT

The mismatches between syntactic number and semantic number (discussed by Munn1999, for example) raise similar issues. A noun like‘group’ or ‘com-mittee’ is syntactically singular but semantically plural. Does this mean that its interpretable number feature is singular or plural? Given that interpretability in this case refers to the semantic interface, it seems that it should be plural, but then the question is why (at least in American English) it determines singular verb agreement. So perhaps a better distinction is between inherently/lexically speciﬁed features (which may or may not map into semantically relevant features but which can provide values to other features) and those that receive their speciﬁcation in the course of a syntactic derivation.

The idea that the same feature can be interpretable in one location but uninterpretable in another one raises the question of whether every inter-pretable feature has an uninterinter-pretable counterpart and vice versa: whether every uninterpretable feature has an interpretable counterpart. For features such as φ-features, the answer to both questions seems to be yes. φ-features, as we saw above, are interpretable on nouns but uninterpretable on verbs. What about other features? Tense features (i.e. past, present, future) have to be inter-pretable somewhere; the question is where and what their uninterinter-pretable counterparts are. It is generally assumed that the Complementizer–Tense com-plex provides tense information; this suggests that tense features are inter-pretable on these heads in spite of the fact that in many languages these features are realized on verbs, as pointed out by Pesetsky & Torrego (2007). If tense is interpreted on T and it ‘needs’ an uninterpretable counterpart, the tense feature on the verb comes to mind as its uninterpretable counterpart, as shown in (17).

(17)TiT[PST] walkeduT[PST]

A slightly different explanation (but not one that is incompatible with tense feature on the verb being uninterpretable) comes from Pesetsky and Torrego’s (2001) work on case. In their view, Nominative case is an instance of an unin-terpretable T feature on the subject:

(18)TiT[PST] DPuT[NOM]

In a similar spirit, Svenonius (2002) reanalyzes the Accusative case feature as an uninterpretable Aspect feature (uAsp). Support for such a reanalysis comes from languages like Turkish or Finnish (see Enç1991, Kiparsky1998, Megerdoomian

(29)

2000, among others), in which Accusative case marks aspectual notions such as delimitedness or speciﬁcity:

(19) a.Ali bir kitabı aldı. [Turkish]

Ali one book.ACC bought

‘A book is such that Ali bought it.’

b.Ali bir kitap aldı.

Ali one book bought

‘Ali bought some book or other.’

(20) a.Matti luk-i kirja-t (tunni-ssa). [Finnish]

Matti-SG/NOM read-PST/3SG book-PL-ACC (hour-INESS) ‘Matti read the books (in an hour).’

b.Matti luk-i kirjo-j-a (tunni-n).

Matti-SG/NOM read-PST/3SG book-PL-PART (hour-ACC)

‘Matti read books (for an hour).’ (Megerdoomian2000: 316–17)

The mechanism responsible for valuing unvalued features is called Agree. Since Agree is the subject of the next section, for now all we need to know is that Agree values uninterpretable features. I use the notation in (21) to capture this distinction. Val stands for any feature value complex, empty brackets signify the lack of value, and ﬁlled brackets signify valued features. The only two possibilities are valued interpretable features and un-valued uninterpretable features; other combinations, to which I turn shortly, would be contradictions in terms.

(21) a. valued/interpretable features: iF[val]

b. unvalued/uninterpretable features: uF[ ]

Uninterpretable features have to be valued in the course of the derivation. If there are only two types of features, uninterpretable/unvalued and interpret-able/valued, providing a value to an uninterpretable feature will make it indistinguishable from a feature that was interpretable (i.e. it had a value) to begin with. Furthermore, if unvalued features need to be deleted by the time of Transfer to the interfaces, there is no way to know which features need to be deleted, as they are all valued by the time of Transfer. This gives rise to the following paradox, discussed by Epstein & Seely (2002), Richards (2008) and Chomsky (2008), among others. Uninterpretable features cannot be interpreted by the interfaces, so they have to be deleted by the time they reach the interfaces. However, since they might have a phonetic reflex (such as the presence of subject–verb agreement as a reflex of an uninterpretable φ-feature on T or overt case morphology as a reflex of an uninterpretable case feature), they must be transferred to the phonological component before they get deleted. This is not a problem, however, if feature valuation, Transfer and deletion happen at the same point during the derivation. This point in the derivation is determined by phase heads. Thus the solution to the timing paradox provides an indirect argument in favor of phases. Since we have not said much about phases yet, I will defer a more complete discussion of the

(30)

interaction between feature valuation, Transfer and deletion till the next chapter. To preview, we will see there that feature valuation and deletion have to happen simultaneously at the point of Transfer to the interfaces. There is thus no need for the derivation to keep track of which features entered the derivation valued and which ones did not.

As pointed out by Pesetsky & Torrego, Chomsky’s reasoning for equating interpretability with valuation has to do with the fact that Narrow Syntax lacks the look-ahead property to determine whether a given feature is going to be interpretable at the interface or not. It is important to note that the timing issue alluded to here only arises if we equate uninterpretability with lack of value; if uninterpretable features remain uninterpretable throughout the deri-vation, the issue does not arise. Indeed, this equivalence has also been questioned in the literature. Pesetsky & Torrego (2007) and Bošković (2011), for example, argue that valuation and interpretability should be taken as independ ent of each other. This allows feature combinations that were contradictory if feature interpretability amounts to valuation. The‘new’ possibilities are bolded in (22).

(22) Types of features

uF[val] an uninterpretable and valued feature

iF[val] an interpretable and valued feature

uF[ ] an uninterpretable and unvalued feature

iF[ ] an interpretable and unvalued feature (cf. Pesetsky & Torrego2007)

For Pesetsky & Torrego (2007), tense on the verb is valued in the lexicon (which, they argue, is plausible given the fact that quite often tense surfaces on the verb).19 It is nevertheless uninterpretable and remains so throughout the derivation. The tense feature on the T head, on the other hand, is interpretable but unvalued:

(23) a.TiT[ ] walkeduT[PST]

b.TiT[PST] walkeduT[PST]

There are thus two plausible candidates for what an uninterpretable T feature might be: a tense feature on the verb or a Nominative case feature on the noun. Furthermore, Pesetsky & Torrego argue that such a dissociation of inter-pretability and valuation also removes the need for two types of features on interrogative complementizer heads and wh-phrases. Complementizer heads in wh-questions are typically thought to require two types of features: the feature that marks the clause as interrogative (essentially Cheng’s (1991) typing

19 _{I am simplifying Pesetsky and Torrego}_{’s system somewhat. For them, feature valuation is feature}

sharing of the kind proposed by Frampton and Gutmann (2000), for example. Instead of (23), they suggest (i), in which [2] is the value. The number 2 has no signiﬁcance, it just marks the same feature value in both locations:

(i) TenseiT[2] walkeduT+past[2]

(31)

feature) and the feature that drives a wh-element to the speciﬁer of the interrogative clause. The two are typically referred to as a Q-feature and a wh-feature, respectively; the complementizer has an interpretable Q feature (iQ feature) and an uninterpretable wh-feature (uwh-feature). A wh-phrase in its scope, on the other hand, has an interpretable wh-feature feature (iwh) and an uninterpretable Q feature (uQ).20 Agree, as shown in (24), ensures valuation of both unvalued features, the uninterpretable wh-feature of C and the uninterpretable Q feature of the wh-phrase.

(24) a.CiQ, uwh[ ] WHiwh,uQ[ ]

b.CiQ, uwh[val] WHiwh,uQ[val]

This is a proliferation of interrogative-type features. For Pesetsky and Torrego, C has an interpretable unvalued Q feature and the wh-phrase has its uninterpretable valued counterpart. As shown in (25), there is only one type of feature (Q feature), and wh is its value.

(25) a.CiQ[ ] WHuQ[wh]

b.CiQ[wh] WHuQ[wh]

1.4 Agree

The discussion in the previous section relied on the assumption that Agree is an operation responsible for valuing unvalued features.21In this section, we turn to the details of the Agree mechanism. Following the general consensus in theﬁeld, I will capitalize Agree when I use it in this technical sense and distinguish it from the descriptive terms ‘agreement’ or ‘agrees with’, used quite broadly in the literature.

For Agree to be possible, the following conditions have to hold (cf. Chomsky

2000: 122–3).

(26) a. The Probe and the Goal have to be active, where being active means having uninterpretable/unvalued features. T H E A C T I V I T Y C O N D I T I O N

20

Some sources take C to have an interpretable wh-feature and an uninterpretable Q feature, and the wh-phrase to have an uninterpretable wh-feature and an uninterpretable Q feature. This seems to be a terminological rather than a substantive difference.

21

Chomsky describes Agree as an operation responsible for deleting uninterpretable features under matching:

(i) The erasure of uninterpretable features of probe and goal is the operation we called Agree. (Chomsky2000: 122) Feature deletion is distinct from feature valuation, which raises the question of whether we have been mistaken in taking Agree to be responsible for feature valuation. I do not believe so. As we will see in

Section 2.6, if valuation and deletion happen simultaneously (at the phase level), there is no incom-patibility here.

(32)

b. The features of the Probe and Goal have to match, where matching refers to

feature identity. T H E M A T C H I N G C O N D I T I O N

c. The Goal has to be inside the domain of the Probe, where the domain of the

Probe is its sister. T H E D O M A I N C O N D I T I O N

d. The Goal has to be in a local relationship, where locality is closest

c-command. T H E L O C A L I T Y C O N D I T I O N

When all these conditions are met, the unvalued features get valued (and deleted). A typical conﬁguration satisfying all four conditions is given in (27). Agree is indicated by a dotted line (a convention I am going to employ throughout the book). Agree is typically assumed to be impossible if both the Probe and the Goal have only unvalued features.22

(27)

a. b.

P_{uF[ ]} AGREE PuF[val]

G_{iF[val], uF[ ]} _G

iF[val], uF[val]

A very straightforward illustration comes from Nominative case licensing, which in current terms reduces to valuation of the uninterpretable Case feature (uC[ ], or uT[ ]in Pesetsky & Torrego’s system). Nominative case is assumed to be a reﬂex of the Agree relationship between a ﬁnite T (which, in current terms, means it has a valued/interpretable tense feature and an unvalued set of φ-features) and a subject in the domain of T, which has a valued set ofφ-features and an unvalued Case feature, as represented schematically in (28) for a past tense T and a third-person singular subject. Agree provides values to the uninterpret-able features on both; in (28b) there are no unvalued features left.

(28) a.T_{uφ[ ], iT[pst]} DP_{iφ[3sg.fem], uC[ ]} b.Tuφ[3sg.fem] iT[pst] DPiφ[3sg.fem], uC[Nom]

For Chomsky, Agree is a binary relationship, involving a single Probe and a single Goal. However, others have argued that Agree does not have to be binary and that it also can involve one Probe undergoing Agree with multiple Goals or multiple Probes undergoing Agree with a single Goal. Hiraiwa (2001) dubs this type of Agree Multiple Agree, focusing speciﬁcally on the scenarios in which a single Probe (T in this case) agrees simultaneously with two (or more) Goals (two DPs in the case at hand):

(29) a.Tuφ[ ], iT[val] DPiφ[val], uC[ ] DPiφ[val], uC[ ]

b.T_{uφ[val] iT[val]} DP_{iφ[val], uC[val]} DP_{iφ[val], uC[val]}

22 _{Some researchers allow such non-standard Agree (see Adger}₂₀₀₃_{: 169, for example).}

(33)

In my own work on Parallel Merge structures (see Citko 2011bfor the most complete exposition), I argued in favor of the other possibility: the possibility of a single Goal agreeing simultaneously with multiple Probes:

(30) a.T_{uφ[ ], iT[val]} T_{uφ[ ], iT[val]} DP_{iφ[val], uC[ ]} b.Tuφ[val], iT[val] Tuφ[val], iT[val] DPiφ[val], uC[val]

Since the issue of whether Agree is a binary operation or not does not bear directly on any aspect of Phase Theory, I will not elaborate on it further here, and I refer the interested reader to the works cited above (and the references therein) for data, evidence and further discussion. The one thing about Agree that is relevant to Phase Theory concerns the locality conditions on Agree; in particular the issue of whether Agree is possible across phase boundaries. This is the issue we come back to in Section 2.4, which focuses on the Phase Impenetrability Condition.

(34)

2

Introducing phases

2.1 Merge over Move preference1

This chapter introduces the concept of a phase, providing both theoretical and empirical motivation for it. To the best of my knowledge, the termﬁrst appeared in Chomsky’s (2000)‘Minimalist Inquiries’, where phases (to be more speciﬁc, lexical subarrays associated with phases) were introduced as a solution to a problem arising from the Merge over Move (MOM) principle. The concept of a phase, however, builds on many prior principles involving locality domains: cycles, barriers, islands, to name a few.

Let us start by looking at the Merge over Move Principle in more detail. What it says is that, all things being equal, Merge is preferred over Move.2Consider the following contrast involving a raising predicate be likely, modeled upon Chomsky’s (2000) (non-parrot) examples:

(1) a. There are likely to be many parrots at the clay lick right now. b. *There are likely many parrots to be at the clay lick right now.

Without the MOM principle, it is impossible to explain the ungrammaticality of (1b). The crucial step in its derivation is the one illustrated in (2b), where the next step has to involve checking the EPP feature of T.

(2) a. N = {there, T, are, likely, to, be, many, parrots, at, the, clay lick, right, now} b. [TPtoEPPbe many parrotsiφ[3pl],uC[ ]at the clay lick right now]

1 _{See also Hornstein Nunes & Grohmann (}₂₀₀₅_{) for a very lucid overview of the rationale behind the}

Merge over Move principle. The logic of the presentation here mirrors theirs. See, however, Castillo, Drury & Gohmann (2004) for a discussion of some issues MOM raises and ways to resolve these issues.

2 _{Chomsky points out that MOM could be derived from more general economy principles, if Move is}

a more complex operation than Merge, consisting of Copy and Merge. It is not clear, however, if the logic survives when Move is treated as a variant of Merge, i.e. Internal Merge. There is no sense in which Internal Merge is more complex than External Merge.

(35)

We have a choice here: the EPP feature of T can be checked either by merging the expletive there in [Spec,TP] or by moving the DP many parrots there. However, only one of these choices leads to a convergent derivation. If we move many parrots to [Spec,TP], the derivation proceeds as follows:

(3) a. Move [DPmany parrots] to [Spec,TP], checking the EPP feature of T

TP DPiϕ[3pl],uC[ ] T′

many parrots T VP

toEPP

be many parrots at the clay lick right now

b. Merge [TPmany parrots to be at the clay lick right now] with the adjective likely

AdjP Adj TP likely DPiϕ[3pl],uC[ ] T′ many parrots T VP toEPP

c. Merge [APlikely many parrots to be at the clay lick right now] with the verb are

VP V AdjP are Adj likely TP DPiϕ[3pl],uC[ ] T′ many parrots T VP to_EPP

d. Merge [VPare likely many parrots to be at the clay lick right now] with T

Move are to T

(36)

TP TEPP,uϕ[3pl] VP are are V AdjP Adj TP likely DPiϕ[3pl],uC[Nom]T′ many parrots T VP toEPP

be many parrots at the clay lick right now e. Merge the expletive there with [TPare likely many parrots to be at the clay lick

right now], checking the EPP feature of T TP there T′ T_EPP,uϕ[3pl] VP are V AdjP are Adj TP likely DPiϕ[3sg],uC[Nom] T′ many parrots T VP toEPP

be many parrots at the clay lick right now f. There are likely many parrots to be at the clay lick right now.

Interestingly, nothing seems to go wrong with this derivation, yet the result (example (3f)) is ungrammatical. The uninterpretable features are all checked and/or valued, the Numeration is exhausted, and locality is respected. This is where the Merge over Move (MOM) principle comes in: the step in (3a) violates MOM, as we are moving the DP many parrots to [Spec,TP] rather than merging the expletive there. If we merge the expletive thereﬁrst, the derivation proceeds as schematized in (4) instead, leading to the grammatical (4f).

(4) a. Merge there with [TPto be many parrots at the clay lick right now], checking the

EPP feature of T TP

there T′

toEPP VP

be many parrotsiϕ[3pl],uC[ ] at the clay lick right now

(37)

b. Merge [TPthere to be many parrots at the clay lick right now] with the adjective likely AdjP Adj TP likely there T′ T VP to_EPP

be many parrots_i_{ϕ[3pl],uC[ ]} at the clay lick right now c. Merge [AdjPlikely there to be many people present] with the verb are

VP V AdjP are Adj TP likely there T VP toEPP

be many parrots_{ϕ[3pl],uC[ ]} at the clay lick right now T′

d. Merge the [VPare likely many parrots to be at the clay lick right now] with the

matrix T

Agree between T and many parrots Move are to T TP Tuϕ[pl],EPPVP are V AdjP are Adj TP likely there T VP toEPP

be [DP many parrots iϕ[3pl],uC[Nom]] at the clay lick right now T′

(38)

e. Move the expletive there to [Spec,TP], checking the EPP feature of T TP there T′ T_uj[3pl],EPPVP are V AdjP are Adj TP likely there T′ T VP toEPP

be many parrots[3pl],uC[Nom] at the clay lick right now f. There are likely to be many parrots at the clay lick right now.

2.2 Motivating phases

Now, with a basic grasp of the Merge over Move principle, let us look at slightly more complex examples, such as (5).

(5) There is a strong likelihood that many parrots will be at the clay lick right now.

Since its Numeration, given in (6a) below, contains an expletive, the prediction is that (5) should be ungrammatical, since at the stage given in (6b), according to MOM, the Merge of there should win out over the Move of many parrots.

(6) a. N = {there, is, a, strong, likelihood, that, many, parrots, will, be, at, the, clay lick, right, now}

b. TP

T VP will_uϕ[3sg]EPP

be many parrots_{iϕ[3pl],uC[Nom]t} be at the clay lick right now T′

TP

there

T VP will_u_ϕ[3sg]EPP

be many parrots_{iϕ[3pl],uC[Nom]} be at the clay lick right now T′

c.

Phase theory: an Introduction

Phase Theory

Phase Theory

An Introduction

BARBARA CITKO

Contents

Acknowledgments

Abbreviations

Introduction

1

The Minimalist Program

2

Introducing phases