A Centered Index of Spatial Concentration: Expected Influence Approach

(1)

A Centered Index of Spatial Concentration:

Expected Influence Approach

and Application to Population and Capital Cities

∗

Filipe R. Campante

†

and Quoc-Anh Do

‡

This version: April 2010

Abstract

We construct a general axiomatic approach to measuring spatial concentration around a center or capital point of interest, a concept with wide applicability from urban eco-nomics, economic geography and trade, to political economy and industrial organization. By analogy with expected utility theory, we propose a basic axiom of independence (sub-group consistency) and continuity for a concentration order that ranks any two distribu-tions relative to the capital point. We show that this axiom implies anexpected influence representation of that order, conceptualizing concentration as an aggregation of the ex-pected influence exerted by the capital on all points in the relevant space (or vice-versa). We then propose two axioms (monotonicity and rank invariance) and prove that they im-ply that the associated influence function must be a decreasing isoelastic function of the distance to the capital. We apply our index to measure the concentration of population around capital cities across countries and US states, and also in US metropolitan areas. We show its advantages over alternative measures, and explore its correlations with many economic and political variables of interest.

Keywords: Spatial Concentration, Expected Influence, Expected Utility, Population Concen-tration, Capital Cities, Gravity, CRRA, Harmonic Functions, Axiomatics.

JEL Classification: C43, D78, D81, R23.

∗_{We owe special thanks to Joan Esteban, James Foster, and Debraj Ray for their help and very useful}

sugges-tions. We are also grateful to Philippe Aghion, Alberto Alesina, Dan Vu Cao, Davin Chor, Juan Dubra, Georgy Egorov, Ed Glaeser, Jerry Green, Bard Harstad, Daniel Hojman, Michael Kremer, David Laibson, Antoine Loeper, Erzo F.P. Luttmer, Nolan Miller, Rohini Pande, Karine Serfaty de Medeiros, Anh T.T. Vu, and semi-nar participants at Clemson University, Ecole Polytechnique, ESOP (University of Oslo), George Washington, GRIPS (Tokyo), Harvard Econ, Harvard Kennedy School, Kellogg MEDS, Paris School of Economics, Singapore Management University, University of Houston, University of Lausanne’s Faculty of Business and Economics, and the Workshop on Conflicts and Inequality (PRIO, Oslo) for helpful comments, and to Ngan Dinh, Janina Matuszeski, and C. Scott Walker for help with the data. The usual disclaimer applies. The authors gratefully acknowledge the financial support from the Taubman Center for State and Local Government (Campante), and the financial support from the Lee Foundation and financial support and hospitality from the Weatherhead Center for International Affairs (Project on Justice, Welfare and Economics) (Do).

†_{Harvard Kennedy School, Harvard University.} _{79 JFK St., Cambridge, MA 02138.} _Email:

fil-ipe [email protected].

‡_{School of Economics, Singapore Management University, 90 Stamford Road, Singapore 178903. Email:}

(2)

1 Introduction

Spatial concentration is a very important concept in the social sciences, and in economics in particular – both in the sense of geographical space, as studied by urban economics, economic geography or international trade, and in more abstract settings (e.g. product or policy spaces) that are studied in many different fields, from industrial organization to political economy. As a result, a number of methods have been developed to measure this concept, from relatively ad hoc measures such as the Herfindahl index to theoretically grounded approaches such as the “dartboard” method of Ellison and Glaeser (1997), and also including the adaptation of indices used to capture related concepts such as inequality (Gini coefficient, entropy measures). These measures are well-suited to analyzing the concentration of a given variable over a “uniform” space, in which no point is considered to be of particular importance in anex ante sense.

In practice, however, it is often the case that some points are indeed more important than others. In other words, we might be interested in measuring the concentration of a given variablearound a point (e.g. a city or a specific site), rather than its concentration over some area (e.g. a region or country). The standard indices of concentration are not suited to capture this type of situation, as they leave aside plenty of information on actual spatial distributions. This paper presents a coherent framework to understand the concept of concentration around a capital point of interest across a broad range of applications – ultimately, the concen-tration of any variable in very general spaces of economic interest. We conceptualize a spatial distribution as describing the probability of an individual observation being located at any given point in the relevant space. This analogy with probability distributions leads us to reach for the tools of expected utility to build an axiomaticExpected Influence framework for concentration around a capital point. We then develop this framework to generate a theoretically grounded measure to quantify the concept: a centered index of spatial concentration (CISC).

Our approach establishes ordinal properties that should be satisfied by any relation C designed to capture the concept of concentration around a capital point C. The first basic axiom builds on the analogy with expected utility theory to pose properties ofIndependence, or Subgroup Consistency, andContinuity(Axiom 1) that are appealing in light of the probabilistic interpretation underlying the approach – and find a natural counterpart in the literature on the measurement of inequality and poverty. These axioms yield the Expected Influence Theorem

(3)

(Theorem 1), whereby any concentration order C has an Expected Influence (EI) represen-tation. In words, we show that we can understand concentration as an aggregation of the expected influence exerted by the capital on all points in the relevant space (or vice-versa).

We then proceed to give further content to the concentration order C by specifying two basic properties any such order should satisfy. Monotonicity (Axiom 2) considers a pair of distributions such that one of them, for any given distance to C, places more mass closer to that point than the other, and requires that the former should be ranked as more concentrated around C than the latter. Rank Invariance (Axiom 3) prescribes that the ranking between different distributions is preserved when the unit, or scale, of distance measure is arbitrarily changed – in other words, the ranking of two distributions should not change based on whether distances are measured in miles, kilometers, or millimeters. These two very natural properties, when appended to the EI representation, define the class of CISC (Centered Index of Spatial Concentration) (Theorem 2): the expected influence, with the influence function being a mono-tonically decreasing, isoelastic (“constant relative risk aversion”) function of the distance to the capital point C.

We then provide further discussion on how to calibrate the crucial degree of freedom left by the CISC – the elasticity parameter. We can interpret it as measuring how marginal influence is affected by the distance to the capital point, and as measuring how the concentration order reacts to mean-preserving spreads (generalized to many dimensions). These two interpretations define two special cases: theLinear CISC (L-CISC), with constant marginal influence, and the Gravity-based CISC (G-CISC), which is invariant with respect to uniform mean-preserving spreads and can thus be interpreted as eliciting the “gravitational pull” exerted by the capital. We also discuss how the CISC can be normalized to suit different applications.

Examples of circumstances in which there is specific ex ante knowledge of the importance of a given capital point are not hard to come by. Topics in urban economics (such as the study of urban sprawl and urbanization, e.g. Henderson 2003, Glaeser and Kahn 2004), political economy (political importance of capital cities and urban centers, e.g. Ades and Glaeser 1995, Traugott 1995, Campante and Do 2007), international trade (gravity equations, as formalized by Anderson and van Wincoop 2003),1 industrial organization (concentration of competitors

1_{Their formalization involves the concept of multilateral resistance, expressible as a measure of how remote a}

particular country is from the ensemble of other countries. The geographical concentration of the world around each country, in this sense, is theoretically expected and empirically verified to affect trade flows in and out

(4)

around a firm or producer), economic geography (“market potential”, e.g. Fujita et al. 1999) – all of these place emphasis on the concentration of population and economic activity around a geographical center of interest. The concept is also important in non-geographical contexts. For instance, one can think about concentration around points of interest within a product space, with obvious applications in IO models of spatial competition, but also in development.2

The same goes for abstract policy spaces, in political economy.3

Not surprisingly in light of that, the literature has grappled with the question of devising a centered measure of spatial concentration. Our EI approach, besides generating a specific family of indices, enables us to systematically evaluate those alternative measures within a unified framework, in very general spaces. We show that measures such as “capital primacy” (e.g. the share of population of a country that lives in the capital or main city, as in Ades and Glaeser 1995 or Henderson 2003) violate continuity and monotonicity, as they discard a lot of information by attaching zero weight to all observations falling outside of the designated boundary. Other approaches, such as the negative exponential density functions from the urban economics literature that has tried to measure the “centrality” of “mononuclear” urban areas, can be shown to violate rank invariance. Another interesting example also comes from the same literature on urban sprawl. Galster et al. (2001), for instance, measure this centrality by the inverse of the sum of the distances of each observation to the central business district. We can show that this is a monotonic transformation of L-CISC, and therefore satisfies all of our axioms. This underscores that our general framework can go much further than the typically ad hoc approaches in the literature, which are inherently limited by what an intuitive grasp of the properties of a given space will provide.

The EI approach also relates very naturally to the literature on the measurement of riskiness (Aumann and Serrano 2008) (as is clear from the connection with expected utility theory), inequality, and polarization (Duclos et al. 2004). It provides a foundation to extend these other of that country. In an earlier version of this paper, available upon request, we detail the surprisingly close connection between the measure of concentration we develop and the formula of multilateral resistance.

2_{Hidalgo et al.} _{(2007), for instance, are interested in a product space in which distances measure the}

likelihood that a country might move from one type of product to another. One might then be interested in how concentrated a country’s economy is around a specific industry, say, oil production.

3_{See for instance Baron and Diermeier (2001), where the status quo policy has special clout, and hence the}

concentration of preferences around that status quo point may be of particular interest. Empirically, one could immediately connect this to the voting records of politicians, or to the collection of opinions from, say, the World Values Surveys.

(5)

contexts to multidimensional settings, and also lends itself to the development of a measure of (non-centered) concentration.

The second part of the paper provides an example of empirical implementation of our mea-sure, by computing an index of population concentration around capital cities across countries. We use the L-CISC and G-CISC as illustration, and show that our index provides a much more sensible ranking of countries than currently usedad hoc alternatives and non-centered proxies. It also uncovers a negative correlation between the size of population and its concentration around the capital city that is not detected by those alternatives.

In addition, motivated by the idea that political influence diminishes with distance to the capital – as put by Ades and Glaeser (1995, p. 198-199), “spatial proximity to power increases political influence” – we consider the correlation between population concentration and a num-ber of measures of quality of governance. We show that there is a positive correlation between concentration and the checks that are faced by governments, and that this correlation is present only in non-democratic countries.4 _{The statistical significance of this correlation is substantially}

improved by using our index instead of the ad hoc alternatives.

We also illustrate how our index can shed light on the issue of the choice of where to locate the capital city, which goes back at least to James Madison during the debates at the US Constitutional Convention of 1787. We show that there is a pattern in which both very autocratic and very democratic countries tend to have their capital cities in places with relatively low concentration of population. Inspired by the Madisonian origins of this debate, we extend our implementation by computing our index for US states, and finish it off by running the computations for US metropolitan areas.

The remainder of the paper is organized as follows. Section 2 presents the main definitions required by the theory. Section 3 presents the foundations of the EI approach, and Section 4 adds the axioms that characterize the CISC. Section 5 discusses the interpretation of the elas-ticity parameter, normalization procedures, and the comparison with other measures. Section 6 contains the empirical implementation and correlation analysis, and Section 7 concludes.

4_{This is consistent with Campante and Do (2007), who present a theory of revolutions and redistribution}

(6)

2 Main Definitions

We start by spelling out the definitions of the main mathematical objects and transformations that are required for our approach. Our main concern is the spatial concentration of a variable – which can be thought of as population, economic activity, etc. – around a point of interest, so we consider a pointCin a compact subset X of_Rn_.5 _{and denote}_X _the_σ₋_{algebra of Lebesgue}

measurable subsets of X. We refer to C as the capital point.6 _Denote _D

X to be the space of

positive bounded measures on X, endowed with the topology of weak convergence, andPX to

be its subspace of probability measures (i.e. PX = {p :

R

Xdp = 1}). Notice that PX is an

affine space.

We will use the term normalized distribution, or simply distribution, for all measures p ∈ PX. A normalized distribution px is called -simple, or just simple, at a point x∈S if px is a

uniform measure on the ballB(x, ) of center xand radius(with B(x, ) ={z:|z−x|< }). We focus our attention on distributions whose size is normalized to one because, generally speaking, we want to be able to disentangle features of the distribution that are distinct from concentration per se. Most importantly among these features, of course, is the size of the population under consideration.7

We can now define our main object of interest, which is an order that compares distributions in relation to the capital point C.

Definition 1 (Concentration Order) A concentration order of distributions based on the capital C is a complete preference relation _%C on PX, that is, a transitive binary relation on

PX such that for all p and q ∈PX either p%Cq or q%Cp.

5_{While our presentation focuses on Euclidean spaces, our approach is, for the most part, easily generalizable}

to any normed vector spaces. (We will explicitly note the parts that are not.) We also conjecture that the approach is generalizable to a large set of “well-behaved” metric spaces. (See the discussion on normed vector spaces and metric spaces in the Appendix.) This means that we can deal with non-Euclidean metrics such as, say, the time (or cost) to travel between any two points on the map. This is a metric that satisfies the basic metric properties (identity of indiscernibles, symmetry and triangular inequality), while being complicated by road conditions, natural and institutional barriers, etc. We can thus extend our analysis to an even more general framework, which could be tailored to specific applications.

6_{We use the term “capital point”, and not “center”, to emphasize that said point need not be located at any}

spatial concept of a center, such as a baricenter or the center of a circle.

7_{We will see that it is always possible to normalize the distribution so that it is re-scaled to a unit size,}

and we can focus without loss of generality on such normalized distributions. This is what we will do in the remainder of the paper. Certain axiomatic frameworks, such as the construction of inequality measures, may choose to incorporate a population invariance axiom imposing the equivalence of distributions that differ only in size. We decide to leave that choice to specific applications.

(7)

We will read p C q as “p is more concentrated around C than q”. Our entire approach can be understood as giving further content to this relation.

In order to think about the concept of spatial concentration, it will be helpful to define some transformations on the space DX, namely “squeeze” (homothety) and translation. Roughly

speaking, a squeeze brings each point in _Rn _{closer (when the scaling ratio is positive and less}

than one) to a center point by the same proportion. Atranslation moves each point in _Rn _away

by the same vector. These concepts are easily extendable to distributions, and we can define them more formally as follows:

Definition 2 (Squeeze, or homothetic transformation) A squeeze of origin O∈_Rn _and

ratioρ∈_R, denoted S(O,ρ), is a self-map onRn that brings any pointxcloser to O by a factor

of ρ:

S(O,ρ)(x) =ρ(x−O) +O.

Definition 3 (Translation) A translation of vector t, denoted Tt, is a self-map on Rn such that:

Tt(x) = x+t,

Definition 4 (Extension to sets and distributions) The squeeze and the translation are defined on the σ−algebra X such that for each set Z ∈ X:

S(O,ρ)(Z) = {S(O,ρ)(z) :z∈Z}; Tt(Z) ={Tt(z) :z∈Z}.

The squeeze and the translation are defined onDX such that for each distribution p∈DX and

each setZ ∈ X:

S(O,ρ)(p)(Z) = p(S(O,ρ)(p)(Z)); Tt(p)(Z) =p(Tt(Z)).

A couple of additional definitions will be helpful for our statements and proofs. First, let us define aunivariate distribution that is associated with anyp∈PX andC∈X. This univariate

distribution, intuitively speaking, characterizes the probability of a point being within a certain distance from C, under the distribution p. (Note that we thus define univariate distributions on_R+_{, so that they should not be confused with the distributions} _p_∈_P

X.) More formally, it

(8)

Definition 5 (Univariate distribution) Each p∈ PX is associated with a unique

univari-ate distribution FC

p on R+ characterized by the following cumulative distribution function:

F_pC(d) = p({x:|x−C| ≤d}).

Note that the definition is sensible because for everyp∈PX, the functionFpConR+is (weakly)

increasing and continuous on the right.

Our final definition is that of a uniform ball distribution, which is simply a uniform distri-bution defined over a ball.

Definition 6 (Uniform ball distribution) A uniform ball distribution p(T,κ) of center T

and radius κ is a uniform probability distribution on the ball B(T, κ).

3 The Expected Influence Approach

We want to give content to the concentration order _%C, so that it makes sense to compare distributions with regard to how concentrated they are around the capital C. For that, it is useful to start by laying out our approach in an informal way. Since we are interested in normalized distributions, represented by the subspacePX, it is convenient to think of them as

probability distributions. To fix ideas, let us think about the application within which we will exemplify the empirical implementation of the approach: the population of a country and its concentration around the capital city. We can thus conceptualize the distribution as describing the probability that a given individual (behind a “veil of ignorance”) will end up located at any given point in the country. Our basic intuition is to think about the concentration order %C as ranking the aggregate influence that is exerted by the capital C on all individuals, or symmetrically the influence exerted by all points on the capital, under each distribution. This probabilistic interpretation suggests that we can resort to expected utility theory in order to think of desirable properties that such concentration order ought to display. This is the guiding principle of ourExpected Influence approach.

The analogy with von Neumann-Morgenstern expected utility theory suggests our basic axiom, which (in consolidated form) closely parallels the standardindependence and continuity axioms:

(9)

Axiom 1 (Independence / Subgroup Consistency and Continuity) 1. Independence / Subgroup Consistency:

p_%C q⇒λp+ (1−λ)r %Cλq+ (1−λ)r ∀p, q, r∈PX , λ∈[0,1].

2. Continuity: ∀p ∈PX, {q ∈ PX :p %C q} and {q ∈ PX : q %C p} are both closed in the topology of weak convergence.

The intuition for this axiom is straightforward in light of the aforementioned probabilistic interpretation. In words, the independence component of the axiom states that if a distribution p is more concentrated around C than q, then a distribution that combines p and another distributionr will be more concentrated around Cthan a distribution that combines q and r. This is the familiar notion of independence of irrelevant alternatives: if an individual ends up located at a certain point, the points were she could have ended up, but did not, should not matter for the assessment of influence. By the same token, the continuity component means that small changes in the “probabilities” of being located in different points should not affect the concentration ranking.

The alternative label of subgroup consistency illustrates an alternative (and equivalent) in-terpretation for the independence axiom. Subgroup consistency can be understood again in reference to our application: suppose that we divide the population of the country into sub-groups, and that one of these subgroups becomes more concentrated around the capital, while the distribution of all other subgroups remains unaffected. Subgroup consistency requires that the total population of the country be judged to be more concentrated as a result of that change. That is to say, for instance, that if the Flemish population becomes more concentrated around Brussels, and the Walloon population stays put, the population of Belgium as a whole will have become more concentrated around its capital. This is a standard property often imposed by the literature concerned with the measurement of inequality (Foster and Sen 1997) and poverty (Foster and Shorrocks 1991), and seems to be a natural starting point for our concept of concentration. While it does impose restrictions in terms of possible interactions between the concentrations of different subgroups – it essentially imposes that they are independent – this agnostic position with respect to the direction of these possible interactions seems to be appropriate for our general approach.

(10)

Axiom 1 thus states the first properties that will give content to the concentration relation %C. We can now use it, with the arsenal of expected utility theory, to establish our first main result, the backbone of our Expected Influence approach:

Theorem 1 (Expected Influence) The concentration order_%Csatisfies Axiom 1 if and only if there exists a function hC:X →_R that is bounded, continuous, and such that:

p_%C q⇔IC(p)≥IC(q) ∀p, q ∈PX,

where IC is a real-valued function on PX such that:

IC(p) =

Z

X

hCdp ∀p∈PX.

The function hC is called the influence function associated with IC(·). Moreover, IC and its associated influence function are unique up to positive affine transformations.

Proof. From Axiom 1, the result follows from standard arguments from expected utility theory – see for instance Theorem 3.2.2 from Karni and Schmeidler (1991).

This Expected Influence Theorem means that the concentration order_%Ccan be represented by an index IC defined on PX (uniquely up to an affine transformation) and its associated

influence functionhCdefined onX. The analogy with expected utility is once again instructive, as the influence function plays a role analogous to that of the (Bernoulli) utility function: it is a cardinal measure of the influence exerted by the capital on any given point.

The theorem gives us a very natural way to frame the concept of concentration around a point of interest: a distribution will be more concentrated than another when the expected influence of the capital over an individual (or vice-versa) is greater in the former than in the latter. With that in mind, we will refer to this index IC as an Expected Influence (EI) representation of the concentration relation. (Of course, every monotonic transformation ofIC will still represent the same order _%C, but without the EI representation.)8

The EI representation displays a property that proves very convenient in applications, which we call decomposability and state as follows:

8_{It is important to keep in mind that Axiom 1 may rule out what could be interesting cases in specific}

applications. Just as with expected utility, it leaves open the possibility of “paradoxes” that violate independence – one could think, for instance, of a situation where the influence of an individual is disproportionately increased by the presence of other individuals in the same location. In any case, as with expected utility, Axiom 1 provides us with a general language to understand the concept, and a benchmark against which to interpret departures in specific cases.

(11)

Corollary 1.1 (Decomposability)

I(λp+ (1−λ)q,C) = λI(p,C) + (1−λ)I(q,C) ∀p, q ∈Px , λ∈[0,1].

Decomposability means that the EI index can be computed separately for any given set of subgroups, and we can add these indices to obtain the overall measure for the entire population. This means, for instance, that the concentration of the US population around Washington, DC can be decomposed into the concentration of the population of each state around that capital point, or into the concentration of groups defined along ethnic lines, income, or any other arbitrary criterion – we can compute the index separately for each group, and from those be able to obtain the overall index for the entire population. This property is closely related to subgroup consistency, as noted by Foster and Shorrocks (1991), but it builds in more structure, with the additive feature. In particular, while any monotonic transformation of the EI index will represent the same concentration order (and hence satisfy Axiom 1), only the EI representation will be decomposable.

4 The Centered Index of Spatial Concentration

Having obtained the EI representation, we can now specify additional basic properties that the order _%C should satisfy so that it captures the idea of concentration around the capital point

C. We will show that these basic properties impose substantial constraints on the EI indices that would be acceptable. These constraints will in turn define our class of Centered Indices of Spatial Concentration (CISC).

The first of these properties is monotonicity, which should be satisfied by any reasonable concept of concentration around a capital point. Suppose we compare two distributions,p and q, and for any d > 0 there is more population that is within a distance d from C under the former than under the latter. It should naturally be the case that p is judged to be more concentrated thanq. Note that this can be stated in terms of first-order stochastic dominance (FOSD): if the univariate distribution FC

q FOSDs the univariate distribution FpC, then the

concentration order should rankp ahead ofq. This enables us to capture this idea concisely as follows:

Axiom 2 (Monotonicity or First-Order Spatial Dominance)

(12)

The axiom states that there exists a neighborhood around the capital point within which FOSD implies a specific ordering of the associated distributions. Note that in this particular definition monotonicity is defined locally, that is to say in a neighborhood of the capital. This is a weaker notion than one where stochastic dominance is defined globally, and leaves open the possibility that some points would be more influential than others that are closer to the capital.

The second basic property, which we labelrank invariance, can be understood with reference to what happens when we change the units in which distances are measured. Suppose we have two distributions, p and q, and the distances between points are measured in miles. If p is deemed to be more concentrated around C than q, it stands to reason that this relative ranking should not change if distances were instead measured in kilometers. In other words, changing the unit of distance measure should not change the ordering of distributions by the concentration order_%C.

This change in units is isomorphic to a squeeze of the distribution around the capital point of interest by a factor ofρ, whereρ >0 gives us the conversion rate between the different units. (Obviously, a “squeeze” in whichρ >1 is actually an expansion around the capital point.) As a result, our property affirms that the relative order of different distributions remains unchanged when they are squeezed or expanded around the capital. We state this in the following axiom:

Axiom 3 (Rank Invariance)

p_%q⇔ S(C,ρ)(p)%S(C,ρ)(q) ∀p, q ∈PX, ρ >0.

This axiom embodies a property of neutrality with respect to labels: to build on the example of a Euclidean space, it requires that relabeling all the axes proportionally will leave the order unaffected. This means that we attribute independent meaning to distances, regardless of the unit in which they are measured.9

As it turns out, these two very natural axioms impose remarkably powerful restrictions on the set of influence functions that can be used to represent the concentration order within the EI framework. This can be stated in the following theorem, which characterizes the admissible CISCs:

9_{This feature is obviously natural in a geographical context, but we surmise that it is just as fundamental}

in abstract contexts where the specific scale is arbitrary. Consider an example in which preferences with

respect to policy are measured in an arbitrary 1-5 scale. If the researcher is not ready to impose the rank invariance property were she to rescale them to a 1-10 scale, it would probably not make much sense to speak of concentration.

(13)

Theorem 2 (CISC) A concentration order _%C satisfies Axioms 1-3 if and only if it is repre-sented by an EI index IC with the influence function hC such that:

hC(x) = α|x−C|γ+β

def

≡ h(|x−C|), where α <0 and γ >0.

Moreover, ifγ <0, or ifh(|x−C|) = αlog(|x−C|) +β, and α <0, then the corresponding %C satisfies Axioms 2 and 3, and Axiom 1.1 (Independence); but not Axiom 1.2 (Continuity).

Proof. See Appendix.

Let us start by focusing on the first part of the theorem. In essence, Axioms 2 and 3 imply that the influence function that was defined in Theorem 1 must actually be a monotonically decreasing, isoelastic function of the distance to the capital point C.10 (We henceforth denote distances by z, for conciseness.) By isoelastic, we mean that, if we define Rh(z)

def

≡ −h_h000(₍z_z)₎z

as the elasticity of the marginal influence function with respect to distance – or alternatively, the “coefficient of relative risk aversion” of the influence function – then Theorem 2 establishes that our class of admissible CISCs must haveRh(z) = 1−γ ≡Rh, a constant.11

The isoelastic property is directly related to Axiom 3, i.e. rank invariance. The monotonicity property is in turn tightly linked to Axiom 2. We should note, however, that the latter axiom is local, in the sense that it refers to a neighborhood of the capital point, but the property of monotonicity is global. This is due to the combination of the two axioms, since the isoelastic shape means that local monotonicity implies global monotonicity.

The problem of unboundedness The first part of Theorem 2 imposes a constraint on the elasticity parameter: γ >0 (Rh < 1). This merits additional scrutiny. Theorem 1 establishes

that the EI representation implies bounded influence functions, whereas functions withRh ≥1

are unbounded at z = 0. It is well-known from expected utility theory that unboundedness implies that expected utility (and hence expected influence, in our context) could be infinite for certain probability distributions, which necessarily violates Axiom 1.2 (Continuity).12 This is exactly what the second part of Theorem 2 states.

10_{We denote this function by}_h_{, as distinct from}_h

C, in order to emphasize the distinction between the two

objects. However, with a slight abuse of terminology, we still refer to it as the influence function.

11_{Technically, Theorem 2 also affirms the infinite differentiability of the impact function}_h_{, so the expression}

ofRh(z) is meaningful.

(14)

As we will see in the next section, the elasticity parameter has very meaningful interpreta-tions, and we do not want to rule out a range of possible values for that parameter on technical grounds. Fortunately, expected utility theory provides several answers to the unboundedness problem (see for instance Ledyard 1971, Fishburn 1975, 1976) that allow us to keep (at least some of) the influence functions with Rh ≥ 1 within the set of admissible solutions. These

solutions typically involve relaxing Axiom 1’s Continuity, and imposing certain restrictions on the class of distributions PX that are admissible. These adaptations, which we discuss in

the Appendix, are very mild and, we surmise, innocuous in the vast majority of conceivable applications of the index.

In practice, any discrete dataset – which is likely to encompass the vast majority of instances of empirical implementation of the index – will allow the full set of elasticity parameter values. The one implementation requirement is that no observations be exactly at the capital point

C, although we could have observations that are arbitrarily close to it. (We will get back to this issue when presenting our empirical implementation.) Even in continuous cases, a simple weakening of Axiom 1’s Continuity allows for a wide range if functions with Rh ≥ 1 to be

admissible.

5 Discussion

Our three basic axioms define a circumscribed class of admissible CISCs, but before moving on to the implementation stage a few issues must be tackled. First, a crucial feature of the CISC defined in Theorem 2 is its flexibility: the degrees of freedom afforded by the parameters γ, α and β mean that in any application it will be possible to shape the index in order to make it most suitable to the specific goals of the analysis. We must thus discuss how to pick those parameter values, and for that we first need an interpretation for the elasticity parameter γ. We also want to have a framework within which to think of the free parametersαandβ, which we will do in the context of the normalization of the index. In addition, we will also provide a comparison of the CISC with alternative measures of concentration and with other related concepts.

(15)

5.1 Interpreting the Elasticity Parameter

5.1.1 Elasticity and Marginal Influence

The first thing to note about the elasticity parameter γ is that it has to do with how the marginal influence function is affected by the distance to the capital. To make things more concrete, suppose we consider bringing an individual closer to the capital along the ray going through her initial location, and ask what is the impact of this movement on expected influence. The value of γ determines how this impact is affected when we vary the initial point of that movement: if we move an individual 10 miles closer to the capital, how does it matter whether she started 20 or 200 miles away?

As it turns out, our framework enables us to understand this choice in terms of the marginal influence at the capital C relative to other points in X. Two very mild conditions on that relative marginal influence will allow us to partition the parameter space into one region in which the answer to the question above would be that the impact is greater when the movement starts closer to the capital, and another region in which the answer would be the opposite.

Condition 1 (Maximal Marginal Influence) ∃x6=C, η >0, and t=η(C−x)such that: 1 2pC+ 1 2px %C 1 2pTt(C)+ 1 2pT−t(x)

Condition 2 (Minimal Marginal Influence) ∃x6=C, η >0, and t=η(C−x) such that: 1 2pC+ 1 2px -C 1 2pTt(C)+ 1 2pT−t(x)

The first condition states, in a nutshell, that there exists a point at which the marginal influence is (weakly) smaller than at the capital. In ordinal terms, we consider a distribution that consists of a mass point atCand another mass point at some other pointx. If we slightly shift the former mass point away from C, while shifting the latter towards C by the same distance and in the same direction, the condition implies that the resulting distribution will be less concentrated around C than the original one. This is what underlies the idea that the marginal influence at x is smaller than at C. The second condition states that there exists a point at which that marginal influence is (weakly) greater than at the capital, i.e. the same shift will result in a more concentrated distribution. These two conditions, plus Axioms 1-3, yield the following:

(16)

Proposition 1 (Convexity / Concavity) Suppose Axioms 1-3 hold, then: 1. Condition 1 holds if and only if Rh ≥0 (or equivalently, γ ≤1).

2. Condition 2 holds if and only if Rh ≤0 (or equivalently, γ ≥1).

Condition 1 thus implies that the influence function will be a convex function of the distance to the capital, whereas Condition 2 implies the concavity of that function.13 _{The former case}

means, intuitively, that movements that occur close to the capital point will have a (weakly) greater weight than similar movements occurring farther away. The latter case has the opposite implication, and it is obvious that the two can only happen simultaneously in the case where Rh = 0 (γ = 1). We will refer to this limit case, in which all points display the same marginal

influence, as the Linear CISC (L-CISC). We state it here for future reference:

Definition 7 (Linear CISC) The Linear Centered Index of Spatial Concentration (L-CISC) is the special case of the CISC defined in Theorem 2 in which γ = 1. The influence function is defined as:

h(z) = αz+β, with α <0.

A brief inspection shows that Conditions 1 and 2 are mild enough that, in principle, they could clearly hold simultaneously even beyond the boundary case. It is only when combined with our basic axioms (particularly Axiom 3) that this ceases to be possible. In other words, with the exception of the boundary case that yields the L-CISC, having both conditions hold requires giving up rank invariance, and accepting the possibility that changes in scale will affect the ranking of distributions.

In sum, Proposition 1 means thatRhorγ are sufficient statistics when it comes to describing

the relative weight that different points will have in affecting the ranking of distributions, depending on their distance to the capital. Different contexts might call for different values,

13_{The convex case generated by Condition 1 includes influence functions that can run into the unboundedness}

issue highlighted in the previous section. More specifically, this is the case forRh ≥1 (γ ≤0). As pointed

out in the previous discussion, using these indices will require imposing some restrictions on the admissible distributions to avoid unboundedness. Also as previously discussed, the discrete-support distributions that are likely to arise in most instances of empirical implementation are always admissible, so that any elasticity parameter can be used.

(17)

but we know that higherRh implies attaching a greater marginal influence to closer points, and

that negative values forRh mean that more distant points have greater marginal influence.

5.1.2 Elasticity and Mean-Preserving Spreads

We can gain further intuition about the elasticity parameter by extending the analogy with expected utility theory to include the concept of second-order stochastic dominance (SOSD), just as we have done with FOSD. We know from the study of choice under uncertainty that there is a close connection between the curvature of utility functions and the ranking of mean-preserving spreads (MPS) of distributions, as captured by the concept of SOSD. We can now show that extending this concept to our setting yields another interpretation for the elasticity. The first challenge in to extend the concepts of MPS and SOSD for general, n-dimensional Euclidean spaces, which is crucial since our applications will often involve more than one di-mension.14 _{We must thus extend the concepts to that multidimensional context, which is not}

obvious because it is not immediate to conceptualize a MPS in many dimensions.

A natural way would be to use the concept of univariate distributions associated with the distributionsp∈PX, and directly define SOSD over them. It will turn out to be more fruitful

to start instead from what we may call a “uniform MPS”, which we define in terms of a squeeze of the uniform ball distribution that we have defined. Informally speaking, consider individual observations that are uniformly distributed over a ball around a given point T, such that C

is not in that ball. Now suppose we squeeze this distribution around that point, such that the resulting distribution is also uniform over the resulting ball.15 _{(Of course, once again this}

could be a squeeze or an expansion (“spread”) depending on the coefficient associated with the transformation.)

The interesting question is how the concentration order _%C will rank the distribution that results from this uniform MPS. Let us consider the case where we have ρ <1 (i.e. a squeeze): should the resulting distribution be ranked as more or less concentrated around C than the initial distribution? The answer to this question must be context-specific – still pushing the analogy with choice under uncertainty, we could be in a case of “risk averse” or “risk loving” behavior. But we can show that it is intimately related to the elasticity parameter, in a way

14_{This is in fact the only part of our approach that is restricted to Euclidean spaces.}

15_{Other types of MPS could be envisioned, in ways that would not preserve uniformity. We will see shortly}

(18)

that enables us to uncover some additional interesting properties.

As it turns out, the answer relies on two conditions that have a similar flavor to Conditions 1 and 2. Their implications can be studied with reference to harmonic function theory, which is why we label then sub- and super-harmonicity. They are as follows:16

Condition 3 (Sub-Harmonicity)

S(T,ρ)(p(T,κ))-Cp(T,κ) ∀ρ <1,B(T, κ)63C. Condition 4 (Super-Harmonicity)

S(T,ρ)(p(T,κ))%Cp(T,κ) ∀ρ <1,B(T, κ)63C.

Condition 3 states that the distribution that results from a uniform squeeze around T

is (weakly) less concentrated around C than the original distribution; Condition 4 states the opposite. Note that Condition 4 is akin to SOSD: any (uniform) MPS will lead to a distribution that is ranked below (“dominated by”) the original distribution. Condition 3 inverts that. This in turn suggests that these conditions will be related to the curvature of the influence function, just as SOSD is linked to the concavity of utility functions. This is indeed the case, but the multidimensional uniform extension leads us into a result that is subtly but importantly different:17

Proposition 2 (SOSD) Suppose Axioms 1-3 hold, then:

1. Condition 3 holds if and only if Rh ≥n−1 (or equivalently, γ ≤2−n).

2. Condition 4 holds if and only if Rh ≤n−1 (or equivalently, γ ≥2−n).

This proposition thus provides us with another interpretation for the elasticity parameter, having to do with MPS, in addition to the interpretation focusing on marginal influence.18 _As

16_{Note that the definitions explicitly restrict attention to uniform MPS defined for a ball around}_T_{that does}

not containC. We will return to this point shortly.

17_{Another way to express the Proposition is to say that Condition 3 implies that the influence function is}

sub-harmonic, whereas Condition 4 implies that the influence function is super-harmonic. See Appendix.

18_{Note in particular that there are functions that are convex, but super-harmonic; in other words, they}

repre-sent concentration orders under which a uniform MPS results in a function that is ranked above (“dominates”) the original function.

(19)

before, the specific choice for the parameter can be thought of as context-specific.19 _{but it}

turns out that the “harmonic” case of Rh = n−1 displays interesting properties. Intuitively

speaking, it is a case in which any local force of concentration at an arbitrary pointTdoes not affect the ranking of distributions in terms of their concentration aroundC. In that sense, the index measures the “gravitational pull” exerted by the capital point, while disentangling from the data any impact of the presence of other local forces. It is invariant to changes in the degree of attraction, or “gravity”, of any other points. An analogy comes from the gravitational pull of the Sun over the Earth, which is measured focusing only on the distance between the two bodies and their characteristics (mass), leaving aside the influence of all other planets. For this reason, we will refer to this special case as the Gravity-based CISC (G-CISC).20

Besides the helpful interpretation, the G-CISC has a convenient property in terms of the implementation of the index. In practice, the information used to compute the index typically comes in a grid format, where we only know the aggregate information of each cell. This intro-duces a source of measurement error. The G-CISC is orthogonal to symmetric measurement error, because it satisfies both Condition 3 and 4 – in other words, the influence function is harmonic.

Definition 8 (Gravity-based CISC) The Gravity-based Centered Index of Spatial Concen-tration (G-CISC) is the special case of the CISC defined in Theorem 2 in whichγ = 2−n. The influence function is defined as: h(z) =

  αz+β ∀z >0, α <0 if n= 1 αlogz+β ∀z >0, α <0 if n= 2 αz2−n+β ∀z >0, α >0 if n≥3 .

19_{One caveat may be in order, however. If we look at uniform MPS defined for a ball around}_T _{that does}

containC– which were not considered by Conditions 3 and 4 – it turns out that it is not possible to pin down

what will happen to the index when the influence function is sub-harmonic. This is because the monotonicity

with respect to pointC (implied by Axioms 2 and 3) pushes in the opposite direction of the sub-harmonicity.

This indeterminacy could be thought of as an argument for focusing on the super-harmonic case (Rh≤1).

20_{The analogy with gravity is actually deeper than what might seem at first sight: there is a connection}

between the G-CISC and the concept ofpotential in physics, which refers to the potential energy stored within

a physical system – e.g. gravitational potential is the stored energy that results in forces that could move objects in space. In fact, our G-CISC can be interpreted, roughly speaking, as a measure of the potential associated with the capital point of interest. This is because the G-CISC is the index that satisfies the Laplace equation, and potential is defined by a solution to that equation – those readers familiar with the physics of the concept will have noticed the connection from the use of harmonic function theory. To see this connection more clearly, note that in the “real life” three-dimensional space of Newtonian mechanics, the potential of a point with respect to a mass is the sum of inverse distances from that point to each location in the mass – and since the gravity force is the derivative of potential, it is proportional to the inverse of the squared distance, as one may recall from the classical Newtonian representation. Our G-CISC in three-dimensional space is also the weighted sum of the inverse of distance, and thus coincides with the concept of potential in physics.

(20)

5.2 Normalization

It is useful to normalize the index so that it can be easier to interpret, and this normalization should be suitably chosen in order to address the questions that are relevant in the specific context. The EI approach affords us two sets of levers for this normalization. First, there is the focus on normalized distributions, i.e. probability measures. Second, we have the degrees of freedom afforded by the CISC as defined in Theorem 2, in addition to the elasticity parameter, which are the choice of parametersα and β.

In order to see both angles more clearly, and motivated by our empirical implementation, let us fix ideas by focusing on a situation where our index is applied to the concentration of the population of a given geographic unit of analysis (e.g. a country) around a point of interest (e.g. the capital city), in two dimensions. Furthermore, we start by focusing our interest on the special case of the G-CISC – which means that the influence function is given byh(z) =αlog(z)+β. (It is straightforward to extend the following analysis to other contexts.) In this context, the first thing to note is that there is often other information that we may want to disentangle from population concentration per se, e.g. population size and the geographical size of the country. This is exactly why our theory focuses on normalized distri-butions, i.e. probability measures. In practice, any implementation requires that the actual distribution be normalized so that it can be expressed as a probability distribution, containing only the information we are interested in. This will be context-specific, since each context will determine what kind of information we want to leave out.

Second, we would like to have an easily interpretable scale, and this is what can guide the choice ofαand β. A convenient rule would be to restrict the index to the [0,1] interval, with 0 and 1 representing situations of minimum and maximum concentration, respectively. The latter can be defined simply as a situation in which the entire population is located arbitrarily close to the capital, but the former will typically vary greatly with context. One could think of it as a case in which the entire population is located as far from the center as possible, but “as far as possible” could have different meanings, as we will discuss below. Alternatively, one could think of minimum concentration as a situation where the population is uniformly distributed over the entire country, along the lines of Ellison and Glaeser’s (1997) “dartboard” approach.21

This choice of parameters will inevitably be context-specific too.

(21)

As a general principle, we can proceed with normalization in two steps: (1) normalize the distribution under analysis, transforming it into a distribution p∈PX that contains only the

information we are interested in; and (2) set the parameters α and β to fit a [0,1] scale. The specifics of each of these steps will depend on the context, so in what follows we discuss a few benchmark examples that will be relevant for our application, and that can illustrate the general procedure.

Maximum distance across units of analysis A first approach is to set the minimum concentration based on the maximum possible distance between a point and the capital city in any of the countries for which the index is to be computed. In this case, the index is evaluated at zero if the entire population lives as far away from the center as it is possible to live in any country.22 (As a result, only one country, the one where this maximum distance is registered, could conceivably display an index equal to zero.) This will be appropriate if we want to compare each country’s concentration against a single benchmark.

In order to achieve this, the two steps are:

1. Normalize the distribution by dividing it by total population size, which means that we will be taking each country to have a population of size one.

2. Set: (α, β) = − 1 log(z),1

where z ≡ maxxi,i|xi−Ci| is the maximum distance between a point and the center in

any country. This means that we take the (logarithm of the) largest distance between a point and the capital in any country to equal to one.

Maximum distance within unit of analysis Another possibility is to evaluate the index at zero for a given country if its entire population lives as far away from the capital as it is possible to be in that particular country. This is appropriate if we want to compare each country’s actual concentration to what its own conceivably lowest level would be.

With that in mind, the two steps are now:

22_{For instance, in the cross-country implementation later in the paper, the largest recorded distance from the}

(22)

1. Besides the standard normalization by population size, normalize by log(zi), where zi ≡

maxx|x−Ci| is the maximum distance between a point in country i and that country’s

capital. This means that we not only take each country’s population to be of size one, but also that we take the (logarithm of the) largest distance between a point and the capital of that country to be one as well.

2. Set:

(α, β) = (−1,1).

Our empirical implementation will illustrate both of these cases of normalization and their different interpretations.

5.3 Comparison with Other Indices

5.3.1 Comparison with other Centered Indices

The first obvious comparison is to other centered indices of spatial concentration used in the literature. The EI approach in fact gives us a coherent framework to think of these alternative measures, and to understand their properties and limitations.

Let us start with a class of widely used measures of spatial concentration that essentially attach zero weight to observations located at more than a certain distance from the capital point. A typical application of such a measure is the share of population of a country that lives in the capital city (often refereed to as “capital primacy”), or in the main city in that country, as in Ades and Glaeser (1995), Henderson (2003), among many others. It is easy to check that this class of measures violates monotonicity (Axiom 2), and also that it violates continuity (Axiom 1.2). Intuitively speaking, this is so because this class of measures essentially discard a lot of information. Our approach, in stark contrast, incorporates all of that information, with enough flexibility to allow for different weights according to the application – as parameterized by the elasticity of the marginal impact function with respect to distance (as in Proposition 1): the greater that elasticity, the less weight is attached to observations that are farther away from the capital point.

We can also use the EI approach to make sense of other widely used measures that do not discard that type of information. One example is the negative exponential function that is widely used in the empirical urban economics literature to describe the density of population or

(23)

economic activity (e.g. Clark 1951, Mills 1970). (This has also been used in political economy, as in Busch and Reinhardt (1999).)It has been recognized that this standard density function is very restrictive, and alternative implementations have been proposed (McMillen 2004), but (to the best of our knowledge) not from a theoretically grounded approach. With our EI framework, it can be easily verified that this measure violates rank invariance (Axiom 3), and can thus lead to different rankings as a result of changes in units.

Another interesting example, as illustrated by Galster et al. (2001), uses the inverse of the sum of the distances of each observation to the center as a measure of “centrality” in the context of studying urban sprawl. Prima facie, it is apparent that this measure does not fit the EI representation, because of the inverse operation. A closer look reveals, however, that this measure is a monotonic transformation of the L-CISC (Rh = 0). As such, Theorem 1 implies

that it must represent the same concentration order that is represented by the L-CISC. This means that the Galster et al. (2001) index does satisfy – or more accurately, it represents a concentration order that satisfies – all of our axioms.

These desirable properties were certainly not obvious from thead hoc approach used in the construction of the index. Our axiomatic approach enables us to identify them, and guarantee that they are satisfied in very general spaces, without being limited by an intuitive grasp of the features of a specific, concrete application. Our approach also enables us to understand in greater depth the properties of such an ad hoc index: in the case of Galster et al. (2001), following Propositions 1 and 2, we can state that their index attaches the same marginal influence regardless of distance, and that their measured concentration would decrease as a result of a uniform mean-preserving spread.

In sum, the EI approach provides a unified “language” to codify the concept of spatial concentration around a capital point. This language has very wide applicability, and it provides us not only with a family of CISCs but also with a way of making sense of pre-existing ad hoc approaches.

5.3.2 Comparison with Non-Centered Indices of Concentration

Besides the obvious distinction that our index is built on the concept of a particular “center”, it also contains considerably more spatial information than non-centered indices. For instance, let us consider the many indices of concentration that are based on measures first designed

(24)

to deal with inequality – such as the location Gini coefficient calculated on a distribution of cells.23 _{Such measures do not take into account the actual spatial distribution.}

Indeed, consider a thought experiment where half of the cells contain exactly one individual observation, and the other half contain zero observations. Such indices do not make any dis-tinction between a situation in which the former cells are all in the East and all the latter cells are in the West, and another situation in which both types are completely mingled together in a chess-board pattern. Generally speaking, this type of measure fails to take into account the relative positions between the cells, which can be highly problematic in many circumstances.24

The same can be said of cruder measures of concentration, such as population density.25 _In

addition, these measures are also highly non-linear with respect to individuals, because they contain a function of the cell distribution. In that sense, they are not grounded at the individual level in the way that our CISC is.

These differences are highlighted by the fact that we can actually derive a non-centered index from our centered measure. We can do so by averaging the CISC over all possible capital points (i.e. all points) within the support of the distribution, a feasible task in most applications. On_R (say, applied to individual income distributions), this aggregative, non-centered index coincides with the much familiar Gini index of inequality. For now, we leave further explorations on this subject for future work.26 _{In any case it is not possible to follow the reverse path and obtain}

a centered index from a given non-centered measure. This underscores the versatility of our approach.

5.3.3 Comparison with Related Indices: Inequality, Polarization, Riskiness

Finally, it is worth noting the links between our index and other indices designed to measure other aspects of distributions, be they spatial or not. The connection with inequality measures

23_{For instance, the}_{location Gini} _{is used,} _{inter alia}_{, in the context of economic geography (Krugman 1991,}

Jensen and Kletzer 2005), studies of migration (Rogers and Sweeney 1998), political economy (Collier and Hoeffler 2004).

24_{In practice, for instance, if we measure the concentration of the US auto industry around Detroit, it matters}

whether car plants are in nearby Ohio or in distant Georgia. However, a non-centered index of concentration computed at the state level would stay unchanged if all the plants in Ohio were moved to Georgia and vice-versa.

25_{A related index that does use spatial information is the measure of compactness developed by Fryer and}

Holden (2007).

26_{We should note that to fully develop a non-centered version of our approach, we would have to evaluate the}

axioms from that non-centered perspective. For instance, we conjecture that such a framework should probably add super-harmonicity (Condition 4) as an axiom, since we would not want to have that uniform MPS would lead to more concentration.

(25)

has already emerged from the very fact that such measures are used to capture spatial concen-tration, and we have noted that a natural extension of our approach to a non-centered setting highlights an interesting connection with the Gini coefficient.

Our approach is also interestingly related to the work on polarization by Duclos, Esteban and Ray (2004). Intuitively speaking, one could think of a notion of “polarization” as broadly working in the opposite direction of “concentration”. Keeping that in mind, our concentration order would satisfy their axioms 1 and 3 (both related to our Monotonicity axiom) and also their axiom 4 (related to our Rank Invariance axiom), as an “anti-polarization” order. Their axiom 2, however, would work in the opposite direction of our Monotonicity axiom, which is of course due to the fact that polarization and concentration around a point do capture different concepts. Nevertheless, the analogy with the other three axioms suggests that our approach could help build a generalization of their measure to multidimensional contexts.

The EI approach also entails interesting connections with the measurement of riskiness – as should be evident from the connection with expected utility, and as was highlighted in our discussion of the elasticity parameter. A desirable property of a measure of riskiness, as spelled out by Aumann and Serrano (2008) is that it respects first- and second-order stochastic dominance. Axiom 2 (Monotonicity) is clearly linked to FOSD – particularly if we were to state it in a global, rather than local version. Just as previously highlighted, our Conditions 3 and 4 (and hence Proposition 2) are closely related to the notion of SOSD, and could be thought of as entailing a generalization of that concept to multidimensional contexts.

6 Application: Population Concentration around

Capi-tal Cities

Having established the EI approach, derived the CISC, and discussed its properties, we now move on to illustrate its applicability in practice. We focus on the distribution of population around capital points of interest – capital cities across countries and across US states, and the political center (e.g. the location of city halls) in US metropolitan areas. We will discuss descriptive statistics and basic correlations with variables of interest, and also how the index can be used to shed light on competing theories, using as an example the determinants of the location of capital cities.

(26)

6.1 Cross-Country Implementation

In our first application, we calculate population concentration around capital cities across coun-tries in the world. We use the database Gridded Population of the World (GPW), Version 3 from the Socio-Economic Data Center (SEDC) at Columbia University. This dataset, published in 2005, contains the information for the years 1990, 1995 and 2000, and is arguably the most detailed world population map available. Over the course of more than 10 years, these data are gathered from national censuses and transformed into a global grid of 2.5 arc-minute side cells (approximately 5km), with data on population for each of the cells in this grid. 27 Details on these and all other variables used in the analysis can be found in the Data Appendix.

The first step in the analysis is to decide which version of the index to focus on. As we have shown, this entails a choice of the elasticity parameter, and of the appropriate normalization. We would expect that it will oftentimes be convenient to have a specific model that might lead to a specific choice. Since our goal in this paper is to broadly illustrate the application of the index, we choose to retain generality and implement (and compare) a broader array of indices. In any case, the choice of this broader array can exemplify the general principles discussed in the previous section in their practical implementation.

With respect to the elasticity parameter, let us first consider the interpretation focused on the marginal influence as a function of distance. We are mostly interested in the capital city as a national political center. In this case, it seems natural to assume that the marginal influence would be highest in the capital. To take a concrete example, if we want to measure the concentration of Russia’s population around Moscow, it stands to reason that a given movement of individuals towards the capital would not matter less if it were to happen in the outskirts of Moscow than if it were to happen in Vladivostok, in the far east corner of the country. This assumption, as we have seen, places us in the range of convex influence functions – namely with Rh ≥0 (γ ≤1).28 Within this range, one focal point is the limit case of the L-CISC (Rh = 0),

which would attach equal marginal weights regardless of distance. While somewhat extreme,

27_{We limit our analysis to countries with more than one million inhabitants, since most of the examples with}

extremely high levels of concentration come from small countries and islands. The results with the full sample are very much similar, however, and are available upon request.

28_{As an example of a model that could pin down the elasticity parameter in this context, one could look}

at the simple model of revolutions sketched in Campante and Do (2007). In that model, political influence is modeled as a cost of joining a rebelion, which depends on the distance to the political center. The standard assumption of convex costs would naturally lead to convexity, and specific functional forms would translate into different elasticity parameters.

(27)

it also has the advantage of providing a natural benchmark for comparison, since it represents the same order as the alternative measure used for instance by Galster et al. (2001).

The second interpretation can help us expand our array, still within the set of convex influence functions, to include an example in which the marginal influence at the capital is strictly greater than anywhere else. As far as how we should think about uniform MPS in terms of political influence relative to the capital, we choose to stick with the “agnostic” position regarding the direction of the impact. This leads us to the limit case of the G-CISC – which in our two-dimensional application corresponds to Rh = 1, the logarithmic influence function.29

Following the argument in the previous section, this is also a natural focal point in light of the gravity interpretation and the robustness to (symmetric) measurement error.

When it comes to normalization, we use the two cases highlighted in section 5.2. The first version (GCISC1) is normalized by the maximum distance across countries, and the second

version (GCISC2) is normalized by the maximum distance within the country. The former

case captures concentration relative to what it could possibly be in any country, while the latter captures concentration relative to what it could possibly be in that specific country.30

In the interest of brevity, for the L-CISC we will only compute the version normalized by the maximum distance across countries, LCISC1.

We will compare our indices with two alternative measures of concentration. The first alternative is the location Gini coefficient (“Gini P op”), a non-centered measure that is often used in the literature, and the second one is the share of the population living in the capital city (“Capital P rimacy”), which provides a benchmark for comparison with another very simple centered index.

6.1.1 Descriptive Statistics

Table 1 shows the basic descriptive statistics for the different measures, for the three years in the sample, and Table 2 presents their correlation. The first remarkable fact is that there is

29_{Recall that the log function is in the range of parameters that might face the unboundedness problem.}

Simply put, in practice the influence function is not defined for the distance of zero. In accordance with the

solutions that we have discussed, what we need to do is to choose the arbitrary sizeof the neighborhood around

the capital in which we will truncate the distribution. We pick the value of 1 (kilometer). In other words, we assign the value 1 (kilometers) to the distance from each grid center to the capital whenever it is measured at less than 1, truncating the log distance function at 0: h(z) =αlog max{z,1}+β=αmax{logz,0}+β. There is very little change in the index when we replace 1 by 10.

30_{While the non-normalized measure may present some interest in itself, we do not report it because of its}

(28)

very little variation over that span of time: the autocorrelation is extremely high, and almost all variation comes from the cross-country dimension. This suggests that the pattern of population distribution is fairly constant within each country, and that a period of 10 years may be too short to see important changes in that pattern. For this reason, we choose to focus on one of the years; we choose 1990 because it is the one that has the highest quality of data, as judged by the SEDC.31

[TABLES 1 AND 2 HERE]

Let us start by comparing the basic properties of our indices with those of the comparison measures, noting that the appropriate benchmark for comparison in the case of G-CISC is GCISC1, and not GCISC2, since both location Gini and Capital P rimacy do not normalize

by the geographical size of each country. The striking fact that immediately jumps from Table 2 is that our index captures a very different concept from what the location Gini is capturing: they are negatively correlated, both in the case of G-CISC and L-CISC. This underscores the point that typical measures of concentration are ill-suited for getting at the idea of concentration around a given point.

This point becomes even more striking when we compare the list of countries with very high and very low levels of concentration, which are displayed in Table 3. We can see that the list of the countries whose population is least concentrated around their capital cities accords very well with what was to be expected: these are by-and-large countries where the capital city is not the largest city. (The exceptions are Russia, on which we will elaborate later, and the Democratic Republic of the Congo, formerly Zaire, whose capital is located on the far west corner of the country.) By the same token, the list of highly concentrated countries is quite intuitive as well, with Singapore leading the way. The same list for the location Gini, in contrast, surely helps us understand why the correlation between the two is negative. It ranks very highly countries that have big territories and unevenly distributed populations. While this concept of concentration may of course be useful for many applications, it is quite apparent that using non-centered measures of concentration can be very misleading if the application calls for a centered notion of concentration.32

31_{Our results are very similar when we use the other two years.}

32_{It is also worth noting that a measure such as location Gini is quite sensitive to how “coarse” the grid that}

is being used to compute the index is: the fewer cells there are, the lower the location Gini will tend to be. Our index, on the other hand, has the “unbiasedness” feature that we have already discussed.