• No results found

Persuading a Rationally Inattentive Agent

N/A
N/A
Protected

Academic year: 2021

Share "Persuading a Rationally Inattentive Agent"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

Persuading a Rationally Inattentive Agent

*

[Click here for most recent version]

Alexander W. Bloedel† Ilya Segal‡ October 28, 2020

Abstract

How should information be disclosed to an inattentive audience? We develop a model in which a Sender transmits signals about an uncertain state to a rationally inattentive Receiver, who privately bears a mutual information cost (Sims(2003)) to process these signals before taking an action. Information disclosure serves dual purposes: topersuade Receiver when pref-erences over actions are misaligned, and to manipulate Receiver’s attention, which is subject to moral hazard. The latter friction causes the standard Obedience Principle to fail: Sender cannot simply provide an action recommendation because Receiver would sometimes ignore it. We characterize the optimal form of disclosure in a canonical binary-action setting using a first-order approach. Under aligned preferences, full disclosure is generally suboptimal: Sender insteadattracts Receiver’s attention by pooling intermediate- and high-stakes states (in which taking the correct action is most valuable) while fully revealing low-stakes states (in which it is least valuable). Under misaligned preferences, Sender may alsodistract Receiver by providing detailed information about states in which their preferred actions differ. We show that higher attention costs tend to induce more informative disclosure and that, when Sender lacks commitment power, her incentive to “exaggerate” hinders communication. We also ex-plore broader modeling issues, such as the importance of Sender’s “language” when Receiver’s cost function differs from mutual information, and the difference between Receiver learning about Sender’s signal (“communication”) vs. directly about the state (“delegated information acquisition”).

JEL Codes: D82, D83, D91

Keywords: Bayesian persuasion, rational inattention, attention manipulation, costly commu-nication, monotone partition, mutual information

*This paper subsumes an earlier version circulated as “Persuasion with Rational Inattention” (Bloedel and Segal(2018)). We are

grateful to Doug Bernheim, Luca Braghieri, Andrew Caplin, Giorgio Martini, Filip Matˇejka, Paul Milgrom, Doron Ravid, Vasiliki Skreta, and Mike Woodford for helpful feedback, and to Konrad Mierendorff and Nikhil Vellodi for serving as discussants. We thank audiences at Stanford, the 2018 Young Economist Symposium (NYU), 2019 Sloan-Nomis Workshop on the Cognitive Foundations of Economic Behavior (NYU), 2019 Econometric Society NASM (UW), 2019 Frontiers in Design Conference (UCL), 2019 European Summer Symposium in Economic Theory, and 2020 Canadian Economic Theory Conference (UBC) for useful comments.

Department of Economics, Stanford University. Email:abloedel@stanford.edu.Department of Economics, Stanford University. Email:ilya.segal@stanford.edu.

(2)

1

Introduction

Communication is fundamental to all human interactions, whether in markets, corporate hierar-chies, politics, or social life. For communication to be effective, it is not enough that the underlying information be reliable and that the sender be credible — the receiver must also be able and willing to absorb, process, and utilize the information he is given. In modern society, where “a wealth of in-formation creates a poverty of attention,” this is a substantial, even dominant, bottleneck to effective communication: “In an information-rich world, most of the cost of information is the cost incurred by the recipient. It is not enough to know how much it costs to produce and transmit information; we must also know how much it costs, in terms of scarce attention, to receive it” (Simon, 1971, p. 41).

This paper studies the optimal design of communication when the “transaction cost” of scarce attention is a primary concern. How should a sender speak, knowing that her audience has limited capacity to process what she says? How does it depend on the alignment of interests between sender and receiver? On the sender’s credibility? On the richness of the information to be conveyed?

This design problem is ubiquitous in practice. Online retailers and advertisers face the increas-ingly difficult task of attracting the attention of information-saturated consumers, a necessary first step before convincing them to buy a particular product. In organizations, the need to capture busy executives’ attention shapes the advice proffered by their advisors and the information allowed into meetings and briefings by their chiefs of staff.1

We develop a model in which a Sender (she), who has exclusive access to information concerning a payoff-relevant state, communicates with a rationally inattentive Receiver (he), who privately bears a cost of processing Sender’ssignals before taking an action. As in Bayesian persuasion (Kamenica and Gentzkow (2011); Rayo and Segal(2010)), Sender commits to adisclosure rule that transmits

signals contingent on the state. While Receiver observes the disclosure rule, he does not directly “see” the transmitted signals; instead, he observes noisy subjectiveperceptions of them encoded into

his brain using a flexibleattention strategy. As in the rational inattention (RI) literature, Receiver’s

attention strategy has a cost that is proportional to the mutual information between his perceptions and Sender’s signals (Sims(1998);Sims(2003)), capturing the idea that Receiver’s attention is flexible and well-adapted to the information-rich world in which he operates.2 Importantly, because Sender does not internalize Receiver’s attention cost, Receiver’s allocation of attention is subject to moral hazard. This gives rise to an incentive conflict even when the agents’ material preferences (over actions) are congruent.

The RI paradigm has several important implications for signal design and disclosure. First, the

1Other examples include the decision to omit evidence at trial due to jury inattention (Lester et al.(2012)); the design of central bank

announcements to minimize inattention-induced misunderstandings and market movements (Sims(2010)); and the use of name/race/sex-blind forms to minimize the extent to which hiring committees’ limited attention amplifies statistical discrimination in the labor market (Bartoˇs et al.(2016)).

2Formally, the RI attention cost can be justified as the smallest per-instance number of bits to be communicated in a channel

transmit-ting many independent copies of Sender’s signal into its noisy perception in Receiver’s brain ((Cover and Thomas,2006, Ch. 10)). This presumes that Receiver optimizes the communication channel after observing Sender’s disclosure rule. The joint assumptions of Sender commitment, multiple instances, and optimal coding may be viewed as capturing the steady-state behavior of repeated interactions. For example, online retailers such as Amazon adopt fixed algorithms that govern which product recommendations are shown to consumers who, in turn, learn which parts of their homepages tend to contain the most relevant ads and use this knowledge to optimize the order in which they visually scan the page.

(3)

“language” of Sender’s signals does not affect Receiver’s attention strategy, and so we may without loss identify signals with their information content (i.e., the posterior beliefs over states induced in afully attentive observer). Second, Receiver has “free disposal” of information, and so always prefers

more to less. Sender therefore withholds information only due to incentive conflicts — not to help Receiver economize on attention costs. Third, Receiver does not pay full attention to any signal, no matter how simple — not even a recommendation of which action to take. In the binary-action environments on which we focus (described below), full obedience of such a recommendation would cost Receiver at most 1 bit. However, even then he finds it optimal to reduce the cost by responding to any action recommendation with noise, since his “marginal cost” of attention at full obedience is infinite. Thus, even though (as discussed below) Receiver optimally uses a “direct recommendation” attention strategy — wherein his perceptions coincide with the actions that he takes — Sender cannot “pre-process” information into an action recommendation on Receiver’s behalf, and so must disclose more information than Receiver ultimately internalizes.3 The standard “Obedience Principle” from the information design literature therefore fails in our model, giving rise to a rich disclosure problem. Optimal Disclosure. In order to explicitly characterize the form of optimal disclosure, we primar-ily focus on a canonical binary-action environment in which Receiver can either “act” (e.g., purchase a product or make an investment) or not.4The state space is one-dimensional, with each state identi-fied with the state-contingent material payoff difference for Receiver between action and inaction (so that action is preferable only in positive states). Sender’s payoff is linear in Receiver’s material payoff and the probability that Receiver acts, with the degree of Sender’s bias in favor of action captured by the weight that she places on the latter (cf. Kolotilin et al.(2017)). A key simplification afforded

by thisLinear Model is that Sender’s signals can be further reduced to their induced posterior means

concerning the state. A fully-attentive Receiver would therefore act conditional (only) on positive signals, with thestakes of taking the correct action being greater the farther a signal is away from

zero (at which point Receiver is indifferent between action and inaction).

Our rationally inattentive Receiver sometimes makes mistakes by acting (resp., not acting) con-ditional on negative (resp., positive) signals. Given a distribution of posterior means generated by Sender’s disclosure rule, Receiver’s optimal stochastic response takes the “logit” form shown by

Matˇejka and McKay(2015), in which he pays more attention to and rarely makes mistakes following high-stakes signals, while paying less attention and making more frequent mistakes following low-stakes signals. This logit characterization reduces Receiver’s high-dimensional choice of a flexible attention strategy to the one-dimensional choice of the optimal unconditional probability of acting. Furthermore, Receiver’s optimal response is characterized by the first-order condition saying that he pays no attention to signals that make him indifferent between the actions, i.e., responds to such signals by randomizing with the unconditional probability. Importantly, Receiver’s optimal response

3In the coding interpretation, consider again the example in which Amazon makes purchase recommendations to a consumer regarding

many independent products. The consumer’s optimal communication channel would compress these recommendations into a perception using a smaller number of bits, giving rise to a stochastic response to each individual recommendation (even if the perception is a deterministic function of the entire “block” of recommendations). Sims(2010, p. 175) makes a related observation in the context of central bank transparency: “Ordinary people will most likely pay little attention to even simple policy announcements, and they will react sluggishly — in effect simplifying the policy statement through their own information-processing filters — whether the information supplied is dense and complex or simple.”

(4)

to any given signal depends not just on its posterior mean, but also on Receiver’s attention strategy, which isendogenous to the entire disclosure rule adopted by Sender. This endogeneity makes Sender’s

disclosure problem in our model different from that in the standard Bayesian persuasion literature. We solve Sender’s disclosure problem using afirst-order approach typical in moral hazard

prob-lems, putting a Lagrange multiplier on Receiver’s first-order condition and thereby reducing Sender’s problem to an unconstrained disclosure problem that we solve using the linear programming dual-ity techniques ofDworczak and Martini(2019). The solution takes the form of atwo-sided censorship

rule: all low states are pooled together into a “low” signal, all high states are pooled together into a “high” signal, and intermediate states are fully revealed. The regions’ cutoffs are determined by Sender’s bias, Receiver’s attention cost, and the distribution of the state, and any of the regions may potentially be empty.

To interpret the solution, recall first from the Bayesian persuasion literature that whenmaterial

preferences (over actions) are misaligned, Sender persuades Receiver to act in her favor by withhold-ing certain information. In our model, this familiarpersuasion motive combines with a novel attention manipulation motive, whereby Sender withholds certain information to guide the focus of Receiver’s

attention.

To isolate the attention manipulation motive, we consider the special case ofAligned Preferences,

in which Sender’s “material” preferences over actions are congruent with Receiver’s, and the only conflict is that Sender does not internalize Receiver’s attention cost. While in this case the agents have a common preference overhow to allocate a fixed amount of attention, Sender optimally induces

Receiver topay more attention by withholding information about some states while disclosing detailed

information about others. Namely, Sender attracts Receiver’s attention to intermediate-stakes states by pooling them with the highest-stakes states, providingsimple and convincing recommendations.

At the same time, Sender attracts Receiver’s attention to moderately low-stakes states by providing

detailed information that separates them from the lowest-stake states. This provides an intuition for

the optimality of two-sided censorship.

In the Aligned case, we show that Sender tends to reveal more information when Receiver’s (marginal) attention cost parameter is higher. Intuitively, when attention costs are low, Sender can at-tract Receiver’snearly-full attention to even the lowest-stakes states by pooling them with high-stakes

states. Thus, optimal disclosure converges to a simple action recommendation as attention costs van-ish. Conversely, when attention costs are high, Sender instead makes Receiver pay some attention

to most states by separating them from lower-stakes states. With sufficiently high attention costs, full disclosure becomes optimal, since otherwise Receiver would stop paying attention altogether. Sender’s optimal payoff is always decreasing in Receiver’s attention cost, but Receiver’s payoff may be non-monotone due to countervailing effect of Sender’s disclosure becoming more informative.

We then study the interaction between the persuasion and attention manipulation motives when the agents’ material preferences are misaligned. We focus on the simplest misaligned case, in which Sender’s preferences areState-Independent, meaning that she simply wants to maximize the

proba-bility that Receiver acts. In this case, Sender still aims to attract Receiver’s attention to positive states (in which their preferred actions agree), but also wants todistract Receiver from, and increase his

mistakes in, some negative states (in which their preferred actions differ). We show that the bottom pooling region of Sender’s optimal disclosure rule is empty, giving rise to anupper censorship rule:

(5)

all sufficiently low states are fully separated, while all sufficiently high states are pooled together. Sender disclosesmore information than in any full-attention solution (as characterized byGentzkow and Kamenica(2016)), for two distinct reasons. First, while any full-attention solution would leave Receiver with zero (material) surplus, the need to attract Receiver’s attention requires Sender to pro-vide a convincing signal, giving Receiver a strictly positive expected payoff to acting. The optimal

convincing signal issimple, pooling intermediate favorable states with extremely favorable states in

order to attract Receiver’s attention and reduce his mistakes. Second, Sender diverts Receiver’s at-tention from unfavorable states by providingdetailed information that separates them from the most

adverse states.

As attention costs vanish, optimal disclosure in the State-Independent case approaches the most informative full-attention solution, which fully separates negative states to maximize Receiver mis-takes. At the other extreme of high attention costs, optimal disclosure converges to full disclosure, as in the Aligned case. Intuitively, the need to attract Receiver’s attention eventually outweighs the persuasive motive for pooling negative states with positive ones. As a result, Receiver’s payoff is maximized when her attention cost parameter is non-zero but not prohibitively large, in which case his attention cost acts as a commitment device to extract more information from Sender than would be disclosed if attention were free.

Model Variants. We consider several variants of the baseline model described above. First, to un-derstand the role of Sender commitment, we study an alternative model of “cheap talk” communi-cation (Crawford and Sobel(1982)) in which Sender chooses her signalsafter observing the realized

state andsimultaneously with Receiver’s choice of attention strategy, which she takes as given.

With-out commitment, Sender has an incentive to exaggerate the state by sending whichever signal

max-imizes the probability that Receiver takes Sender’s preferred state-contingent action. This implies that in any “attentive equilibrium,” Sender uses just two signals: one that is sent when the state is above, and another that is sent when the state is below, the cutoff state at which Sender is indiffer-ent between actions. Thus, in contrast to the commitmindiffer-ent benchmark, the information disclosed by Sender is independent of Receiver’s attention cost parameter, implying that Receiver never benefits from her inattention and leading to acommunication breakdown, wherein Receiver stops paying

atten-tion when his attenatten-tion cost parameter is only moderately high, so that informative communicaatten-tion would still be possible under Sender commitment.

Second, we illustrate that both tenets of the RI model — that Receiver’s attention isflexible and

subject to the mutual information cost function — are important for our results. When Receiver’s

attention is not flexible, Sender’s attention manipulation motive can be significantly muted. To il-lustrate this point, we consider an alternative model (inspired byDewatripont and Tirole(2005)) in which Receiver only chooses the probability with which he observes Sender’s signal, but is required to allocate the same attention to each possible signal. Because Sender cannot induce Receiver to pay differential attention to different signals, we show that Sender is indifferent among many optimal disclosure rules and, in particular, always finds it optimal to provide a simple action recommenda-tion.

When Receiver’s attention is flexible but its cost is not determined by mutual information, Re-ceiver’s optimal response depends not just on the information provided by Sender but also on the

(6)

“language” in which it is conveyed. Thus, Sender cannot merely identify signals with their infor-mation content, and instead faces a joint problem of inforinfor-mation design and “language design.” To illustrate this point, we characterize optimal disclosure in a model (inspired byCaplin et al.(2018)) in which Receiver’s attention cost is based on Tsallis entropy. This class of cost functions is character-ized by a single “elasticity” parameter, and includes mutual information when this parameter equals one. In the “elastic” case that this parameter exceeds one, Receiver finds it less costly to learn about low-probability signals. We show that Sender can therefore (approximately) implement the outcome that would arise if Receiver were fully attentive by transmitting a large number of “redundant” sig-nals, each with the same information content. In the “inelastic” case that the parameter is below one, Receiver finds it less costly to learn about high-probability signals. Consequently, we show that Receiver may pay no attention to full disclosure, but can be induced to pay partial attention when

Sender discloses coarse information. In both cases, Receiver prefersless informative disclosure if it is

conveyed in a “language” he finds easier to process, which never happens under mutual information costs.

Finally, we examine a fundamental modeling issue: what “disclosing information” means when the recipient has limited ability to process it. In our model, Receiver learns about the state by paying costly attention to Sender’s signals, which captures the idea that communication is itself a costly

activity. An alternative model ofdelegated information acquisition, developed in the contemporaneous

work ofLipnowski et al.(2020a), views Receiver as acquiring costly informationabout the state itself,

with the information provided by Sender merely restricting what he is able to learn. While these two perspectives are equivalent in the standard Bayesian persuasion paradigm in which Receiver is fully attentive, they are substantively different — and suited to different applications — in the presence of attention costs.5We show that, under mutual information costs (but not generally), Sender’s problem in the delegated information acquisition model is a (sometimes strict) relaxation of her problem in the communication model. Moreover, these two approaches give rise to substantively different forms of optimal disclosure. For instance, direct action recommendations are always optimal in the delegated information acquisition model but, as discussed above, are not even feasible in the communication model.

The rest of the paper is organized as follows. The remainder of this section reviews related lit-erature. Section 2 introduces the model. Section 3 develops the first-order approach to Sender’s problem. Section 4 presents our main results on optimal disclosure. Section 5 discusses several model variants and compares our model to related ones in the literature. Section 6concludes. All proofs, as well as some measure-theoretic details related to the model formulation, are relegated to the appendices.

1.1 Related Literature

Bayesian Persuasion. Our Sender’s problem builds on the Bayesian persuasion paradigm of infor-mation disclosure under commitment (Rayo and Segal(2010);Kamenica and Gentzkow(2011)), and the stochasticity of Receiver’s response is reminiscent of models with a privately-informed Receiver

5Subsequently to our working paperBloedel and Segal(2018) and the contemporaneous working paper version ofLipnowski et al.

(2020a),Angeletos and Sastry(2019) made a related distinction between “state-tracking” and “price-tracking” economies in a general equilibrium setting with rationally inattentive consumers and firms.

(7)

(Rayo and Segal(2010); Kolotilin et al. (2017); Kolotilin (2018); Guo and Shmaya (2019)). Yet, as previously noted, our model differs from prior work in several important respects. First, Receiver’s response to a given signal depends not only on its information content, but on theentire disclosure rule

through his choice of attention strategy (see Subsection5.1for some implications).6 Second, the Obe-dience Principle, a cornerstone of the information design literature (Bergemann and Morris(2019)), fails in our model.7 Finally, our Sender’s problem is constrained by Receiver’s first-order condition

for optimal attention. In contemporaneous work,Boleslavsky and Kim (2018) study persuasion in a moral hazard setting where an agent’s performance is disclosed to a third party — the Receiver. While they use a related first-order approach to incorporate the incentive constraint involving the agent’s one-dimensional effort choice, the Receiver in their model pays full attention and does not have an incentive constraint.

A few recent papers study persuasion with agents who learn directly aboutthe state subject to RI

costs, while Sender’s signals are freely observed so that the Obedience Principle holds. InGentzkow and Kamenica(2014), Receiver is fully attentive whileSender bears an RI cost for her chosen

disclo-sure rule.8 We show in Subsection5.5that their model reduces to a standard single-agent RI prob-lem and generates essentially opposite comparative statics from ours. In contemporaneous work,

Matyskov´a(2018) studies a model in which Receiver can acquireadditional costly information about

the state after observing Sender’s signal, and shows that her Sender without loss completely deters

such learning.

Also in contemporaneous work,Lipnowski et al.(2020a) independently develop a model of in-formation provision to an RI agent with similar motivation to our work. In addition to the model-ing differences described above, Lipnowski et al. (2020a) focus exclusively on the case of Aligned Preferences, finding that full disclosure is always optimal in their framework with two states but is sometimes suboptimal when there are three states. However, they do not characterize the form of optimal partial disclosure. The follow-up paper Lipnowski et al. (2020b) characterizes optimal partial disclosure in a three-state example under Aligned Preferences and a special case of the Tsallis attention cost functions that we consider in Subsection 5.3, but obtains a starkly different form of optimal disclosure. Subsequently to our working paper, Wei(2020) extended theLipnowski et al.

(2020a) framework to State-Independent Sender preferences in a two-state setting and showed that full disclosure is never optimal, in contrast to our solution for this special case in Subsection5.1. We discuss in detail the relation between these papers and ours in Subsection5.4.

Rational Inattention in Games. Our paper joins a growing literature that applies the RI model to game-theoretic settings in which inattentive agents track strategically-chosen variables, such as prices (Matˇejka and McKay(2012);Martin(2017);Ravid(2020)) and political platforms (Matˇejka and

6The Bayesian persuasion literature assumes that Receiver’s response to a given signal is independent of other signals that might have

been sent. Alonso and Camara(2016) andGalperti(2019) discuss the importance of this assumption for the standard concavification approach.

7It also fails inEly et al.(2015) andLipnowski and Mathevet(2018), but for a different reason: their Receivers have intrinsic preferences

over posterior beliefs, so providing information is optimal even when it has no instrumental value, e.g., there is no decision to be made.

8Le Treust and Tomala(2019) provide information-theoretic foundations for a variant of theGentzkow and Kamenica(2014) model

in which Sender faces a capacity constraint instead of an intrinsic cost, in the same way that rate-distortion theory provides foundations for the single-agent capacity-constrained RI model.Akyol et al.(2017) study the same problem asLe Treust and Tomala(2019) in a one-dimensional Gaussian-Quadratic setting with Gaussian channel noise, which trivializes the qualitative properties of optimal disclosure.

(8)

Tabellini(2016)).9,10However, the majority of this literature considers simultaneous strategic choices without commitment, so that others’ attention strategies are taken as given, as in the more tractable “cheap talk” variant of our model. Exceptions are the monopolistic pricing model ofMatˇejka(2015), in which the buyer is inattentive to offered prices, and the security design models of Yang (2019) andYang and Zeng(2019), in which the buyer learns about the underlying asset’s cashflow (i.e., the exogenous state) while being fully attentive to the offered security. While our focus on informa-tion disclosure generates qualitatively different results from models of strategic acinforma-tion choices,11our novel first-order approach for solving Sender’s problem may prove useful for analyzing other models of strategic commitment with RI agents.

Costly Communication. The team-theoretic literature on organizational economics (Marschak and Radner (1972); Garicano and Prat (2011); Dessein and Prat (2016)) studies communication with noisy transmission (e.g.,Dessein et al.(2016)) and coarse message spaces (Cremer et al. (2007); So-bel (2015); Dilm´e (2017)), but abstracts from any incentive conflicts. More related to our work is the model ofDewatripont and Tirole(2005), in which communication is costly and subject to moral hazard. Among other differences, disclosure and attention are inflexible in their model: Sender (Re-ceiver) is required to choose a mixture of full and no disclosure (resp., attention). In contrast, our model features fully flexible disclosure and attention, giving rise to a rich and novel set of tradeoffs (see Subsection5.3for a more specific comparison).Persson(2018) considers a variant of the Dewa-tripont and Tirole(2005) model in which Receiver is subject to “information overload” when Sender “complexifies” her signals; this bears some resemblance to our analysis of inelastic Tsallis costs in Subsection5.3, but she uses a model of costly attention unrelated to the RI paradigm.

2

Model

2.1 Timing

There are two players, Sender (she) and Receiver (he). Sender has exclusive access to a state of nature θ drawn from a compact metrizable state space Θ according to the (common) prior distribution Q ∈∆(Θ).12 Let ˜θ denote the corresponding random variable.13Receiver has the exclusive authority to choose an action a from a compact metrizable action space A.

The timing of the game is as follows: First, Sender chooses adisclosure rule hS, σ i, where S is a

measurable signal space and σ : Θ → ∆ (S) is a mapping from states to distributions over signals.

9These papers build on the logit characterization for decision problems (Matˇejka and McKay(2015);Caplin et al.(2019)), originally

developed in work on the rate-distortion problem in information theory (which is dual to the RI problem). To our knowledge, it first appeared inGallager(1968), and some of our general results rely on the abstract treatment inCsisz´ar(1974).

10A separate line of work uses the RI model to generate endogenous information structures in games of incomplete information,

as-suming that all players simultaneously acquire information about the same payoff-relevant state (Yang(2015);Denti(2019);Morris and Yang(2019)).

11In particular, note that in our model thespace of uncertainty on which Receiver’s RI problem is defined (i.e., Sender’s signal space)

isendogenous, while in extant work agents learn about actions chosen from a fixed set (or about exogenous states). Some important

implications of this endogeneity are discussed in Subsection5.3.

12We will refer to priors with two-point (resp., three-point) supports asbinary and ternary.

13Capital letters denote sets and lower case letters denote random variables (with tildes) and their realizations (without tildes). For

simplicity, we often omit the sigma-algebras with which various measurable spaces are endowed. Throughout, any metrizable space X will always be endowed with the Borel sigma-algebra, and ∆(X), the space of Borel probability measures on X, will be equipped with the weak∗topology.

(9)

Then, after observing Sender’s disclosure rule (but not the realized signal s ∈ S), Receiver chooses an

attention strategyM, µ , where M is a measurable perception space and µ : S → ∆ (M) is a mapping from signals to distributions over perceptions. (We will refer to the composition µ ◦ σ : Θ → ∆(M) as Receiver’s(induced) learning strategy.) Note that Receiver’s general attention strategy allows him

full flexibility in choosingwhether to pay attention to Sender’s signals, how much attention attention

to pay, and what to attend to. Finally, given realized state θ, signal s, and perception m, Receiver

observes (only) the perception m and chooses action a ∈ A.

Remark 1(Notation). Every prior and strategy profile induces a Markov chain ˜θ → ˜s → ˜m → ˜a. As will become clear, the agent’s payoffs are fully determined by the joint distribution of ( ˜θ, ˜s, ˜m, ˜a). Thus, with slight abuse of notation, we will often suppress the explicit strategies and simply identify them with these induced random variables (leaving implicit the spaces S and M, as well as the probability space on which the random variables are defined).

2.2 Payoffs

When action a is chosen in state θ, Sender’s payoff is v(a,θ) and Receiver’s material payoff is

u(a, θ), where both u, v : A × Θ → R are continuous. Preferences are Aligned if u ≡ v, and Sender’s

preferences areState-Independent if v(a, ·) is constant for each a ∈ A. We refer to this general class

of problems as the General Model. Except where noted, for the remainder of the paper we restrict

attention to the following class of problems.14

Definition 1(Linear Model). TheLinear Modelis the special case in which:

(i) The action space is binary, A = {0, 1};

(ii) The state space is an interval Θ =hθ, θiwith θ < 0 < θ, and the prior is described by the distri-bution function G(θ) ≡ Q ˜θ ≤ θ, which we without loss assume satisfies max supp(G) = θ and

min supp(G) = θ;

(iii) Receiver’s material payoff is u(a,θ) = aθ and Sender’s payoff is v(a,θ) = βa + (1 − β)u(a,θ), for some “bias” parameter β ∈ [0, 1].15

We refer to a = 1 as “action” (e.g., purchasing a product or making an indivisible investment) and

a = 0 as “inaction.” The payoffs to inaction are normalized to zero for both players. The state θ

spec-ifies Receiver’s material payoff when he chooses to act. Sender may care both about Receiver’s payoff and about whether Receiver acts, with the bias β controlling the degree of preference alignment. Note that β = 0 corresponds to Aligned Preferences, and β = 1 corresponds to State-Independent Sender preferences.

14Many aspects of our formulation of Sender’s problem (Section3) and several results on Sender’s optimal disclosure under Aligned

Preferences (Subsection4.2) extend to the General Model. We make explicit note of this where appropriate, relegating formal develop-ments to the appendices.

15The only substantive assumptions are that the action space is binary and that Sender’s payoff is an affine function of Receiver’s

material payoff. Given these, it is without loss to define the state as θ ≡ u(1,θ) − u(0,θ) and normalize payoffs as stated above. It is also without loss to normalize the bias parameter to lie in [0, 1], as any β > 1 can be re-normalized to lie in [0, 1] and, for any β < 0, we can simply re-label actions and “reflect” the prior around the origin.

(10)

Attention Cost. In addition to his material payoff, Receiver pays an attention cost that is deter-mined by his attention strategy (and Sender’s disclosure rule). We assume the attention cost to be

λI(˜s; ˜m), where λ > 0 is a parameter and I(˜s; ˜m) is the mutual information between Sender’ signal ˜s and

Receiver’s perception ˜m, given by

I(˜s; ˜m) =          R log  dP(˜s, ˜m) dP˜s×dPm˜  dP(˜s, ˜m), if P(˜s, ˜m)P˜s×Pm˜ +∞, else

where and P(˜s, ˜m), P˜s, and Pm˜ are, respectively, the joint and the marginal distributions of random variables ˜s and ˜m. The classical foundation for using mutual information to measure

communica-tion cost is that it describes the minimal per-instance number of bits to be transmitted in an opti-mally designed noisy communication channel converting many independently drawn signals ˜s into corresponding perceptions ˜m ((Cover and Thomas, 2006, Ch. 10)). By the chain rule for mutual information, we have that

I(˜s; ˜m) = I( ˜θ; ˜m) + I(˜s; ˜m | ˜θ). (1) We refer to I(˜s; ˜m) as Receiver’s attention effort (to be distinguished from his cost, which includes the λ multiplier), and call I( ˜θ; ˜m) his learning effort. The former always exceeds the latter because I(˜s; ˜m | ˜θ) ≥ 0 (with equality if and only if ˜s and ˜m are independent conditional on ˜θ), reflecting the

additional effort required to track Sender’s signal when her disclosure rule involves state-contingent randomization.

Remark 2(Full-Attention Upper Bound). Sender’s optimal payoff in our model is always (weakly) lower

than in the standard Bayesian persuasion model with a fully attentive Receiver (i.e., with λ = 0). Indeed, every Receiver learning strategy corresponds to a distribution of his posterior beliefs over ˜θ upon observing

˜

m and, since he is Bayesian, any such distribution could be directly disclosed by Sender in the full-attention model (in which ˜m ≡ ˜s without loss). Our Sender simply faces the additional constraint that Receiver’s induced learning strategy is a Blacwkell garbling of her disclosure rule.

3

First-Order Approach

In this section, we state an explicit solution to Receiver’s RI problem and introduce a tractable formulation of Sender’s problem.

3.1 Solution to Receiver’s RI Problem

Our characterization of the solution to Receiver’s RI problem is similar to those obtained by

Matˇejka and McKay (2015) and Yang (2019), but relaxes their technical assumptions to allow for a completely general “prior” (i.e., distribution of Sender’s signals).16 We also introduce a new proof and add intuition to make the solution a building block for subsequent analysis of Sender’s problem. We begin with two preliminary observations. First, it is well known that an RI agent will not acquire more information than is needed to determine which action to take. Thus, without loss we

16Both papers assume that the prior is either discrete or absolutely continuous, and the definition of their cost functions directly in

(11)

can restrict attention to Receiver using a “direct recommendation” attention strategyM, µ in which

M = A and, upon observing perception m, he chooses action a = m with probability one. In the

Linear Model, any suchaction strategy can be described by a stochastic choice rule p (s) ≡ µ ( ˜a = 1 | s),

the probability of acting given signal s. Second, in theLinear Model, Receiver’s expected material payoff of acting given signal s depend only on the posterior mean Eh ˜θ | ˜s = siinduced by that signal. Therefore, it turns out that — given the mutual information cost function — Receiver optimally treats all signals that induce the same posterior mean identically. For purposes of characterizing his optimal attention strategy, it is therefore without loss to identify signals with their implied posteriors means, i.e., let s ≡ Eh ˜θ | ˜s = si.17

Lemma 1. Let F be the marginal distribution of posterior means chosen by Sender, and suppose that F does not put full mass on s = 0. Then Receiver’s RI problem has an essentially unique solution, in which the probability of acting is described F-almost everywhere as follows:

(i) If EF h e˜s/λi≤1, then p (s) ≡ 0, (ii) If EF h e˜s/λi≤1, then p (s) ≡ 1, (iii) Otherwise, p (s) = pα(s) ≡ αes/λ αes/λ+ 1 − α (2)

where α ∈ (0, 1) is the unique interior solution to equation

EF[pα(˜s)] = α. (3)

Proof ofLemma 1. Write mutual information as I(˜s; ˜a) = EF[DKL(p(˜s) || α)], the expected

Kullback-Leibler (KL) divergence DKL(p(s) || α) ≡ p (s) log p (s) α ! + (1 − p (s)) log 1 − p (s) 1 − α ! (4) between the probability of acting p (s) conditional on signal s and the unconditional probability of acting, which we label α. Consider the “de-coupled” problem of maximizing Receiver’s expected payoff EF[˜sp (˜s)] − λI(˜s; ˜a) by choosing a function p : Θ → [0, 1] and a number α ∈ [0, 1] without

im-posing the constraint EF[p (˜s)] = α. The KL divergence (4) is (Gateux) differentiable in p(·) when

α ∈ (0, 1), so any “interior” optimum (with α ∈ (0, 1)) must satisfy the pointwise FOC with respect to p (s), which yields (2), and the FOC with respect to α, which yields (3). Since the ignored constraint is satisfied at any such solution, it also solves Receiver’s RI problem. The KL divergence (4) is also jointly strictly convex in p(s) and α, so this is Receiver’s unique optimum whenever it exists.

To see when an interior optimum exists, note that (3) is always solved by α ∈ {0, 1} and that the function P (α) ≡ EF[pα(˜s)] is differentiable with P0(0) = EF

h

e˜s/λiand P0(1) = EF

h

e˜s/λi. Thus, by the

17Similarly, in the General Model, signals can without loss be reduced to their induced posteriorbeliefs, i.e., let s( ˆΘ) ≡ Pr ˜θ ∈ ˆΘ |˜s = s for all Borel ˆΘ ⊆ Θ. SeeLemma 8in AppendixBfor formal details. That such reductions are without loss is notex ante obvious

because Receiver’s attention cost directly depends on Sender’s disclosure rule, including the signal space itself. In Subsection5.3, we further discuss this point and study optimal disclosure under an alternative attention cost function for which signals cannot be reduced to posterior means.

(12)

Figure 1: Receiver’s optimal logit rule (left) and attention allocation rule (right).

Intermediate Value Theorem, min {P0(0), P0(1)} > 1 is sufficient for existence of an interior optimum. It is also necessary, for when P0(0) ≤ 1, we for any α ∈ (0, 1) that

P (α) < αEF

h

e˜s/λi αEFhe˜s/λi+ 1 − α

α

by Jensen’s inequality (since ˜s is non-degenerate). When P0(1) ≤ 1, a symmetric argument shows that

P (α) > α, establishing case (iii) of the proposition.

To characterize corner solutions, note that when P0(0) ≤ 1, we have EF[˜s] < λ log EFhe˜s/λi≤0 (also by Jensen’s inequality), so the optimal strategy involves paying no attention and never acting (α = 0). This establishes case (i) of the proposition. When P0(1) ≤ 1, a symmetric argument establishes case (ii).

In cases (i) and (ii) of the proposition, Receiver chooses not to pay any attention and to take the ex ante best action with probability 1. In case (iii), Receiver pays some attention but makes

mistakes, since paying full attention to some signals (i.e., setting p(s) ∈ {0, 1} when p(·) is not constant) has infinite marginal cost. The mistakes result in action probability pα(s) that takes the logit form,

depicted in Figure 1. It will be important in what follows to note that pα(s) is strictly concave on [θ, sα] and strictly convex on [sα, θ], where sαis the median point that solves pα(sα) = 1/2.

We will refer to α as Receiver’s activity level, as it describes his average propensity to act. Notice that the proof ofLemma 1establishes that we can view Receiver as choosing his activity level inde-pendently from his conditional action probabilities. This fact will be useful for interpreting Sender’s problem below.

Attention Allocation. It will also be useful to define Receiver’s attention (effort) allocation rule as

eα(s) ≡ DKL(pα(s) || α), so that Receiver’s optimal attention effort I(˜s; ˜a) can be expressed as the

expec-tation of eα(˜s) given his optimal α. The optimal attention allocation rule is depicted inFigure 1. Note that eα(0) = 0 because pα(0) = α. Intuitively, Receiver pays no costly attention to s = 0, at which point

he is indifferent between the actions. Receiver pays positive attention to all other signals: eα(s) > 0 for all s , 0. He also pays more attention to higher-stakes signals: eα(s) is increasing in s for s > 0 and

(13)

around s = 0 and strictly concave elsewhere. It asymptotes to DKL(1 || α) = log (1/α) as s → +∞ and

to DKL(0 || α) = log (1/ (1 − α)) as s → −∞.18

3.2 LP Formulation of Sender’s Problem

We now introduce a simple two-stepfirst-order approach to solving Sender’s problem.19 First, we formulate Sender’s component problem for optimal implementation of a given interior activity level α ∈ (0, 1). We can justify holding α fixed in this way by appealing to the proof of Lemma 1, which showed that we can view Receiver as choosing p(·) and α independently.

GivenLemma 1and the linearity of Sender’s payoff in the state, the posterior mean induced by signal s is also a sufficient statistic for Sender’s interim payoff given s and a fixed activity level α. Therefore, both agents’ payoffs depend on the disclosure rule only through the induced marginal distribution of posterior means. Going forward, we thus identify any disclosure rule hS, σ i with F, its induced marginal distribution of posterior means (under the given prior); all uniqueness statements about Sender’s optimal disclosure reference uniqueness of such distributions. It is well known that such a distribution is feasible (i.e., can be induced by some disclosure rule) if and only if it is a mean-preserving contraction of the prior, written as F 4 G, where < is the mean-preserving spread (MPS) order on distribution functions.20 Moreover, given any two feasible F and F0, we have F < F0 if and only if there exists some hS, σ i inducing F and some hS0, σ0iinducing F0 such that the former is Blackwell more informative than the latter (e.g.,Gentzkow and Kamenica(2016)). Thus, we say that disclosure rule F is more informative than F0 if F < F0; henceforth, any statement referencing Blackwell informativeness will have an explicit “Blackwell” qualifier.

Given this reduction, Sender’s component problem can be written as

W (α | G) ≡ sup

F4G

EF[(β + (1 − β)˜s) · pα(˜s)] (CP)

s.t. EF[pα(˜s)] = α. (Act)

A solution to this relaxed problem is denoted by Fα∗. Then, in the second step, we optimize over the activity levels α ∈ (0, 1) that are implementable, i.e., for which the feasible set in (CP) is nonempty:

W (G) ≡ sup

α∈(0,1)

W (α | G). (RP)

A solution is denoted by α∗. This approach is analogous to the two-step procedure often used to solve moral hazard models, with the activity level α playing the role of the usual one-dimensional “effort” variable, and the agent’s incentive constraint relaxed to the first-order condition. Note, however, that while the activity level α is a one-dimensional parameter, Receiver’s moral hazard problem of attention allocation is inherently infinite-dimensional.21

18Note that e

α(s) must be convex around s = 0 because the KL divergence DKL(· || α) is convex and pα(s) is smooth (hence, locally linear),

and that eα(s) must be asymptotically concave as |s| → ∞ because it is bounded. That it has exactly this one region of convexity follows

from Receiver’soptimal attention-smoothing: since the KL divergence is convex, it is cost-efficient to discriminate most finely between

states (i.e., make pα(s) steepest) where the least attention is being paid (i.e., eα(s) is smaller), which in the present binary-action setting is

precisely the interval around s = 0.

19AppendixBshows how an analogous first-order approach can be used to formulate Sender’s problem in the General Model. 20Recall that G < F if and only ifRx

θG(t)dt ≥

Rx

θF(t)dt for all x ∈

h

θ, θi.

21Also, the relationship between α and Receiver’s attention effort I(˜s; ˜a) is typically non-monotone, so the activity level does not directly

(14)

Recall fromLemma 1that theactivity constraint (Act) is necessary and sufficient for pα(s) to solve

Receiver’s RI problem given F. Thus, the first-order approach is valid for interior solutions: Corollary 1. Problem (RP)has a solution α∗∈[0, 1]. Furthermore:

(i) If α∗∈(0, 1), then Sender’s problem is solved by the corresponding disclosure rule F∗ ≡F

αsolving (CP)given α.

(ii) Otherwise, Sender’s problem is solved some Fsatisfying either EF∗ h

e˜s/λi≤1, in which case α= 0,

or EF∗ h

e˜s/λi≤1, in which case α= 1.

We will be primarily be interested in the interior solutions in case (i). The corner solutions in case (ii) arise primarily in trivial cases, e.g., when Sender’s preferences are State-Independent and action isex ante optimal for Receiver. Another instance covered by case (ii) is when the marginal attention

cost exceeds theshutdown level

λ(G) ≡ minnλ > 0 : minnEG h

eθ/λ˜ i, EG

h

e− ˜θ/λio≤1o.

We say that the prior G is attention-proof whenever the marginal cost λ ≥ λ(G), in which case Receiver would not pay attention givenany disclosure rule.22

Remark 3(Pure Noisy Persuasion). Omitting the activity constraint (Act)from the component problem

(CP)yields the unconstrained pure noisy persuasion problem

sup

F4G

EF[(β + (1 − β)˜s) · pα(˜s)] , (PNP)

which represents a Bayesian persuasion setting in which Receiver’s response is noisy for an exogenous rea-son, such as his private information (as inRayo and Segal(2010) orKolotilin et al.(2017)). Interestingly, in some settings described below, constraint (Act) does not bind in the component problem (CP) with the optimal activity level α, meaning its solution coincides with the solution to the corresponding noisy prob-lem. Yet even in this case, the characterization of the optimal α

and the comparitive statics distinguish our Sender’s problem under rational inattention from the pure noisy problem. See Subsection5.1below for further details.

4

Optimal Disclosure

This section has several parts. First, Subsection 4.1 characterizes the solution to the compo-nent problem (CP) for general Sender biases. The remaining parts analyze in detail, and provide additional intuition for, our leading special cases: Aligned Preferences in Subsection4.2and State-Independent Sender preferences in Subsection4.3.

4.1 General Characterization

The first result characterizes the solutions to Sender’s component problem (CP) for a fixed interior activity level α. For ease of presentation, we state the complete characterization only for continuous and full-support priors G, in which case optimal disclosure is shown to be deterministic.

22This follows fromLemma 1and the facts that F 4 G and the functions es/λand es/λ

(15)

Theorem 1. Given any implementable α ∈ (0, 1), there exists an essentially unique solution Fαto Sender’s component problem (CP). If G is continuous and has full support, then Fαis induced by a “two-sided censorship” rule. Specifically, there exist cutoffs θl, θh[θ, θ] such that θlθhand:

(i) All states in the “lower pooling region” [θ, θl] are pooled together into a single “low signal” l ≡

EGh ˜θ | ˜θ ≤ θl i

,

(ii) Each state in the “separating region” (θl, θh)is fully revealed,

(iii) All states in the “upper pooling region” [θh, θ] are pooled together into a single “high signal” h ≡

EGh ˜θ | ˜θ ≥ θh i

.

In general, any of these three regions may be empty.

To understand the idea behind the proof ofTheorem 6, write the Lagrangian for Sender’s compo-nent problem (CP) as

L(F, α, η) ≡ EF[(β + (1 − β)˜s) · pα(˜s)] + η · (EF[pα(˜s)] − α) ,

where η ∈ R is a Lagrange multiplier on the activity constraint (Act). For a fixed value of η, the Lagrangian maximization problem

max

F4G

L(F, α, η) (5)

is an unconstrained persuasion problem with ashifted Sender bias βη(β + η)/(1 + η). Observe that

the Lagrangian payoff function

βη+ (1 − βη)s



·pα(s) is strictly concave for low s, strictly convex for intermediate s, and strictly concave again for high s (seeFigure 2). Intuitively, this suggests that it is optimal to pool on each of the concave regions and separate on the convex region. However, this intuition is only partially correct: optimality requires, for instance, that the marginal value of slightly decreasing θh (which increases the probability of action at θh) is exactly offset by the marginal loss of the corresponding decrease in h (which lowers the probability of action at each s ≥ θh).

Formally, in the proof we find cutoffs (and corresponding F) that can be supported by a suit-able “price function,” which we explicitly construct, and appeal to complementary slackness results from Dworczak and Martini(2019). The price function — roughly, a Lagrange multiplier on the mean-preserving contraction constraint in (CP) — is a convex function that is affine on the pooling intervals, majorizes Sender’s Lagrangian payoff, and coincides with it on the support of the optimal

F. These properties, together with the shape of Sender’s Lagrangian payoff, imply that the pooled

signals l and h are determined by tangency conditions between these two functions, capturing the aforementioned marginal condition required by optimality (seeFigure 2). As a result, it is optimal to separate only on part of the convex region.

The above procedure characterizes all solutions to the Lagrangian problem (5) for any fixed η and, a fortiori, any solution to the component problem (CP). In the remainder of the proof, we verify that the latter problem indeed has a unique solution.

Remark 4 (Delegation Interpretation). Since the disclosure rules in Theorem 6 are deterministic, Re-ceiver’s attention and learning efforts coincide under these rules (i.e., I(˜s; ˜a) = I( ˜θ; ˜a) for all action strategies

˜

a). Thus, at the optimum it is “as if” Sender merely restricts the learning strategies available to Receiver, who then acquires information directly about the state (see Subsection5.4below for related discussion).

(16)

Figure 2: Sender’s objective (dashed green curve), Lagrangian objective (solid green curve), and the

Dworczak and Martini(2019) price function supporting her optimal disclosure (red curve).

Remark 5(General Priors). Several of our subsequent results are established for general priors, not just those that are continuous with full support. Theorem 6 in Appendix A shows that the above two-sided censorship characterization extends to all priors, with the caveat that randomization may be needed at the cutoffs θland θhif the prior has atoms there. Note that, since any prior can be approximated by continuous

and full-support ones, any such solution can be viewed as a “distributional limit” of the partitions described inTheorem 6. For instance, when the prior is binary and Sender has State-Independent preferences, we show in Subsection5.1below that Sender’s optimal disclosure induces two posterior means, l = θ and h ∈ (0, θ]. Such signal distributions are approximated by optimal partition for a “nearly binary” continuous, full-support prior, in which the bottom pooling interval is empty and the separation interval shrinks down to the single point {θ}, the mass of which is “split” into two signals.

4.2 Aligned Preferences

In this subsection we focus on the case in which material preferences are Aligned. This isolates Sender’s attention manipulation motive, for when Receiver is fully attentive (i.e., λ = 0) there is no incentive conflict: a disclosure rule is optimal if and only if it is more informative than an action recommendation, defined to be any disclosure rule that pools on [θ, 0) and (0, θ] and sends exactly two

signals with positive probability.23 With costly attention, theonly incentive conflict is that Sender

wants Receiver to paymore attention in order to minimize losses from his mistakes. Indeed, given a fixed amount of attention, the agents’ preferences over attention strategies would be perfectly aligned:

in a variant of the model where Receiver has no attention cost but faces acapacity constraint I(˜s; ˜a) ≤ κ,

he would always allocate the capacity in a way that’s optimal to Sender, so full disclosure would be

(17)

optimal.24

But in our model with a linear attention cost, Sender optimally induces Receiver to pay more attention by withholding some information. While it is impossible to increase the attention paid to every signal, Sender can profitably induce Receiver to reallocate his attention, focusing it on certain

signals while diverting it away from others. At the optimum, this improves the agents’ material

payoffs by increasing Receiver’s overall attention effort I(˜s; ˜a) and learning effort I( ˜θ; ˜a) (seeLemma 9 inAppendix C). We now turn to the optimal form of partial disclosure.

4.2.1 Symmetric Priors and Disclosure

To build intuition for the solution, we first consider the special case in which the prior is sym-metric. Namely, we say that distribution function H is symmetric (around θ = 0) if θ = −θ and

H(−θ) = 1 − H(θ) for all θ ∈ [θ, θ]. We assume that prior G is symmetric, and examine the optimal symmetric disclosure rule F, i.e., the optimal rule within the class of symmetric rules.25 (We drop these restrictions in Subsection4.2.2below.) The simplification from focusing on symmetric disclo-sure rules is that they all induce α = 1/2 and endisclo-sure that the constraint (Act) does not bind in Sender’s component problem (CP), so that Sender’s optimum also solves the noisy persuasion problem (PNP) given α = 1/2.26

Proposition 1. Suppose that preferences are Aligned, and that G is continuous, has full support, and is symmetric. The unique optimal symmetric disclosure rule is a two-sided censorship rule described in

Theorem 6, with θh= −θl> 0. Moreover:

(i) There exists some ˆλ > 0 such that full disclosure is optimal (i.e., θh= θ) if and only if λ ≥ ˆλ, in which

case it is uniquely optimal.

(ii) As costs vanish, optimal disclosure converges to an action recommendation: θh→0as λ → 0.

Proposition 1shows that action recommendations are only optimal when attention is free, full disclosure is only optimal when costs are sufficiently high, and in general optimal disclosure lies strictly between these extremes in the MPS order. At the optimum, Sender provides a convincing action recommendation when the stakes are high (i.e., |s| ≥ θh), and delegates learning by providing

detailed information when the stakes are low (i.e., |s| ≤ θh).

This two-sided censorship structure can be explained by the shape of Sender’s objective function

s · p1/2(s), which is convex near the indifference point s = 0 and concave elsewhere (see Figure 3, where we plot s · p1/2(s) − s/2 to symmetrize the function). When the stakes are high, this objective is concave and Sender is effectively risk-averse; by pooling, she hedges her bets that Receiver takes the correct action. When the stakes are low, the objective is convex and Sender is effectively risk-loving; she provides maximal information in a gamble that Receiver takes the correct action.

24Specifically, full disclosure would be uniquely optimal when the capacity constraint binds (i.e., would be violated under the first-best

action strategy ˜a = 1( ˜θ ≥ 0)), while otherwise any disclosure rule sufficient for the first-best action strategy would be optimal, as in the

full-attention model. SeeLemma 10inAppendix Cfor a formal statement (established for the General Model with Aligned Preferences).

25We conjecture that symmetric disclosure is uniquely optimal whenever the prior is symmetric, but were only able to verify it

numer-ically in a range of examples.

26Given any symmetric F, the solution to Receiver’s RI problem involves α = 1/2 and, conversely, constraint (Act) is satisfied at α = 1/2

(18)

Figure 3: Price function supporting Sender’s optimal disclosure (left) and its manipulation of Re-ceiver’s attention allocation (right) under Aligned Preferences.

Another intuition for the result comes from the observation that Sender’s goal is to increase Re-ceiver’s attention, which is allocated across the signal/state space according to the allocation rule

e1/2(s) shown inFigure 3. First, since Receiver pays nearly full attention to high-stakes states, pool-ing intermediate-stake states with high-stakes states increases his attention to the former by more than it reduces his attention to the latter, explaining optimality of pooling on intervals [θh, θ] and

[−θ, −θh]. (Formally, the attention allocation rule e1/2(s) is concave except in an interval around the indifference point s = 0.) On the other hand, since Receiver pays negligible attention to very low stakes, distinguishing intermediate stakes from low stakes increases his attention to the former by more than it reduces his attention to the latter, which explains the optimality of separation on inter-val [−θh, θh]. (Formally, the attention allocation rule is convex on an interval around s = 0.) Note that,

while a simple action recommendation would maximize the probability that Receiver takes the cor-rect action, it would not be optimal to Sender as it would dilute Receiver’s attention to higher-stakes states too much.27

We now turn to the comparative statics described inProposition 1. Point (i) states that full dis-closure is optimal (only) when the marginal attention cost λ is sufficiently large. This derives from the fact that Receiver’s attention allocation rule e1/2(s) “flattens out” and becomes convex over a larger interval around s = 0 as λ increases. For λ large enough, the attention allocation rule e1/2(s) is strictly convex over the entire state space, at which point full disclosure is uniquely maximizes his attention effort. (Because any symmetric prior G has zero mean,Lemma 1implies that Receiver pays some attention to any (nondegenerate) disclosure for all λ, i.e., λ(G) = ∞.) Intuitively, as λ grows, Receiver pays “nearly full” attention to fewer states and “almost no attention” to more states, so pooling with the former becomes comparatively less useful than separation from the latter as a means for attracting his attention. Conversely, point (ii) states that optimal disclosure converges to an action recommendation in the small-cost limit, in which e1/2(s) becomes concave everywhere ex-cept a vanishingly-small neighborhood of s = 0 because Receiver pays “nearly full” attention to most

27Formally, disclosing just the sign of ˜θ uniquely maximizes Pr ˜

a = 1( ˜θ ≥ 0)among all symmetric disclosure rules. This is because

p1/2(s) is strictly concave on (0, θ] and strictly convex on [θ, 0), so that pooling on (0, θ] uniquely maximizes Pr

 ˜

a = 1 | ˜θ > 0and pooling on [θ, 0) uniquely minimizes Pra = 1 | ˜˜ θ < 0when α = 1/2 is held fixed.

(19)

states. Consequently, pooling with high-stakes states becomes effective at attracting his attention to even the lowest-stakes states, without appreciably diluting his attention to the former. This can be read as a selection result, whereby a small attention cost selects one of the many optima from the

full-attention (λ = 0) model.

The next result shows how the agents’ payoffs vary as the marginal attention cost λ increases: Proposition 2. Suppose that preferences are Aligned, and that G is continuous, has full support, and is symmetric. Under the optimal symmetric disclosure rule: (i) Sender’s optimal payoff is continuous and strictly decreasing in λ ≥ 0, while (ii) Receiver’s payoff is uniquely maximized at λ = 0 and strictly decreas-ing on [ ˆλ, 0), but may be non-monotone on (0, ˆλ).

Point (i) follows from the fact that, as λ increases, Receiver’s makes more mistakes given any signal, i.e., p1/2(s) strictly decreases for s > 0 and strictly increases for s < 0. Consequently, Sender’s symmetrized objective s · p1/2(s) − s/2 strictly decreases everywhere, implying that the value of any disclosure rule strictly decreases. As for Receiver, the first parts of point (ii) are obvious. To see that his payoff may be non-monotone for intermediate λ, note that there are two countervailing effects: Sender discloses more information, but Receiver processes less of it. We show by example that the former effect can dominate the latter, leading to a local strict increase in Receiver’s payoff.

Remark 6(Incomparable Learning Strategies). We emphasize that the same basic logic — that as λ in-creases, Sender discloses more information but Receiver processes less of it — also implies that Receiver’s induced learning strategy (i.e., the composition of Sender’s disclosure rule and Receiver’s attention strategy) typically varies in a Blackwell-incomparable way as λ increases. In the present setting, it never becomes Blackwell more informative, which would contradictProposition 2(i), and it only becomes Blackwell less informative for λ ≥ ˆλ, where Sender’s optimal disclosure is held fixed (this can be seen explicitly by com-puting Receiver’s induced posteriors over ˜θ). It is straightforward to show that similar arguments apply to the otherLinear Modelenvironments studied in Subsections4.2.2and4.3below, implying that Receiver’s induced learning strategy varies with λ in a Blackwell-incomparable manner quite generally in theLinear Model.

4.2.2 General Priors and Disclosure

We now consider general prior distributions and drop the restriction to symmetric disclosure rules. While the analysis is now complicated by the fact that constraint (Act) typically binds at Sender’s optimum (see Subsection 5.1 below for an example), we show that the main qualitative insights from the symmetric special case remain valid.

Theorem 2. Suppose that preferences are Aligned. Given any prior G:

(i) As λ ↑ λ(G), every optimal disclosure rule converges to full disclosure (namely, θlθ and θh→ ¯θ).

Furthermore, if EGh ˜θ

i

= 0, then there exists some ˆλ > 0 such that full disclosure is uniquely optimal if and only if λ ≥ ˆλ.

(ii) As λ → 0, any limit of optimal disclosure rules pools on [θ, 0) and (0, θ] (namely, θl, θh0).

Fur-thermore, if 0 < supp(G), then there exists some ˆλ > 0 such that action recommendation is uniquely optimal when λ ≤ ˆλ.

(20)

Thus, while we can no longer guarantee that each of the three regions fromTheorem 6is always nonempty (as inProposition 1), point (i) of Theorem 2shows that the separation interval must be nonempty when costs are large, and point (ii) implies that both pooling regions must be nonempty when it is small. To understand point (i), note that the shutdown cost λ(G) < ∞ if and only if G has non-zero mean. In that case, when λ is just below the shutdown level, inducing Receiver to pay any attention requires near-full disclosure (even if full disclosure is not exactly optimal).

A direct but important implication ofTheorem 2 is that, for generic priors, full disclosure and action recommendations are each strictly suboptimal for a range of attention costs:

Corollary 2. For any prior G with | supp(G)\ {0} |> 2, there exist λf, λr∈0, λ(G)such that any optimal

disclosure rule is:28

(i) Strictly less informative than full disclosure when λ ≤ λf.

(ii) Strictly more informative than action recommendation when λ ≥ λr.

The intuition for Theorem 6 and Corollary 2 is similar to that for the symmetric special case considered above. Namely, withholding information by pooling intermediate- and high-stakes states attracts Receiver’s attention to the former while distracting from the latter. When λ is not too large (λ ≤ λf), the former effect dominates, leading to higher-quality decisions than would arise under

full disclosure. As λ → 0, optimal disclosure approximates an action recommendation because Re-ceiver pays “nearly full” attention to almost all signals, so that pooling the lowest-stakes states with high-stakes states significantly attracts his attention to the former without appreciably diluting his attention to the latter. On the other hand, when λ is not too small (λ ≥ λr), Receiver would not pay

any attention to a simple action recommendation, leading Sender to disclose additional information so as to attract his attention.

Notably, the hypothesis ofCorollary 2excludes (essentially only) binary priors, under which full disclosure and action recommendations are equivalent (in theLinear Model). Indeed, given a binary prior, Sender would have no scope to profitably reallocate Receiver’s attention: partial disclosure would necessarily involve pooling positive and negative states, obfuscating the optimal action and reducing Receiver’s overall attention effort. This intuition is general: Theorem 7 in Appendix C shows, among other things, that full disclosure is uniquely optimal under binary priors and Aligned Preferences for all decision problems allowed by our General Model. However, Corollary 2implies that this property is special to binary priors, and thus non-generic. (See Subsection5.4for discussion of a related result ofLipnowski et al.(2020a).)

Finally, it can also be shown that the comparative statics from Proposition 2extend to general priors — and even to the General Model with Aligned Preferences — although establishing this requires a different proof technique (seeAppendix C). The basic idea is that, for any fixed disclosure rule, Sender’s payoff is strictly increasing in Receiver’s optimal attention effort I(˜s; ˜a), which can be shown to be non-increasing in λ by a monotone comparative statics argument.

28In most known examples, we can let λ

r< λf, meaning there is a non-degenerate interval of λ values on which Sender’s optimal

disclosure must liestrictly between action recommendation and full disclosure in the MPS order, as in the symmetric case described in

References

Related documents