Analysis of Markovian Competitive Situations using
Nonatomic Games—the Shock-driven Case and Its
Dynamic Pricing Application
Jian Yang
Department of Management Science and Information Systems
Business School, Rutgers University
Newark, NJ 07102
Email: [email protected]
August 2015
Abstract
We show that equilibria derived for nonatomic games (NGs) can be used by large fi-nite games to achieve near-equilibrium performances. Rather than single-period games, we deal with dynamic games in which the evolution of a player’s state is influenced by his own action as well as other players’ states and actions. We focus on the case where both random state transitions and random actions are driven by independently generated shocks. The NG equilibria we consider are random state-to-action maps. The simple NG equilibria are adoptable to a variety of real situations where awareness of other players’ states can be anywhere between full and none. Transient results here also form the basis for a link between an NG’s stationary equilibrium (SE) and good stationary profiles for large finite games. Our approach works well for certain dynamic pricing games, both without and with production.
1
Introduction
Many multi-period competitive situations, as first noted by Shapley [29], involve randomly-evolving and action-influenced player states that in turn affect players’ payoffs. When making a decision, a player has to be concerned with not only what states other players are in and how other players will act,but also how his and others’ states and actions will influence the future evolution of states. Another complicating factor is that players may have zero, partial, or full knowledge of other players’ states before they take their actions in each period. The task of analyzing these dynamic games is conceivably challenging. As an example of such competitive situations, consider a dynamic pricing game. Multiple firms start a fixed time horizon with stocks of the same product. Each of them is bent on using pricing to influence demand and earn the highest revenue from selling their respective stocks. In any given period, a firm is aware of its own current inventory level but not the levels of others. Yet, demand arrival to the firm is both random and influenced by not only its own price, but also prices charged by other firms. In spite of its practical relevance, this dynamic pricing game has not received satisfactory treatment due probably to its obvious complexity.
The ultimate purpose for such a Markovian game is certainly to identify an equilibrium action plan that will earn him the highest total payoff from that period on, given that all players are to adhere to the same plan. But even in the stationary setting, known equilibria come in quite complicated forms that for real implementation, demand a high degree of coordination among players; see, e.g., Mertens and Parthasrathy [24], Duffie, Geanakoplos, Mas-Colell, and McLennan [7], and Solan [30]. Reaching an exact equilibrium thus being too lofty a goal, we propose to arrive to equilibrium asymptotically as the number of players grows, on the premise that the game’s nonatomic-game (NG) counterpart is analyzable. In the latter, a continuum of players are in competition, none of whom having any discernible influence on any other players and yet all players in aggregation hold sway on players’ payoffs and state evolutions. The key advantage of such a game is that its state distribution will evolve in a deterministic fashion. This results in the relatively simple form taken by the NG’s equilibria x: the pure or mixed action plan xt(st), though dependent on the time period t and
his own individual state st, is insensitive to whatever portion of the overall state distribution
that the player can observe.
When an NG equilibrium is handy, we show that it can be used on the original finite Markovian game to serve our intended purpose. Relying on intermediate results stemming from the Prohorov metric and the weak Law of Large Numbers (LLN) concerning empirical
distributions, we establish two main results. In Theorem 1, we show that the empirical distribution of players’ states, which is itself random in the finite game, will nevertheless converge in probability to the deterministic distribution as predicted for the NG counterpart when the number of players grows to infinity. This convergence paves way for Theorem 2, which states that players can use the observation-blind NG equilibrium in the finite-player situation and gain an average performance that is ever harder to beat as the number of players grows. In either of the two results, the “average” on players’ states is assessed on the state distribution prevailing in either the NG or the finite game.
After assuming time-invariant payoff and transition functions, as well as fixed discount-ings over time and an infinite time horizon, we obtain a stationary setting. For that, we establish Theorem 3, effectively our affirmative answer to whether stationary equilibria (SE) studied in past literature can be useful to large finite games. The derivation of this result through the transient Theorem 2 indicates that transient results are probably more funda-mental. It is also not difficult to introduce exogenous global states to our setting. The above theory will be most useful when the NG counterpart is relatively easy to deal with in compari-son to the corresponding finite games. Besides evidence in literature, this point is illustrated through the aforementioned dynamic pricing game, which demonstrates the usefulness of the transient result Theorem 2. We extend the game by considering locked-in production, wherein every firm uses production to bring its inventory back up to a pre-determined level whenever it becomes empty. This game is asymptotically analyzable thanks to the stationary result Theorem 3.
We summarize this paper’s contribution as follows. First, we established one more link between NGs and their finite-game counterparts. Previously, links were mostly established for single-period games, special multi-period games without individual states, or games ex-hibiting stationary features. The introduction of information-carrying individual states allow in for proper treatment a much wider swath of applicable situations involving present-future tradeoffs and transient properties. Second, we demonstrated that the usefulness of SEs to large finite games stems from more fundamental properties possessed by transient NG equilibria. Third, we added to the revenue management literature a new way to tackle the notoriously difficult dynamic pricing game in which firms are capable of reacting in real time to their changing inventory levels and are yet blind to other firms’ inventory levels.
Here is our plan for the remainder of the paper. We spend Section 2 on a survey of related research. Basic model primitives are presented in Section 3, and the nonatomic as
well as finite games are introduced in Section 4. We discuss the convergence of environments in Section 5 and that of equilibria in Section 6. Based on the above transient results, we develop in Section 7 a link between SEs in infinite games and equilibria in large finite games with stationary features. These results are applied to a dynamic pricing game involving nonperishable items in Section 8. We conclude the paper in Section 9.
2
Literature Survey
NGs are often easier to analyze than their finite counterparts, because in them, the action of an individual player has no impact on payoffs of the other players. Systematic research on NG started with Schmeidler [28]. He formulated a single-period semi-anonymous NG, wherein the joint distribution of other players’ types and actions may affect any given player’s payoff. When the action space is finite, Schmeidler established the existence of pure equilibria when the game becomes anonymous, so that only the distribution of other players’ actions matters. Mas-Colell [23] showed the existence of distributional equilibria in anonymous NGs with compact action spaces. Khan, Rath, and Sun [19] identified a certain limit to which Schmeidler’s result can be extended. Links between NGs and their finite counterparts were covered in Green [12], Housman [14], Carmona [6], Kalai [18], Al-Najjar [4], and Yang [34].
This paper differs from the above by its focus on multi-period games. For multi-period games without individual states that allow past actions to impact future gains, Green [11], Sabourian [27], and Al-Najjar and Smorodinsky [3] showed that equilibria for large games are nearly myopic. With individual states that inherit traces of past actions, the games we study pose new challenges. Our main message is about observation-blindness. An NG equilibrium for our situation is certainly not myopic as it takes the current action’s future consequences into account. Rather, it is insensitive to real-time observations made on other players’ states. We succeed in showing that such a simple action plan can be used in finite situations with randomly evolving state distributions of which a player may have zero, partial, or full knowledge, and still generate decent payoffs. The type of NGs we deal with are similar to sequential anonymous games studied by Jovanovic and Rosenthal [16], who established existence of distributional equilibria. The result was generalized to games involving aggregate shocks by Bergin and Bernhardt [5]. Different from these papers, we work on the link between NGs and finite games, not the NGs themselves.
individual player influences as done through the NG approach. In addition, they strove for the so-called stationary equilibria (SE), that further stressed the long-run steady-state nature of individual action plans and system states; see, e.g., Hopenhayn [13] and Adlakha and Johari [1]. The oblivious equilibrium (OE) concept as proposed by Weintraub, Benkard, and van Roy [31], though accounting for impacts of large players, took the same stationary approach by letting firms beware of only long-run average system states. We caution that the implicit stationarity of SE or OE renders it helpless to applications that are transient by nature; for instance, the dynamic pricing game to be studied wherein the inventory level of every player can only decrease over time.
Some recent works also contributed to the understandings of links between equilibria of infinite-player games and their finite-player brethren. Weintraub, Benkard, and van Roy [32] did so for a setting where long-run average system state can be defined. Adlakha, Johari, and Weintraub [2] established the existence of SE and achieved a similar conclusion by using only exogenous conditions on model primitives. While this paper tackles the case where exogenous shocks drive state evolution and decision making, Yang [35] dealt with the more general setting where such shocks are not necessarily identifiable; however, technical challenges faced there forced state and action spaces to be discrete.
The dynamic pricing example has its own practical significance. In today’s marketplace, companies are often obliged to take the competition element into their revenue management considerations. The permeation of Internet access among virtually all homes and businesses brings price competition to an even more intense, global level. Sites that pool dynamically changing prices from tens or even hundreds of providers and let customers choose the most suitable ones for themselves include Expedia, Hotwire, Orbitz, and Travelocity, just to name a few. The competition is intense in several industries and is most manifest in the hospitality industry and the second market of ticket selling.
Yet, research on competitive dynamic pricing has only been sporadic. Perakis and Sood [26] studied such a problem in which each firm reacts most conservatively to the inventory-independent price schedules of its competitors. Xu and Hopp [33] considered a multi-seller dynamic pricing game, where demand arrival is governed by a geometric Brownian motion and only the lowest-priced seller sells. Lin and Sibdari [21] and Levin, McGill, and Nediak [20] resolved equilibrium existence issues for specific dynamic pricing situations; neither revealed structural properties of their equilibria. Yang and Xia [36] tackled a continuous-time variant of the current pricing example, and found in computation that an NG equilibrium would
fare well in an actual competitive situation with as few as 30 firms.
3
Model Setup
We use complete separable metric space S for individual states and separable metric space X for player actions. Every time players are engaged, each one of them is aware of his own individual state. To each player, other players’ states and actions are immediately felt in a semi-anonymous fashion: what matters is the joint distribution of other players’ states and actions. We let shocks drive players’ decision making processes as well as their random state transitions. That is, players will choose their actions according to shocks drawn from a complete separable metric space G; after taking their actions, players will each experience another shock from a complete separable metric space I which will sway the evolution of their states. The completeness requirements on S, G, and I stem from the need to invoke Lemma 3 in Appendix A. These are not stringent, as multi-dimensional Euclidean spaces, as well as finite and countable sets endowed with the discrete metric, are all complete separable metric spaces.
Given a separable metric space A, we use dA to denote its metric, B(A) its Borel
σ-field, and P (A) the set of all probability measures (distributions) on the measurable space (A, B(A)). The space P (A) is metrized by the Prohorov metric ρA, which induces on it
the weak topology. The aforementioned joint state-action distribution comes from the space P (S × X). Given separable metric spaces A and B, we use M (A, B) to represent all mea-surable functions from A to B, i.e., functions y : A → B such that, for any B0 ∈ B(B), the set y−1(B0) = {a ∈ A | y(a) ∈ B0} is a member of B(A). We let the distribution of players’ action-inducing shocks be fixed at some ¯γ ∈ P (G) and that of players’ transition-inducing shocks be fixed at some ¯ι ∈ P (I).
We let time periods be indexed by 1, 2, ..., ¯t. For period t = 1, ..., ¯t, we suppose a player’s state s ∈ S, his action x ∈ X, and the instantaneous environment µ ∈ P (S × X) he faces, together determine his payoff in a period. Thus, we introduce a bounded payoff function
˜
ft : S × X × P (S × X) → [− ¯ft, ¯ft], (1)
where ¯ft is some positive constant. It is required that ˜ft(·, ·, µ) ∈ M (S × X, [− ¯ft, ¯ft]) for
state under shock i ∈ I, we introduce mapping ˜
st: S × X × P (S × X) × I → S. (2)
We require that ˜st(·, ·, µ, ·) ∈ M (S × X × I, S) at every µ ∈ P (X × S). In Yang [35], state
evolution is governed by some ˜gt : S × X × P (S × X) → P (S). Here, we face the special
case with
˜
gt(s, x, µ) = ¯ι · (˜st(s, x, µ, ·))−1. (3)
However, when S is a closed subset of the real line <, it would suffice to study the current case with I = [0, 1], ¯ι being uniform, and
˜
st(s, x, µ, i) = inf{s ∈ S|˜gt(s, x, µ|(−∞, s] ∩ S) ≥ i}, (4)
where we have used θ(a|B0) for (θ(a))(B0) when θ : A → B(B), a ∈ A, and B0 ∈ B(B). The action plans we consider are of the form yt ∈ M (S × G, X). That is, a player will use
action xt= yt(s, g) in period t when he starts with state s ∈ S and receives action-inducing
shock g. This can be considered as a special case to the random action plan χt : S → P (X)
studied in Yang [35], because of the association
χt(s) = ¯γ · (yt(s, ·))−1. (5)
However, when X is a closed subset of the real line, we can let G be [0, 1], ¯γ be uniform, and
yt(s, g) = inf{x ∈ X|χt(s|(−∞, x] ∩ X) ≥ g}. (6)
Also, we can always use a singleton G and hence degenerate ¯γ to handle a pure action plan xt: S → X.
We introduce more notation to better describe finite games. Let A be a separable metric space. For a ∈ A, we use εa to denote the singleton probability measure with εa({a}) = 1.
For a = (a1, ..., an) ∈ Anwhere n ∈ N , the set of natural numbers, we use εaforPnm=1εam/n.
The two uses are consistent. We also use Pn(A) to denote the space of probability measures
of the type εa for a ∈ An, i.e., the space of empirical distributions generated from n samples.
4
Nonatomic and Finite Games
For obvious reasons, we will call a state distribution σ ∈ P (S) a pre-action environment and a joint state-action distribution µ ∈ P (S × X) an in-action environment.
Given an initial pre-action environment σ1 ∈ P (S), we can define a nonatomic game
Γ(σ1). For it, we can use x = (xt| t = 1, ..., ¯t) ∈ (M (S × G, X))¯t to denote a policy profile.
Here, each xt ∈ M (S × G, X) is a map from a player’s state-shock pair to his prescribed
action to take. Along with the given initial environment σ1, this profile will help generate a
deterministic pre-action environment trajectory σ = (σt | t = 1, 2, ..., ¯t, ¯t + 1) ∈ (P (S))¯t+1.
This allows a player’s policy to be observation-blind; that is, what portion of σtis observable
to the player in each period t is not of any concern. The determinism of the environment evolution in Γ(σ1) can be heuristically argued along the line of LLN. However, in view of
Feldman and Giles [9] and Judd [17], a rigorous proof does not seem to be in the offing. Thus, we have made this feature an axiom of the nonatomic game rather than a property derivable from its finite counterparts. We believe this adds to the weight of our main result instead of the opposite. It can now be said that, a solution to an NG, model-wise not necessarily representing the limiting behavior of a continuum of players in the most rigorous sense, can nevertheless be shown through LLN to offer proper solutions to large finite games.
In the following, we discuss how the deterministic trajectory can be formed. Let t = 1, ..., ¯t be given. When all players form state distribution σt ∈ P (S) at the beginning and adopt the
same plan xt ∈ M (S × G, X), the in-action environment µt ∈ P (S × X) to be experienced
by all players takes the form
µt= (σt× ¯γ) · (1S, xt)−1, (7)
where 1S stands for the projection map from S × G to S. The meaning for (7) is that, for
any W0 ∈ B(S × X),
µt(W0) = (σt× ¯γ)((1S, xt)−1(W0)) =
Z
G
σt({s ∈ S | (s, xt(s, g)) ∈ W0}) · ¯γ(dg). (8)
For a player who starts with state st and has experienced action-inducing shock gt and
transition-inducing shock it, his new state will be governed by (2) to follow
st+1= ˜st(st, xt(st, gt), (σt× ¯γ) · (1S, xt)−1, it). (9)
To describe the transition of the overall pre-action environment from σtto σt+1 under action
plan xt, we define operator Tt(xt) on P (S). Note that states are distributed according to
σt, action-inducing shocks are distributed according to ¯γ, and transition-inducing shocks are
distributed according to ¯ι. So following (9), for any S0 ∈ B(S), σt+1(S0) = [Tt(xt) ◦ σt](S0)
=RGRIσt({s ∈ S | ˜st(s, xt(s, g), (σ × ¯γ) · (1S, xt)−1, i) ∈ S0}) · ¯ι(di) · ¯γ(dg).
The environment trajectory alluded to earlier is merely
σ = (T[1,t−1](x[1,t−1]) ◦ σ1 | t = 1, 2, ..., ¯t, ¯t + 1), (11)
where T[tt0](x[tt0]) is understood to be the identity map whenever t0 ≤ t − 1, and when t ≤ t0,
T[tt0](x[tt0]) = Tt0(xt0) ◦ T[t,t0−1](x[t,t0−1]). (12)
For some n ∈ N and initial state distribution σ1 ∈ Pn(S), we can also define an
n-player game Γn(σ1). Note the initial pre-action environment σ1 must be of the form
εs1 = ε(s11,s12,...,s1n), where each s1m is player m’s initial state. The game’s payoff and
state transition are still governed by (1) and (2). In period t, the pre-action environ-ment is also some σt = ε(st1,...,stn) ∈ Pn(S) ⊂ P (S). Hence, the in-action environment
µt1 ∈ Pn−1(S ×X) ⊂ P (S ×X) experienced by any designated player 1 is the empirical
distri-bution εst,−1,yt,−1 = ε((st2,yt2),...,(stn,ytn))when each player m starts with state stm∈ S and takes
action ytm ∈ X. Suppose players still adopt policy x = (xt | t = 1, ..., ¯t) ∈ (M (S × G, X))t¯,
which is but the crudest of many choices presented to the n players. But we will see later that this restriction is not going to do much harm.
Simplistic as it may seem, x will not help generate a deterministic environment trajectory. For shock vectors gt = (gt1, ..., gtn) ∈ Gnand it= (it1, ..., itn) ∈ In, we can define Tnt(xt, gt, it)
as the operator on Pn(S) that converts a period-t pre-action environment into a period-(t+1)
one. Thus following (9), εst+1 = Tnt(xt, gt, it) ◦ εst is such that
st+1,m = ˜st(stm, xt(stm, gtm), εst,−m,gt,−m · (1S, xt)
−1
, itm), ∀m = 1, 2, ..., n, (13)
where each εst,−m,gt,−m represents the empirical distribution built on state-shock pairs (st1, gt1),
..., (st,m−1, gt,m−1), (st,m+1, gt,m+1), ..., (stn, gtn). Again, we define Tn,[tt0] as the identity map
when t0 ≤ t − 1 and when t ≤ t0, let
Tn,[tt0](x[tt0], g[tt0], i[tt0]) = Tnt0(xt0, gt0, it0) ◦ Tn,[t,t0−1](x[t,t0−1], g[t,t0−1], i[t,t0−1]). (14)
The evolution of pre-action envirnoments σt= εst is guided by the random shocks gt and it,
and hence is stochastic by nature.
5
Convergence of Environments
Even before discussing payoffs and equilibria, we have in Theorem 1 a link between finite games and their NG counterpart. It reflects that stochastic environment pathways experi-enced by large finite games converge to the NG’s deterministic environment trajectory.
Let A, B, and C be separable metric spaces and πB ∈ P (B) be a distribution. We use
K(A, B, πB, C) ⊂ M (A × B, C) to represent the space of all functions from A × B to C that
are uniformly continuous in a probabilistic sense. The criterion for y ∈ K(A, B, πB, C) is as
follows. For any > 0, there exists δ > 0, so that for any a, a0 ∈ A satisfying dA(a, a0) < δ,
πB({b ∈ B|dC(y(a, b), y(a0, b)) < }) > 1 − .
When B is a singleton and hence πB is degenerate, y ∈ K(A, B, πB, C) merely means that
y is a uniformly continuous function from A to C, a situation we denote by y ∈ K(A, C). Probabilistic continuity in the sense of K(A, B, πB, C) has another much anticipated link
with ordinary continuity.
Proposition 1 (I) Let πB ∈ P (B), y ∈ M (A × B, C), and θ : A → P (C) be crafted from
θ(a) = πB· (y(a, ·))−1.
When y ∈ K(A, B, πB, C), one has θ ∈ K(A, P (C)).
(II) On the other hand, suppose B = [0, 1], πB ∈ P (B) is uniform, C is a closed subset
of the real line, θ is a mapping from A to P (C), and y : A × B → C is crafted from y(a, b) = inf{c ∈ C|θ(a|(−∞, c] ∩ C) ≥ b}.
Then, we still have θ(a) = πB· (y(a, ·))−1. Suppose in addition that y(a, ·) is continuous at
an a-independent rate. Then, when θ(·) is continuous at a given a ∈ A, one has y(an, ·)
converging to y(a, ·) in probability for any sequence an converging to a.
We may see from Proposition 1 that, as a requirement, y ∈ K(A, B, πB, C) is stronger
than θ(a) = πB · (y(a, ·))−1 being uniformly continuous in a, but probably not by much.
Technical developments of this section are left to Appendix A. We now make two assumptions on the transition function ˜st:
(S1) For every µ ∈ P (S × X), the function ˜st(·, ·, µ, ·) is a member of K(S × X, I, ¯ι, S).
That is, for any µ ∈ P (S × X) and > 0, there exist δS > 0 and δX > 0, so that for any
s, s0 ∈ S and x, x0 ∈ X satisfying d
S(s, s0) < δS and dX(x, x0) < δX,
¯
ι({i ∈ I | dS(˜st(s, x, µ, i), ˜st(s0, x0, µ, i)) < }) > 1 − .
(S2) Not only is there ˜st(s, x, ·, ·) ∈ K(P (S × X), I, ¯ι, S) at every (s, x) ∈ S × X, but
µ ∈ P (S × X) and > 0, there exists δ > 0, so that for any µ0 ∈ P (S × X) satisfying ρS×X(µ, µ0) < δ, as well as any s ∈ S and x ∈ X,
¯ι({i ∈ I | dS(˜st(s, x, µ, i), ˜st(s, x, µ0, i)) < }) > 1 − .
For separable metric space A, we use (An, Bn(A)) to denote the product measurable space that houses n-long sample sequences. Given π ∈ P (A), we use πn to denote the product measure on (An, Bn(A)). We have a technical result showing that a one-step evolution in a
big game is not that much different from that in a nonatomic game.
Proposition 2 Given separable metric space A, distribution π ∈ P (A), and environment σ ∈ P (S), suppose sn = (sn(a) | a ∈ An) for each n ∈ N is a member of M (An, Sn), and
εsn(a) converges to σ in probability, to the effect that
πn({a ∈ An | ρS(εsn(a), σ) < }) > 1 − ,
for any > 0 and any n large enough. Then, for any t = 1, ..., ¯t − 1, Tnt(x, g, i) ◦ εsn(a) will
converge to Tt(x) ◦ σ in probability for any probabilistically continuous x. That is, for any
x ∈ K(S, G, ¯γ, X),
(π × ¯γ × ¯ι)n({(a, g, i) ∈ (A × G × I)n | ρS(Tnt(x, g, i) ◦ εsn(a), Tt(x) ◦ σ) < }) > 1 − ,
for any > 0 and any n large enough.
Imagine that (A, B(A), π) provides exogenous shocks that drive games’ evolutions up to period t: A = S×Gt−1×It−1and π = σ
1ׯγt−1ׯιt−1. Proposition 2 states that, when starting
period t with initial state vectors sn(a) in n-player games that in aggregation increasingly
resemble the given starting distribution σ for the NG, one will still get state vectors in large games that in aggregation resemble the NG’s state distribution after the period-t transition. When exploiting this proposition iteratively, we can arrive at this section’s main result on the convergence of environments.
Theorem 1 Let a policy profile x[t¯t] ∈ (M (S×G, X)) ¯
t−t+1for periods t, t+1, ..., ¯t be such that
each xt0 is a member of K(S, G, ¯γ, X). Then, when we sample st = (st1, ..., stn) from a given
pre-action environment σt∈ P (S), the sequence (Tn,[t,t0−1](x[t,t0−1], g[t,t0−1], i[t,t0−1]) ◦ εs t) | t
0 =
t, t + 1, ..., ¯t, ¯t + 1) will converge to (T[t,t0−1](x[t,t0−1]) ◦ σt | t0 = t, t + 1, ..., ¯t, ¯t + 1) in probability.
That is, for any > 0 and any n large enough, (σt× ¯γ
¯
t−t+1× ¯ι¯t−t+1)n( ˜A
where ˜An() ∈ Bn(S × G¯t−t+1× I¯t−t+1) is such that, for any (st, g[t,¯t], i[t,¯t]) ∈ ˜An(),
ρS(Tn,[t,t0−1](x[t,t0−1], g[t,t0−1], i[t,t0−1]) ◦ εs
t, T[t,t0−1](x[t,t0−1]) ◦ σt) < , ∀t
0 = t, t + 1, ..., ¯t, ¯t + 1.
Suppose an NG starts period t with pre-action environment σtand a slew of finite games
start the period with pre-environments that are sampled from σt. Let the evolution of
both types of games be guided by players acting according to the same probabilistically continuous policy profile x[t¯t]. Then, as the numbers of players n involved in finite games
grow to infinity, Theorem 1 predicts for ever less chances for the finite games’ period-t0 environments Tn,[t,t0−1](x[t,t0−1], g[t,t0−1], i[t,t0−1]) ◦ εs
t to be even slightly away from the NG’s
deterministic period-t0 environment T[t,t0−1](x[t,t0−1]) ◦ σt.
6
NG and Finite-game Equilibria
In defining Γ(σ1)’s equilibria, we subject a candidate policy profile to the one-time deviation
of a single player, who is by default infinitesimal in influence. The deviation will not alter the environment trajectory corresponding to the candidate profile. Thus, we define vt(st, σt, x[t¯t])
as the total expected payoff a player can make from time t to ¯t, when he starts with state st∈ S and outside environment σt∈ P (S), and players’ policy profile from t to ¯t is given by
x[t¯t] = (xt, ..., x¯t) ∈ (M (S × G, X))t−t+1¯ . As a terminal condition, we have
v¯t+1(s¯t+1, σ¯t+1) = 0. (15)
For t = ¯t, ¯t − 1, ..., 1, we have the recursive relationship vt(st, σt, x[t¯t]) = R Gf˜t(st, xt(st, gt), (σt× ¯γ) · (1S, xt) −1) · ¯γ(dg t) + R G R I× ×vt+1(˜st(st, xt(st, gt), (σt× ¯γ) · (1S, xt)−1, it), Tt(xt) ◦ σt, x[t+1,¯t]) · ¯ι(dit) · ¯γ(dgt), (16)
due to the dynamics illustrated in (9) and (10).
Given policy y ∈ M (S × G, X), we define vt0(st, σt, x[t¯t], y) as the total expected payoff
similar to the just defined entity, with the only difference that, in period t, the current player is to adopt policy y instead of xt. For t = 1, 2, ..., ¯t, we have
vt0(st, σt, x[t¯t], y) = R Gf˜t(st, y(st, gt), (σt× ¯γ) · (1S, xt) −1) · ¯γ(dg t) + R G R I× ×vt+1(˜st(st, y(st, gt), (σt× ¯γ) · (1S, xt)−1, it), Tt(xt) ◦ σt, x[t+1,¯t]) · ¯ι(dit) · ¯γ(dgt). (17)
We deem policy x∗ = (x∗t | t = 1, 2, ..., ¯t) ∈ (M (S × G, X))¯t a Markov equilibrium for the
game Γ(σ1) when, for every t = 1, 2, ..., ¯t and y ∈ M (S × G, X),
R Svt(st, T[1,t−1](x ∗ [1,t−1]) ◦ σ1, x ∗ [t¯t]) · [T[1,t−1](x ∗ [1,t−1]) ◦ σ1](dst) ≥R Sv 0 t(st, T[1,t−1](x∗[1,t−1]) ◦ σ1, x[t¯∗t], y) · [T[1,t−1](x∗[1,t−1]) ◦ σ1](dst). (18)
That is, policy x∗ will be regarded an equilibrium when it cannot be beaten by any plan y ∈ M (S × G, X) in any period t in an average sense that is defined by the period-t environment σt= T[1,t−1](x∗[1,t−1]) ◦ σ1 induced by the policy itself. We caution that (18) is weaker than
vt(st, T[1,t−1](x∗[1,t−1]) ◦ σ1, x ∗ [t¯t]) ≥ v 0 t(st, T[1,t−1](x∗[1,t−1]) ◦ σ1, x ∗ [t¯t], y), (19)
for every st ∈ S. On the other hand, since y ∈ M (S × G, X) allows for much freedom in
choosing for each state s ∈ S and shock g ∈ G a competitive reaction y(s, g), there is not much difference between the two criteria aside from measurability subtleties.
For an n-player game, let vnt(st1, εst,−1, x[t¯t]) be the total expected payoff player 1 can
make from t to ¯t, when he starts with state st1 ∈ S, other players’ initial environments are
describable by their aggregate empirical state distribution εst,−1 = ε(st2,...,stn), and all players
adopt the policy x[t¯t] = (xt, xt+1, ..., x¯t) ∈ (M (S × G, X))¯t−t+1 from period t to period ¯t. As
a terminal condition, we have
vn,¯t+1(st+1,1¯ , εs¯t+1,−1) = 0. (20)
For t = ¯t, ¯t − 1, ..., 1, we have the recursive relationship vnt(st1, εst,−1, x[t¯t]) = R Gnγ¯n(dgt) × { ˜ft(st1, xt(st1, gt1), εst,−1,gt,−1· (1S, xt) −1) +RIn¯ι n(di t) × vn,t+1(˜st(st1, xt(st1, gt1), εst,−1,gt,−1 · (1S, xt) −1, i t1), [Tnt(xt, gt, it) ◦ εst]−1, x[t+1,¯t])}, (21)
due to the dynamics illustrated in (9) and (13). By [Tnt(xt, gt, it) ◦ εst]−1, we mean εst+1,−1,
where εst+1 is Tnt(xt, gt, it) ◦ εst as defined through (13).
Given policy y ∈ M (S × G, X), let vnt0 (st1, εst,−1, x[t¯t], y) be the total expected payoff of
player 1 similar to the just defined entity, with the only difference that, in period t, player 1 is to adopt policy y. For t = 1, 2, ..., ¯t, we have
v0nt(st1, εst,−1, x[t¯t], y) = R Gn¯γn(dgt) × { ˜ft(st1, y(st1, gt1), εst,−1,gt,−1· (1S, xt) −1) +R In¯ιn(dit) × vn,t+1(˜st(st1, y(st1, gt1), εst,−1,gt,−1· (1S, xt) −1, i t1), [Tnt(xt, gt, it) ◦ εst]−1, x[t+1,¯t])}. (22)
Let σ = (σt | t = 1, ..., ¯t) ∈ (P (S))¯t be a sequence of environments. For ≥ 0, we deem
x∗ = (x∗t | t = 1, ..., ¯t) ∈ (M (S × G, X))¯t an -Markov equilibrium for the game family (Γn(εs1) | s1 ∈ S
n) in the sense of σ when, for every t = 1, ..., ¯t and y ∈ M (S × G, X),
Z Sn vnt(st1, εst,−1, x ∗ [t¯t]) · σ n t (dst) ≥ Z Sn vnt0 (st1, εst,−1, x ∗ [t¯t], y) · σ n t (dst) − . (23)
That is, action plan x∗ will be an -Markov equilibrium in the sense of σ when under its guidance, the average payoff from any period t on will not be improved by more than through any deviation, where the “average” is taken with respect to state distribution σt.
To go further, we need to assume that the single-period payoff functions ˜ftare continuous:
(F1) Each ˜ft(s, x, µ) is continuous in (s, x).
(F2) Each ˜ft(s, x, µ) is continuous in µ at an (s, x)-independent rate.
First, we have a couple of intermediate results.
Proposition 3 vt(st, σt, x[t¯t]) is continuous in st under probabilistically continuous xt’s.
Proposition 4 Let σt ∈ P (S) and x[t¯t] ∈ (K(S, G, ¯γ, X)) ¯
t−t+1 be given. Suppose sequence
st,−1 = (st2, st3, ...) is sampled from σt, then vnt(st1, εsn
t,−1, x[t¯t]) will converge vt(st1, σt, x[t¯t])
in probability at an st1-independent rate, where snt,−1 stands for the cutoff (st2, st3, ..., stn).
Now we are in a position to present the main transient result.
Theorem 2 For some σ1 ∈ P (S), suppose x∗ = (x∗t | t = 1, 2, ..., ¯t) ∈ (K(S, G, ¯γ, X)) ¯ t is
a probabilistically continuous Markov equilibrium of the nonatomic game Γ(σ1). Then, for
any > 0 and large enough n ∈ N , this x∗ is also an -Markov equilibrium for the game family (Γn(εs1) | s1 ∈ S
n) in the sense of (T
[1,t−1](x∗[1,t−1]) ◦ σ1 | t = 1, ..., ¯t). Furthermore, the
same is true in the sense of (Tn,[1,t−1](x∗[1,t−1], g[1,t−1], i[1,t−1]) ◦ σ1 | t = 1, ..., ¯t) in probability.
The latter means that, for any > 0 and large enough n ∈ N , for any t = 1, ..., ¯t and y ∈ M (S × G, X), R Snσ n 1 (ds1) × R Gn·(t−1)¯γ n·(t−1)(dg 1,t−1]) × R In·(t−1)¯ι n·(t−1)(di [1,t−1])× ×vnt(st,1(s1, x∗[1,t−1], g[1,t−1], i[1,t−1]), εst,−1(s1, x ∗ [1,t−1], g[1,t−1], i[1,t−1]), x∗[t¯t]) ≥R Snσ1n(ds1) × R Gn·(t−1)¯γn·(t−1)(dg1,t−1]) × R In·(t−1)¯ιn·(t−1)(di[1,t−1])× ×v0 nt(st,1(s1, x∗1,t−1], g[1,t−1], i[1,t−1]), εst,−1(s1, x ∗ [1,t−1], g[1,t−1], i[1,t−1]), x ∗ [t¯t], y) − ,
where both st,1(s1, x∗[1,t−1], g[1,t−1], i[1,t−1]) and εst,−1(s1, x
∗
[1,t−1], g[1,t−1], i[1,t−1]) should be
under-stood as being extracted from εst(s1, x
∗
Technical developments of this section can be found in Appendix B. Theorem 2 says that, when there are enough of them, players in a finite game can agree on an NG equilibrium and expect to lose little on average; also, the distribution based on which “average” is taken is itself an accurate assessment of what players’ states would be had they followed the NG equilibrium all along. A prominent feature of an NG equilibrium is its insensitivity to vicissitudes of pre-action environments, which players in a finite game may or may not be observant of. In the NG limit, the evolution of these environments is deterministic after all. An equilibrium of the NG counterpart, which is necessarily observation-blind, serves as a good asymptotic equilibrium for finite games when there are enough players; and, this asymptotic result is independent of the observatory power of players in the finite games. The root cause of this is, as stated in Theorem 1, that the aggregate environment in large games evolves in a nearly deterministic fashion and hence the value of observation has vanished.
7
A Stationary Situation
Now we study an infinite-horizon model with stationary features. To this end, we suppose there is a state transition function ˜s, so that
˜
st(s, x, µ, i) = ˜s(s, x, µ, i), ∀t = 1, 2, .... (24)
In addition, we suppose there is a payoff function ˜f , so that ˜
ft(s, x, µ) = ¯φt−1· ˜f (s, x, µ), ∀t = 1, 2, ..., (25)
where ¯φ ∈ [0, 1) is a discount factor. Also, we use ¯f for the bound ¯f1 that appears in (1).
For the nonatomic game Γ with the above stationary features, we use x = (x(s, g) | s ∈ S, g ∈ G) ∈ M (S ×G, X) to represent a stationary policy profile. It is a map from the current period’s state and action-inducing shock to the player’s action. Given an x ∈ M (S × G, X), we denote by T (x) the operator on P (S) that converts one state distribution σ to another σ0 so that, for every S0 ∈ B(S),
σ0(S0) = [T (x) ◦ σ](S0)
=RGRIσ({s ∈ S | ˜s(s, x(s, g), (σ × ¯γ) · (1S, x)−1, i) ∈ S0}) · ¯ι(di) · ¯γ(dg).
(26)
An environment σ ∈ P (S) is said to be associated with x when
That is, we consider σ ∈ P (S) to be associated with x ∈ M (S × G, X) when the former is invariant under the state transition facilitated by the T (x) operator.
Suppose stationary policy x ∈ M (S × G, X) is associated with pre-action environment σ ∈ P (S). For t = 0, 1, ..., we define vt(s, σ, x) as the total expected payoff a player can
make from period 1 to t, when he starts period 1 with state s ∈ S and outside environment σ ∈ P (S), while all players keep on using policy x from period 1 to t. From Appendix C, we can see that the sequence {vt(s, σ, x) | t = 0, 1, ...} has a limit point, say v∞(s, σ, x).
This v∞(s, σ, x) can be understood as the infinite-horizon total discounted expected payoff
a player can obtain by starting with state s and environment σ, while all players adhere to the action plan x.
Given policy y ∈ M (S ×G, X), we can define v0∞(s, σ, x, y) as the total infinite-horizon ex-pected payoff similar to the just defined entity, with the only difference in the very beginning that, the current player is to adopt policy y instead of x. We leave recursive relationships involving these value functions to Appendix C as well. We deem a policy-environment pair (x∗, σ∗) ∈ M (S × G, X) × P (S) a Markov equilibrium for the nonatomic game Γ, when (27) is true and for every y ∈ M (S × G, X),
Z S v∞(s, σ∗, x∗) · σ∗(ds) ≥ Z S v∞0 (s, σ∗, x∗, y) · σ∗(ds). (28)
Therefore, a policy will be considered part of an equilibrium when it induces an invariant environment profile under which the policy forms a best response in the long run.
Now we move on to the n-player game Γnwith the same stationary features provided by
¯
s, ¯f , and ¯φ. Given policy profile x = (x(s, g) | s ∈ S, g ∈ G) ∈ M (S × G, X), action-inducing shock vector g = (g1, ..., gn) ∈ Gn, and transition-inducing shock vector i = (i1, ..., in) ∈ In,
we define Tn(x, g, i) as the operator on Pn(S) that converts a period’s pre-action environment
into that of a next period. Thus, εs0 = Tn(x, g, i) ◦ εs is such that
s0m = ˜s(sm, x(sm, gm), εs−m,g−m · (1S, x)
−1
, im), ∀m = 1, 2, ..., n. (29)
Let vnt(s1, εs−1, x) be the total expected payoff player 1 can make from period 1 to t, when
the player’s starting state is s1 ∈ S, other players’ initial environments is describable by
their aggregate empirical state distribution εs−1 = ε(s2,...,sn), and all players adopt the policy
x ∈ M (S × G, X). From Appendix C, we can tell that the sequence {vnt(s1, εs−1, x) |
t = 0, 1, ...} has a limit point, say vn∞(s1, εs−1, x). Given policy y ∈ M (S × G, X), let
with the only difference in the very beginning that, the player is to adopt policy y. We again leave recursive relationships involving these value functions to Appendix C.
We make the following assumptions, which are t-independent versions of (S1) to (F2): (S1-s) For every µ ∈ P (S × X), the function ˜s(·, ·, µ, ·) is a member of K(S × X, I, ¯ι, S). (S2-s) Not only is there ˜s(s, x, ·, ·) ∈ K(P (S × X), I, ¯ι, S) at every (s, x) ∈ S × X, but the continuity is also achieved at a rate independent of the (s, x) present.
(F1-s) The function ˜f (s, x, µ) is continuous in (s, x).
(F2-s) The function ˜f (s, x, µ) is continuous in µ at an (s, x)-independent rate. There now follows the main result for the stationary case.
Theorem 3 Suppose (x∗, σ∗) ∈ K(S, G, ¯γ, X)×P (S) is a probabilistically continuous Markov equilibrium for the nonatomic game Γ. Then, for any > 0 and large enough n ∈ N , for any y ∈ M (S × G, X), Z Sn vn∞(s1, εs−1, x ∗ ) · (σ∗)n(ds) ≥ Z Sn vn∞0 (s1, εs−1, x ∗ , y) · (σ∗)n(ds) − .
Theorem 3 says that players in a large finite game will not regret much by keeping on adopting a stationary equilibrium for its correspondent nonatomic game. The regret is measured in an average sense, where the underlying invariant state distribution for measuring “average” is part of the NG equilibrium. So players can fare well by responding to their individual states in the same fashion all the way into the infinite future.
8
A Dynamic Pricing Game
We study a dynamic pricing game involving nonperishable items. Since the random demand arrival process is influenced by prices charged by all firms and leftover items are stored for future sales, the finite-player version of this problem is virtually intractable. The usefulness of the transient result Theorem 2 is thus at full display. To the stationary case also involving production, the stationary result Theorem 3 can further be applied.
8.1
The Setup
Here, players are firms engaged in price competition whilst items unsold in a period are put aside for future sales. For some strictly positive integer ¯s, let S = {0} ∪ {1, ..., ¯s} be the set of potential inventory levels of individual firms. The model’s action space X is the set
[x, x] ∪ {∞} of potential prices, in which x and x are two constants satisfying x > x ≥ 0. Firms with no stock left and only those firms will charge the demand-stopping price ∞. We use the discrete metric for S and the Euclidean metric for X. A distribution σ ∈ P (S) on the inventory level can be represented by a positive-valued vector (σs | s = 1, ..., ¯s), such
that each σs for s = 1, ..., ¯s represents the probability of there being s items left and the
value σ0 = 1 −Pss=1¯ σs ≥ 0 represents the probability of there being no stock.
Later, it will be clear that at any moment, all firms will charge at most ¯s different prices within [x, x], each of which corresponding to a price charged for firms with a certain inventory level. We can confine our consideration of price distributions to P0(X) ⊂ P (X), so that each
ξ ∈ P0(X) is representable by a vector (ξ1, ..., ξs¯, ξs+1¯ , ..., ξ2¯s). In it, the first half (ξ1, ..., ξs¯)
is a member of [x, x]s¯, while the second half (ξ¯s+1, ..., ξ2¯s) is positive-valued and satisfies
1 −P¯s
s=1ξs+s¯ ≥ 0. For s = 1, ..., ¯s, each ξs in the first half may be understood as the price
charged by firms with inventory level s and each ξs+s¯ in the second half the fraction of these
firms; also, the positive value ξ0 = 1 −Pss=1¯ ξ¯s+s represents the fraction of firms with the
zero inventory and hence effectively the ∞ price level.
We let the transition-inducing shock i come from I = [0, 1] and its distribution ¯ι be uniform. Let ˜λ(·, ·) be a function from [x, x] × P0(X) to [0, 1]. In a period, suppose a
particular firm charges price x ∈ [x, x], faces an outside price distribution ξ ∈ P0(X), and receives a shock i ∈ I. Then, it will experience a unit demand when i ≤ ˜λ(x, ξ) and no demand otherwise. This way, we have effectively let demand be distributed as a Bernoulli random variable with parameter ˜λ(x, ξ) when the firm charges price x under environment ξ.
For the single-period payoff ˜ft defined in (1), we let
˜
ft(s, x, µ) = ˜f (s, x, µ |X). (30)
In the above, µ |X stands for the marginal price distribution of the given joint inventory-price
distribution µ; also, ˜f : S × X × P0(X) → [0, x] satisfies
˜
f (s, x, ξ) = 1(s ≥ 1) · x · ˜λ(x, ξ), (31)
where 1(·) stands for the indicator function. This function ˜f reflects that the firm can earn x · ˜λ(x, ξ) on average when it charges the price x in a period when the outside environment is ξ and it has some inventory left. For the state-transition map ˜st as defined through (2),
we suppose
˜
where ˜s : S × X × P0(X) × I → S satisfies
˜
s(s, x, ξ, i) = 1(s ≥ 1) · [s − 1(i ≤ ˜λ(x, ξ))]. (33)
The above (33) reflects that, if a firm starts a period with inventory level s ≥ 1 and outside environment ξ, charges price x, and experiences shock i in the period, then it will secure a unit demand as long as i ≤ ˜λ(x, ξ); also, the firm’s inventory will remain empty once it reaches the bottom level zero.
8.2
Formulation of the NG
A nonatomic game is defined by an initial pre-action environment σ1 = (σ11, ..., σ1¯s) ∈ P (S).
It describes the initial inventory distribution of all firms. We shall see that only pure policies are needed for this game. So issues revolving around the space G of action-inducing shocks are mute. As the action adoptable by an out-of-stock firm has been fixed, x = (xts | t =
1, ..., ¯t, s = 1, ..., ¯s) ∈ [x, x]¯s¯t will suffice as a policy profile. Here, each x
ts is the price
prescribed to a firm with a starting inventory level s in period t.
The policy x along with the given initial environment σ1will help generate a deterministic
in-action environment trajectory µ = (µt | t = 1, ..., ¯t) ∈ (P (S × X)) ¯
t. This sequence will in
turn generate the price-distribution trajectory ξ = (ξt | t = 1, ..., ¯t) ∈ (P (X)) ¯
t where each
ξt = µt |X. Indeed, there is guarantee that ξ ∈ (P0(X))¯t. Note that in period t = 1, ..., ¯t,
the joint inventory-price distribution is decided by µt = σt· (1S, xt)−1, where here 1S has
degenerated into the identity map on S. The distribution’s price projection ξt is then
ξt = µt|X= [σt· (1S, xt)−1] |X= σt· x−1t . (34)
The above can be facilitated by ξt= (ξt1, ..., ξt¯s, ξt,¯s+1, ..., ξt,2¯s), where
ξts = xts, ξt,¯s+s= σts, ∀s = 1, ..., ¯s. (35)
So in ξt, the first half (ξt1, ..., ξt¯s) contains information on prices being charged by all firms
and the second half (ξt,¯s+1, ..., ξt,2¯s) spells out their individual fractions. From (33), we may
achieve the one-period environment transition σt+1 = Tt(xt) ◦ σt for t = 1, ..., ¯t − 1 that
corresponds to (10): (
σt+1,s = [1 − ˜λ(xts, ξt)] · σts+ ˜λ(xt,s+1, ξt) · σt,s+1, ∀s = 1, ..., ¯s − 1,
σt+1,¯s = [1 − ˜λ(xt¯s, ξt)] · σt¯s,
where ξt is certainly given by (34). The above reflects that a firm will end up with inventory
level s at the end of period t if either (i) it starts the period with inventory level s and the period’s shock does not help to unleash any demand, or (ii) the firm starts the period with inventory level s + 1 and the period’s shock helps to release one unit of demand. Under the fixed σ1 ∈ P (S), the whole process of obtaining ξ ∈ (P0(X))¯t from a given x ∈ [x, x]s¯¯t
through the iterative use of (34) and (36) may be denoted by ξ = ZED(x | σ1).
Suppose a price-distribution sequence ξ = (ξt | t = 1, ..., ¯t) ∈ (P0(X)) ¯
t is given. Let v∗ ts
be the optimal total expected payoff a firm can make from time t to ¯t when it starts with inventory level s = 0, 1, ..., ¯s. As a terminal condition, we have
v∗¯t+1,s = 0, ∀s = 0, 1, ..., ¯s. (37)
For t = ¯t, ¯t − 1, ..., 1, we have, due to (31) and (33), v∗t0= 0 and for s = 1, ..., ¯s,
vts∗ = sup
y∈[x,x]
{˜λ(y, ξt) · (y + vt+1,s−1∗ ) + [1 − ˜λ(y, ξt)] · v∗t+1,s}. (38)
Solving (38) will help us obtain a policy x that corresponds to the given ξ. Here, the impact of an individual firm’s decision on the environment’s future evolution is ignorable; however, the future evolution of the current firm’s inventory level is part of the decision. We may denote the solution of this dynamic programming by x = ZDE(ξ).
8.3
Analysis and Application of Theorem 2
We do not have to follow the definition outlined from (15) to (18) to find a Markov equilib-rium. Instead, as done in the above, we can first separately treat the decision-to-environment and environment-to-decision processes, and then fuse them together. It is easy to see that a fixed point for the operator ZDD(σ1) = ZDE ◦ ZED(σ1) is a Markov equilibrium in the sense
of (19) and hence (18). Under reasonable conditions on the arrival-rate function ˜λ(·, ·), we can establish the existence of equilibria for the nonatomic dynamic pricing game. Moreover, the decrease of price level xts in inventory level s will be shown to come naturally to this
problem. Hence, we are prompted to define ¯∆ ⊂ [x, x]s¯, so that
¯
∆ = {(x1, ..., xs¯) ∈ [x, x]s¯| x1 ≥ x2 ≥ · · · ≥ x¯s}. (39)
Rather than [x, x]s¯¯t, we need only to consider pricing policies within ¯∆¯t. Details have been
Proposition 5 Under a given σ1 ∈ P0(S), there exists a fixed point x∗(σ1) ∈ ¯∆¯t for the
operator ZDD(σ1).
Further analysis in Appendix D.2 will confirm the applicability of our transient result. Proposition 6 Assumptions (S1), (S2), (F1), and (F2) are all satisfied. Therefore, we can apply Theorem 2 to the NG equilibrium obtained in Proposition 5.
Thus, when their number is big enough, firms engaged in the dynamic pricing competition do not have to fret about their inabilities to observe inventory levels of other firms. They can all adopt the same pricing policy as implied by an observation-blind equilibrium of the NG counterpart, and still expect to receive a total payoff that cannot on average be much improved by any one-time unilateral deviation. Note that the NG equilibrium may in turn be obtained from solving a fixed point problem, which in practice, can most likely be realized through an iterative procedure.
For an example whose legitimacy is confirmed in Appendix D.3, consider the following. Suppose x > 0. For some constants a > 1/x and b, c ≥ 0, we may suppose
˜ λ(x, ξ) = exp[a · (x− x) + b · ¯ s X s=1 ξ¯s+s· (ξs− (1 + c) · x)]. (40)
This way, when a firm’s own price is at its lowest level x and all other firms have sold out, rendering all finite-price probabilities ξs+1¯ = ξs+2¯ = · · · = ξ2¯s= 0, the arrival rate will be at
its maximum value of 1. Other times, the arrival rate is decreasing in the firm’s own price and increasing in a stochastic sense in other firms’ prices.
8.4
Production and Application of Theorem 3
We now add production to the pricing game and study its stationary version. At the cost of ¯
k, we suppose a firm’s inventory will be brought back to the full ¯s level in one period’s time when it drops down to 0. That is, the production lead time is one period and every firm exercises an (s∗, S∗) production policy with s∗ = 0 and S∗ = ¯s. The situation with s∗ > 0 can be similarly handled. Like before, we use S = {0} ∪ {1, ..., ¯s} as the set of potential inventory levels and X = [x, x] ∪ {∞} as the set of pricing actions. We adhere with the definition for P0(X) ⊂ P (X), the shock space I = [0, 1], the uniform distribution ¯ι, and the arrival rate ˜λ(·, ·) as a function from [x, x] × P0(X) to [0, 1]. We now assume a per-period
For single-period payoff, we let ˜f : S × X × P0(X) → [0, x] satisfy
˜
f (s, x, ξ) = 1(s ≥ 1) · x · ˜λ(x, ξ) − ¯h · s − ¯k · 1(s = 0). (41) The first term in the function remains the same as (31). The term ¯h · s reflects the inventory holding cost and the term ¯k · 1(s = 0) reflects the total production cost needed to bring the inventory level back to ¯s from 0. For convenience, we have let the inventory holding cost be assessed on the pre-selling inventory level. For state transition, we let ˜s : S×X ×P0(X)×I → S satisfy
˜
s(s, x, ξ, i) = 1(s ≥ 1) · [s − 1(i ≤ ˜λ(x, ξ))] + 1(s = 0) · ¯s. (42) The first term means the same as in (33). The new term 1(s = 0) · ¯s reflects the locked-in production policy of bringing the inventory level back to full in one period’s time.
Now define
v = 1
1 − ¯φ · [¯k ∨ (¯h · ¯s) ∨y∈[x,x], ξ∈Psup 0(X)
{y · ˜λ(y, ξ)}]. (43)
Using analysis detailed in Appendix D.4, we show that an equilibrium for the pricing game with production can be achieved in the form of a fixed point (x∗, ξ∗, σ∗, v∗) ∈ [x, x]¯s×P0(X)×
P (S) × [−v, v]¯s+1 to an operator ZDEF VDEF V.
Proposition 7 There exists a fixed point (x∗, ξ∗, σ∗, v∗) ∈ [x, x]s¯×P0(X)×P (S)×[−v, v]¯s+1
for the operator ZDEF VDEF V.
When production is allowed, we can learn from Proposition 7 about the existence of an NG equilibrium that prescribes for firms the dynamic pricing policy x∗. The latter would help the firms fare well in the long run under the environment σ∗ which is consistent with all these firms adopting the policy x∗. Due to the presence of production, we can no longer claim that lower inventory levels lead to higher prices. Here, unlike in Section 8, the equilibrium pricing policy x∗ is not known to come from ¯∆.
From more analysis done in Appendix D.5, we know that the stationary theory can be applied to the current situation.
Proposition 8 Assumptions (S1-s), (S2-s), (F1-s), and (F2-s) are all satisfied. Therefore, we can apply Theorem 3 to the NG equilibrium obtained in Proposition 7.
Proposition 8 further says that, when their number is finite but large enough, firms can still use the same pricing policy x∗ to fare relatively well in the long run in an average sense that is defined through the inventory distribution σ∗.
9
Concluding Remarks
Using the weak LLN for empirical distributions, we have established links between multi-period Markovian games and their NG counterparts. In essence, the evolution of player-state distributions in large finite games, though random, resembles in probability the deterministic pathway taken by their NG brethren. This allows NG equilibria to be well adapted to large finite games. In our derivation, transient results played pivotal roles in forming comparable conclusions for stationary systems. Our results have been successfully brought to bear on dynamic pricing situations with their own practical significances.
Still, many dynamic competitive situations are better described by continuous-time mod-els, the probe of which requires vastly different techniques. For one thing, the mathematical induction approach we have taken to deal with multiple periods would not seem to go well with a discrete-time approximation of a continuous-time model. In the latter model, even to identify the environment induced by all players adopting a common policy might involve solving a fixed point problem. Therefore, this problem poses serious challenges and hence forms a potent future research area.
Appendices
A
Technical Developments in Section 5
Given separable metric space A, the Prohorov metric ρA is such that, for any distributions
π, π0 ∈ P (A),
ρA(π, π0) = inf( > 0 | π0((A0)) + ≥ π(A0), for all A0 ∈ B(A)), (A.1)
where
(A0)= {a ∈ A | dA(a, a0) < for some a0 ∈ A0}. (A.2)
The metric ρA is known to generate the weak topology for P (A).
Proof of Proposition 1: We prove (I) first. Let > 0 be given. By the hypothesis, we know there is δ > 0, so that for any a, a0 ∈ A with dA(a, a0) < δ, for the set
we have
πB(B0) > 1 − . (A.4)
Now consider any C0 ∈ B(C). The key observation from (A.3) is that
(y(a, ·))−1(C0) ∩ B0 ⊂ (y(a0, ·))−1((C0)). (A.5) Then,
θ(a|C0) = πB((y(a, ·))−1(C0) ∩ B0) + πB((y(a, ·))−1(C0) ∩ (B \ B0))
≤ πB((y(a0, ·))−1((C0))) + πB(B \ B0) < θ(a0|(C0)) + ,
(A.6) where the only equality is an identity, the first inequality is due to (A.5), and the second inequality is due to (A.4). So by (A.1), we have ρC(θ(a), θ(a0)) < .
We prove (II) next. By y’s definition, we have
b ∈ [0, θ(a|(−∞, c])] if and only if y(a, b) ≤ c. (A.7)
Thus,
πB({b ∈ B|y(a, b) ≤ c}) = πB([0, θ(a|(−∞, c])]) = θ(a|(−∞, c]), (A.8)
where the first equality is from (A.7) and the second one is from the uniform nature of πB.
So indeed πB· (y(a, ·))−1 is θ(a).
Let sequence anconverge to a ∈ A. By θ(·)’s continuity at a, we know that πB·(y(an, ·))−1
converges to πB · (y(a, ·))−1 under the ρC-metric. Since B is [0, 1], πB is uniform, and C
is complete, we know from Skorohod’s representation theorem, that there exist functions zn ∈ M (B, C) and z ∈ M (B, C) with πB · zn−1 = πB · (y(a, ·))−1 and z−1 = πB · (y(a, ·))−1
such that zn converges to z almost surely.
According to Jouini, Schachermayer, and Touzi [15] (Lemma A.4), there is a one-to-one measure-preserving mapping e : B → B so that y(a, b) = z(e(b)). Note zn(e(·)) converges
to z(e(·)) = y(a, ·) almost surely as well. Without loss of generality, we suppose e is the identity mapping on B. This entails that, zn(·), which has the same distribution as y(an, ·),
converges to the monotone z(·) = y(a, ·).
Let > 0 be given. Because y(a, ·) is continuous on the compact B = [0, 1] at an a-independent rate, there exists δ ∈ (0, ], so that for b, b0 ∈ B with | b − b0 |< δ and any
n ∈ N , one has
| y(an, b) − y(an, b0) |<
2. (A.9)
Due to zn’s almost sure convergence to z, we also know that, for n large enough,
where
Bn0 = {b ∈ B| | zn(b) − z(b) |<
2}. (A.11)
There is a measure-preserving map fn on Bn0 so that zn(fn(·)) is monotone on Bn0. Since
z(·) = y(a, ·) is monotone, one key observation from (A.11) is that | zn(fn(b)) − z(b) |<
2, ∀b ∈ B
0
n. (A.12)
Since zn(·) has the same distribution as y(an, ·), there is a measure-preserving map gn on
B so that zn(gn(·)) = y(an, ·). Due to y(an, ·)’s monotonicity, zn(gn(·)) is monotone on the
entire B.
By the monotonicity of zn(fn(·)) on B0n, that of zn(gn(·)) on B, and (A.10), we reach
another key observation that
| fn(b) − gn(b) |< δ, ∀b ∈ Bn0. (A.13)
Now for b ∈ Bn0,
| y(an, b)−y(a, b) |=| zn(gn(b))−z(b) |≤| zn(gn(b))−zn(fn(b)) | + | zn(fn(b))−z(b) | . (A.14)
By (A.9) and (A.13), the first term on the right-hand side is below /2; whereas, by (A.12), the second term on the right-hand side is below /2 as well. It is now clear that y(an, ·)
converges to y(a, ·) in probability.
According to Parthasarathy [25] (Theorem II.7.1), the strong LLN applies to the empirical distribution under the weak topology, and hence under the Prohorov metric. In the following, we state its weak version.
Lemma 1 Given separable metric spaces A and B, suppose distribution πA ∈ P (A) and
measurable mapping y ∈ M (A, B). Then, for any > 0, as long as n is large enough, (πA)n({a = (a1, ..., an) ∈ An | ρB(εa· y−1, π · y−1) < }) > 1 − .
For separable metric space A, point a ∈ A, and the (n − 1)-point empirical distribution space π ∈ Pn−1(A), we use (a, π)n to represent the member of Pn(A) that has an additional
1/n weight on the point a, but with probability masses in π being reduced to (n − 1)/n times of their original values. For a ∈ An and m = 1, ..., n, we have (a
m, εa−m)n= εa. Concerning
Lemma 2 Let A be a separable metric space. Then, for any n = 2, 3, ..., a ∈ A, and π ∈ Pn−1(A),
ρA((a, π)n, π) ≤
1 n. Proof: Let A0 ∈ B(A) be chosen. If a /∈ A0, then
(a, π)n(A0) ≤ π(A0) ≤ (a, π)n(A0) +
1 n; (A.15) if a ∈ A0, then (a, π)n(A0) − 1 n ≤ π(A 0 ) ≤ (a, π)n(A0). (A.16)
Hence, it is always true that
| (a, π)n(A0) − π(A0) |≤
1
n. (A.17)
In view of (A.1) and (A.2), we have
ρA((a, π)n, π) ≤
1
n. (A.18)
We have thus completed the proof.
The following result is important for showing the near-trajectory evolution of aggregate environments in large multi-period games.
Lemma 3 Given separable metric space A, as well as complete separable metric spaces B and C, suppose yn∈ M (An, Bn) for every n ∈ N , πA∈ P (A), πB ∈ P (B), and πC ∈ P (C).
If
(πA)n({a ∈ An| ρB(εyn(a), πB) < }) > 1 − ,
for any > 0 and any n large enough, then
(πA× πC)n({(a, c) ∈ (A × C)n| ρB×C(εyn(a),c, πB× πC) < }) > 1 − ,
for any > 0 and any n large enough.
Proof: Suppose sequence {πB10 , πB20 , ...} weakly converges to the given probability measure πB, and sequence {πC10 , π
0
C2, ...} weakly converges to the given probability measure πC. We
are to show that the sequence {π0B1× π0 C1, π
0 B2× π
0
Let F (B) denote the family of uniformly continuous real-valued functions on B with bounded support. Let F (C) be similarly defined for C. We certainly have
( limk→+∞ R Bf (b) · π 0 Bk(db) = R Bf (b) · πB(db), ∀f ∈ F (B), limk→+∞ R Cf (c) · π 0 Ck(dc) = R Cf (c) · πC(dc), ∀f ∈ F (C). (A.19) Define F so that F = {f | f (b, c) = fB(b) · fC(c) for any (b, c) ∈ B × C, where fB ∈ F (B) ∪ {1} and fC ∈ F (C) ∪ {1}}, (A.20)
where 1 stands for the function whose value is 1 everywhere. (A.19) and (A.20) apparently lead to lim k→+∞ Z B×C f (b, c) · (π0Bk× π0Ck)(d(b, c)) = Z B×C f (b, c) · (πB× πC)(d(b, c)). (A.21)
According to Ethier and Kurtz [8] (Proposition III.4.4), F (B) and F (C) happen to be P (B) and P (C)’s convergence determining families, respectively. As B and C are com-plete, Ethier and Kurtz ([8], Proposition III.4.6, whose proof involves Prohorov’s Theorem, i.e., the equivalence between tightness and relative compactness of a collection of probabil-ity measures defined for complete separable metric spaces) further says that F as defined through (A.20) is convergence determining for P (B × C). Therefore, we have the desired weak convergence by (A.21).
Let > 0 be given. In view of the above product-measure convergence and the equivalence between the weak topology and that induced by the Prohorov metric, there must be δB > 0
and δC > 0, such that ρB(π0B, πB) < δB and ρC(πC0 , πC) < δC will imply
(ρB× ρC)(πB0 × π 0
C, πB× πC) < . (A.22)
By (A.1) and the given hypothesis, there is ¯n1 ∈ N , so that for n = ¯n1, ¯n1+ 1, ...,
(πA)n( ˜An) > 1 −
2, (A.23)
where ˜An contains all a ∈ An such that
ρB(εyn(a), πB) < δB. (A.24)
By (A.1) and Lemma 1, on the other hand, there is ¯n2 ∈ N , so that for n = ¯n2, ¯n2+ 1, ..., (πC)n( ˜Cn) > 1 −
where ˜Cn contains all c ∈ Cn such that
ρC(εc, πC) < δC. (A.26)
For any n = ¯n1∨ ¯n2, ¯n1∨ ¯n2+ 1, ..., let (a, c) be an arbitrary member of ˜A
n× ˜Cn. We have
from (A.22), (A.24), and (A.26) that,
(ρB× ρC)(εyn(a),c, πB× πC) < . (A.27)
Noting the facilitating (a, c) is but an arbitrary member of ˜An× ˜Cn, we see that
(πA× πC)n({(a, c) ∈ (A × C)n| ρB×C(εyn(a),c, πB× πC) < })
≥ (πA)n( ˜An) × (πC)n( ˜Cn),
(A.28)
which by (A.23) and (A.25), is greater than 1 − .
Because the equivalence between tightness and relative compactness of a collection of probability measures is indirectly related to the proof of Lemma 3, we require B and C to be complete separable metric spaces.
Lemma 4 Given separable metric spaces A, B, C, and D, as well as distributions πA ∈
P (A), πB ∈ P (B), and πC ∈ P (C), suppose yn ∈ M (An, Bn) for every n ∈ N and z ∈
K(B, C, πC, D). If
(πA× πC)n({a ∈ An, c ∈ Cn | ρB×C(εyn(a),c, πB× πC) < }) > 1 − ,
for any > 0 and any n large enough, then (πA× πC)n({a ∈ An, c ∈ Cn | ρD(εyn(a),c· z
−1
, (πB× πC) · z−1) < }) > 1 − ,
for any > 0 and any n large enough.
Proof: Let > 0 be given. Since z ∈ K(B, C, πC, D), there exist C0 ∈ B(C) satisfying
πC(C0) > 1 −
2, (A.29)
as well as
δ ∈ (0, /2], (A.30)
such that for any b, b0 ∈ B satisfying dB(b, b0) < δ and any c ∈ C0,
For any subset D0 in B(D), we therefore have
(z−1(D0))δ∩ (B × C0) ⊂ z−1((D0)). (A.32) This leads to (z−1(D0))δ\ (B × (C \ C0)) ⊂ z−1((D0)), and hence due to (A.29),
(πB× πC)(z−1((D0))) ≥ (πB× πC)((z−1(D0))δ) −
2. (A.33)
On the other hand, by the hypothesis, we know for n large enough,
(πA× πC)n(En0) > 1 − δ, (A.34)
where
En0 = {a ∈ An, c ∈ Cn| ρB×C(εyn(a),c, πB× πC) < δ} ∈ B
n(A × C). (A.35)
By (A.35), for any (a, b) ∈ En0 and F0 ∈ B(B × C), (πB× πC)((F0)δ) ≥ εyn(a),c(F
0) − δ. (A.36)
Combining the above, we have, for any (a, c) ∈ En0 and D0 ∈ B(D), [(πB× πC) · z−1]((D0)) = (πB× πC)(z−1((D0))) ≥ (πB× πC)((z−1(D0))δ) − /2 ≥ εyn(a),c(z −1(D0)) − δ − /2 ≥ εyn(a),c(z −1(D0)) − = (ε yn(a),c· z −1)(D0) − . (A.37)
where the first inequality is due to (A.33), the second inequality is due to (A.36), and the third inequality is due to (A.30). That is, we have
ρD(εyn(a),c· z
−1
, (πB× πC) · z−1) ≤ , ∀(a, c) ∈ En0. (A.38)
In view of (A.30) and (A.34), we have the desired result.
We can now prove Proposition 2 and then Theorem 1.
Proof of Proposition 2: Let t = 1, ..., ¯t − 1 and x ∈ K(S, G, ¯γ, X) be given. Define map z ∈ M (S × G × I, S), so that
z(s, g, i) = ˜st(s, x(s, g), (σ × ¯γ) · (1S, x)−1, i), ∀s ∈ S, g ∈ G, i ∈ I. (A.39)
In view of (10) and (A.39), we have, for any S0 ∈ B(S), [Tt(x) ◦ σ](S0) = R I R Gσ({s ∈ S | z(s, g, i) ∈ S 0}) · ¯γ(dg) · ¯ι(di) = (σ × ¯γ × ¯ι)({(s, g, i) ∈ S × G × I | z(s, g, i) ∈ S0}) = (σ × ¯γ × ¯ι)(z−1(S0)). (A.40)
For n ∈ N , g = (g1, ..., gn) ∈ Gn, and i = (i1, ..., in) ∈ In, also define operator Tn0(g, i) on
Pn(S) so that Tn0(g, i) ◦ 1S = εs0, where for m = 1, 2, ..., n,
s0m = z(sm, gm, im) = ˜st(sm, x(sm, gm), (σ × ¯γ) · (1S, x)−1, im). (A.41)
It is worth noting that (A.41) is different from the earlier (13). In view of (A.39) and (A.41), we have, for S0 ∈ B(S), [Tn0(g, i) ◦ 1S](S0) = 1 n · n X m=1 1(z(sm, gm, im) ∈ S0) = ε((s1,g1,i1),...,(sn,gn,in))(z −1 (S0)). (A.42)
Combining (A.40) and (A.42), we arrive to a key observation that
Tt(x) ◦ σ = (σ × ¯γ × ¯ι) · z−1, while Tn0(g, i) ◦ 1S = εs,g,i· z−1. (A.43)
In the rest of the proof, we first show the asymptotic closeness between Tt(x)◦σ and Tn0(g, i)◦
εsn(a), and then that between the latter and Tnt(x, g, i) ◦ εsn(a).
First, due to the hypothesis on the convergence of εsn(a) to σ and Lemma 3, we have
(π × ¯γ × ¯ι)n({(a, g, i) ∈ (A × G × I)n| ρS×G×I(εsn(a),g,i, σ × ¯γ × ¯ι) <
0}) > 1 − 0
, (A.44) for any 0 > 0 and any n large enough. By (S1) and the fact that x ∈ K(S, G, ¯γ, X), we may see that z as defined through (A.39) is a member of K(S, G × I, ¯γ × ¯ι, S). By Lemma 4, this fact along with (A.44) will lead to
(π × ¯γ ׯι)n({(a, g, i) ∈ (A×G×I)n| ρS(εsn(a),g,i·z
−1
, (σ × ¯γ ׯι)·z−1) < 0}) > 1−0, (A.45) for any 0 > 0 and any n large enough. By (A.43), this is equivalent to that, given > 0, there exists ¯n1 ∈ N so that for any n = ¯n1, ¯n1+ 1, ...,
(π × ¯γ × ¯ι)n( ˜An()) > 1 −
2, (A.46)
where ˜
An() = {(a, g, i) ∈ (A×G×I)n| ρS(Tt(x)◦σ, Tn0(g, i)◦εsn(a)) <
2} ∈ B
n(A×G×I). (A.47)
Next, note that the only difference between Tnt(x, g, i) ◦ εsn(a) and T
0
n(g, i) ◦ εsn(a) lies
in that εsn,−m(a),g−m is used in the former as in (13) whereas σ × ¯γ is used in the latter as
in (A.41). Here, sn,−m(a) refers to the vector (sn1(a), ..., sn,m−1(a), sn,m+1(a), ..., snn(a)). By
(S2), there is δ ∈ (0, /4] and I0 ∈ B(I) with ¯
ι(I0) > 1 −