1.3 Roadmap and Contributions
1.3.1 Rational behavior models
Rational algorithms
The main goal of the rational algorithms thrust is to develop algorithms where agents compute stage BNE strategies and propagate beliefs in BNG. In this thrust, we look at a specific class of Bayesian network games which are called Gaussian quadratic network games. In this class, at the start of the game each agent makes a private observation of the unknown parameter corrupted by additive Gaussian noise. In addition, the payoffs of individuals are represented by a utility function that is
a∗N i,t P Ksi P Ei,t+1[s] vi,t a∗i,t z−1 Ei,t[s] −Hi,tT {vT j,tLj,t}j∈Ni −Ei,t[a∗Ni,t] Mi ss(t) {kj,t}j∈N {Lj,t}j∈N
Figure 1.3: Quadratic Network Game (QNG) filter. Agents run the QNG filter to compute BNE actions in games with quadrate payoffs and Gaussian private signals.
quadratic in the actions of all agents and an unknown state of the world. That is, at any time t, selection of actions{ai :=ai,t ∈R}i∈N when the state of the world is
θ ∈Rresults in agent i receiving a payoff,
ui(ai, a−i, θ) = − 1 2 X j∈N a2j + X j∈N \i βijaiaj + δaiθ (1.17)
where βij and δ are real valued constants. The constant βij measures the effect of
j’s action on i’s utility. For convenience we let βii = 0 for all i ∈ N. Other terms
that depend onaj for j ∈ N \i or θ can be added.
The rational behavior requires a delicate consistency of rationality among the individuals, that is, the model that an individual has on the society is correct, and moreover the model that the society has on the individual itself is correct. That is, the concern is whether the decision-makers have the required profound level of un- derstanding to optimize their behavior with respect to their anticipation of behavior of others or not. This constitutes an evaluation of expectation of behavior of all the other individuals of the society with respect to all possible societies given local information as per (1.8) or (1.9). The evaluation of expectation requires a high level
of astuteness as one has to consider the society not only from its viewpoint but also from the viewpoint of all the other individuals. In particular, given the uncertainty that one has over the information of others, it needs to think what are the possible societies that the other individual is considering as demonstrated in the example in Section 1.2.1. Our goal in the specification to the Gaussian quadratic network games is to use the linearity enabled by Gaussian expectations and quadratic payoffs to overcome the burden of computing equilibrium behavior. We detail the derivation and specifics of the algorithm in Chapter 2. Below we provide an intuition.
To determine a mechanism to calculate equilibrium actions we introduce an out- side clairvoyant observer that knows all private observations. For this clairvoyant observer the trajectory of the game is completely determined but individual agents operate by forming a belief on the private signals of other agents. We start from the assumption that this probability distribution is normal with an expectation that, from the perspective of the outside observer, can be written as a linear combination of the actual private signals. If such is the case we can prove that there exists a set of linear equations that can be solved to obtain actions that are linear combinations of estimates of private signals. This result is then used to show that after observ- ing the actions of their respective adjacent peers the beliefs on private signals of all agents remain Gaussian with expectations that are still linear combinations of the actual private signals. We can then proceed to close a complete induction loop to de- rive a recursive expression that the outside clairvoyant observer can use to compute BNE actions for all game stages. We leverage this recursion to derive the Quadratic Network Game (QNG) filter that agents can run locally, i.e., without access to all private signals, to compute their equilibrium actions. A schematic representation of the QNG filter is shown in Fig. 1.3 to emphasize the parallelism with the Kalman filter. The difference is in the computation of the filter coefficients which require
solving a system of linear equations that incorporates the belief on the actions of others.
Asymptotic analysis
In this thrust, our goal is to answer the question ‘what is the eventual outcome of MPBE behavior in networked interactions?’. As per the interactive decision-
making environment model presented above, individuals receive private signals si,t
and exchange messages mi,t. In addition, they use this information to better infer
about the actions of others and the unknown state parameters. Since individuals are all rational, how others process information is known. We can then interpret an individual’s goal as the eventual learning of peers’ information, that is, agents play against uncertainty. Then one important question that pertains to the eventual outcome of the game, that is, we ask whether this information is learned or not. The answer to this question depends on what messages are exchanged among individuals and the type of the game, i.e., the payoffs. For instance, in the simple example considered in Section 1.2.1, where agents only observe the actions of their neighbors, i.e., mi,t = ai,t, and the payoff is given by (1.10), agents eventually correctly learn
each other’s action and play a consensus action while they do not necessarily have the same estimate of the state θ.
Our focus in this thrust is on the class of games that are symmetric and strictly supermodular games. In supermodular games, agents’ actions are strategically com- plementary, that is, they have the incentive to increase their actions upon observing increase in others’ actions. For a twice differentiable utility function ui(ai, a−i, θ),
this is equivalent to requiring that ∂2u
i/∂ai∂aj > 0 for i, j. Supermodular games
are suitable models for modeling coordinated movement toward a target among a team of autonomous robots or power control in wireless networks – see Chapter 4
for more examples. We remark that the target coverage example in Section 1.1 is not a supermodular game. As a matter of fact, agents’ actions are strategic sub- stitutable, that is, a choice of one target by an agent decreases another’s chance to pick the same target. We assume agents only observe actions of their neighbors,
mi,t =ai,t. Our analysis shows that rational behavior yields asymptotic convergence
in actions for all agents to the same value given connected network. This consen- sus implies that agents’ eventual payoffs are identical. Our analysis leverages the rational behavior definition (Definition 1.1) to first prove that each agent’s action asymptotically converges to an action and then argue that this action cannot be dif- ferent than others using the definition of supermodularity. This result suggests that in a coordination game – where agents interests are aligned – repeated interactions between autonomous agents who are selfish and myopic could eventually lead them to coordinate on the same action. We provide the details in Chapter 4 and discuss further implications of these results.