Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

(1)

INFORMS is located in Maryland, USA

Mathematics of Operations Research

Publication details, including instructions for authors and subscription information:

http://pubsonline.informs.org

Partially Observable Total-Cost Markov Decision Processes

with Weakly Continuous Transition Probabilities

Eugene A. Feinberg, Pavlo O. Kasyanov, Michael Z. Zgurovsky

To cite this article:

Eugene A. Feinberg, Pavlo O. Kasyanov, Michael Z. Zgurovsky (2016) Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities. Mathematics of Operations Research 41(2):656-681. http:// dx.doi.org/10.1287/moor.2015.0746

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval, unless otherwise noted. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service.

Please scroll down for article—it is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, management science, and analytics.

(2)

Vol. 41, No. 2, May 2016, pp. 656–681

ISSN 0364-765X (print) ISSN 1526-5471 (online)

Partially Observable Total-Cost Markov Decision Processes with

Weakly Continuous Transition Probabilities

Eugene A. Feinberg

Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York 11794, [email protected]

Pavlo O. Kasyanov

Institute for Applied System Analysis, National Technical University of Ukraine “Kyiv Polytechnic Institute,” 03056, Kyiv, Ukraine,[email protected]

Michael Z. Zgurovsky

National Technical University of Ukraine “Kyiv Polytechnic Institute,” 03056, Kyiv, Ukraine, [email protected]

This paper describes sufficient conditions for the existence of optimal policies for partially observable Markov decision processes (POMDPs) with Borel state, observation, and action sets, when the goal is to minimize the expected total costs over finite or infinite horizons. For infinite-horizon problems, one-step costs are either discounted or assumed to be nonnegative. Action sets may be noncompact and one-step cost functions may be unbounded. The introduced conditions are also sufficient for the validity of optimality equations, semicontinuity of value functions, and convergence of value iterations to optimal values. Since POMDPs can be reduced to completely observable Markov decision processes (COMDPs), whose states are posterior state distributions, this paper focuses on the validity of the above-mentioned optimality properties for COMDPs. The central question is whether the transition probabilities for the COMDP are weakly continuous. We introduce sufficient conditions for this and show that the transition probabilities for a COMDP are weakly continuous, if transition probabilities of the underlying Markov decision process are weakly continuous and observation probabilities for the POMDP are continuous in total variation. Moreover, the continuity in total variation of the observation probabilities cannot be weakened to setwise continuity. The results are illustrated with counterexamples and examples.

Keywords : partially observable Markov decision processes; total cost; optimality inequality; optimal policy MSC2000 subject classification : Primary: 90C40; secondary: 90C39

OR/MS subject classification : Primary: dynamic programming/optimal control; secondary: Markov, infinite state History : Received January 10, 2014; revised May 1, 2015. Published online in Articles in Advance January 22, 2016.

1. Introduction. Partially observable Markov decision processes (POMDPs) play an important role in operations research, electrical engineering, and computer science. They have a broad range of applications to various areas including sensor networks, artificial intelligence, target tracking, control and maintenance of complex systems, finance, and medical decision making. In principle, it is known how to solve POMDPs. A POMDP can be reduced to a completely observable Markov decision process (COMDP), which is a fully observable Markov decision process (MDP) whose states are belief (posterior state) probabilities for the POMDP; see Hinderer [23, §7.1] and Sawarigi and Yoshikawa [29] for countable state spaces and Rhenius [26], Yushkevich [36], Dynkin and Yushkevich [12, Chapter 8], Bertsekas and Shreve [8, Chapter 10], and Hernández-Lerma [20, Chapter 4] for Borel state spaces. After an optimal policy for the COMDP is found, it can be used to compute an optimal policy for the POMDP. However, except for finite state and action POMDPs (Sondik [33]), problems with a continuous filtering transition probability H described in Equation (6) (Hernández-Lerma [20, Chapter 4], Hernández-Lerma and Romera [22]), and a large variety of particular problems considered in the literature, little is known regarding the existence and properties of optimal policies for COMDPs and POMDPs.

This paper investigates the existence of optimal policies for COMDPs and therefore for POMDPs with the expected total discounted costs and, if the one-step costs are nonnegative, with the expected total costs. We provide conditions for the existence of optimal policies and for the validity of other properties of optimal values and optimal policies: they satisfy optimality equations, optimal values are lower semicontinuous functions, and value iterations converge to optimal infinite-horizon values.

Since a COMDP is an MDP with Borel state and action sets, it is natural to apply results on the existence of optimal policies for MDPs to COMDPs. Feinberg et al. [14] introduced a mild assumption, called Assumption(W∗₎_,

for the existence of stationary optimal policies for infinite-horizon MDPs, lower semicontinuity of value functions, characterization of the sets of optimal actions via optimality equations, and convergence of value iterations to optimal values for the expected total discounted costs, if one-step costs are bounded below, and for the expected total costs, if the one-step costs are nonnegative (according to the main result in Feinberg et al. [14], if another mild assumption is added to Assumption(W∗₎_{), then there exist stationary optimal policies for average costs per}

656

(3)

unit time). Assumption(W∗₎_{consists of two conditions: transition probabilities are weakly continuous and one-step}

cost functions are-inf-compact. The notion of -inf-compactness (see the definition below) was introduced in Feinberg et al. [15], and it is slightly stronger than the lower semicontinuity of the cost function and its inf-compactness in the action parameter. In operations research applications, one-step cost functions are usually -inf-compact.

Here we consider a POMDP whose underlying MDP satisfies Assumption(W∗₎_{. According to Theorem}_3.3_{, this}

implies-inf-compactness of the cost function for the COMDP. Theorem3.6states that weak continuity of transition probabilities and continuity of observation probabilities in total variation imply weak continuity of transi-tion probabilities for the COMDP. Thus, Assumptransi-tion(W∗₎_{for the underlying MDP and continuity of observation}

probabilities in total variation imply that the COMDP satisfies Assumption(W∗₎_{and therefore optimal policies}

exist for the COMDP and for the POMDP, value iterations converge to the optimal value, and other optimality properties hold; see Theorem3.5. Example4.1demonstrates that continuity of observation probabilities in total variation cannot be relaxed to setwise continuity.

For problems with incomplete information, the filtering equation z_t+1= H 4z_t1 a_t1 y_t+15 presented in Equation (7), that links the posterior state probabilities zt1 zt+11 the selected action at, and the observation yt+11 plays an

important role. This equation presents a general form of Bayes’s rule. Hernández-Lerma [20, Chapter 4] showed that the weak continuity of the stochastic kernel H in all three variables and weak continuity of transition and observation probabilities imply weak continuity of transition probabilities for the COMDP. In this paper we introduce another condition, Assumption(H), which is weaker than the weak continuity of the filtering kernel H in 4z_t1 a_t1 y_t+15. We prove that Assumption (H)and setwise continuity of the stochastic kernel on the observation set, given a posterior state probability and prior action, imply weak continuity of the transition probability for the COMDP; see Theorem3.4. Furthermore, weak continuity of transition probabilities and continuity of the observation kernel in total variation imply Assumption(H)and setwise continuity of the stochastic kernel described in the previous sentence; see Theorem3.6. In particular, if either of these two assumptions or the weak continuity of H and observation probabilities are added to Assumption(W∗₎_{for the underlying MDP of the POMDP,}

the COMDP satisfies Assumption(W∗₎_{and therefore various optimality properties, including the existence of}

stationary optimal policies and convergence of value iterations; see Theorem3.2.

If the observation set is countable and it is endowed with the discrete topology, convergence in total variation and weak convergence are equivalent. Thus, Theorem 3.6implies weak continuity of the transition probability for the COMDP with a countable observation set endowed with the discrete topology and with weakly continuous transition and observation kernels; see Hernández-Lerma [20, p. 93]. However, as Example4.2demonstrates, under these conditions the filtering transition probability H may not be continuous. In other words, the statement in Hernández-Lerma [20, p. 93], that continuity of transition and observation kernels imply weak continuity of H 1 if the observation set is countable and endowed with discrete topology, is incorrect. Example4.2motivated us to introduce Assumption(H).

The main results of this paper are presented in §3. Section4contains three counterexamples. In addition to the two described examples, Example4.3demonstrates that setwise continuity of the stochastic kernel on the observation set, given a posterior state probability and prior action, is essential to ensure that Assumption(H) implies continuity of the transition probability for the COMDP. Section5describes properties of stochastic kernels used in the proofs of the main results presented in §6. Section7introduces a sufficient condition for the weak continuity of transition probabilities for the COMDP that combines Assumption(H)and the weak continuity of H . Combining these properties together is important because Assumption(H)may hold for some observations and weak continuity of H may hold for others. Section8contains three illustrative examples: (i) a model defined by stochastic equations including Kalman’s filter; (ii) a model for inventory control with incomplete records (for particular inventory control problems of such type see Bensoussan et al. [4,5,6,7] and references therein); and (iii) the classic Markov decision model with incomplete information studied by Aoki [1], Dynkin [11], Shiryaev [31], Hinderer [23, §7.1], Sawarigi and Yoshikawa [29], Rhenius [26], Yushkevich [36], and Dynkin and Yushkevich [12, Chapter 8], for which we provide a sufficient condition for the existence of optimal policies, convergence of value iterations to optimal values, and other optimality properties formulated in Theorems3.1,3.2, and3.5.

Some of the results of §§5and 8.3 have been further developed in Feinberg et al. [17,18]. In particular, new results on convergence of probability measures and on continuity of stochastic kernels are described in Feinberg et al. [17, §§3, 4], and they are used to generalize in Feinberg et al. [17, Theorem 6.2] the main result of §8.3, Theorem6.2, that states sufficient conditions for optimality for Markov decision models with incomplete information. The uniform Fatou’s lemma is introduced in Feinberg et al. [18, Theorem 1.1]. It provides the natural necessary and sufficient condition for a stronger inequality than Fatou’s lemma, and Theorem5.2presents a particular version of its necessary condition.

(4)

2. Model description. For a metric space, let B45 be its Borel -field, that is, the -field generated by all open subsets of the metric space. For a Borel set E ∈ B451 we denote by B4E5 the -field whose elements are intersections of E with elements ofB45. Observe that E is a metric space with the same metric as on , and B4E5 is its Borel -field. For a metric space , we denote by 45 the set of probability measures on 41 B4550 A sequence of probability measures 84n5₉

n=1121 : : : from45 converges weakly 4setwise5 to ∈ 45 if for any

bounded continuous (bounded Borel-measurable) function f on Z f 4s54n54ds5 → Z f 4s54ds5 as n → 0

A sequence of probability measures 84n5₉

n=1121 : : : from45 converges in total variation to ∈ 45 if

sup Z f 4s54n5_{4ds5 −}Z f 4s54ds5 f 2 _{→ 6−11 17 is Borel-measurable} → 0 as n → 3

see Feinberg et al. [18] for properties of these three types of convergence of probability measures. These types of convergence of measures are used in Yüksel and Linder [35] to describe convergence of observation channels. Note that 45 is a separable metric space with respect to the weak convergence topology for probability measures, when is a separable metric space; see Parthasarathy [25, Chapter II]. For metric spaces1and2, a

(Borel-measurable) stochastic kernel R4ds₁ s₂5 on ₁ given2 is a mapping R4· ·52B415 ×2→ 601 17, such

that R4· s₂5 is a probability measure on₁for any s₂∈₂, and R4B ·5 is a Borel-measurable function on₂ for any Borel set B ∈B415. A stochastic kernel R4ds1 s25 on1 given2 defines a Borel-measurable mapping

s₂→ R4· s₂5 of2to the metric space415 endowed with the topology of weak convergence. A stochastic

kernel R4ds1 s25 on1given2is called weakly continuous 4setwise continuous, continuous in total variation5,

if R4· x4n5_{5 converges weakly (setwise, in total variation) to R4· x5 whenever x}4n5 _{converges to x in} 2. For

one-point sets 8s₁9 ⊂1, we sometimes write R4s1 s25 instead of R48s19 s25.

For a Borel subset S of a metric space 41 5, where is a metric, consider the metric space 4S1 5. A set B is called open (closed, compact) in S if B ⊆ S and B is open (closed, compact, respectively) in 4S1 5. Of course, if S =, we omit “in .” Observe that, in general, an open (closed, compact) set in S may not be open (closed, compact, respectively).

Let, , and be Borel subsets of Polish spaces (a Polish space is a complete separable metric space), P 4dx0_{x1 a5 be a stochastic kernel on}

given × , Q4dy a1 x5 be a stochastic kernel on given × , Q₀4dy x5 be a stochastic kernel on given , p be a probability distribution on , c2 × → ¯ = ∪ 8+9 be a bounded below Borel function on × , where is a real line.

A POMDP is specified by a tuple 41 1 1 P 1 Q1 c5, where is the state space, is the observation set, is the action set, P 4dx0_{x1 a5 is the state transition law, Q4dy a1 x5 is the observation kernel, c2}

× → ¯ is the one-step cost.

The partially observable Markov decision process evolves as follows:

• at time t = 0, the initial unobservable state x0 has a given prior distribution p;

• the initial observation y0 is generated according to the initial observation kernel Q04· x05;

• at each time epoch t = 01 11 : : : 1 if the state of the system is xt∈ and the decision maker chooses an action

a_t∈, then the cost c4xt1 at5 is incurred;

• the system moves to a state xt+1 according to the transition law P 4· xt1 at5, t = 01 11 : : : ;

• an observation yt+1∈ is generated by the observation kernel Q4· at1 xt+15, t = 01 11 : : : 0

Define the observable histories: h₀2= 4p1 y₀5 ∈0 and ht 2= 4p1 y01 a01 : : : 1 yt−11 at−11 yt5 ∈t for all

t = 11 21 : : : , where₀2=_{45 × and}_t2=_t−1× × if t = 11 21 : : : 0 A policy for the POMDP is defined as a sequence = 8_t9_{t=0111 : : :} of stochastic kernels _t on_given_t. A policy is called nonrandomized, if each probability measure _t4· h_t5 is concentrated at one point. The set of all policies is denoted by ç. The Ionescu Tulcea theorem (Bertsekas and Shreve [8, pp. 140–141] or Hernández-Lerma and Lassere [21, p. 178]) implies that a policy ∈ ç and an initial distribution p ∈45, together with the stochastic kernels P , Q, and Q0,

determine a unique probability measure P

p on the set of all trajectories 4 × × 5

_{endowed with the -field}

defined by the products of Borel -fieldsB45, B45, and B45. The expectation with respect to this probability measure is denoted byƐp.

For a finite horizon T = 01 11 : : : 1 the expected total discounted costs are

V_{T 1} 4p5 2=Ɛ p T −1 X t=0 tc4x_t1 a_t51 p ∈451 ∈ ç1 (1)

where ≥ 0 is the discount factor, V

01 4p5 = 00 Consider the following assumptions.

(5)

Assumption (D). The function c is bounded below on × and ∈ 601 15. Assumption (P). The function c is nonnegative on × and ∈ 601 17.

When T = 1 formula (1) defines the infinite horizon expected total discounted cost, and we denote it by V

4p50 We use the notations(D)and(P)following Bertsekas and Shreve [8, p. 214], where cases (D), (N), and (P)

are considered. However, Assumption(D)here is weaker than the conditions assumed in case(D)in Bertsekas and Shreve [8, p. 214], where one-step costs are assumed to be bounded.

Since the function c is bounded below by a constant M on × , a discounted model can be converted into a model with nonnegative costs by replacing costs c4x1 a5 with c4x1 a5 + M . Though Assumption(P)is more general, Assumption(D) is met in a wide range of applications. Thus we formulate the results for either of these assumptions.

For any function g_{4p5, including g}_{4p5 = V}

T 1 4p5 and g4p5 = V4p5, define the optimal values

g4p5 2= inf

∈çg

_4p51 _{p ∈}

450

A policy is called optimal for the respective criterion, if g_{4p5 = g4p5 for all p ∈}_{450 For g}_{= V} T 1 , the

optimal policy is called T -horizon discount-optimal; for g_{= V}

, it is called discount-optimal.

In this paper, for the expected total costs and objective values, we use similar notations for POMDPs, MDPs, and COMDPs. However, we reserve the symbol V for POMDPs, the symbol v for MDPs, and the notation ¯v for COMDPs. So, in addition to the notations V

T 1 1 V1 VT 1 1 and V introduced for POMDPs, we shall use the

notations v

T 1 1 v1 vT 1 , v and ¯vT 1 1 ¯v1 ¯vT 1 , ¯v for the similar objects for MDPs and COMDPs, respectively.

We recall that a function c defined on× with values in ¯ is inf-compact if the set 84x1 a5 ∈ ×2 c4x1 a5 ≤ 9 is compact for any finite number . A function c defined on × with values in ¯ is called -inf-compact on × , if for any compact set K ⊆ , the function c2 K × → ¯ defined on K × is inf-compact; see Feinberg et al. [13,15, Definition 1.1]. According to Feinberg et al. [15, Lemma 2.5], a bounded below function c is-inf-compact on the product of metric spaces and if and only if it satisfies the following two conditions:

(a) c is lower semicontinuous; (b) if a sequence 8x4n5₉

n=1121 : : : with values in converges and its limit x belongs to , then any sequence

8a4n5₉

n=1121 : : : with a4n5∈, n = 11 21 : : : 1 satisfying the condition that the sequence 8c4x4n51 a4n559n=1121 : : : is

bounded above, has a limit point a ∈.

Various applications deal with-inf-compact cost functions that are not inf-compact. For example, the functions c4x1 a5 = 4x − a52 _{and c4x1 a5 = x − a defined on}_{× are -inf-compact, but they are not inf-compact.}

For a POMDP 41 1 1 P 1 Q1 c5, consider the MDP 41 1 P 1 c5, in which all the states are observable. An MDP can be viewed as a particular POMDP with = and Q4B a1 x5 = Q4B x5 = I8x ∈ B9 for all x ∈ 1 a ∈_{, and B ∈ B45. In addition, for an MDP an initial state is observable. Thus, for an MDP, an initial state x is} considered instead of the initial distribution p0 In fact, this MDP possesses the property that action sets at all the states are equal. For MDPs, Feinberg et al. [14] provides general conditions for the existence of optimal policies, validity of optimality equations, and convergence of value iterations. Here we formulate these conditions for an MDP whose action sets in all states are equal.

Assumption (W∗) (cp. Feinberg et al. [14,15, Lemma 2.5]). (i) the function c is-inf-compact on × ;

(ii) the transition probability P 4· x1 a5 is weakly continuous in 4x1 a5 ∈_{× .}

For an MDP, a nonrandomized policy is called Markov if all decisions depend only on the current state and time. A Markov policy is called stationary if all decisions depend only on current states.

Theorem 2.1 (cp. Feinberg et al. [14, Theorem 2]). Let MDP 41 1 P 1 c5 satisfy Assumption(W∗). Let either Assumptions(P)or(D) hold. Then,

(i) the functions vt1 , t = 01 11 : : : , and v are lower semicontinuous on, and vt1 4x5 → v4x5 as t → for

all x ∈₃

(ii) for each x ∈ and t = 01 11 : : : 1

v_t+114x5 = min a∈ c4x1 a5 + Z v_t14y5P 4dy x1 a5 1 (2)

where v₀₁4x5 = 0 for all x ∈, and the nonempty sets

A_t14x5 2= a ∈_{2 v}_t+114x5 = c4x1 a5 + Z v_t14y5P 4dy x1 a5 1 x ∈_{1 t = 01 11 : : : 1}

(6)

satisfy the following properties: (a) the graph Gr4A_t15 = 84x1 a52 x ∈1 a ∈ At1 4x59, t = 01 11 : : : 1 is a Borel

subset of × , and (b) if vt+11 4x5 = +, then At1 4x5 = and, if vt+11 4x5 < +, then At1 4x5 is compact;

(iii) for each T = 11 21 : : : , there exists an optimal Markov T -horizon policy 4₀1 : : : 1 _{T −1}5, and if for a T -horizon Markov policy 4₀1 : : : 1 _{T −1}5 the inclusions _{T −1−t}4x5 ∈ A_t14x5, x ∈_{1 t = 01 : : : 1 T − 11 hold, then} this policy is T -horizon optimal;

(iv) for each x ∈

v4x5 = min a∈ c4x1 a5 + Z v4y5P 4dy x1 a5 1 (3)

and the nonempty sets

A4x5 2= a ∈2 v4x5 = c4x1 a5 + Z v4y5P 4dy x1 a5 1 x ∈1

satisfy the following properties: (a) the graph Gr4A5 = 84x1 a52 x ∈1 a ∈ A4x59 is a Borel subset of × ,

and (b) if v4x5 = +, then A4x5 = and, if v4x5 < +, then A4x5 is compact;

(v) for infinite-horizon problems there exists a stationary discount-optimal policy , and a stationary policy ∗

is optimal if and only if ∗

4x5 ∈ A4x5 for all x ∈;

(vi) (Feinberg and Lewis [19, Proposition 3.1(iv)]) if c is inf-compact on × , then the functions vt1 ,

t = 11 21 : : : , and v are inf-compact on_.

3. Reduction of POMDPs to COMDPs and main results. In this section we formulate the main results of the paper, Theorems3.2,3.5, and the relevant statements. These theorems provide sufficient conditions for the existence of optimal policies for COMDPs and therefore for POMDPs with expected total costs, as well as optimality equations and convergence of value iterations for COMDPs. These conditions consist of two major components: the conditions for the existence of optimal policies for the underlying MDP and additional conditions on the POMDP. Theorem3.5 states that the continuity of the observation kernel Q in total variation is the additional sufficient condition under which there is a stationary optimal policy for the COMDP, and this policy satisfies the optimality equations and can be found by value iterations. In particular, the continuity of Q in total variation and the weak continuity of P imply the setwise continuity of the stochastic kernel R0_{defined in (}₅_{) and}

the validity of Assumption(H)introduced in this section; see Theorem3.6. These two additional properties imply the weak continuity of the transition probability q for the COMDP (Theorem3.4) and eventually the desired properties of the COMDP; see Theorem3.2.

This section starts with the description of known results on the general reduction of a POMDP to the COMDP; see Bertsekas and Shreve [8, §10.3], Dynkin and Yushkevich [12, Chapter 8], Hernández-Lerma [20, Chapter 4], Rhenius [26], and Yushkevich [36]. To simplify notations, we sometimes drop the time parameter. Given a posterior distribution z of the state x at time epoch t = 01 11 : : : and given an action a selected at epoch t, denote by R4B × C z1 a5 the joint probability that the state at time 4t + 15 belongs to the set B ∈B45 and the observation at time t + 1 belongs to the set C ∈B45,

R4B × C z1 a5 2= Z Z B Q4C a1 x05P 4dx0 x1 a5z4dx51 B ∈B451 C ∈ B451 z ∈ 451 a ∈ 0 (4)

Observe that R is a stochastic kernel on × given 45 × ; see Bertsekas and Shreve [8, §10.3], Dynkin and Yushkevich [12, Chapter 8], Hernández-Lerma [20, p. 87], Yushkevich [36], or Rhenius [26] for details. The probability that the observation y at time t + 1 belongs to the set C ∈B45, given that at time t the posterior state probability is z and selected action is a1 is

R0_{4C z1 a5 2=}Z Z Q4C a1 x0_{5P 4dx}0_{x1 a5z4dx51} _{C ∈} B451 z ∈ 451 a ∈ 0 (5)

Observe that R0 _{is a stochastic kernel on}

given 45 × 0 By Bertsekas and Shreve [8, Proposition 7.27], there exists a stochastic kernel H on_{given 45 × × such that}

R4B × C z1 a5 = Z

C

H 4B z1 a1 y5R04dy z1 a51 B ∈B451 C ∈ B451 z ∈ 451 a ∈ 0 (6)

The stochastic kernel H 4· z1 a1 y5 defines a measurable mapping H 2 45 × × → 45, where H 4z1 a1 y54 · 5 = H 4· z1 a1 y50 For each pair 4z1 a5 ∈_{45 × , the mapping H4z1 a1 ·52 → 45 is defined} R0_{4· z1 a5-almost surely uniquely in y ∈}

; see Bertsekas and Shreve [8, Corollary 7.27.1] or Dynkin and

(7)

Yushkevich [12, Appendix 4.4]. For a posterior distribution z_t∈45, action at∈, and an observation yt+1∈1

the posterior distribution z_t+1∈45 is

z_t+1= H 4z_t1 a_t1 y_t+150 (7) However, the observation y_t+1is not available in the COMDP model, and therefore y_t+1is a random variable with the distribution R0_{4· z}

t1 at5, and the right-hand side of (7) maps 4zt1 at5 ∈45 × to 44550 Thus, zt+1is a

random variable with values in45 whose distribution is defined uniquely by the stochastic kernel

q4D z1 a5 2=Z

I8H 4z1 a1 y5 ∈ D9R0_{4dy z1 a51} _{D ∈}

B44551 z ∈ 451 a ∈ 3 (8)

see Hernández-Lerma [20, p. 87]. The particular choice of a stochastic kernel H satisfying (6) does not effect the definition of q from (8), since for each pair 4z1 a5 ∈_{45 × , the mapping H4z1 a1 ·52 → 45 is defined} R0_{4· z1 a5 almost surely uniquely in y ∈}

; see Bertsekas and Shreve [8, Corollary 7.27.1], and Dynkin and Yushkevich [12, Appendix 4.4].

Similar to the stochastic kernel R, consider a stochastic kernel R₀ on × given 45 defined by

R₀4B × C p5 2=Z

B

Q₀4C x5p4dx51 B ∈_{B451 C ∈ B451 p ∈ 450}

This kernel can be decomposed as

R₀4B × C p5 =Z

C

H₀4B p1 y5R0

04dy p51 B ∈B451 C ∈ B451 p ∈ 451 (9)

where R0

04C p5 = R04 × C p5, C ∈ B45, p ∈ 45, is a stochastic kernel on given 45 and H04dx p1 y5

is a stochastic kernel on given 45 × . Any initial prior distribution p ∈ 45 and any initial observation y0

define the initial posterior distribution z₀= H₀4p1 y₀5 on 41 B455. Similar to (7), the observation y₀ is not available in the COMDP and this equation is stochastic. In addition, H₀4p1 y5 is defined R0

04dy p5 almost surely

uniquely in y ∈ for each p ∈ 45. Similar to (8), the stochastic kernel

q₀4D p5 2= Z

I8H₀4p1 y5 ∈ D9R0₀4dy p51 D ∈B4455 and p ∈ 451 (10)

on45 given 45 defines the initial distribution on the set of posterior probabilities. Define q04p54D5 = q04D p51

where D ∈B44550 Then q04p5 is the initial distribution of z0= H04p1 y05 corresponding to the initial state

distribution p0

The COMDP is defined as an MDP with parameters (451 1 q1 ¯c), where (i) 45 is the state space;

(ii) is the action set available at all states z ∈ 45; (iii) the one-step cost function ¯c2 _{45 × → ¯, defined as}

¯

c4z1 a5 2= Z

c4x1 a5z4dx51 z ∈451 a ∈ 3 (11)

(iv) transition probabilities q on45 given 45 × defined in (8).

Denote by it, t = 01 11 : : : , a t-horizon history for the COMDP, also called an information vector,

i_t2= 4z₀1 a₀1 : : : 1 z_t−11 a_t−11 z_t5 ∈ I_t1 t = 01 11 : : : 1

where z₀ is the initial posterior distribution and z_t∈45 are recursively defined by Equation (7), I_t2= 45 × 4 × 455t _{for all t = 01 11 : : : , with I}

02=45. An information policy (I-policy) is a policy in a

COMDP, i.e., I -policy is a sequence = 8_t2 t = 01 11 : : : 9 such that _t4· i_t5 is a stochastic kernel on given It

for all t = 01 11 : : : ; see Bertsekas and Shreve [8, Chapter 10], and Hernández-Lerma [20, p. 88]. Denote by ã the set of all I -policies. We also consider Markov I -policies and stationary I -policies.

For an I -policy = 8_t2 t = 01 11 : : : 91 define a policy _{= 8}

t2 t = 01 11 : : : 9 in ç as

_t4· h_t5 2= _t4· i_t4h_t55 for all h_t∈ H_t and t = 01 11 : : : 1 (12) where i_t4h_t5 ∈ I_t is the information vector determined by the observable history h_t via (7). Thus and _are

equivalent in the sense that

t assigns the same conditional probability on given the observed history ht as t

(8)

for the history i_t4h_t5. If is an optimal policy for the COMDP then _{is an optimal policy for the POMDP. This}

follows from the facts that V_t14p5 = ¯v_t14q₀4p55, t = 01 11 : : : 1 and V4p5 = ¯v4q₀4p55; see Hernández-Lerma [20, p. 89] and references therein. Let zt4ht5 be the last element of the information vector it4ht5. With a slight abuse of

notations, by using the same notations for a measure concentrated at a point and a function at this point, if is Markov, then (12) becomes

t4ht5 = t4zt4ht55, and if is stationary, then t4ht5 = 4zt4ht551 t = 01 11 : : : 0

Thus, an optimal policy for a COMDP defines an optimal policy for the POMDP. However, very little is known for the conditions on POMDPs that lead to the existence of optimal policies for the corresponding COMDPs. For the COMDP, Assumption(W∗₎_{has the following form:}

(i) ¯c is-inf-compact on 45 × ;

(ii) the transition probability q4· z1 a5 is weakly continuous in 4z1 a5 ∈45 × .

Recall that the notation ¯v has been reserved for the expected total costs for COMDPs. The following theorem follows directly from Theorem2.1applied to the COMDP 4_{451 1 q1 ¯c5.}

Theorem 3.1. Let either Assumption(D) or Assumption(P)hold. If the COMDP 4451 1 q1 ¯c5 satisfies Assumption(W∗₎_{, then}

(i) the functions ¯v_t1, t = 01 11 : : : , and ¯v are lower semicontinuous on_{45, and ¯v}_t14z5 → ¯v4z5 as t → for all z ∈453

(ii) for each z ∈45 and t = 01 11 : : : 1

¯ v_t+114z5 = min a∈ ¯ c4z1 a5 + Z 45 ¯ v_t14z0_5q4dz0 z1 a5 = min a∈ _Z c4x1 a5z4dx5 + Z Z Z ¯ v_t14H 4z1 a1 y55Q4dy a1 x0_{5P 4dx}0 x1 a5z4dx5 1 (13)

where ¯v₀₁4z5 = 0 for all z ∈_{45, and the nonempty sets}

A_t14z5 2= a ∈_{2 ¯v}_t+114z5 = ¯c4z1 a5 + Z 45 ¯ v_t14z0_5q4dz0 z1 a5 1 z ∈_{451 t = 01 11 : : : 1}

satisfy the following properties: (a) the graph Gr4A_t15 = 84z1 a52 z ∈_{451 a ∈ A}_t14z59, t = 01 11 : : : 1 is a Borel subset of_{45 × , and (b) if ¯v}_t+114z5 = +, then A_t14z5 =_{and, if ¯v}_t+114z5 < +, then A_t14z5 is compact; (iii) for each T = 11 21 : : : , there exists an optimal Markov T -horizon I -policy 4₀1 : : : 1 _{T −1}5, and if for a T -horizon Markov I -policy 4₀1 : : : 1 _{T −1}5 the inclusions _{T −1−t}4z5 ∈ A_t14z5, z ∈451 t = 01 : : : 1 T − 11 hold, then this I -policy is T -horizon optimal;

(iv) for each z ∈₄₅

¯ v4z5 = min a∈ ¯ c4z1 a5 + Z 45 ¯ v4z0_5q4dz0_{z1 a5} = min a∈ _Z c4x1 a5z4dx5 + Z Z Z ¯ v4H 4z1 a1 y55Q4dy a1 x05P 4dx0 x1 a5z4dx5 1 (14)

and the nonempty sets

A4z5 2= a ∈_{2 ¯v}4z5 = ¯c4z1 a5 + Z 45 ¯ v4z0_5q4dz0 z1 a5 1 z ∈₄₅₁

satisfy the following properties: (a) the graph Gr4A5 = 84z1 a52 z ∈_{451 a ∈}4z59 is a Borel subset of 45 × , and (b) if ¯v4z5 = +, then A4z5 = and, if ¯v4z5 < +, then A4z5 is compact;

(v) for infinite-horizon problems there exists a stationary discount-optimal I -policy , and a stationary I -policy ∗

is optimal if and only if ∗

4z5 ∈ A4z5 for all z ∈453 and

(vi) if ¯c is inf-compact on_{45 × , then the functions ¯v}_t1, t = 11 21 : : : , and ¯v are inf-compact on_45. Thus, in view of Theorem3.1, the important question is under which conditions on the original POMDP, the COMDP satisfies the conditions under which there are optimal policies for MDPs. Monograph Hernández-Lerma [20, p. 90] provides the following conditions for this: (a) is compact, (b) the cost function c is bounded and continuous, (c) the transition probability P 4dx0_{x1 a5 and the observation kernel Q4dy a1 x5 are weakly continuous}

stochastic kernels, and (d) there exists a weakly continuous stochastic kernel H on given 45 × × satisfying (6). Consider the following relaxed version of assumption (d) that does not require that H is continuous in y. We introduce this assumption, called Assumption(H), because it holds in many important situations when a weakly continuous stochastic kernel H satisfying (6) does not exist; see Example4.2and Theorem3.6.

(9)

Assumption (H). There exists a stochastic kernel H on given 45 × × satisfying (6) such that if a sequence 8z4n5₉

n=1121 : : : ⊆45 converges weakly to z ∈ 45, and a sequence 8a4n59n=1121 : : :⊆ converges to

a ∈_{as n → , then there exists a subsequence 84z}4nk5_{1 a}4nk5₅₉

k=1121 : : : ⊆ 84z4n51 a4n559n=1121 : : : and a measurable

subset C of such that R0_{4C z1 a5 = 1 and for all y ∈ C}

H 4z4nk5_{1 a}4nk5_{1 y5 converges weakly to H 4z1 a1 y50} ₍₁₅₎ In other words, (15) holds R0_{4· z1 a5 almost surely.}

Theorem 3.2. Let the following assumptions hold: (a) either Assumption(D)or Assumption(P)holds; (b) the function c is -inf-compact on × ; (c) either

(i) the stochastic kernel R0_{4dy z1 a5 on}

given 45 × is setwise continuous and Assumption(H)holds; or

(ii) the stochastic kernels P 4dx0_{x1 a5} _on

given × and Q4dy a1 x5 on given × are weakly continuous and there exists a weakly continuous stochastic kernel H 4dx z1 a1 y5 on given 45 × × satisfying (6).

Then the COMDP 4_{451 1 q1 ¯c5 satisfies Assumption}(W∗₎ _{and therefore statements (i)–(vi) of Theorem}_3.1

hold.

Remark 3.1. Assumption(H)is weaker than weak continuity of H stated in Assumption (ii) of Theorem3.2. Assumption(H)is introduced because, according to Theorem3.6, this assumption and setwise continuity of R0

stated in assumption (i) hold, if the stochastic kernel P 4dx0_{x1 a5 on}

given × is weakly continuous and the stochastic kernel Q4dy a1 x5 on given × is continuous in total variation, whereas a weakly continuous version of H may not exist; see Example4.2.

Remark 3.2. Throughout this paper we follow the terminology according to which finite sets are countable. If is countable, then Equation (13) transforms into

¯ v_t+114z5 = min a∈ _Z c4x1 a5z4dx5 + X y∈ ¯ v_t14H 4z1 a1 y55R04y z1 a5 1 z ∈451 t = 01 11 : : : 1

and Equation (14) transforms into

¯ v4z5 = min a∈ _Z c4x1 a5z4dx5 + X y∈ ¯ v4H 4z1 a1 y55R04y z1 a5 1 z ∈450

Theorem3.2follows from Theorems3.1,3.3, and3.4. In particular, Theorem3.3implies that, if Assumption(D) or(P) holds for a POMDP, then it also holds for the corresponding COMDP.

Theorem 3.3. If the function c2 × → ¯ is bounded below and -inf-compact on × , then the cost function ¯c2 45 × → ¯ defined for the COMDP in (11) is bounded from below by the same constant as c and -inf-compact on 45 × .

Theorem 3.4. The stochastic kernel q4dz0 z1 a5on45 given 45 × is weakly continuous if condition (c) from Theorem3.2holds.

The following theorem provides sufficient conditions for the existence of optimal policies for the COMDP and therefore for the POMDP in terms of the initial parameters of the POMDP.

Theorem 3.5. Let assumptions (a) and (b) of Theorem3.2hold, the stochastic kernel P 4dx0 x1 a5on given × be weakly continuous, and the stochastic kernel Q4dy a1 x5 on given × be continuous in total variation. Then the COMDP 4451 1 q1 ¯c5 satisfies Assumption(W∗₎ _{and therefore statements (i)–(vi) of}

Theorem3.1hold.

Theorem3.5follows from Theorem3.3and from the following statement.

Theorem 3.6. The weak continuity of the stochastic kernel P 4dx0 x1 a5on_{given × and continuity in} total variation of the stochastic kernel Q4dy a1 x5 on_{given × imply that condition (i) from Theorem}3.2 holds (that is, R0_{is setwise continuous and Assumption}_(H)_{holds) and therefore the stochastic kernel q4dz}0_{z1 a5}

on45 given 45 × is weakly continuous.

(10)

Example4.1demonstrates that, if the stochastic kernel Q4dy a1 x5 on given × is setwise continuous, then the transition probability q for the COMDP may not be weakly continuous in 4z1 a5 ∈45 × . In this example the state set consists of two points. Therefore, even if the stochastic kernel P 4dx0_{x1 a5 on}

given × is setwise continuous (even if it is continuous in total variation) in 4x1 a5 ∈ × then the setwise continuity of the stochastic kernel Q4dy a1 x5 on given × is not sufficient for the weak continuity of q.

Corollary 3.1 (cp. Hernández-Lerma [20, p. 93]). If the stochastic kernel P 4dx0 x1 a5on given × is weakly continuous, is countable, and for each y ∈ the function Q4y a1 x5 is continuous on × , then the following statements hold:

(a) for each y ∈ the function R0_{4y z1 a5 is continuous on}

45 × with respect to the topology of weak convergence on_{451 and Assumption} (H)holds;

(b) the stochastic kernel q4dz0_{z1 a5} _on

45 given 45 × is weakly continuous; and

(c) if, in addition to the above conditions, assumptions (a) and (b) from Theorem3.2hold, then the COMDP 4_{451 1 q1 ¯c5 satisfies Assumption}(W∗₎_{and therefore statements (i)–(vi) of Theorem}_3.1_hold.

Proof. For a countable, the continuity in total variation of the stochastic kernel Q4· a1 x5 on given × follows from the continuity of Q4y a1 x5 for each y ∈ in 4a1 x5 ∈ × and from Q4 a1 x5 = 1 for all 4a1 x5 ∈_{× (the similar fact holds for the stochastic kernel R}0_{4· z1 a5). Indeed, let 4a}4n5_{1 x}4n5_{5 → 4a1 x5 as}

n → . For an arbitrary > 0 choose a finite set Y⊆_{such that Q4Y} a1 x5 ≥ 1 − 1 which is equivalent to Q4_\Y a1 x5 ≤ 0 Let us choose a natural number N such that Q4Y a4n5_{1 x}4n5_{5 ≥ 1 − 2 and, therefore,}

Q4_\Y a4n5_{1 x}4n5_{5 ≤ 2 for n > N}

. Denote ãn2=

P

y∈Q4y a4n51 x4n55 − Q4y a1 x50 Observe that

ã_n≤ Q4_\Y a4n5_{1 x}4n5_{5 + Q4}

\Y a1 x5 +

X

y∈Y

Q4y a4n5_{1 x}4n5_{5 − Q4y a1 x50}

Since Q4y a1 x5 is continuous in 4a1 x5 ∈ × for each y ∈ and the set Y is finite, the limit of the sum in the

right-hand side of the last inequality is 0 as n → 0 This implies that lim sup_n→ã_n≤ 30 Since > 0 is arbitrary, then ãn→ 0 as n → 0 Thus, the continuity of Q4· a1 x5 in total variation takes place. Statements (a) and (b)

follow from Theorem3.6, and statement (c) follows from Theorem 3.5.

4. Counterexamples. In this section we provide three counterexamples. Example4.1 demonstrates that the assumption in Theorems3.5and3.6, that the stochastic kernel Q is continuous in total variation, cannot be weakened to the assumption that Q is setwise continuous. Example4.2 shows that, under conditions of Corollary3.1, a weakly continuous mapping H satisfying (7) may not exist. The existence of such a mapping is mentioned in Hernández-Lerma [20, p. 93]. Example4.3illustrates that the setwise continuity of the stochastic kernel R04dy z1 a5 on_{given 45 × is essential in condition (i) of Theorem}3.2. Without this assumption, Assumption(H)alone is not sufficient for the weak continuity of the stochastic kernel q4dz0_{z1 a5 on}

45 given 45 × and therefore for the correctness of the conclusions of Theorems3.2and3.4.

We would like to mention that before the authors constructed Example4.1, Huizhen (Janey) Yu provided them with an example when the weak continuity of the observation kernel Q is not sufficient for the weak continuity of the stochastic kernel q4· z1 a50 In her example, = 811 29, the system does not move, = = 601 17, at state 1 the observation is 0 for any action a and at state 2, under an action a ∈1 the observation is uniformly distributed on 601 a70 The initial belief distribution is z = 40051 00550

Example 4.1. Continuity of Q in total variation cannot be relaxed to setwise continuity in Theorems3.5 and3.6. Let = 811 29, = 601 17, and = 809 ∪ 81/n2 n = 11 21 : : : 9. The system does not move. This means that P 4x x1 a5 = 1 for all x = 11 2 and a ∈. This stochastic kernel P is weakly continuous and, since is finite, it is setwise continuous and continuous in total variation. The observation kernel Q is Q4dy a1 15 = Q4dy 01 25 = m4dy5, a ∈, with m being the Lebesgue measure on = 601 171 and Q4dy 1/n1 25 = m4n5_{4dy5, n = 11 21 : : : 1 where}

m4n5_{is the absolutely continuous measure on}_{= 601 17 with the density f}4n5_,

f4n5_{4y5 =}

(

01 if 2k/2n_{< y < 42k + 15/2}n _{for k = 01 11 : : : 1 2}n−1_{− 1;}

21 otherwise. (16)

First we show that Q4dy a1 x5 on given × is setwise continuous in 4a1 x5. In our case, this means that the probability distributions Q4dy 1/n1 i5 converge setwise to Q4dy 01 i5 as n → , where i = 11 2. For i = 1 this statement is trivial, because Q4dy a1 15 = m4dy5 for all a ∈_{. For i = 2 we need to verify that m}4n5_converge

setwise to m as n → . According to Bogachev [10, Theorem 8.10.56], which is Pfanzagl’s generalization

(11)

of the Fichtenholz-Dieudonné-Grothendiek theorem, measures m4n5 converge setwise to the measure m, if m4n5_{4C5 → m4C5 for each open set C in 601 17. Since m}4n5_{405 = m405 = m}4n5_{415 = m4151 n = 11 21 : : : 1 then}

m4n5_{4C5 → m4C5 for each open set C in 601 17 if and only if m}4n5_{4C5 → m4C5 for each open set C in 401 15.}

Choose an arbitrary open set C in 401 15. Then C is a union of a countable set of open disjoint intervals 4a_i1 b_i5. Therefore, for any > 0 there is a finite number nof open intervals 84a_i1 b_i52 i = 11 : : : 1 n9 such that m4C\C5 ≤ , where C=Sn

i=14ai1 bi50 Since f4n5≤ 2, this implies that m4n54C\C5 ≤ 2 for any n = 11 21 : : : 0

Since m4n544a1 b55 − m44a1 b55 ≤ 1/2n−1_{1 n = 11 21 : : : 1 for any interval 4a1 b5 ⊂ 601 171 this implies that}

m4C5 − m4n5_4C

5 ≤ if n ≥ N, where Nis any natural number satisfying 1/2N−1≤ 0 Therefore, if n ≥ N1

then m4n5_{4C5 − m4C5 ≤ m}4n5_4C

5 − m4C5 + m4C\C5 + m4n54C\C5 ≤ 40 This implies that m4n54C5 → m4C5 as

n → . Thus m4n5 _{converge setwise to m as n → .}

Second, we verify that the transition kernel q does not satisfy the weak continuity property. Consider the posterior probability distribution z = 4z4151 z4255 = 40051 0055 of the state at the current step. Since the system does not move, this is the prior probability distribution at the next step. If the action 0 is selected at the current step, then nothing new can be learned about the state during the next step. Thus q4z z1 05 = 1. Let y be an observation at the next step, and let D be the event that the state is 2. At the next step, the prior probability of the event D is 0.5, because z425 = 0050 Now let an action 1/n be selected at the current step. The new posterior state probabilities depend on the event A = 8f4n5_{4y5 = 290 If the event D takes place (the state is 2), then the}

probability of the event A is 1 and the probability of the event ¯A = 8f4n5_{4y5 = 09 is 0. If the event ¯}_{D takes place}

(the new state is 1), then the probabilities of the events A and ¯A are 0.5. Bayes’s formula implies that the posterior probabilities are 41/31 2/351 if f4n5_{4y5 = 21 and 411 051 if f}4n5_{4y5 = 0. Since f}4n5_{425 = 2 with probability 3/4}

and f4n5_{4y5 = 0 with probability 1/41 then q441/31 2/35 z1 1/n5 = 3/4 and q4411 05 z1 1/n5 = 1/4. So, all the}

measures q4· z1 1/n5 are constants and they are not equal to the measure q4· z1 051 which is concentrated at the point z = 40051 00550 Thus the transition kernel q on45 given 45 × is not weakly continuous.

Example 4.2. Under conditions of Corollary3.1there is no weakly continuous stochastic kernel H 4· z1 a1 y5 on given 45 × × satisfying (7). Consider a POMDP with the state and observation spaces = = 811 29; the action space = 6−11 17; the system does not move, that is P 41 11 a5 = P 42 21 a5 = 1 for all a ∈ 3 for each y ∈ the observation kernel Q4y a1 x5 is continuous in a ∈ ,

Q41 a1 15 = ( a1 a ∈ 6−11 051 a2₁ _{a ∈ 601 171} Q41 a1 25 = ( a2₁ _{a ∈ 6−11 051} a1 a ∈ 601 171 Q42 a1 15 = ( 1 − a1 a ∈ 6−11 051 1 − a2₁ _{a ∈ 601 171} Q42 a1 25 = ( 1 − a2₁ _{a ∈ 6−11 051} 1 − a1 a ∈ 601 173 and z = 4z4151 z4255 = 41₂11₂5 is the probability distribution on = 811 29.

Formula (4) with B = 819 and C = 819 implies

R4411 15 z1 a5 =1 2Q41 a1 15 =        a 2 1 a ∈ 6−11 051 a2 21 a ∈ 601 170 (17) Setting C = 819 in (5), we obtain R041 z1 a5 =1 2Q41 a1 15 + 1 2Q41 a1 25 = a + a2 2 1 a ∈ 6−11 170 (18) Formulas (17) and (18) imply that, if H satisfies (6), then

H 41 z1 a1 15 =R4411 15 z1 a5 R0_{41 z1 a5} =          a a + a21 a ∈ 6−11 051 a2 a + a21 a ∈ 401 170 Therefore, lim

a↑0H 41 z1 a1 15 = 1 and lima↓0H 41 z1 a1 15 = 00

Thus, the stochastic kernel H on given 45 × × is not weakly continuous in a, that is, H2 45 × × → 45 is not a continuous mapping. In view of Corollary3.1, Assumption(H)holds.

(12)

Example 4.3. Stochastic kernels P on given × and Q on given × are weakly continuous, the stochastic kernel R0 _on

given 45 × , defined by formula (5), is weakly continuous, but it is not setwise continuous. Though Assumption(H)holds, the stochastic kernel q on45 given 45 × , defined by formula (8), is not weakly continuous.

Let = 811 29, = = 8111 21

1

31 : : : 9 ∪ 809 with the metric 4a1 b5 = a − b, a1 b ∈, and P 4x x1 a5 = 1,

x ∈1 a ∈ . Let also Q40 01 x5 = 1, Q40 1/m1 x5 = Q41/n 01 x5 = 0, x ∈ 1 and Q41/n 1/m1 15 = a_{m1 n}sin24n/42m55, Q41/n 1/m1 25 = a_{m1 n}cos2_{4n/42m55, m1 n = 11 21 : : : 1 where a}

m1 2mk+`= 1/42k+1m5 for k = 01 11 : : : 1 ` = 11 21 : : : 1 2m. SinceP2m `=1sin 2_{4`/42m55 =}P2m `=1cos24`/42m55 = Pm `=14sin 2_{4`/42m55 +} cos2_{4`/42m555 = m, then}P n=1Q41/n 1/m1 x5 = P k=0 P2m `=1Q41/42mk + `5 1/m1 x5 = P k=041/2 k+1_{5 = 1,}

x ∈, and Q is a stochastic kernel on given × . The stochastic kernels P on given × and Q on given × are weakly continuous. The former is true for the same reasons as in Example4.1. The latter is true because lim sup_m→Q4C a_m1 x5 ≤ Q4C 01 x5 for any closed set C in0 Indeed, a set C is closed in if and only if either (i) 0 ∈ C or (ii) 0 y C and C is finite. Let C ⊆ be closed. In case (i), lim sup_m→Q4C a_m1 x5 ≤ 1 = Q4C 01 x5 as a_m→ 0, x ∈_{. In case (ii), lim}_m→Q4C a_m1 x5 = 0 = Q4C 01 x5 as a_m→ 0, since lim_m→Q41/n a_m1 x5 = 0 = Q41/n 01 x5 for n = 11 21 : : : and for x ∈0

Formula (4) implies that R411 1/n z1 1/m5 = z415a_{m1 n}sin24n/42m55, R421 1/n z1 1/m5 = z425a_{m1 n}cos2_{4n/42m55, R411 0 z1 1/m5 = 0, R421 0 z1 1/m5 = 0, and R411 1/n z1 05 = 0, R421 1/n z1 05 = 0,}

R411 0 z1 05 = z415, R421 0 z1 05 = z425 for m1 n = 11 21 : : : , z = 4z4151 z4255 ∈_{45. Formula (}5) yields R0_{40 z1 1/m5 = 0, R}0_{41/n z1 1/m5 = z415a}

m1 nsin24n/42m55 + z425am1 ncos24n/42m55, and R

0_{40 z1 05 = 1,}

R0_{41/n z1 05 = 0 for m1 n = 11 21 : : : 1 z = 4z4151 z4255 ∈}

45. Therefore, R0_{40 z1 1/m5 6→ R}0_{40 z1 05 as m → .}

Thus the stochastic kernel R0_on

given 45 × is not setwise continuous. However, stochastic kernel R0 _on

given45 × is weakly continuous.

Observe that45 = 84z4151 z42552 z4151 z425 ≥ 01 z415 + z425 = 19 ⊂ 2_{. Let z = 4z4151 z4255 ∈}_{45. If}

R0_{4y z1 a5 > 0, in view of (}₆_{), H 4x}0_{z1 a1 y5 = R44x}0_{1 y5 z1 a5/R}0_{4y z1 a5 for all x}0_∈

1 a ∈ , and y ∈ 0 Thus, if R0_{4y z1 a5 > 0, then} H 4z1 a1 y5 =                  _{z415 sin}2_4n/42m55 z415 sin24n/42m55 + z425 cos2_4n/42m551 z425 cos2_4n/42m55 z415 sin24n/42m55 + z425 cos2_4n/42m55 1 if a = 1 m1 y = 1 n1 m1 n = 11 21 : : : 1 4z4151 z42551 if a = y = 00

If R0_{4y z1 a5 = 0, we set H 4z1 a1 y5 = z = 4z4151 z42550 In particular, H 4z1 1/m1 05 = z for all m = 11 21 : : : 0}

Observe that Assumption(H)holds because, if R0_{4y z1 a5 > 0 and if sequences 8z}4N 5₉

N =1121 : : : ⊆45 and

8a4N 5₉

N =1121 : : : ⊆ converge to z ∈ 45 and a ∈ , respectively, as N → , then H4z4N 51 a4N 51 y5 → H 4z1 a1 y5 as

N → 0 Indeed, it is sufficient to verify this property only for the following two cases: (i) y = 1/n, a = 1/m, and R0_{41/n z1 1/m5 > 0, where m1 n = 11 21 : : : 1 and (ii) y = a = 00 In case (i), a}4N 5_{= 1/m, when N is large enough,}

and H 4z4N 5_{1 1/m1 1/n5 → H 4z1 1/m1 1/n5 as N → because the function H 4z1 1/m1 1/n5 is continuous in z, when}

R0_{41/n z1 1/m5 > 0. For case (ii), H 4z}4N 5_{1 a}4N 5_{1 05 = z}4N 5_{→ z as N → 0}

Fix z = 41₂11

25. According to the above formulae, H 4z1 1/m1 1/n5 = 4sin

2_{4n/2m51 cos}2_{4n/2m55 and}

R0_{41/n z1 1/m5 = a}

m1 n/2. Consider a closed subset D = 84z

0_{4151 z}0_{4255 ∈} 452 z0_{415 ≥}3 49 in45. Then q4D z1 1/m5 = X n=1121 : : : 8sin24n/2m5 ≥ 3/494a_{m1 n}/25 = X k=0 2m X `=1 Isin4`/42m55 ≥√3/2 44a_{m1 2mk}+ `5/25 = 2m X `=1 I8sin4`/2m5 ≥√3/291/2m X k=0 41/2k+15 →1 3> 0

as m → , where the limit takes place because 62m/37 −P2m

`=1I8sin4`/2m5 ≥

√

3/29 ≤ 1, where 6 · 7 is the integer part of a number, andP

k=041/2

k+1_{5 = 1. In addition, q4D z1 05 = 0 since z y D and q4z z1 05 =}

I8H 4z1 01 05 = z9R0_{40 z1 05 = 1. Thus, lim}

m→q4D z1 1/m5 =13> 0 = q4D z1 05 for a closed set D in450

This implies that the stochastic kernel q on45 given 45 × is not weakly continuous.

5. Continuity of transition kernels for posterior probabilities. This section contains the proofs of Theorems3.4and3.6. The following two versions of Fatou’s lemma for a sequence of measures 84n59_{n=1121 : : :} are used in the proofs provided below.

(13)

Lemma 5.1 (Generalized Fatou’s Lemma). Let be an arbitrary metric space, 84n59n=1121 : : :⊂45, and

8f4n5₉

n=1121 : : : be a sequence of measurable nonnegative ¯-valued functions on . Then

(i) (Royden [28, p. 231]) if 84n5₉

n=1121 : : :⊂45 converges setwise to ∈ 45, then

Z lim inf n→ f 4n5_{4s54ds5 ≤ lim inf} n→ Z f4n54s54n54ds53 (19)

(ii) (Schäl [30, Lemma 2.3(ii)], Ja´skiewicz and Nowak [24, Lemma 3.2], Feinberg et al. [14, Lemma 4], Feinberg et al. [16, Theorem 1.1]) if 84n59_{n=1121 : : :}⊂_{45 converges weakly to ∈ 45, then}

Z lim inf n→1 s0_→sf 4n5_4s0_{54ds5 ≤ lim inf} n→ Z f4n5_4s54n5_4ds50 ₍₂₀₎

Proof of Theorem3.4. According to Parthasarathy [25, Theorem 6.1, p. 40], Shiryaev [32, p. 311], or Billingsley [9, Theorem 2.1], the stochastic kernel q4dz0_{z1 a5 on}

45 given 45 × is weakly continuous if and only if q4D z1 a5 is lower semicontinuous in 4z1 a5 ∈ 445 × 5 for every open set D in 451 that is,

lim inf

n→ q4D z

4n5_{1 a}4n5_{5 ≥ q4D z1 a51} ₍₂₁₎

for all z1 z4n5_∈_{45, and a1 a}4n5_∈_{, n = 11 21 : : : , such that z}4n5_{→ z weakly and a}4n5_{→ a0}

To prove (21), suppose that

lim inf

n→ q4D z

4n5_{1 a}4n5_{5 < q4D z1 a50}

Then there exists ∗_{> 0 and a subsequence 84z}4n115_{1 a}4n115₅₉

n=1121 : : :⊆ 84z4n51 a4n559n=1121 : : : such that

q4D z4n1 15_{1 a}4n1 15_{5 ≤ q4D z1 a5 −}∗₁ _{n = 11 21 : : : 0}

(22) If condition (ii) of Theorem 3.2 holds, then formula (8), the weak continuity of the stochastic kernel R0

on given 45 × (this weak continuity is proved in Hernández-Lerma [20, p. 92]), and Lemma5.1(ii) contradict (22). If condition (i) of Theorem3.2holds, then there exists a subsequence 84z4n1251 a4n125₅₉

n=1121 : : :

⊆ 84z4n115_{1 a}4n115₅₉

n=1121 : : : such that H 4z4n1251 a4n1251 y5 → H 4z1 a1 y5 weakly as n → 1 R

0_{4· z1 a5 almost surely in}

y ∈_{0 Therefore, since D is an open set in 45,} lim inf

n→ I8H 4z

4n1 25_{1 a}4n1 25_{1 y5 ∈ D9 ≥ I8H 4z1 a1 y5 ∈ D91} _R0

4· z1 a5 almost surely in y ∈0 (23)

Formulas (8), (23), the setwise continuity of the stochastic kernel R0 _on

given 45 × , and Lemma5.1(i) imply lim infn→q4D z4n1 251 a4n1 255 ≥ q4D z1 a51 which contradicts (22). Thus (21) holds.

To prove Theorem3.6, we need to formulate and prove several auxiliary facts. Let be a metric space, 45 and45 be, respectively, the spaces of all real-valued functions and all bounded continuous functions defined on_{. A subset A}₀⊆_{45 is said to be equicontinuous at a point s ∈ , if sup}_{f ∈A}

0f 4s

0_{5 − f 4s5 → 0 as s}0_{→ s. A}

subsetA0⊆45 is said to be uniformly bounded, if there exists a constant M < + such that f 4s5 ≤ M1 for all

s ∈ and for all f ∈ A00 Obviously, if a subsetA0⊆45 is equicontinuous at all the points s ∈ and uniformly

bounded, thenA0⊆450

Theorem 5.1. Let1,2, and3 be arbitrary metric spaces, ë 4ds2 s15 be a weakly continuous stochastic

kernel on2 given1, and a subsetA0⊆42×35 be equicontinuous at all the points 4s21 s35 ∈2×3 and

uniformly bounded. If2 is separable, then for every open setO in 2 the family of functions defined on1×3,

AO= 4s₁1 s₃5 → Z O f 4s₂1 s₃5ë 4ds₂ s₁52 f ∈A0 1

is equicontinuous at all the points 4s₁1 s₃5 ∈1×3 and uniformly bounded.

Proof. The familyAconsists of a single function, which is identically equal to 0. Thus, the statement of the

theorem holds whenO = 0 Let O 6= 0 Since A0⊆42×35 is uniformly bounded, then

M = sup f ∈A0 sup s2∈2 sup s3∈3 f 4s₂1 s₃5 < 1 (24)

and, since ë 4ds₂ s₁5 is a stochastic kernel, the family of functionsAO is uniformly bounded byM.

(14)

Let us fix an arbitrary nonempty open setO ⊆ 2 and an arbitrary point 4s11 s35 ∈1×3. We shall prove that

AO⊂41×35 is equicontinuous at the point 4s11 s35. For any s ∈2 and > 0 denote by B4s5 and ¯B4s5,

respectively, the open and closed balls in the metric space2of radius with center s and by S4s5 the sphere in

2of radius with center s. Note that S4s5 = ¯B4s5\B4s5 is the boundary of B4s5. Every ball B4s5 contains a

ball B04s5, 0 < 0≤ , such that

ë 4 ¯B04s5\B04s5 s₁5 = ë 4S04s5 s₁5 = 01

that is, B04s5 is a continuity set for the probability measure ë 4· s15; Parthasarathy [25, p. 50]. SinceO is an open

set in2, for any s ∈O there exists s> 0 such that Bs4s5 is a continuity set for a probability measure ë 4· s15 and B

s4s5 ⊆O. The family 8Bs4s52 s ∈O9 is a cover of O. Since 2 is a separable metric space, by Lindelöf’s lemma, there exists a sequence 8s4j5₉

j=1121 : : :⊂O such that 8B_s4j54s4j552 j = 11 21 : : : 9 is a cover ofO. The sets

A4152= B s4154s 415₅₁ _A425_{2= B} _s4254s 425_5\B _s4154s 415_{51 : : : 1} _A4j5_{2= B} _s4j54s 4j5₅/ j−1 [ i=1 B s4i54s 4i5₅ 1 : : : 1

are continuity sets for the probability measure ë 4· s15. In view of Parthasarathy [25, Theorem 6.1(e), p. 40],

ë 4A4j5_s0 15 → ë 4A 4j5_s 151 as s 0 1→ s11 j = 11 21 : : : 0 (25) Moreover, [ j=1121 : : : A4j5₌

O and A4i5_{∩ A}4j5_{= 1} _{for all i 6= j0} ₍₂₆₎

The next step of the proof is to show that for each j = 11 21 : : :

sup f ∈A0 Z A4j5 f 4s₂1 s0 35ë 4ds2 s 0 15 − Z A4j5 f 4s₂1 s₃5ë 4ds₂ s₁5 → 0 as 4s0 11 s 0 35 → 4s11 s350 (27)

Fix an arbitrary j = 11 21 : : : 0 If ë 4A4j5 s₁5 = 0, then formula (27) directly follows from (25) and (24). Now let ë 4A4j5_s

15 > 0. Formula (25) implies the existence of such > 0 that ë 4A4j5 s 0

15 > 0 for all s 0

1∈ B4s15. We

endow A4j5 _{with the induced topology from}

2and set ë_j4C s0 15 2= ë 4C s0 15 ë 4A4j5_s0 15 1 s0 1∈ B4s151 C ∈B4A 4j5₅₀ Formula (25) yields

ë_j4ds₂ s₁05 converges weakly to ë_j4ds₂ s₁51 in 4A4j5_{5 as s}0

1→ s10 (28)

According to Parthasarathy [25, Theorem 6.8, p. 51],

sup f ∈A0 Z A4j5 f 4s₂1 s₃5ë 4ds₂ s0₁5 −Z A4j5 f 4s₂1 s₃5ë 4ds₂ s₁5 → 01 as s₁0→ s₁0 (29)

Equicontinuity ofA0⊆42×35 at all the points 4s21 s35 ∈2×3 and the inequality f 4s 0 21 s 0 35 − f 4s 0 21 s35 ≤ f 4s0 21 s 0 35 − f 4s21 s35 + f 4s 0 21 s35 − f 4s21 s35 imply lim sup 4s0 21 s035→4s21 s35 sup f ∈A0 f 4s₂01 s0 35 − f 4s 0 21 s35 = 01 for all s2∈20 (30)

Thus, formulas (30) and (28) and Lemma5.1(ii) imply

lim sup 4s0 11 s305→4s11 s35 sup f ∈A0 Z A4j5 4f 4s₂1 s₃05 − f 4s₂1 s₃55ë 4ds₂ s₁05 ≤ Z A4j5 lim sup 4s0 21 s305→4s21 s35 sup f ∈A0 f 4s₂01 s0 35 − f 4s 0 21 s35ë 4ds2 s15 = 00 (31)

Formula (27) follows from (29) and (31).

(15)

Since, for all j = 11 21 : : : and for all 4s0 11 s 0 35 ∈1×3, sup f ∈A0 Z A4j5 f 4s₂1 s0 35ë 4ds2 s 0 15 − Z A4j5 f 4s₂1 s₃5ë 4ds₂ s₁5 ≤ 2Më 4A4j5_s 15 andP j=1ë 4A 4j5_s

15 = ë 4O s15 ≤ 1, then equicontinuity ofAOat the point 4s11 s35 follows from (26) and (27).

Indeed, sup f ∈A0 Z O f 4s₂1 s0 35ë 4ds2 s 0 15 − Z O f 4s₂1 s₃5ë 4ds₂ s₁5 ≤ X j=1121 : : : sup f ∈A0 Z A4j5f 4s21 s 0 35ë 4ds2 s 0 15 − Z A4j5f 4s21 s35ë 4ds2 s15 → 01 as 4s₁01 s0₃5 → 4s₁1 s₃50

As 4s₁1 s₃5 ∈₁×3 is arbitrary, the above inequality implies thatAO is equicontinuous at all the points

4s₁1 s₃5 ∈₁×₃. In particular,_A_O⊆₄₁×₃5.

For a set B ∈B45, let RB be the following family of functions defined on45 × :

RB= 84z1 a5 → R4B × C z1 a52 C ∈B4590 (32)

Lemma 5.2. Let the stochastic kernel P 4dx0 x1 a5on given × be weakly continuous and the stochastic kernel Q4dy a1 x5 on_{given × be continuous in total variation. Consider the stochastic kernel R4· z1 a5} on_{× given 45 × defined in formula (}4). Then, for every pair of open sets _O₁ and _O₂ in_{, the} family of functionsRO1\O2 defined on45 × is uniformly bounded and is equicontinuous at all the points 4z1 a5 ∈_{45 × , that is, for all z1 z}4n5_∈

45, a1 a4n5_∈

, n = 11 21 : : : , such that z4n5_{→ z}_{weakly and a}4n5_{→ a,}

sup

C∈B45

R44O1\O25 × C z

4n5_{1 a}4n5_{5 − R44}

O1\O25 × C z1 a5 → 00 (33)

Proof. Since R is a stochastic kernel, all the functions in the familyRO1\O2 are nonnegative and bounded above by 1. Thus, this family is uniformly bounded. The remaining proof establishes the equicontinuity ofRO1\O2 at all the points 4z1 a5 ∈45 × . First we show that ROis equicontinuous at all the points 4z1 a5 whenO is an open

set in. Theorem5.1, with1= × , 2=, 3=, O = O, ë = P 1 and A0= 84a1 x5 → Q4C a1 x52 C ∈

B459 ⊆ 4 × 5, implies that the family of functions A1

O= 84x1 a5 →

R

OQ4C a1 x

0_{5P 4dx}0_{x1 a52 C ∈}

B459 is equicontinuous at all the points 4x1 a5 ∈ × . In particular, A1

O⊆4 × 5. Thus, Theorem5.1, with

1=45, 2=, 3=, O = , ë 4B z5 = z4B5, B ∈ B45, z ∈ 45, and A0=A1O, implies that the family

RO is equicontinuous at all the points 4z1 a5 ∈45 × . Second, let O1andO2 be arbitrary open sets in. Then

the families of functionsRO1,RO2, andRO1∪O2 are equicontinuous at all the points 4z1 a5 ∈45 × . Thus, for all z1 z4n5_∈_{45, a1 a}4n5_∈_{, n = 11 21 : : : , such that z}4n5_{→ z weakly and a}4n5_{→ a,}

sup C∈B45 R44O1\O25 × C z 4n5_{1 a}4n5_{5 − R44} O1\O25 × C z1 a5 ≤ sup C∈B45 R44O1∪O25 × C z 4n5_{1 a}4n5_{5 − R44} O1∪O25 × C z1 a5 + sup C∈B45 R4O2× C z 4n5_{1 a}4n5_{5 − R4} O2× C z1 a5 → 01 (34)

that is, the family of functionsRO1\O2 is equicontinuous at all the points 4z1 a5 ∈45 × .

Corollary 5.1. Let assumptions of Lemma5.2hold. Then the stochastic kernel R04dy z1 a5 on given 45 × , defined in formula (5), is continuous in total variation.

Proof. This corollary follows from Lemma5.2applied toO1= and O2= .

Theorem 5.2 (cp. Feinberg et al. [18, Corollary 1.8]). Let be an arbitrary metric space, 8h1 h4n59n=1121 : : :

be Borel-measurable uniformly bounded real-valued functions on_{, 8}4n5₉

n=1121 : : :⊂45 converges in total variation to ∈_{45, and} sup S∈B45 Z S h4n54s54n54ds5 − Z S h4s54ds5 → 01 as n → 0 (35) Then 8h4n5₉

n=1121 : : : converges in probability to h, and therefore there is a subsequence 8h4nk59k=1121 : : : ⊆

8h4n5₉

n=1121 : : : such that 8h4nk59k=1121 : : : converges almost surely to h0

(16)

Proof. Fix an arbitrary > 0 and set

S4n1 +52= 8s ∈2 h4n5_{4s5 − h4s5 ≥ 91} _S4n1 −5_{2= 8s ∈}

2 h4s5 − h4n5_{4s5 ≥ 91}

S4n5_{2= 8s ∈}

2 h4n5_{4s5 − h4s5 ≥ 9 = S}4n1 +5_{∪ S}4n1 −5₁ _{n = 11 21 : : : 0}

Note that for all n = 11 21 : : :

4n54S4n1 +55 ≤ Z S4n1 +5 h4n54s54n54ds5 − Z S4n1 +5 h4s54n54ds5 ≤ Z S4n1 +5h 4n5_4s54n5_{4ds5 −}Z S4n1 +5h4s54ds5 + Z S4n1 +5h4s5 4n5_{4ds5 −}Z S4n1 +5h4s54ds5 0 (36)

The convergence in total variation of 4n5_{to ∈}_{45 implies that}

Z S4n1 +5 h4s54n5_{4ds5 −}Z S4n1 +5 h4s54ds5 → 0 and 4n5_4S4n1 +5_{5 − 4S}4n1 +5₅ → 01 as n → 0 (37) Formula (35) yields R_S4n1 +5h 4n5_4s54n5_{4ds5 −}R

S4n1 +5h4s54ds5 → 0 as n → . Thus, in view of (36) and (37),

4S4n1 +5_{5 → 01} _{as n → 0} ₍₃₈₎

Being applied to the functions 8−h1 −h4n5₉

n=1121 : : :, formula (38) implies that 4S4n1−55 → 0 as n → 0 This fact

and (38) yield 4S4n5_{5 = 4S}4n1 +5_{5 + 4S}4n1 −5_{5 → 0 as n → 0 Since > 0 is arbitrary, 8h}4n5₉

n=1121 : : : converges to

h in probability and, therefore, 8h4n5₉

n=1121 : : : contains a subsequence 8h4nk59k=1121 : : : that converges almost

surely to h.

Lemma 5.3. If the topology on has a countable base b= 8O4j59j=1121 : : : such that, for each finite

intersection _{O =}TN

i=1O4ji5 of its elementsO4ji5∈ b1 i = 11 21 : : : 1 N , the family of functions RO defined in

(32) is equicontinuous at all the points 4z1 a5 ∈45 × , then for any sequence 84z4n5_{1 a}4n5₅₉

n=1121 : : :1 such

that 8z4n5₉

n=1121 : : :⊆45 converges weakly to z ∈ 45 and 8a4n59n=1121 : : : ⊆ converges to a, there exists a

subsequence 84z4nk5_{1 a}4nk5₅₉

k=1121 : : : and a set C ∗_∈

B45 such that

R04C∗ z1 a5 = 1 and H 4· z4nk5_{1 a}4nk5_{1 y5 converges weakly to H 4· z1 a1 y51} _{for all y ∈ C}∗₁ ₍₃₉₎

and, therefore, Assumption(H)holds.

As explained in Feinberg et al. [17, Remark 4.5], the intersection assumption in Lemma 5.3is equivalent to the similar assumption for finite unions.

Proof of Lemma5.3. According to Billingsley [9, Theorem 2.1] or Shiryaev [32, p. 311], (39) holds if there exists a subsequence 84z4nk5_{1 a}4nk5₅₉

k=1121 : : : of the sequence 84z4n51 a4n559n=1121 : : : and a set C ∗_∈

B45 such that

R0_4C∗_{z1 a5 = 1} _and _{lim inf} k→ H 4O z

4nk5_{1 a}4nk5_{1 y5 ≥ H 4}_{O z1 a1 y51} _{for all y ∈ C}∗₁ ₍₄₀₎

for all open setsO in . The rest of the proof establishes the existence of a subsequence 84z4nk5_{1 a}4nk5₅₉

k=1121 : : : of

the sequence 84z4n5_{1 a}4n5₅₉

n=1121 : : : and a set C ∗_∈

B45 such that (40) holds for all open setsO in .

LetA1be a family of all the subsets of that are finite unions of sets from b, and letA2be a family of all

subsets B of such that B = ˜O\O0 _{with ˜}

O ∈ b andO 0_∈

A1. Observe that (i) bothA1 andA2 are countable, (ii)

any open setO in can be represented as O = [ j=1121 : : : O4j1 15₌ [ j=1121 : : : B4j1 151 for someO4j1 15_∈ b1 j = 11 21 : : : 1 (41) where B4j1 15₌_O4j1 15_\4Sj−1

i=1O4i1 155 are disjoint elements ofA2(it is allowed thatO4j1 15= or B4j1 15= for some

j = 11 21 : : : ).

To prove (40) for all open setsO in , we first show that there exists a subsequence 84z4nk5_{1 a}4nk5₅₉

k=1121 : : : of

the sequence 84z4n5_{1 a}4n5₅₉

n=1121 : : : and a set C ∗_∈

B45 such that (40) holds for allO ∈ A20