factored mdps

Top PDF factored mdps:

Solving factored MDPs with exponential-family transition models

Solving factored MDPs with exponential-family transition models

Generalized hybrid factored MDPs Discrete-state factored MDPs (Boutilier, Dearden, & Gold- szmidt 1995) permit a compact representation of stochastic decision problems by exploiting their structure. In this sec- tion, we introduce a new formalism for representing hybrid factored MDPs with an exponential-family transition model. This formalism is based on the HMDP framework (Guestrin, Hauskrecht, & Kveton 2004) and generalizes its mixture of beta transition model for continuous variables.

7 Read more

An MCMC Approach to Solving Hybrid Factored MDPs

An MCMC Approach to Solving Hybrid Factored MDPs

To overcome the limitations of the discussed constraint sat- isfaction techniques, we propose a novel Markov chain Monte Carlo (MCMC) method for finding the most violated con- straint of a relaxed HALP. The method directly operates in the domains of continuous variables, takes into account the structure of factored MDPs, and its space complexity is pro- portional to the number of variables. Such a separation ora- cle can be easily embedded into the ellipsoid or cutting plane method for solving linear programs, and therefore constitutes a key step towards solving HALP efficiently.

6 Read more

Piecewise Linear Value Function Approximation for Factored MDPs

Piecewise Linear Value Function Approximation for Factored MDPs

The main drawback of linear models is the need for a good basis set. While these approaches may scale, the quality of the approximation depends critically on the underlying basis. If no decent approximate VF lies in the subspace spanned by the basis, it is impossible to obtain good solutions using such techniques. Unfortunately, in the recent work on linear approximations for factored MDPs, no proposals exist for ei- ther: (a) the choice of a good basis; or (b) the modification of an existing basis to improve decision quality. Studies to date have used simple characteristic functions over (very small) subsets of state variables.
Show more

9 Read more

Efficient Reinforcement Learning in Factored MDPs

Efficient Reinforcement Learning in Factored MDPs

Structured probabilistic models, and particularly Bayesian networks, have revolutionized the field of reasoning under uncertainty by allowing compact representations of complex domains. Their success is built on the fact that this structure can be exploited effectively by inference and learning algo- rithms. This success leads one to hope that similar structure can be exploited in the context of planning and reinforce- ment learning under uncertainty. This paper, together with the recent work on representing and reasoning with factored MDPs [Boutilier et al., 1999], demonstrate that substantial computational gains can indeed be obtained from these com- pact, structured representations.
Show more

8 Read more

An Associative State-Space Metric for Learning in Factored MDPs

An Associative State-Space Metric for Learning in Factored MDPs

In this paper we proposed a new state-space associative metric for factored MDPs that draws inspiration from classical conditioning in nature. Our metric relies on identified associations between state-variables perceived by the learning agent during its interaction with the environment. These associations are learned using a sensory pattern-mining algorithm and determine the similarity between states, thus providing a state-space metric that requires no prior knowledge on the structure of the underlying decision problem. The sensory pattern-mining algorithm relies on the associative sensory tree, that captures the frequency of co-occurrence of stimuli in the agent’s environment.
Show more

12 Read more

Solving Factored MDPs with Continuous and Discrete Variables

Solving Factored MDPs with Continuous and Discrete Variables

We present the first framework that exploits problem struc- ture and solves large hybrid MDPs efficiently. The MDPs are modelled by hybrid factored MDPs, where the stochastic dy- namics is represented compactly by a probabilistic graphical model, a hybrid dynamic Bayesian network (DBN) (Dean & Kanazawa 1989). The solution of the MDP is approximated by a linear combination of basis functions (Bellman, Kalaba, & Kotkin 1963; Bertsekas & Tsitsiklis 1996). Specifically, we use a factored (linear) value function (Koller & Parr 1999), where each basis function depends on a small number of state variables. We show that the weights of this approximation can be optimized using a convex formulation that we call hy-
Show more

6 Read more

Efficient solutions to factored MDPs with imprecise transition probabilities

Efficient solutions to factored MDPs with imprecise transition probabilities

Markov Decision Processes (MDP) [1] have become the de facto standard model for decision-theoretic planning problems and a great deal of research in recent years has aimed to exploit structure in order to compactly represent and efficiently solve factored MDPs [2–5]. However, in many real-world problems, it is simply impossible to obtain a precise representation of the transition probabilities in an MDP. This may occur for many reasons, including (a) imprecise or conflicting elicitations from experts, (b) insufficient data from which to estimate reliable precise transition models, or (c) non-stationary transition probabilities due to insufficient state information.
Show more

30 Read more

Causal Graph Based Decomposition of Factored MDPs

Causal Graph Based Decomposition of Factored MDPs

Markov decision processes, or MDPs, are widely used to model stochastic control tasks. Many researchers have developed algorithms that determine optimal or near-optimal decision policies for MDPs. However, most of these algorithms scale poorly as the size of a task grows. Much recent research on MDPs has focused on finding task structure that makes it possible to simplify construction of a useful policy. In this paper, we present Variable Influence Structure Analysis, or VISA, an algorithm that identifies task structure in factored MDPs and combines hierarchical decomposition and state abstraction to exploit task structure and simplify policy construction. VISA was first introduced in a conference paper (Jonsson and Barto, 2005); this paper provides more detail and additional insights as well as a new section on compact activity models.
Show more

43 Read more

Discovering hidden structure in factored MDPs

Discovering hidden structure in factored MDPs

To our knowledge, there have been no previous attempts to handle identification of dead ends in MDPs. The “Sensitive but Slow” and “Fast but Insensitive” mechanisms were not actually designed specifically for the purpose of identifying dead ends and are unsatisfactory in many ways. One possible reason for this omission may be that most MDPs studied by the Artificial Intelligence and Operations Research communities until recently had no dead ends. However, MDPs with dead ends have been receiving attention in the past few years as researchers realized their probabilistic interestingness [35] . Besides the analogy to EBL, SixthSense can also be viewed as a machine learning algorithm for rule induction, similar in purpose, for example, to CN2 [12] . While this analogy is valid, SixthSense operates under different requirements than most such algorithms, because we demand that SixthSense-derived rules (nogoods) have zero false-positive rate. Last but not least, our term “nogood” shares its name with and closely mirrors the concept from the areas of truth maintenance systems (TMSs) [13] and constraint satisfaction problems (CSPs) [14] . However, our methodology for finding nogoods has little in common with algorithms used in that literature.
Show more

29 Read more

Stochastic dynamic programming with factored representations

Stochastic dynamic programming with factored representations

Dietterich and Flann [32,33] also consider the application of regression methods to the solution of MDPs in the context of reinforcement learning. Their original proposal [32] is restricted to MDPs with goal regions and deterministic actions (represented using S TRIPS -like operators), thus rendering true goal-regression techniques directly applicable. They extend their approach in [33] to allow stochastic actions, thus providing a stochastic generalization of goal regression. One key difference between their model and ours is that they deal exclusively with goal-based problems whereas we allow general reward functions. Thus we might classify their work as stochastic regression and ours as decision- theoretic regression. The general motivation and spirit of their proposal is very similar to ours, but focuses on different representations. In the abstract, Dietterich and Flann simply require operators (actions) that can be inverted, and they develop grid-world navigation and chess end-games as examples of deterministic regression. In the stochastic case, Dietterich and Flann place an emphasis is on algorithms for manipulating rectangular regions of grid worlds. In contrast, our approach deals with general DBN/decision-tree representations of discrete, multi-variable systems. Our decision-tree representation has certain advantages in multi-variable domains (e.g., we will see below that it provides leverage for approximation). In navigation domains (to take one example), the region-based representation is clearly superior as they offer very little structure that can be exploited by a decision tree. Both approaches can be seen as particular instances of a more general approach to regression in MDPs.
Show more

59 Read more

A Simplified Chinese Parser with Factored Model

A Simplified Chinese Parser with Factored Model

In this evaluation, we use TCT Treebank as the developing and experimental data. The Tree- bank uses an annotation scheme with double- tagging (Zhou, 2004). Under this scheme, every sentence is annotated with a complete parse tree, where each non-terminal constituent is assigned with two tags, the syntactic constituent tag and the grammatical relation tag, which also is a new annotation scheme that differs from with head constituents in previous TCT version. In order to fit to this annotation of TCT, we use the unlexi- calized model to do the PCFG parsing and use CKY-based decoder in the Stanford parser. Fi- nally we mainly use TregEx (Levy, 2006), which is a useful tool to visualize and query syntactic structures, to generate a head propagation table applying to the factored model in order to im- prove the performance.
Show more

6 Read more

Factored models for Deep Machine Translation

Factored models for Deep Machine Translation

We contributed mainly in two directions: better analysis with an improved pipeline for Bulgarian, and different more complex types of factored models to explore successful factor combinations. We have experimented with a number of combinations of the listed factors, language model types (word and POS), translation and generation steps. The best performing model featuring a semantic factor for the direction BG→EN includes four factors: word form, lemma, POS and variable type; a word and POS- based language model. In the transfer step, two alternative approaches are used. If possible a mapping
Show more

9 Read more

Interval Iteration Algorithm for MDPs and IMDPs

Interval Iteration Algorithm for MDPs and IMDPs

is satisfied along paths of the IMDP M starting in state s and following policy σ. Regarding the definitions, IMDPs may be seen as an extension of MDPs with an infinite (even uncountable) set of actions, without taking into account the randomisation in policies. This makes their study a priori more complex. However one of the contributions of [13] regarding IMDPs is to show that their behaviour can be captured by finite MDPs. We now explain this reduction that we will use for proofs but not for algorithms since it constructs a finite MDP with a number of actions exponentially larger than the original IMDP. The main idea is to explicit the set of possible choices of probability distributions in Steps(a) for a given action a ∈ A(s). Recall that it consists of all distributions p ∈ Dist(S) such that P
Show more

36 Read more

Factored Upper Bounds for Multiagent Planning Problems under Uncertainty with Non-Factored Value Functions

Factored Upper Bounds for Multiagent Planning Problems under Uncertainty with Non-Factored Value Functions

took 3.31 secs for IO-Q-MMDP and 2696.23s for IO-Q-Dec- POMDP. Fig. 5 shows the results that indicate that the upper bound is relatively tight: the solutions found by TP are not far from the upper bound. In particular, the EAF lies typically between 1.4 and 1.7, thus demonstrating that IO-UBs can pro- vide firm guarantees for solutions of factored Dec-POMDPs with up to 700 agents. Moreover, we see that we see that the EAF stays roughly constant for the larger problem instances indicating that relative guarantees do not degrade as the num- ber of agents increase.

7 Read more

Factored Models for Probabilistic Modal Logic

Factored Models for Probabilistic Modal Logic

Abstract Modal logic represents knowledge that agents have about other agents’ knowledge. Probabilistic modal logic fur- ther captures probabilistic beliefs about probabilistic beliefs. Models in those logics are useful for understanding and de- cision making in conversations, bargaining situations, and competitions. Unfortunately, probabilistic modal structures are impractical for large real-world applications because they represent their state space explicitly. In this paper we scale up probabilistic modal structures by giving them a factored representation. This representation applies conditional inde- pendence for factoring the probabilistic aspect of the structure (as in Bayesian Networks (BN)). We also present two exact and one approximate algorithm for reasoning about the truth value of probabilistic modal logic queries over a model en- coded in a factored form. The first exact algorithm applies inference in BNs to answer a limited class of queries. Our second exact method applies a variable elimination scheme and is applicable without restrictions. Our approximate al- gorithm uses sampling and can be used for applications with very large models. Given a query, it computes an answer and its confidence level efficiently.
Show more

7 Read more

Factored Translation with Unsupervised Word Clusters

Factored Translation with Unsupervised Word Clusters

Based on the hypothesis that the factorisations are beneficial when translation some sentences, and not when translating others, we completed an oracle- based evaluation, in which we assume to know a pri- ori whether to use the factored model for translating a given sentence, or just go with the baseline, unfac- tored model. In reality, we don’t have such an or- acle method for arbitrary sentences, but when deal- ing with the shared task test set (or other corpora for which we have reference translations), it was easy enough to check per-sentence BLEU scores for each model and make the decision based on a comparison. Table 1b lists BLEU scores obtainable with each factor configuration given such an oracle method. In this scenario, most factored models beat the baseline, indicating that the factorisations are beneficial for certain sentences, and detrimental for others.
Show more

5 Read more

Factored Planning: How, When, and When Not

Factored Planning: How, When, and When Not

The idea of divide and conquer through domain decomposi- tion has always appealed to planning researchers. In this pa- per we provided a formal study of some of the fundamental questions factored planning brings up. This study resulted in a number of key results and insights. First, it provides a novel factored planning approach that is more efficient than the best previous method of (Amir & Engelhardt 2003). Sec- ond, it identifies the domain’s causal graph as one of the key parameters in the complexity of factored and non-factored planning. Third, the complexity analysis provided enables us to compare between the complexity of standard and fac- tored methods, and provides new classes of tractable plan- ning problems. As we noted, these tractable classes appear to be of genuine practical interest, which has not often been the case for past results on tractable planning. Finally, our analysis helps to understand what makes one factorization better than another, and makes a concrete recommendation on how to factor a problem domain both in presence and in absence of additional domain knowledge.
Show more

6 Read more

Dec-POMDPs as Non-Observable MDPs

Dec-POMDPs as Non-Observable MDPs

Letting δ¯ denote the joint decision rules that map information states to such augmented joint actions, we can define a plan-time model where the transition function no longer depends on[r]

16 Read more

Factored Markov Translation with Robust Modeling

Factored Markov Translation with Robust Modeling

5.5 Comparison with Lexical Reordering Our Markov model learns a joint model of jump, source and target factors and this is similar to the lexical reordering model of Moses (Koehn et al.[r]

9 Read more

Reinforcement Learning with Factored States and Actions

Reinforcement Learning with Factored States and Actions

First, we implemented the direct policy algorithm of Peshkin et al. (2000). This algorithm is designed to learn policies for MDPs on factored state and action spaces. To parameterize the policy, we used a feed-forward neural network with one hidden layer. The number of hidden units was cho- sen to match the number of hidden variables in the competing restricted Boltzmann machine. The output layer of the neural network consisted of a softmax unit for each action variable, which gave the probability of executing each value for that action variable. For example, if an action variable has four possible values, then there are separate inputs (weights and activations) entering the output unit for each of the four possible values. The output unit produces four normalized probabilities by first exponentiating the values, and then normalizing by the sum of the four exponentiated values.
Show more

26 Read more

Show all 118 documents...