• No results found

Bandit Problems

The consequences of behavioural bias: Bandit problems and product liability law

The consequences of behavioural bias: Bandit problems and product liability law

... contexts, bandit problems and the case o f legal decision ...to bandit problems, the focus o f interest is to examine the role o f risk aversion and loss aversion, which are both excluded from ...

243

Optimistic Bayesian Sampling in Contextual-Bandit Problems

Optimistic Bayesian Sampling in Contextual-Bandit Problems

... At the other end of the spectrum, in belief-lookahead methods, such as those suggested by Gittins (1979), a fully Bayesian approach is incorporated in which the action yielding the highest expected cumulative reward over ...

38

MODIFIED ACTION VALUE METHOD APPLIED TO ‘n’-ARMED BANDIT PROBLEMS USING REINFORCEMENT LEARNING

MODIFIED ACTION VALUE METHOD APPLIED TO ‘n’-ARMED BANDIT PROBLEMS USING REINFORCEMENT LEARNING

... Reinforcement Learning (RL) is an area of Artificial Intelligence (AI) concerned with how an agent should take actions in a stochastic environment so as to optimize a cumulative reward signal. This paper investigates a ...

7

Klein, Nicolas
  

(2010):


	Learning and Experimentation in Strategic Bandit Problems.


Dissertation, LMU München: Volkswirtschaftliche Fakultät

Klein, Nicolas (2010): Learning and Experimentation in Strategic Bandit Problems. Dissertation, LMU München: Volkswirtschaftliche Fakultät

... Bandit problems have been used in economics to study the trade-off between experimentation and exploitation since Rothschild’s (1974) discrete-time single-agent ...two-armed bandit machines, as well ...

149

Optimal Policies for Observing Time Series and Related Restless Bandit Problems

Optimal Policies for Observing Time Series and Related Restless Bandit Problems

... Restless Bandit Approach to Multi-Target ...multi-armed bandit approach to the multi-target tracking problem, but they did not pursue the Whit- tle index approach, rather focussing on trying to find ...

93

Mechanisms with learning for stochastic multi armed bandit problems

Mechanisms with learning for stochastic multi armed bandit problems

... multi-armed bandit (MAB) problem is a widely studied problem in machine learning litera- ture in the context of online ...of problems namely stochastic MAB problems where the rewards are ...MAB ...

44

Approximations of the Restless Bandit Problem

Approximations of the Restless Bandit Problem

... multi-armed bandit problems arise in various modern real-world applications, such as online advertisement, and Internet ...These problems are typically studied under the assumption that the pay-offs ...

37

BinaryBandit:An Efficient Julia Package for Optimization and Evaluation of the Finite Horizon Bandit Problem with Binary Responses

BinaryBandit:An Efficient Julia Package for Optimization and Evaluation of the Finite Horizon Bandit Problem with Binary Responses

... of bandit problems with more than two arms, such as the Gittins and Whittle index rules, and to cover randomized designs such as those common in the biostatistics literature for adaptive clinical ...

15

Exploration vs Exploitation with Partially Observable Gaussian Autoregressive Arms

Exploration vs Exploitation with Partially Observable Gaussian Autoregressive Arms

... This paper provides a starting point for a rigorous investi- gation of the structural properties and performance of index policies in partially observable restless bandit problems with AR(1) arms. This ...

8

Parallelizing Exploration-Exploitation Tradeoffs in Gaussian Process Bandit Optimization

Parallelizing Exploration-Exploitation Tradeoffs in Gaussian Process Bandit Optimization

... contextual bandit problems with finite deci- sion sets, and thus not to settings with complex (even nonparametric) payoff ...sequential bandit algorithms to the delayed, finite decision set ...

51

Optimizing Adaptive Marketing Experiments with the Multi-Armed Bandit

Optimizing Adaptive Marketing Experiments with the Multi-Armed Bandit

... multi-armed bandit problems in ...a bandit problem that does not have an existing solution framework, propose such a solution ...cated bandit problem with many components: attributes ...

148

Regret Bounds and Minimax Policies under Partial Monitoring

Regret Bounds and Minimax Policies under Partial Monitoring

... We reduce the above gaps by improving the upper bounds as shown by Table 2. Different proof techniques are used and new forecasting strategies are proposed. The most original contribution is the introduction of a new ...

52

Counterfactual Learning from Bandit Feedback under Deterministic Logging : A Case Study in Statistical Machine Translation

Counterfactual Learning from Bandit Feedback under Deterministic Logging : A Case Study in Statistical Machine Translation

... deterministic bandit logs is possible despite these seemingly contradic- tory theoretical ...simulated bandit feedback for two different SMT tasks, showing improvements of up to 2 BLEU in SMT domain ...

11

On Bandit Organizations and Their (IL)Legitimacy: Concept Development and Illustration

On Bandit Organizations and Their (IL)Legitimacy: Concept Development and Illustration

... that bandit organizations foster a specific perception of the political and economic establishment – as illegitimate – to build their own flows of ...how bandit organizations form field level ...

45

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

... Part I. Let us consider the following three hypothesis testing problems. For each problem, we are interested in a δ-correct policy, i.e., a policy whose probability of error is less than δ under any hypothesis. We ...

26

Bandit Structured Prediction for Neural Sequence to Sequence Learning

Bandit Structured Prediction for Neural Sequence to Sequence Learning

... (2016). Bandit Learning. The NMT bandit models that optimize the EL objective yield generally a much higher improvement over the out-of-domain mod- els than the corresponding linear models: As listed in ...

11

Kernel Estimation and Model Combination in A Bandit Problem with Covariates

Kernel Estimation and Model Combination in A Bandit Problem with Covariates

... of bandit problem: for every visitor interaction event, only one article is displayed, and we only have this visitor’s response to the displayed article, while his/her response to other articles is not available, ...

37

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

... Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we focus on speeding up ...

52

LIMSI Submission for WMT’17 Shared Task on Bandit Learning

LIMSI Submission for WMT’17 Shared Task on Bandit Learning

... This paper describes LIMSI participation to the WMT’17 shared task on Bandit Learning. The method we propose to adapt a seed system trained on out-domain data to a new, unknown domain relies on two components. ...

6

A multi-armed bandit approach for exploring partially observed networks

A multi-armed bandit approach for exploring partially observed networks

... Active search on graphs (Wang et al. 2013; Bilgic et al. 2010) is another related problem with the objective of finding as much target nodes as possible possessing a given prop- erty. Most of the previous work relating ...

18

Show all 10000 documents...

Related subjects