Bellman’s recursion for T = ∞ - Sequential Resource Allocation Under Uncertainty: An Index Poli

Below is the Bellmans’s recursion on value functions for a single-task system when

T = ∞: if l= U and wx = 0: Vx(αx, βx, wx, l) = R(αx, βx); (A.7a) if l= U and wx > 0: Vx(αx, βx, wx, l) = αx αx+ βx Vx(αx+ 1, βx, wx− 1, l)+ βx αx+ βx Vx(αx, βx + 1, wx− 1, l); (A.7b) If l < U: Vx(αx, βx, wx, l) = r qx max al,x∈{0,1}{Vx(αx, βx, wx+ al,x, l + 1) − λl+1al,x} + µwx qx h αx αx+ βx Vx(αx+ 1, βx, wx− 1, l) + βx αx+ βx Vx(αx, βx+ 1, wx− 1, l)i.

BIBLIOGRAPHY

[1] D Adelman and A Mersereau. Relaxations of Weakly Coupled Stochastic Dynamic Programs. Operations Research, 2008.

[2] S Andrad´ottir and S H Kim. Fully Sequential Procedures for Comparing Con- strained Systems via Simulation. Naval Research Logistics, 57(5):403–421, 2010. [3] P Auer, N Cesa-Bianchi, and P Fischer. Finite-time Analysis of the Multiarmed

Bandit Problem. Machine Learning, 2002.

[4] Y Bacharach, T Grapple, T Minka, and J Guiver. How To Grade A Test Without Knowing the Answers - A Bayesian Graphical Model for Adaptive Crowdsourcing And Aptitude Testing. ICML, 2013.

[5] D Batur and S H Kim. Finding Feasible Systems in the Presence of Constraints on Multiple Performance Measures. ACM Transactions on Modeling and Computer Simulation, 20(3):13:1–13:26, 2010.

[6] D A Berry and B Fristedt. Bandit problems: sequential allocation of experiments (Monographs on statistics and applied probability). Springer, 1985.

[7] E Bofinger and G J Lewis. Two-Stage Procedures for Multiple Comparisons

with a Control. American Journal of Mathematical and Management Sciences, 12(4):253–275, 1992.

[8] O Chapelle and L Li. An Empirical Evaluation of Thompson Sampling. NIPS, 2002.

[9] C Chen, D He, M Fu, and L Lee. Efficient Simulation Budget Allocation for Selecting an Optimal Subset. Informs Journal on Computing, 2008.

[10] X Chen, Q Lin, and D Zhou. Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling. arXiv, 2011.

[11] Xi Chen, Qihang Lin, and D Y Zhou. Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing. ICML, 2013.

[12] Xi Chen, Qihang Lin, and Dengyong Zhou. Optimistic Knowledge Gradient Pol- icy for Optimal Budget Allocation in Crowdsourcing. In Proceedings of the 30th International Conference on Machine Learning, pages 64–72, 2013.

[13] H Damerdji and M K Nakayama. Two-Stage Procedures for Multiple Comparisons with a Control in Steady-State Simulations. In Proceedings of the 1996 Winter Simulation Conference, pages 372–375, Piscataway, New Jersey, 1996. Institute of Electrical and Electronics Engineers, Inc.

[14] M H DeGroot. Optimal Statistical Decisions. McGraw Hill, New York, 1970. [15] E J Dudewicz and S R Dalal. Multiple Comparisons with a Control When Vari-

ances Are Unknown and Unequal. American Journal of Mathematics and Man- agement Sciences, 3(4):275–295, 1983.

[16] C W Dunnett. A Multiple Comparison Procedure for Comparing Several

Treatments with a Control. Journal of the American Statistical Association,

50(272):1096–1121, 1955.

[17] E B Dynkin and A A Yushkevich. Controlled Markov Processes. Springer, New York, 1979.

[18] E.B Dynkin and Yushkevish A.A. Controlled Markov Processes. Springer, 1979. [19] P I Frazier. Learning with dynamic programming. In Wiley Encyclopedia of Op-

erations Research and Management Science, pages 1–13, Hoboken, New Jersey, 2011. Wiley.

[20] A Ghosh, S Kale, and P McAfee. Who moderates the moderators? Crowdsourcing Abuse detection in user-generated content. ACM, 2011.

[21] C Gittins, J. Bandit Processes and Dynamic Allocation Indices. Journal of the Royal Statistical Society. Series B, 1979.

[22] J C Gittins. Multi-Armed Bandit Allocation Indices. John Wiley and Sons, New York, 1989.

[23] John Gittins, Kevin Glazebrook, and Richard Weber. Multi-armed Bandit Alloca- tion Indices. Wiley, Hoboken, New Jersey, 2nd edition, 2011.

[24] Y Gocgun and A Ghatey. Lagrangian Relaxation and Constraint Generation For Allocation And Advanced Scheduling. Computer and Operations Research, 2011. [25] J Hawkins. A Lagrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. Ph.D Thesis, Center of Operations Research, MIT, 2002.

[26] Christopher Healey, Sigr´un Andrad´ottir, and Seong-Hee Kim. Selection procedures for simulations with multiple constraints under independent and correlated sampling. ACM Transactions on Modeling and Computer Simulation, 24(3):14:1– 14:25, 2014.

[27] CJ Ho, S Jabbari, and J W Vaughan. Adaptive Task Assignment For Crowdsourced classification. ICML, 2013.

[28] D Karger, S Oh, and D Shah. Budget-Optimal Task Allocation for Reliable Crowd- sourcing Systems. Operations Research, 2013.

[29] D R Karger, S Oh, and D Shah. Iterative Learning for Reliable Crowdsourcing System. NIPS, 2011.

[30] D R Karger, S Oh, and D Shah. Efficient crowdsourcing for multi-class labeling. ACM, 2013.

[31] S H Kim. Comparison with a Standard via Fully Sequential Procedures. ACM Transactions on Modeling and Computer Simulation, 15(2):155–174, 2005. [32] S H Kim and B L Nelson. Recent Advances in Ranking and Selection. In Pro-

ceedings of the 2007 Winter Simulation Conference, pages 162–172, Piscataway, New Jersey, 2007. Institute of Electrical and Electronics Engineers, Inc.

[33] L. W. Koenig and A. M. Law. A procedure for selecting a subset of size m con- taining the l best of k independent normal populations. Comm. Statist.Simulation Comm. 14 719734, 1985.

[34] W S Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28(1):47–65, 1991.

[35] G E Monahan. A survey of partially observable Markov decision processes: The- ory, models, and algorithms. Management Science, 28(1):1–16, 1982.

[36] B L Nelson and D Goldsman. Comparisons with a Standard in Simulation Exper- iments. Management Science, 47(3):449–463, 2001.

[37] E Paulson. On the Comparison of Several Experimental Categories with a Control. The Annals of Mathematical Statistics, 23(2):239–246, 1952.

[38] W B Powell. Approximate Dynamic Programming: Solving the curses of dimensionality. John Wiley and Sons, New York, 2007.

[39] Warren B Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, 2011.

[40] M.L Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Pro- gramming. Wiley, 2008.

[41] H Robbins. A note on gambling systems and birth statistics. American Mathemat- ical Monthly, 1952.

[42] A Ruszczynski. Nonlinear Optimization. Princeton University Press, 206.

[43] R Snow, B OConnor, D Jurafsky, and A Ng. Cheap and Fast But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. EMNLP, 2008. [44] R Szechtman and E Y¨ucesan. A New Perspective on Feasibility Determination.

In Proceedings of the 2008 Winter Simulation Conference, pages 273–280, Piscat- away, New Jersey, 2008. Institute of Electrical and Electronics Engineers, Inc. [45] T L Thanh, M Venanzi, A Rogers, and N R Jennings. Efficient budget allocation

with accuracy guarantees for crowdsourcing classification tasks. ACM, 2013. [46] R Weber and G Weiss. On An Index Policy For Restless Bandits. Journal of

Applied Probability, 1990.

[47] R Weber, R and G Weiss. Addendum to ’On An Index Policy For Restless Bandit’. Advances in Applied Probability, 1991.

[48] P. Whittle. Restless Bandits: Activity Allocation in a Changing World. Journal of Applied Probability, 1988.

[49] J Xie and P I Frazier. Sequential Bayes-Optimal Policies for Multiple Comparisons with a Known Standard. Operations Research, 61(3):1174–1189, 2013.

[50] J. Xie and P.I. Frazier. Upper Bounds on the Bayes-Optimal Procedure for Ranking & Selection with Independent Normal Priors. Proceedings of the 2013 Winter Simulation Conference, 2013.

[51] Y Yan, R Rosales, G Fung, and J Dy. Active Learning From crowds. ICML, 2011. [52] F Ye, H Zhu, and E Zhou. Weakly Coupled Dynamic Program: Information and

In document Sequential Resource Allocation Under Uncertainty: An Index Policy Approach (Page 97-101)