Course MDP: Markov Decision Processes

Time:	9 Mondays: 13.00 - 14.45 (November 14 - December 12 & January 30 - February 20).
Location:	Room 107A, Buys Ballot Laboratorium, Princetonlaan 4, The Uithof, Utrecht.
Lecturers:	Dr. S. Bhulai (VU) and Prof.dr. L.C.M. Kallenberg (UL/UM).

Course description:
The theory of Markov decision processes (MDPs) - also known under the names sequential decision theory, stochastic control or stochastic dynamic programming - studies sequential optimization of stochastic systems by controlling their transition mechanism over time. Each control policy defines a stochastic process and values of objective functions associated with this process. The goal is to select a control policy that optimizes a function of the values generated by the utility functions.
In real life, decisions that are made usually have two types of impact. Firstly, they cost or save resources, such as money or time. Secondly, by influencing the dynamics of the system they have an impact on the future as well. Therefore, the decision with the largest immediate profit may not be good in view of future rewards in many situations. MDPs model this paradigm and provide results on the structure and existence of good policies, and on methods for the computation of optimal policies.
Contents of the lectures:
1. Model formulation, policies, optimality criteria, the finite horizon.
2. Discounted rewards: optimality equation and solution methods.
3. Average rewards: optimality equation and solution methods.
4. Modelling issues in MDPs.
5. Structural properties.
6. More sensitive rewards, part 1: Laurent expansion and solution methods.
7. More sensitive rewards, part 2: Blackwell optimality.
8. Applications of MDPs.
9. Further topics in MDPs

Literature and examination:
Sheets including exercises of part II (Kallenberg) are weekly available as pdf file.
The sheets and exercises of week 1 (January 30) : Sheets and exercises week 1
The sheets and exercises of week 2 (February 6) : Sheets and exercises week 2
For the paper by Avrachenkov and Altman see: Nested linear programs for ergodic MDPs.
The sheets and exercises of week 3 (February 13) : Sheets and exercises week 3
The sheets and exercises of week 4 (February 20) : Sheets and exercises week 4
The sheets and exercises of part II of the course (4 weeks; January 30 until February 20) : Sheets

Prerequisites:
- Elementary knowledge of linear programming (e.g. K.G. Murty, Linear programming, Wiley, 1983).
- Elementary knowledge of probability theory ( e.g. S.M. Ross, A first course in probability, Macmillan, New York, 1976).
- Elementary knowledge of (numerical) analysis (e.g. Banach space; contracting mappings; Newton’s method; Laurent series).

Examination:
Take home problems.
There are two options:
1. You make the exercises alone: in that case you choose each week two exercises out of three exercises which are given each week (see the sheets).
2. You make the exercises as couple: in that case you have to make the three exercises which are given each week (see the sheets).

Address of the lecturers:
Dr. S. Bhulai
Faculty of Mathematics and Computer Science
Free University Amsterdam, De Boelelaan 1081a, 1081 HV Amsterdam
Phone: 020 – 4447755
E-mail: sbhulai@few.vu.nl

Prof.dr. L.C.M. Kallenberg
Mathematical Institute
Leiden University, P.O. Box 9512, 2300 RA Leiden
Phone: 071-5277130
E-mail: kallenberg@math.leidenuniv.nl

Last modified: February 15, 2006