Time: | 9 Mondays: 13.00 - 14.45 (November 14 - December 12 & January 30 - February 20). |
Location: | Room 107A, Buys Ballot Laboratorium, Princetonlaan 4, The Uithof, Utrecht. |
Lecturers: | Dr. S. Bhulai (VU) and Prof.dr. L.C.M. Kallenberg (UL/UM). |
Course description:
The theory of Markov decision processes (MDPs) - also known under the
names sequential decision theory, stochastic control or stochastic
dynamic programming - studies sequential optimization of stochastic
systems by controlling their transition mechanism over time. Each
control policy defines a stochastic process and values of objective
functions associated with this process. The goal is to select a control
policy that optimizes a function of the values generated by the utility
functions.
In real life, decisions that are made usually have two types of impact.
Firstly, they cost or save resources, such as money or time. Secondly,
by influencing the dynamics of the system they have an impact on the
future as well. Therefore, the decision with the largest immediate
profit may not be good in view of future rewards in many situations.
MDPs model this paradigm and provide results on the structure and
existence of good policies, and on methods for the computation of
optimal policies.
Contents of the lectures:
1. Model formulation, policies, optimality criteria, the finite
horizon.
2. Discounted rewards: optimality equation and solution methods.
3. Average rewards: optimality equation and solution methods.
4. Modelling issues in MDPs.
5. Structural properties.
6. More sensitive rewards, part 1: Laurent expansion and solution
methods.
7. More sensitive rewards, part 2: Blackwell optimality.
8. Applications of MDPs.
9. Further topics in MDPs
Literature and examination:
Sheets including exercises of part II (Kallenberg) are weekly available
as pdf file.
The sheets and exercises of week 1 (January 30) : Sheets
and exercises week 1
The sheets and exercises of week 2 (February 6) : Sheets
and exercises week 2
For the paper by Avrachenkov and Altman see: Nested
linear programs for ergodic MDPs.
The sheets and exercises of week 3 (February 13) : Sheets
and exercises week 3
The sheets and exercises of week 4 (February 20) : Sheets
and exercises week 4
The sheets and exercises of part II of the course (4 weeks; January 30
until February 20) : Sheets
Prerequisites:
- Elementary knowledge of linear programming (e.g. K.G. Murty, Linear
programming, Wiley, 1983).
- Elementary knowledge of probability theory ( e.g. S.M. Ross, A first
course in probability, Macmillan, New York, 1976).
- Elementary knowledge of (numerical) analysis (e.g. Banach space;
contracting mappings; Newton’s method; Laurent series).
Examination:
Take home problems.
There are two options:
1. You make the exercises alone: in that case you choose each week two
exercises out of three exercises which are given each week (see the
sheets).
2. You make the exercises as couple: in that case you have to make the
three exercises which are given each week (see the
sheets).
Address of the lecturers:
Dr. S. Bhulai
Faculty of Mathematics and Computer Science
Free University Amsterdam, De Boelelaan 1081a, 1081 HV Amsterdam
Phone: 020 – 4447755
E-mail: sbhulai@few.vu.nl
Prof.dr. L.C.M.
Kallenberg
Mathematical Institute
Leiden University, P.O. Box 9512, 2300 RA Leiden
Phone: 071-5277130
E-mail: kallenberg@math.leidenuniv.nl