Technische Universität Ilmenau

Sequential Decision Making - Interactive curriculae of TU Ilmenau

The interactive curriculae provide information on the degree programmes offered by the TU Ilmenau.

Please refer to the respective study and examination rules and regulations for the legally binding curricula (Annex Curriculum).

You can find all details on planned lectures and classes in the course catalogue.

Please note that this page is no longer updated. All modules and study plans from PO version 2021 onwards (Bachelor and Master study programs) are now available on the Campus Portal.

module properties Sequential Decision Making in degree program Master Data Science 2026
module number201205
examination number2400899
departmentDepartment of Mathematics and Natural Sciences
ID of group 2414 (Mathematics of Data Science)
module leaderProf. Dr. Jana de Wiljes
term winter term only
languageEnglish
credit points5
on-campus program (h)45
self-study (h)105
obligationobligatory module
examwritten examination performance, 120 minutes
details of the certificate
link to Moodle course
teacher
signup details for alternative examinations
maximum number of participants
previous knowledge and experience

fundamentals of analysis, linear algebra, probability theory, Python programming or Matlab programming

learning outcome

Upon completing this course, students will be capable of comprehensively grasping the fundamentals of Sequential Learning. They will have the ability to independently derive mathematical estimations of bounds for average loss using common standard methods for bandit problems.

Furthermore, students will be empowered to autonomously implement prevalent algorithms from the realm of reinforcement learning, such as Q-Learning and Monte Carlo Tree Search, and apply them to application data from diverse fields like recommendation systems, gaming, autonomous driving, medicine, and finance. Thus, this course provides a sturdy foundation for independent research in the domain of Sequential Learning.

content

We commence with an inspiring introduction where significant application examples of sequential learning and decision-making within the context of uncertainties are elucidated. Many of these examples will accompany us throughout the entire course, serving as bridges to the real world. Following a brief review of fundamental concepts in statistics, stochastic processes, linear algebra, and numerical methods, we delve into the intricacies of stochastic multivariate bandit problems. We extensively explore a variety of algorithms (e.g., UCB, Thompson Sampling) that are discussed and applied to diverse datasets. Subsequently, a theoretical deepening of the discussed algorithms ensues.

In the second part of the seminar, we transition to more general Markov Decision Processes and discuss state-of-the-art reinforcement learning algorithms that are derived. Furthermore, we introduce established implementations such as Alpha Go/Zero and engage in discussions surrounding them. To conclude the course, all students will be assigned a minor application project.

media of instruction and technical requirements for education and examination in case of online participation

projector, assignments, slides, jupyter notebooks, personal computer with Python or Matlab to work on the programming part of the exercises

literature / references

T. Lattimore and C. Szepesvari (2010): Bandit Algorithms; Cambridge, University Press

P. Auer, N. Cesa-Bianchi and P. Fischer (2002): Finite-time Analysis of the Multiarmed Bandit Problem, Machine Learning, 47, 23556

T. Lai and H. Robbins (1985): Asymptotically efficient adaptive allocation rules; Advances in Applied Mathematics, 6(1) 42

Sean Meyn (2022): Control Systems and Reinforcement Learning

Dimitri P. Bertsekas (2023): A Course in Reinforcement Learning; Athena Scientific

Dimitri P. Bertsekas (2019): Reinforcement Learning and optimal Control; Athena Scientific

 

evaluation of teaching