Sequential Decision Making - Interactive curriculae of TU Ilmenau

The interactive curriculae provide information on the degree programmes offered by the TU Ilmenau.

Please refer to the respective study and examination rules and regulations for the legally binding curricula (Annex Curriculum).

You can find all details on planned lectures and classes in the course catalogue.

Please note that this page is no longer updated. All modules and study plans from PO version 2021 onwards (Bachelor and Master study programs) are now available on the Campus Portal.

module properties Sequential Decision Making in degree program Master Data Science 2026
module number	201205
examination number	2400899
department	Department of Mathematics and Natural Sciences
ID of group	2414 (Mathematics of Data Science)
module leader	Prof. Dr. Jana de Wiljes
term	winter term only
language	English
credit points	5
on-campus program (h)	45
self-study (h)	105
obligation	obligatory module
exam	written examination performance, 120 minutes
details of the certificate
link to Moodle course
teacher
signup details for alternative examinations
maximum number of participants
previous knowledge and experience	fundamentals of analysis, linear algebra, probability theory, Python programming or Matlab programming
learning outcome	Upon completing this course, students will be capable of comprehensively grasping the fundamentals of Sequential Learning. They will have the ability to independently derive mathematical estimations of bounds for average loss using common standard methods for bandit problems. Furthermore, students will be empowered to autonomously implement prevalent algorithms from the realm of reinforcement learning, such as Q-Learning and Monte Carlo Tree Search, and apply them to application data from diverse fields like recommendation systems, gaming, autonomous driving, medicine, and finance. Thus, this course provides a sturdy foundation for independent research in the domain of Sequential Learning.
content	We commence with an inspiring introduction where significant application examples of sequential learning and decision-making within the context of uncertainties are elucidated. Many of these examples will accompany us throughout the entire course, serving as bridges to the real world. Following a brief review of fundamental concepts in statistics, stochastic processes, linear algebra, and numerical methods, we delve into the intricacies of stochastic multivariate bandit problems. We extensively explore a variety of algorithms (e.g., UCB, Thompson Sampling) that are discussed and applied to diverse datasets. Subsequently, a theoretical deepening of the discussed algorithms ensues. In the second part of the seminar, we transition to more general Markov Decision Processes and discuss state-of-the-art reinforcement learning algorithms that are derived. Furthermore, we introduce established implementations such as Alpha Go/Zero and engage in discussions surrounding them. To conclude the course, all students will be assigned a minor application project.
media of instruction and technical requirements for education and examination in case of online participation	projector, assignments, slides, jupyter notebooks, personal computer with Python or Matlab to work on the programming part of the exercises
literature / references	T. Lattimore and C. Szepesvari (2010): Bandit Algorithms; Cambridge, University Press P. Auer, N. Cesa-Bianchi and P. Fischer (2002): Finite-time Analysis of the Multiarmed Bandit Problem, Machine Learning, 47, 23556 T. Lai and H. Robbins (1985): Asymptotically efficient adaptive allocation rules; Advances in Applied Mathematics, 6(1) 42 Sean Meyn (2022): Control Systems and Reinforcement Learning Dimitri P. Bertsekas (2023): A Course in Reinforcement Learning; Athena Scientific Dimitri P. Bertsekas (2019): Reinforcement Learning and optimal Control; Athena Scientific
evaluation of teaching