Technische Universität Ilmenau

Sequential Decision Making - Interaktive Studienpläne der TU Ilmenau

Die Interaktiven Studienpläne sind ein Informationsangebot zu den Studiengängen der TU Ilmenau.

Die rechtsverbindlichen Studienpläne entnehmen Sie bitte den jeweiligen Studien- und Prüfungsordnungen (Anlage Studienplan).

Alle Angaben zu geplanten Lehrveranstaltungen finden Sie im elektronischen Vorlesungsverzeichnis.

Bitte beachten Sie, dass auf dieser Seite keine Aktualisierungen mehr vorgenommen werden. Alle Module und Studienpläne ab der PO-Version 2021 (Bachelor- und Master-Studiengänge) sind ab sofort im Campus-Portal erreichbar.

Modulinformationen zu Sequential Decision Making im Studiengang Master Mathematik und Wirtschaftsmathematik 2022
Modulnummer201205
Prüfungsnummer2400899
FakultätFakultät für Mathematik und Naturwissenschaften
Fachgebietsnummer 2414 (Mathematics of Data Science)
Modulverantwortliche(r)Prof. Dr. Jana de Wiljes
TurnusWintersemester
SpracheEnglish
Leistungspunkte5
Präsenzstudium (h)45
Selbststudium (h)105
VerpflichtungWahlmodul
Abschlussschriftliche Prüfungsleistung, 120 Minuten
Details zum Abschluss
Link zum Moodle-Kurs
Lehrende
Anmeldemodalitäten für alternative PL oder SL
max. Teilnehmerzahl
Vorkenntnisse

fundamentals of analysis, linear algebra, probability theory, Python programming or Matlab programming

Lernergebnisse und erworbene Kompetenzen

Upon completing this course, students will be capable of comprehensively grasping the fundamentals of Sequential Learning. They will have the ability to independently derive mathematical estimations of bounds for average loss using common standard methods for bandit problems.

Furthermore, students will be empowered to autonomously implement prevalent algorithms from the realm of reinforcement learning, such as Q-Learning and Monte Carlo Tree Search, and apply them to application data from diverse fields like recommendation systems, gaming, autonomous driving, medicine, and finance. Thus, this course provides a sturdy foundation for independent research in the domain of Sequential Learning.

Inhalt

We commence with an inspiring introduction where significant application examples of sequential learning and decision-making within the context of uncertainties are elucidated. Many of these examples will accompany us throughout the entire course, serving as bridges to the real world. Following a brief review of fundamental concepts in statistics, stochastic processes, linear algebra, and numerical methods, we delve into the intricacies of stochastic multivariate bandit problems. We extensively explore a variety of algorithms (e.g., UCB, Thompson Sampling) that are discussed and applied to diverse datasets. Subsequently, a theoretical deepening of the discussed algorithms ensues.

In the second part of the seminar, we transition to more general Markov Decision Processes and discuss state-of-the-art reinforcement learning algorithms that are derived. Furthermore, we introduce established implementations such as Alpha Go/Zero and engage in discussions surrounding them. To conclude the course, all students will be assigned a minor application project.

Medienformen und technische Anforderungen bei Lehr- und Abschlussleistungen in elektronischer Form

projector, assignments, slides, jupyter notebooks, personal computer with Python or Matlab to work on the programming part of the exercises

Literatur

T. Lattimore and C. Szepesvari (2010): Bandit Algorithms; Cambridge, University Press

P. Auer, N. Cesa-Bianchi and P. Fischer (2002): Finite-time Analysis of the Multiarmed Bandit Problem, Machine Learning, 47, 23556

T. Lai and H. Robbins (1985): Asymptotically efficient adaptive allocation rules; Advances in Applied Mathematics, 6(1) 42

Sean Meyn (2022): Control Systems and Reinforcement Learning

Dimitri P. Bertsekas (2023): A Course in Reinforcement Learning; Athena Scientific

Dimitri P. Bertsekas (2019): Reinforcement Learning and optimal Control; Athena Scientific

 

Lehrevaluation