Difference between revisions of "BSTE:ReinforcmentLearning"

From IU
Jump to navigation Jump to search
Line 68: Line 68:
 
== Course Sections ==
 
== Course Sections ==
 
The main sections of the course and approximate hour distribution between them is as follows:
 
The main sections of the course and approximate hour distribution between them is as follows:
  +
=== Section 1 ===
  +
  +
==== Section title ====
  +
Fundamentals of Reinforcement Learning
  +
  +
==== Topics covered in this section ====
  +
* Sequential Decision Making
  +
* Markov Decision Processes
  +
* Value Functions & Bellman Equations
  +
* Dynamic Programming for Value Function
  +
  +
==== What forms of evaluation were used to test students’ performance in this section? ====
  +
{| class="wikitable"
  +
|+
  +
|-
  +
! Form !! Yes/No
  +
|-
  +
| Development of individual parts of software product code || 1
  +
|-
  +
| Homework and group projects || 1
  +
|-
  +
| Midterm evaluation || 0
  +
|-
  +
| Testing (written or computer based) || 1
  +
|-
  +
| Reports || 0
  +
|-
  +
| Essays || 0
  +
|-
  +
| Oral polls || 0
  +
|-
  +
| Discussions || 1
  +
|}
  +
  +
==== Typical questions for ongoing performance evaluation within this section ====
  +
# What is sequential decision making?
  +
# What is exploration vs. exploitation trade-off in sequential decision making?
  +
# What are Markov Decision Processes?
  +
# What is the difference between episodic and continuous tasks?
  +
# What are policies, value functions and Bellman equations?
  +
# How to use dynamic programming to compute value functions and optimal policies?
  +
  +
==== Typical questions for seminar classes (labs) within this section ====
  +
# What are the strengths and weaknesses of different exploration algorithms?
  +
# What is an epsilon greedy agent?
  +
# How to translate a real-world problem into a Markov Decision Process?
  +
# Why Bellman equations?
  +
# What is generalized policy iterations?
  +
  +
==== Tasks for midterm assessment within this section ====
  +
# Suppose you are given two action-value functions corresponding to the action-value function of an arbitrary, fixed policy under the two reward functions. Using the Bellman equation, explain if it is possible or not to combine these value functions in a simple manner to obtain a new action-value function corresponding to a single reward function r.
  +
  +
==== Test questions for final assessment in this section ====
  +
# How to implement incremental algorithms for estimating action-values?
  +
# How to implement and test an epsilon-greedy agent?
  +
# Create an example of your own that will fit into Markov Decision Processes framework
  +
# How to use optimal values functions to get optimal policies?
  +
# How to implement an efficient dynamic programming agent?

Revision as of 12:41, 18 November 2021

Reinforcement Learning

  • Course name: Reinforcement Learning

Course Characteristics

Key concepts of the class

  • Fundamentals of Reinforcement Learning
  • Sample-based Learning Methods
  • Prediction and Control with Function Approximation

What is the purpose of this course?

Harnessing the full potential of artificial intelligence requires adaptive learning systems. Reinforcement learning (RL) is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare.

Course objectives based on Bloom’s taxonomy

- What should a student remember at the end of the course?

By the end of the course, the students should be able to

  • Markov Decision Processes
  • Exploration vs. Exploitation
  • Value Functions
  • Temporal-difference Learning
  • Q-learning
  • Expected Sarsa
  • Actor-Critic

- What should a student be able to understand at the end of the course?

By the end of the course, the students should be able to

  • How to build an RL system for sequential decision making
  • How to formalize a task as an RL problem
  • the space of RL algorithms

- What should a student be able to apply at the end of the course?

By the end of the course, the students should be able to

  • RL for solving real-world problems
  • TD-algorithms for estimating value functions
  • Expected Sarsa and Q-Learning
  • Actor-Critic Method

Course evaluation

Course grade breakdown
Type Points
Labs/seminar classes 20
Interim performance assessment 50
Exams 30

Grades range

Course grading range
Grade Points
A. Excellent [85, 100]
B. Good [70, 84]
C. Satisfactory [55, 69]
D. Poor [0, 54]

Resources and reference material

  • Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition.
  • Reinforcement Learning: State-of-the-Art, Marco Wiering and Martijn van Otterlo, Eds

Course Sections

The main sections of the course and approximate hour distribution between them is as follows:

Section 1

Section title

Fundamentals of Reinforcement Learning

Topics covered in this section

  • Sequential Decision Making
  • Markov Decision Processes
  • Value Functions & Bellman Equations
  • Dynamic Programming for Value Function

What forms of evaluation were used to test students’ performance in this section?

Form Yes/No
Development of individual parts of software product code 1
Homework and group projects 1
Midterm evaluation 0
Testing (written or computer based) 1
Reports 0
Essays 0
Oral polls 0
Discussions 1

Typical questions for ongoing performance evaluation within this section

  1. What is sequential decision making?
  2. What is exploration vs. exploitation trade-off in sequential decision making?
  3. What are Markov Decision Processes?
  4. What is the difference between episodic and continuous tasks?
  5. What are policies, value functions and Bellman equations?
  6. How to use dynamic programming to compute value functions and optimal policies?

Typical questions for seminar classes (labs) within this section

  1. What are the strengths and weaknesses of different exploration algorithms?
  2. What is an epsilon greedy agent?
  3. How to translate a real-world problem into a Markov Decision Process?
  4. Why Bellman equations?
  5. What is generalized policy iterations?

Tasks for midterm assessment within this section

  1. Suppose you are given two action-value functions corresponding to the action-value function of an arbitrary, fixed policy under the two reward functions. Using the Bellman equation, explain if it is possible or not to combine these value functions in a simple manner to obtain a new action-value function corresponding to a single reward function r.

Test questions for final assessment in this section

  1. How to implement incremental algorithms for estimating action-values?
  2. How to implement and test an epsilon-greedy agent?
  3. Create an example of your own that will fit into Markov Decision Processes framework
  4. How to use optimal values functions to get optimal policies?
  5. How to implement an efficient dynamic programming agent?