This repository contains the courseworks I completed for my MSc module COMP0089 Reinforcement Learning.
-
Multi-armed Bernoulli Bandit Problem
- Implemented several agents with the following algorithms:
- UCB
- Greedy
-
$\epsilon$ -greedy - Policy gradient (REINFORCE)
- Implemented several agents with the following algorithms:
-
Markov Decision Process
- Implementd several RL algorithms for a MDP:
- Tabular TD learning
- Policy iteration
- Value iteration
- Analysed a MDP
- Implementd several RL algorithms for a MDP:
-
Actor-Critics
- Implemented a deep RL agent using
jax
.
- Implemented a deep RL agent using
-
Off-Policy Learning
- Implemented several off-policy multi-step return estimates:
- Full importance sampling
- Per-decision importance sampling (PDIS)
- PDIS with control variates
- PDIS with control variates and adaptive bootstrapping
- Analysed the convergence and variance properties of a proposed TD error
- Implemented several off-policy multi-step return estimates:
- Python
- NumPy
- Jax
Requirement: python=3.11
pip install -r requirements.txt