Udacity Deep Reinforcement Learning course - Multi Agent RL - Collaboration and Competition

This repository contains code that train an agent to solve the environment proposed in the Multi Agent Reinforcement Learning section of the Udacity Deep Reinforcement Learning (DRL) course.

Environment

The environment has 2 agents playing tennis. Each agent has a set of actions and states. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play. The task is episodic, and the environment will be considered solved when the average over 100 episodes of the maximum reward of the agents hits +0.5.

Both the action and the state space are continuous. The state space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.

Getting started

Unity environments

Unity doesn't need to be installed since the environment is already available. The environments can be downloaded from the following links:

Python dependencies

The project uses Python 3.6 and relies on the Udacity Value Based Methods repository. This repository should be cloned, and the instructions on the README should be followed to install the necessary dependencies.

Instructions

The repository contains 2 scripts under the collaboration_competition package: train.py and play.py.

Train

The script train.py can be used to train the agents. The environment has been solved using the Multiple Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. More details can be found in ipynb/report.ipynb.

The script accepts the following arguments:

env-path: path pointing to the Unity Tennis environment
weights-path: path where the agents' NN weights will be stored
experiment-id: path inside weights-path where weights and plots for an experiment will be stored

The algorithm hyperparameters are stored in conf.py to simplify experimentation.

Example:

python train.py --env-path /deep-reinforcement-learning/p3_collab-compet/Tennis_Linux/Tennis.x86_64
--weights-path /repos/deep-reinforcement-learning/p3_collab-compet/weights
--experiment-id maddpg_5

Play

Agents can be used to play! To do so, the play.py script can be used, providing the Unity environment and the agents' weights paths:

python play.py --env-path /home/carlos/cursos/udacity_rl_2023/repos/deep-reinforcement-learning/p2_continuous-control/Reacher_Linux_env2/Reacher.x86_64
--weights-path /home/carlos/cursos/udacity_rl_2023/projects/drl_p2_continous_control/weights

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
collaboration_competition		collaboration_competition
ipynb		ipynb
weights		weights
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Udacity Deep Reinforcement Learning course - Multi Agent RL - Collaboration and Competition

Environment

Getting started

Unity environments

Python dependencies

Instructions

Train

Play

About

Releases

Packages

Languages

gscharly/drl_p3_collaboration_competition

Folders and files

Latest commit

History

Repository files navigation

Udacity Deep Reinforcement Learning course - Multi Agent RL - Collaboration and Competition

Environment

Getting started

Unity environments

Python dependencies

Instructions

Train

Play

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages