diff --git a/README.md b/README.md index b91fc42f..ea9b3da6 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,8 @@ -
- -
+[comment]: <> (
) + +[comment]: <> () + +[comment]: <> (
)

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library

@@ -10,10 +12,9 @@ [![GitHub issues](https://img.shields.io/github/issues/Replicable-MARL/MARLlib)](https://github.com/Replicable-MARL/MARLlib/issues) [![PyPI version](https://badge.fury.io/py/marllib.svg)](https://badge.fury.io/py/marllib) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Replicable-MARL/MARLlib/blob/sy_dev/marllib.ipynb) -[![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html) [![Organization](https://img.shields.io/badge/Organization-ReLER_RL-blue.svg)](https://github.com/Replicable-MARL/MARLlib) [![Organization](https://img.shields.io/badge/Organization-PKU_MARL-blue.svg)](https://github.com/Replicable-MARL/MARLlib) - +[![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html) > __News__: > We are excited to announce that a major update has just been released. For detailed version information, please refer to the [version info](https://github.com/Replicable-MARL/MARLlib/releases/tag/1.0.2). @@ -55,7 +56,7 @@ Here we provide a table for the comparison of MARLlib and existing work. | [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | 4 cooperative | 1 | share + separate | MLP + GRU | :x: | | [MAlib](https://github.com/sjtu-marl/malib) | 4 self-play | 10 | share + group + separate | MLP + LSTM | [![Documentation Status](https://readthedocs.org/projects/malib/badge/?version=latest)](https://malib.readthedocs.io/en/latest/?badge=latest) | [EPyMARL](https://github.com/uoe-agents/epymarl)| 4 cooperative | 9 | share + separate | GRU | :x: | -| **[MARLlib](https://github.com/Replicable-MARL/MARLlib)** | 11 **no task mode restriction** | 18 | share + group + separate + **customizable** | MLP + CNN + GRU + LSTM | [![Documentation Status](https://readthedocs.org/projects/marllib/badge/?version=latest)](https://marllib.readthedocs.io/en/latest/) | +| **[MARLlib](https://github.com/Replicable-MARL/MARLlib)** | 12 **no task mode restriction** | 18 | share + group + separate + **customizable** | MLP + CNN + GRU + LSTM | [![Documentation Status](https://readthedocs.org/projects/marllib/badge/?version=latest)](https://marllib.readthedocs.io/en/latest/) | | Library | Github Stars | Documentation | Issues Open | Activity | Last Update |:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:| @@ -108,7 +109,7 @@ First, install MARLlib dependencies to guarantee basic usage. following [this guide](https://marllib.readthedocs.io/en/latest/handbook/env.html), finally install patches for RLlib. ```bash -$ conda create -n marllib python=3.8 +$ conda create -n marllib python=3.8 # or 3.9 $ conda activate marllib $ git clone https://github.com/Replicable-MARL/MARLlib.git && cd MARLlib $ pip install -r requirements.txt @@ -185,6 +186,7 @@ Most of the popular environments in MARL research are supported by MARLlib: | **[GRF](https://github.com/google-research/football)** | collaborative + mixed | Full | Discrete | 2D | | **[Hanabi](https://github.com/deepmind/hanabi-learning-environment)** | cooperative | Partial | Discrete | 1D | | **[MATE](https://github.com/XuehaiPan/mate)** | cooperative + mixed | Partial | Both | 1D | +| **[GoBigger](https://github.com/opendilab/GoBigger)** | cooperative + mixed | Both | Continuous | 1D | Each environment has a readme file, standing as the instruction for this task, including env settings, installation, and important notes. @@ -320,7 +322,11 @@ More tutorial documentations are available [here](https://marllib.readthedocs.io ## Awesome List -A collection of research and review papers of multi-agent reinforcement learning (MARL) is available [here](https://marllib.readthedocs.io/en/latest/resources/awesome.html). The papers have been organized based on their publication date and their evaluation of the corresponding environments. +A collection of research and review papers of multi-agent reinforcement learning (MARL) is available. The papers have been organized based on their publication date and their evaluation of the corresponding environments. + +Algorithms: [![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html) +Environments: [![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/handbook/env.html) + ## Community diff --git a/ROADMAP.md b/ROADMAP.md index 3ead2415..eed984ae 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -11,7 +11,8 @@ This list describes the planned features including breaking changes. - [ ] manual training, refer to issue: https://github.com/Replicable-MARL/MARLlib/issues/86#issuecomment-1468188682 - [ ] new environments - [x] MATE: https://github.com/UnrealTracking/mate - - [ ] Go-Bigger: https://github.com/opendilab/GoBigger + - [x] Go-Bigger: https://github.com/opendilab/GoBigger - [ ] Voltage Control: https://github.com/Future-Power-Networks/MAPDN - [ ] Overcooked: https://github.com/HumanCompatibleAI/overcooked_ai -- [ ] Support Transformer architecture + - [ ] CloseAirCombat: https://github.com/liuqh16/CloseAirCombat +- [ ] Support Transformers diff --git a/docs/source/handbook/env.rst b/docs/source/handbook/env.rst index f3dcfd6c..baec9148 100644 --- a/docs/source/handbook/env.rst +++ b/docs/source/handbook/env.rst @@ -594,4 +594,52 @@ Installation .. code-block:: shell - pip3 install git+https://github.com/XuehaiPan/mate.git#egg=mate \ No newline at end of file + pip3 install git+https://github.com/XuehaiPan/mate.git#egg=mate + + +.. _GoBigger: + +GoBigger +============== +.. only:: html + + .. figure:: images/env_gobigger.gif + :width: 320 + :align: center + + +GoBigger is a game engine that offers an efficient and easy-to-use platform for agar-like game development. It provides a variety of interfaces specifically designed for game AI development. The game mechanics of GoBigger are similar to those of Agar, a popular massive multiplayer online action game developed by Matheus Valadares of Brazil. The objective of GoBigger is for players to navigate one or more circular balls across a map, consuming Food Balls and smaller balls to increase their size while avoiding larger balls that can consume them. Each player starts with a single ball, but can divide it into two when it reaches a certain size, giving them control over multiple balls. +Official Link: https://github.com/opendilab/GoBigger + +.. list-table:: + :widths: 25 25 + :header-rows: 0 + + * - ``Original Learning Mode`` + - Cooperative + Mixed + * - ``MARLlib Learning Mode`` + - Cooperative + Mixed + * - ``Observability`` + - Partial + Full + * - ``Action Space`` + - Continuous + * - ``Observation Space Dim`` + - 1D + * - ``Action Mask`` + - No + * - ``Global State`` + - No + * - ``Global State Space Dim`` + - / + * - ``Reward`` + - Dense + * - ``Agent-Env Interact Mode`` + - Simultaneous + + +Installation +----------------- + +.. code-block:: shell + + conda install -c opendilab gobigger \ No newline at end of file diff --git a/docs/source/images/env_gobigger.gif b/docs/source/images/env_gobigger.gif new file mode 100644 index 00000000..918f74fa Binary files /dev/null and b/docs/source/images/env_gobigger.gif differ diff --git a/marllib/envs/base_env/__init__.py b/marllib/envs/base_env/__init__.py index c06d11be..c917fff3 100644 --- a/marllib/envs/base_env/__init__.py +++ b/marllib/envs/base_env/__init__.py @@ -88,3 +88,9 @@ except Exception as e: ENV_REGISTRY["mate"] = str(e) +try: + from marllib.envs.base_env.gobigger import RLlibGoBigger + ENV_REGISTRY["gobigger"] = RLlibGoBigger +except Exception as e: + ENV_REGISTRY["gobigger"] = str(e) + diff --git a/marllib/envs/base_env/config/gobigger.yaml b/marllib/envs/base_env/config/gobigger.yaml new file mode 100644 index 00000000..aee7b3c5 --- /dev/null +++ b/marllib/envs/base_env/config/gobigger.yaml @@ -0,0 +1,33 @@ +# MIT License + +# Copyright (c) 2023 Replicable-MARL + +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. + +env: gobigger + +env_args: + map_name: "st_t1p2" # st(andard)_t(eam)2p(layer)2 + #num_teams: 1 + #num_agents: 2 + frame_limit: 1600 +mask_flag: False +global_state_flag: False +opp_action_in_cc: True +fixed_batch_timesteps: 3200 # optional, all scenario will use this batch size, only valid for on-policy algorithms diff --git a/marllib/envs/base_env/gobigger.py b/marllib/envs/base_env/gobigger.py new file mode 100644 index 00000000..dbbb003a --- /dev/null +++ b/marllib/envs/base_env/gobigger.py @@ -0,0 +1,202 @@ +# MIT License + +# Copyright (c) 2023 Replicable-MARL + +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. + +import copy + +from gobigger.envs import create_env_custom +from gym.spaces import Dict as GymDict, Box +from ray.rllib.env.multi_agent_env import MultiAgentEnv +import numpy as np + + +policy_mapping_dict = { + "all_scenario": { + "description": "mixed scenarios to t>2 (num_teams > 1)", + "team_prefix": ("team0_", "team1_"), + "all_agents_one_policy": True, + "one_agent_one_policy": True, + }, +} + + +class RLlibGoBigger(MultiAgentEnv): + + def __init__(self, env_config): + + map_name = env_config["map_name"] + + env_config.pop("map_name", None) + self.num_agents_per_team = int(map_name.split("p")[-1][0]) + self.num_teams = int(map_name.split("_t")[1][0]) + if self.num_teams == 1: + policy_mapping_dict["all_scenario"]["team_prefix"] = ("team0_",) + self.num_agents = self.num_agents_per_team * self.num_teams + self.max_steps = env_config["frame_limit"] + self.env = create_env_custom(type='st', cfg=dict( + team_num=self.num_teams, + player_num_per_team=self.num_agents_per_team, + frame_limit=self.max_steps + )) + + self.action_space = Box(low=-1, + high=1, + shape=(2,), + dtype=float) + + self.rectangle_dim = 4 + self.food_dim = self.num_agents * 100 + self.thorns_dim = self.num_agents * 6 + self.clone_dim = self.num_agents * 10 + self.team_name_dim = 1 + self.score_dim = 1 + + self.obs_dim = self.rectangle_dim + self.food_dim + self.thorns_dim + \ + self.clone_dim + self.team_name_dim + self.score_dim + + self.observation_space = GymDict({"obs": Box( + low=-1e6, + high=1e6, + shape=(self.obs_dim,), + dtype=float)}) + + self.agents = [] + for team_index in range(self.num_teams): + for agent_index in range(self.num_agents_per_team): + self.agents.append("team{}_{}".format(team_index, agent_index)) + + env_config["map_name"] = map_name + self.env_config = env_config + + def reset(self): + original_obs = self.env.reset() + obs = {} + for agent_index, agent_name in enumerate(self.agents): + + rectangle = list(original_obs[1][agent_index]["rectangle"]) + + overlap_dict = original_obs[1][agent_index]["overlap"] + + food = overlap_dict["food"] + if 4 * len(food) > self.food_dim: + food = food[:self.food_dim // 4] + else: + padding = [0] * (self.food_dim - 4 * len(food)) + food.append(padding) + food = [item for sublist in food for item in sublist] + + thorns = overlap_dict["thorns"] + if 6 * len(thorns) > self.thorns_dim: + thorns = thorns[:self.thorns_dim // 6] + else: + padding = [0] * (self.thorns_dim - 6 * len(thorns)) + thorns.append(padding) + thorns = [item for sublist in thorns for item in sublist] + + clone = overlap_dict["clone"] + if 10 * len(clone) > self.clone_dim: + clone = clone[:self.clone_dim // 10] + else: + padding = [0] * (self.clone_dim - 10 * len(clone)) + clone.append(padding) + clone = [item for sublist in clone for item in sublist] + + team = original_obs[1][agent_index]["team_name"] + score = original_obs[1][agent_index]["score"] + + all_elements = rectangle + food + thorns + clone + [team] + [score] + all_elements = np.array(all_elements, dtype=float) + + obs[agent_name] = { + "obs": all_elements + } + + return obs + + def step(self, action_dict): + actions = {} + for i, agent_name in enumerate(self.agents): + actions[i] = list(action_dict[agent_name]) + actions[i].append(-1) + + original_obs, team_rewards, done, info = self.env.step(actions) + + rewards = {} + obs = {} + infos = {} + + for agent_index, agent_name in enumerate(self.agents): + + rectangle = list(original_obs[1][agent_index]["rectangle"]) + + overlap_dict = original_obs[1][agent_index]["overlap"] + + food = overlap_dict["food"] + if 4 * len(food) > self.food_dim: + food = food[:self.food_dim // 4] + else: + padding = [0] * (self.food_dim - 4 * len(food)) + food.append(padding) + food = [item for sublist in food for item in sublist] + + thorns = overlap_dict["thorns"] + if 6 * len(thorns) > self.thorns_dim: + thorns = thorns[:self.thorns_dim // 6] + else: + padding = [0] * (self.thorns_dim - 6 * len(thorns)) + thorns.append(padding) + thorns = [item for sublist in thorns for item in sublist] + + clone = overlap_dict["clone"] + if 10 * len(clone) > self.clone_dim: + clone = clone[:self.clone_dim // 10] + else: + padding = [0] * (self.clone_dim - 10 * len(clone)) + clone.append(padding) + clone = [item for sublist in clone for item in sublist] + + team = original_obs[1][agent_index]["team_name"] + score = original_obs[1][agent_index]["score"] + + all_elements = rectangle + food + thorns + clone + [team] + [score] + all_elements = np.array(all_elements, dtype=float) + + obs[agent_name] = { + "obs": all_elements + } + + rewards[agent_name] = team_rewards[team] + + dones = {"__all__": done} + return obs, rewards, dones, infos + + def get_env_info(self): + env_info = { + "space_obs": self.observation_space, + "space_act": self.action_space, + "num_agents": self.num_agents, + "episode_limit": self.max_steps, + "policy_mapping_info": policy_mapping_dict + } + return env_info + + def close(self): + self.env.close() diff --git a/marllib/envs/global_reward_env/__init__.py b/marllib/envs/global_reward_env/__init__.py index d5088910..8ab46a8d 100644 --- a/marllib/envs/global_reward_env/__init__.py +++ b/marllib/envs/global_reward_env/__init__.py @@ -24,56 +24,70 @@ try: from marllib.envs.global_reward_env.mpe_fcoop import RLlibMPE_FCOOP + COOP_ENV_REGISTRY["mpe"] = RLlibMPE_FCOOP except Exception as e: COOP_ENV_REGISTRY["mpe"] = str(e) try: from marllib.envs.global_reward_env.magent_fcoop import RLlibMAgent_FCOOP + COOP_ENV_REGISTRY["magent"] = RLlibMAgent_FCOOP except Exception as e: COOP_ENV_REGISTRY["magent"] = str(e) try: from marllib.envs.global_reward_env.mamujoco_fcoop import RLlibMAMujoco_FCOOP + COOP_ENV_REGISTRY["mamujoco"] = RLlibMAMujoco_FCOOP except Exception as e: COOP_ENV_REGISTRY["mamujoco"] = str(e) try: from marllib.envs.global_reward_env.smac_fcoop import RLlibSMAC_FCOOP + COOP_ENV_REGISTRY["smac"] = RLlibSMAC_FCOOP except Exception as e: COOP_ENV_REGISTRY["smac"] = str(e) try: from marllib.envs.global_reward_env.football_fcoop import RLlibGFootball_FCOOP + COOP_ENV_REGISTRY["football"] = RLlibGFootball_FCOOP except Exception as e: COOP_ENV_REGISTRY["football"] = str(e) try: from marllib.envs.global_reward_env.rware_fcoop import RLlibRWARE_FCOOP + COOP_ENV_REGISTRY["rware"] = RLlibRWARE_FCOOP except Exception as e: COOP_ENV_REGISTRY["rware"] = str(e) try: from marllib.envs.global_reward_env.lbf_fcoop import RLlibLBF_FCOOP + COOP_ENV_REGISTRY["lbf"] = RLlibLBF_FCOOP except Exception as e: COOP_ENV_REGISTRY["lbf"] = str(e) try: from marllib.envs.global_reward_env.pommerman_fcoop import RLlibPommerman_FCOOP + COOP_ENV_REGISTRY["pommerman"] = RLlibPommerman_FCOOP except Exception as e: COOP_ENV_REGISTRY["pommerman"] = str(e) - try: from marllib.envs.global_reward_env.mate_fcoop import RLlibMATE_FCOOP + COOP_ENV_REGISTRY["mate"] = RLlibMATE_FCOOP except Exception as e: COOP_ENV_REGISTRY["mate"] = str(e) +try: + from marllib.envs.global_reward_env.gobigger_fcoop import RLlibGoBigger_FCOOP + + COOP_ENV_REGISTRY["gobigger"] = RLlibGoBigger_FCOOP +except Exception as e: + COOP_ENV_REGISTRY["gobigger"] = str(e) diff --git a/marllib/envs/global_reward_env/gobigger_fcoop.py b/marllib/envs/global_reward_env/gobigger_fcoop.py new file mode 100644 index 00000000..455314c8 --- /dev/null +++ b/marllib/envs/global_reward_env/gobigger_fcoop.py @@ -0,0 +1,207 @@ +# MIT License + +# Copyright (c) 2023 Replicable-MARL + +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. + +import copy + +from gobigger.envs import create_env_custom +from gym.spaces import Dict as GymDict, Box +from ray.rllib.env.multi_agent_env import MultiAgentEnv +import numpy as np + +policy_mapping_dict = { + "all_scenario": { + "description": "cooperative scenarios to t=1 (num_teams=1)", + "team": ("team0_"), + "all_agents_one_policy": True, + "one_agent_one_policy": True, + }, +} + + +class RLlibGoBigger_FCOOP(MultiAgentEnv): + + def __init__(self, env_config): + + map_name = env_config["map_name"] + + env_config.pop("map_name", None) + self.num_agents_per_team = int(map_name.split("p")[-1][0]) + self.num_teams = 1 + self.num_agents = self.num_agents_per_team * self.num_teams + self.max_steps = env_config["frame_limit"] + self.env = create_env_custom(type='st', cfg=dict( + team_num=self.num_teams, + player_num_per_team=self.num_agents_per_team, + frame_limit=self.max_steps + )) + + self.action_space = Box(low=-1, + high=1, + shape=(2,), + dtype=float) + + self.rectangle_dim = 4 + self.food_dim = self.num_agents * 100 + self.thorns_dim = self.num_agents * 6 + self.clone_dim = self.num_agents * 10 + self.team_name_dim = 1 + self.score_dim = 1 + + self.obs_dim = self.rectangle_dim + self.food_dim + self.thorns_dim + \ + self.clone_dim + self.team_name_dim + self.score_dim + + self.observation_space = GymDict({"obs": Box( + low=-1e6, + high=1e6, + shape=(self.obs_dim,), + dtype=float)}) + + self.agents = [] + for team_index in range(self.num_teams): + for agent_index in range(self.num_agents_per_team): + self.agents.append("team{}_{}".format(team_index, agent_index)) + + env_config["map_name"] = map_name + self.env_config = env_config + + def reset(self): + original_obs = self.env.reset() + obs = {} + for agent_index, agent_name in enumerate(self.agents): + + rectangle = list(original_obs[1][agent_index]["rectangle"]) + + overlap_dict = original_obs[1][agent_index]["overlap"] + + food = overlap_dict["food"] + if 4 * len(food) > self.food_dim: + food = food[:self.food_dim // 4] + else: + padding = [0] * (self.food_dim - 4 * len(food)) + food.append(padding) + food = [item for sublist in food for item in sublist] + + thorns = overlap_dict["thorns"] + if 6 * len(thorns) > self.thorns_dim: + thorns = thorns[:self.thorns_dim // 6] + else: + padding = [0] * (self.thorns_dim - 6 * len(thorns)) + thorns.append(padding) + thorns = [item for sublist in thorns for item in sublist] + + clone = overlap_dict["clone"] + if 10 * len(clone) > self.clone_dim: + clone = clone[:self.clone_dim // 10] + else: + padding = [0] * (self.clone_dim - 10 * len(clone)) + clone.append(padding) + clone = [item for sublist in clone for item in sublist] + + team = original_obs[1][agent_index]["team_name"] + score = original_obs[1][agent_index]["score"] + + all_elements = rectangle + food + thorns + clone + [team] + [score] + + if len(all_elements) != self.obs_dim: + print(1) + + all_elements = np.array(all_elements, dtype=float) + + obs[agent_name] = { + "obs": all_elements + } + + return obs + + def step(self, action_dict): + actions = {} + for i, agent_name in enumerate(self.agents): + actions[i] = list(action_dict[agent_name]) + actions[i].append(-1) + + original_obs, team_rewards, done, info = self.env.step(actions) + + rewards = {} + obs = {} + infos = {} + + for agent_index, agent_name in enumerate(self.agents): + + rectangle = list(original_obs[1][agent_index]["rectangle"]) + + overlap_dict = original_obs[1][agent_index]["overlap"] + + food = overlap_dict["food"] + if 4 * len(food) > self.food_dim: + food = food[:self.food_dim // 4] + else: + padding = [0] * (self.food_dim - 4 * len(food)) + food.append(padding) + food = [item for sublist in food for item in sublist] + + thorns = overlap_dict["thorns"] + if 6 * len(thorns) > self.thorns_dim: + thorns = thorns[:self.thorns_dim // 6] + else: + padding = [0] * (self.thorns_dim - 6 * len(thorns)) + thorns.append(padding) + thorns = [item for sublist in thorns for item in sublist] + + clone = overlap_dict["clone"] + if 10 * len(clone) > self.clone_dim: + clone = clone[:self.clone_dim // 10] + else: + padding = [0] * (self.clone_dim - 10 * len(clone)) + clone.append(padding) + clone = [item for sublist in clone for item in sublist] + + team = original_obs[1][agent_index]["team_name"] + score = original_obs[1][agent_index]["score"] + + all_elements = rectangle + food + thorns + clone + [team] + [score] + + if len(all_elements) != self.obs_dim: + print(1) + + all_elements = np.array(all_elements, dtype=float) + + obs[agent_name] = { + "obs": all_elements + } + + rewards[agent_name] = team_rewards[team] + + dones = {"__all__": done} + return obs, rewards, dones, infos + + def get_env_info(self): + env_info = { + "space_obs": self.observation_space, + "space_act": self.action_space, + "num_agents": self.num_agents, + "episode_limit": self.max_steps, + "policy_mapping_info": policy_mapping_dict + } + return env_info + + def close(self): + self.env.close() diff --git a/marllib/marl/algos/README.md b/marllib/marl/algos/README.md deleted file mode 100644 index dcd00f6c..00000000 --- a/marllib/marl/algos/README.md +++ /dev/null @@ -1,37 +0,0 @@ -10 environments are available for Independent Learning - -- Football -- MPE -- SMAC -- mamujoco -- RWARE -- LBF -- Pommerman -- Magent -- MetaDrive -- Hanabi - - -7 environments are available for Value Decomposition - -- Football -- MPE -- SMAC -- mamujoco -- RWARE -- LBF -- Pommerman - -9 environments are available for Centralized Critic - -- Football -- MPE -- SMAC -- mamujoco -- RWARE -- LBF -- Pommerman -- Magent -- Hanabi - - diff --git a/marllib/marl/algos/hyperparams/common/coma.yaml b/marllib/marl/algos/hyperparams/common/coma.yaml index b7a455b0..44274589 100644 --- a/marllib/marl/algos/hyperparams/common/coma.yaml +++ b/marllib/marl/algos/hyperparams/common/coma.yaml @@ -29,6 +29,6 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 10 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 diff --git a/marllib/marl/algos/hyperparams/common/facmac.yaml b/marllib/marl/algos/hyperparams/common/facmac.yaml index ad22a4d4..8ffbe6d9 100644 --- a/marllib/marl/algos/hyperparams/common/facmac.yaml +++ b/marllib/marl/algos/hyperparams/common/facmac.yaml @@ -36,6 +36,6 @@ algo_args: buffer_size_episode: 1000 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" mixer: "qmix" # qmix or vdn diff --git a/marllib/marl/algos/hyperparams/common/happo.yaml b/marllib/marl/algos/hyperparams/common/happo.yaml index 42b71349..1388564b 100644 --- a/marllib/marl/algos/hyperparams/common/happo.yaml +++ b/marllib/marl/algos/hyperparams/common/happo.yaml @@ -38,4 +38,4 @@ algo_args: entropy_coeff: 0.01 vf_clip_param: 10.0 min_lr_schedule: 1e-11 - batch_mode: "complete_episodes" \ No newline at end of file + batch_mode: "truncate_episodes" \ No newline at end of file diff --git a/marllib/marl/algos/hyperparams/common/hatrpo.yaml b/marllib/marl/algos/hyperparams/common/hatrpo.yaml index 85fc6f5f..ddf4e9c5 100644 --- a/marllib/marl/algos/hyperparams/common/hatrpo.yaml +++ b/marllib/marl/algos/hyperparams/common/hatrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.00005 diff --git a/marllib/marl/algos/hyperparams/common/ia2c.yaml b/marllib/marl/algos/hyperparams/common/ia2c.yaml index 2b2c4fa6..76af2158 100644 --- a/marllib/marl/algos/hyperparams/common/ia2c.yaml +++ b/marllib/marl/algos/hyperparams/common/ia2c.yaml @@ -29,6 +29,6 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 10 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 diff --git a/marllib/marl/algos/hyperparams/common/iddpg.yaml b/marllib/marl/algos/hyperparams/common/iddpg.yaml index cfbe62aa..c4971a4a 100644 --- a/marllib/marl/algos/hyperparams/common/iddpg.yaml +++ b/marllib/marl/algos/hyperparams/common/iddpg.yaml @@ -36,5 +36,5 @@ algo_args: buffer_size_episode: 1000 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/common/ippo.yaml b/marllib/marl/algos/hyperparams/common/ippo.yaml index dad13578..8df638d1 100644 --- a/marllib/marl/algos/hyperparams/common/ippo.yaml +++ b/marllib/marl/algos/hyperparams/common/ippo.yaml @@ -35,5 +35,5 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/common/itrpo.yaml b/marllib/marl/algos/hyperparams/common/itrpo.yaml index 1b0ad894..66d1e072 100644 --- a/marllib/marl/algos/hyperparams/common/itrpo.yaml +++ b/marllib/marl/algos/hyperparams/common/itrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.00005 diff --git a/marllib/marl/algos/hyperparams/common/maa2c.yaml b/marllib/marl/algos/hyperparams/common/maa2c.yaml index 449462d6..df3b0abb 100644 --- a/marllib/marl/algos/hyperparams/common/maa2c.yaml +++ b/marllib/marl/algos/hyperparams/common/maa2c.yaml @@ -29,6 +29,6 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 10 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 diff --git a/marllib/marl/algos/hyperparams/common/maddpg.yaml b/marllib/marl/algos/hyperparams/common/maddpg.yaml index 20d42498..5c957a8d 100644 --- a/marllib/marl/algos/hyperparams/common/maddpg.yaml +++ b/marllib/marl/algos/hyperparams/common/maddpg.yaml @@ -36,5 +36,5 @@ algo_args: buffer_size_episode: 1000 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/common/mappo.yaml b/marllib/marl/algos/hyperparams/common/mappo.yaml index c03dcb26..efcbb7f2 100644 --- a/marllib/marl/algos/hyperparams/common/mappo.yaml +++ b/marllib/marl/algos/hyperparams/common/mappo.yaml @@ -35,6 +35,6 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/common/matrpo.yaml b/marllib/marl/algos/hyperparams/common/matrpo.yaml index 4d44a416..76a86a8d 100644 --- a/marllib/marl/algos/hyperparams/common/matrpo.yaml +++ b/marllib/marl/algos/hyperparams/common/matrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.00005 diff --git a/marllib/marl/algos/hyperparams/common/vda2c.yaml b/marllib/marl/algos/hyperparams/common/vda2c.yaml index f2d0e24d..95c03bb6 100644 --- a/marllib/marl/algos/hyperparams/common/vda2c.yaml +++ b/marllib/marl/algos/hyperparams/common/vda2c.yaml @@ -29,7 +29,7 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 10 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 mixer: "qmix" # vdn diff --git a/marllib/marl/algos/hyperparams/common/vdppo.yaml b/marllib/marl/algos/hyperparams/common/vdppo.yaml index 04e90420..3adf7000 100644 --- a/marllib/marl/algos/hyperparams/common/vdppo.yaml +++ b/marllib/marl/algos/hyperparams/common/vdppo.yaml @@ -35,5 +35,5 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" mixer: "qmix" # qmix or vdn diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/facmac.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/facmac.yaml index 62b186af..1d7f09ce 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/facmac.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/facmac.yaml @@ -36,6 +36,6 @@ algo_args: buffer_size_episode: 1000 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" mixer: "qmix" # qmix or vdn diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/happo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/happo.yaml index 1b2707dd..1451a6f9 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/happo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/happo.yaml @@ -36,6 +36,6 @@ algo_args: lr: 0.0001 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" min_lr_schedule: 1e-11 gain: 0.01 diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/hatrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/hatrpo.yaml index fd289dc4..45436c7c 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/hatrpo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/hatrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.0005 diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/ia2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/ia2c.yaml index 2b2c4fa6..76af2158 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/ia2c.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/ia2c.yaml @@ -29,6 +29,6 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 10 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/iddpg.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/iddpg.yaml index babce71a..e6e84b55 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/iddpg.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/iddpg.yaml @@ -36,5 +36,5 @@ algo_args: buffer_size_episode: 1000 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/ippo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/ippo.yaml index 6dde1d9d..c25d964c 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/ippo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/ippo.yaml @@ -35,5 +35,5 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/itrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/itrpo.yaml index 2578927b..e38d6d2c 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/itrpo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/itrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.0005 diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/maa2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/maa2c.yaml index 449462d6..df3b0abb 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/maa2c.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/maa2c.yaml @@ -29,6 +29,6 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 10 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/maddpg.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/maddpg.yaml index 476e6a96..9dc602d9 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/maddpg.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/maddpg.yaml @@ -36,5 +36,5 @@ algo_args: buffer_size_episode: 1000 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/mappo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/mappo.yaml index eef75ff0..802aed8f 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/mappo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/mappo.yaml @@ -35,6 +35,6 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/matrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/matrpo.yaml index 5942888f..9770b9a2 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/matrpo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/matrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.0005 diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/vda2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/vda2c.yaml index f2d0e24d..95c03bb6 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/vda2c.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/vda2c.yaml @@ -29,7 +29,7 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 10 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 mixer: "qmix" # vdn diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/vdppo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/vdppo.yaml index fe3f1bd4..d1b53e56 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/vdppo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/vdppo.yaml @@ -35,5 +35,5 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" mixer: "qmix" # qmix or vdn diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/coma.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/coma.yaml index 26ad593f..3b54ae3b 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/coma.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/coma.yaml @@ -29,6 +29,6 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 128 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/facmac.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/facmac.yaml index 2c8d62b7..f42ce4ec 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/facmac.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/facmac.yaml @@ -36,6 +36,6 @@ algo_args: buffer_size_episode: 1000 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" mixer: "qmix" # qmix or vdn diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/happo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/happo.yaml index 4ab06ad1..afef9151 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/happo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/happo.yaml @@ -38,4 +38,4 @@ algo_args: entropy_coeff: 0.01 vf_clip_param: 10.0 min_lr_schedule: 1e-11 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/hatrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/hatrpo.yaml index 588d1ed3..a0d81929 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/hatrpo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/hatrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.0005 diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/ia2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/ia2c.yaml index 2b2c4fa6..76af2158 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/ia2c.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/ia2c.yaml @@ -29,6 +29,6 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 10 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/iddpg.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/iddpg.yaml index 94ba33ef..621beb54 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/iddpg.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/iddpg.yaml @@ -36,5 +36,5 @@ algo_args: buffer_size_episode: 1000 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/ippo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/ippo.yaml index aa8d522d..8c6c08b4 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/ippo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/ippo.yaml @@ -35,5 +35,5 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 20.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/itrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/itrpo.yaml index 3e8cc247..f41374a8 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/itrpo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/itrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.0005 diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/maa2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/maa2c.yaml index a5201c1f..74dccc18 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/maa2c.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/maa2c.yaml @@ -29,6 +29,6 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 128 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/maddpg.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/maddpg.yaml index 61ec7e6c..2faf2b4e 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/maddpg.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/maddpg.yaml @@ -36,5 +36,5 @@ algo_args: buffer_size_episode: 10000 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/mappo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/mappo.yaml index e5f13fc5..823705a1 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/mappo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/mappo.yaml @@ -35,6 +35,6 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 20.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/matrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/matrpo.yaml index 3a3da10f..6ded245c 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/matrpo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/matrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.0005 diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/vda2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/vda2c.yaml index e11990b1..7053131f 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/vda2c.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/vda2c.yaml @@ -29,7 +29,7 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 128 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 mixer: "qmix" # vdn diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/vdppo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/vdppo.yaml index dc45d4cb..5df3d881 100644 --- a/marllib/marl/algos/hyperparams/finetuned/mpe/vdppo.yaml +++ b/marllib/marl/algos/hyperparams/finetuned/mpe/vdppo.yaml @@ -35,5 +35,5 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 20.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" mixer: "qmix" # qmix or vdn diff --git a/marllib/marl/algos/hyperparams/test/coma.yaml b/marllib/marl/algos/hyperparams/test/coma.yaml index f320a3f1..e3019d38 100644 --- a/marllib/marl/algos/hyperparams/test/coma.yaml +++ b/marllib/marl/algos/hyperparams/test/coma.yaml @@ -29,6 +29,6 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 2 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 diff --git a/marllib/marl/algos/hyperparams/test/facmac.yaml b/marllib/marl/algos/hyperparams/test/facmac.yaml index 40c0d4df..e41bbc68 100644 --- a/marllib/marl/algos/hyperparams/test/facmac.yaml +++ b/marllib/marl/algos/hyperparams/test/facmac.yaml @@ -36,6 +36,6 @@ algo_args: buffer_size_episode: 10 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" mixer: "qmix" # qmix or vdn diff --git a/marllib/marl/algos/hyperparams/test/happo.yaml b/marllib/marl/algos/hyperparams/test/happo.yaml index 85ed5d79..dfbbc47d 100644 --- a/marllib/marl/algos/hyperparams/test/happo.yaml +++ b/marllib/marl/algos/hyperparams/test/happo.yaml @@ -38,4 +38,4 @@ algo_args: entropy_coeff: 0.01 vf_clip_param: 10.0 min_lr_schedule: 1e-11 - batch_mode: "complete_episodes" \ No newline at end of file + batch_mode: "truncate_episodes" \ No newline at end of file diff --git a/marllib/marl/algos/hyperparams/test/hatrpo.yaml b/marllib/marl/algos/hyperparams/test/hatrpo.yaml index 3b74bca1..33af497c 100644 --- a/marllib/marl/algos/hyperparams/test/hatrpo.yaml +++ b/marllib/marl/algos/hyperparams/test/hatrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.00005 diff --git a/marllib/marl/algos/hyperparams/test/ia2c.yaml b/marllib/marl/algos/hyperparams/test/ia2c.yaml index faed5009..5d830e6a 100644 --- a/marllib/marl/algos/hyperparams/test/ia2c.yaml +++ b/marllib/marl/algos/hyperparams/test/ia2c.yaml @@ -29,6 +29,6 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 2 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 diff --git a/marllib/marl/algos/hyperparams/test/iddpg.yaml b/marllib/marl/algos/hyperparams/test/iddpg.yaml index d52f814d..a1f237f3 100644 --- a/marllib/marl/algos/hyperparams/test/iddpg.yaml +++ b/marllib/marl/algos/hyperparams/test/iddpg.yaml @@ -36,5 +36,5 @@ algo_args: buffer_size_episode: 10 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/test/ippo.yaml b/marllib/marl/algos/hyperparams/test/ippo.yaml index c40456a9..e13de22e 100644 --- a/marllib/marl/algos/hyperparams/test/ippo.yaml +++ b/marllib/marl/algos/hyperparams/test/ippo.yaml @@ -35,5 +35,5 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/test/itrpo.yaml b/marllib/marl/algos/hyperparams/test/itrpo.yaml index ed85d536..ce0093d6 100644 --- a/marllib/marl/algos/hyperparams/test/itrpo.yaml +++ b/marllib/marl/algos/hyperparams/test/itrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.00005 diff --git a/marllib/marl/algos/hyperparams/test/maa2c.yaml b/marllib/marl/algos/hyperparams/test/maa2c.yaml index 1a199a75..cca3b1e3 100644 --- a/marllib/marl/algos/hyperparams/test/maa2c.yaml +++ b/marllib/marl/algos/hyperparams/test/maa2c.yaml @@ -29,6 +29,6 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 2 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 diff --git a/marllib/marl/algos/hyperparams/test/maddpg.yaml b/marllib/marl/algos/hyperparams/test/maddpg.yaml index efe7c914..a4f3197c 100644 --- a/marllib/marl/algos/hyperparams/test/maddpg.yaml +++ b/marllib/marl/algos/hyperparams/test/maddpg.yaml @@ -36,5 +36,5 @@ algo_args: buffer_size_episode: 10 target_network_update_freq_episode: 1 tau: 0.002 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/test/mappo.yaml b/marllib/marl/algos/hyperparams/test/mappo.yaml index f13392e2..c96c5f9a 100644 --- a/marllib/marl/algos/hyperparams/test/mappo.yaml +++ b/marllib/marl/algos/hyperparams/test/mappo.yaml @@ -35,6 +35,6 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" diff --git a/marllib/marl/algos/hyperparams/test/matrpo.yaml b/marllib/marl/algos/hyperparams/test/matrpo.yaml index 915e843d..29972443 100644 --- a/marllib/marl/algos/hyperparams/test/matrpo.yaml +++ b/marllib/marl/algos/hyperparams/test/matrpo.yaml @@ -34,7 +34,7 @@ algo_args: vf_loss_coeff: 1.0 entropy_coeff: 0.01 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" kl_threshold: 0.00001 accept_ratio: 0.5 critic_lr: 0.00005 diff --git a/marllib/marl/algos/hyperparams/test/vda2c.yaml b/marllib/marl/algos/hyperparams/test/vda2c.yaml index 3f0bd5c4..c3889033 100644 --- a/marllib/marl/algos/hyperparams/test/vda2c.yaml +++ b/marllib/marl/algos/hyperparams/test/vda2c.yaml @@ -29,7 +29,7 @@ algo_args: lambda: 1.0 vf_loss_coeff: 1.0 batch_episode: 2 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" lr: 0.0005 entropy_coeff: 0.01 mixer: "qmix" # vdn diff --git a/marllib/marl/algos/hyperparams/test/vdppo.yaml b/marllib/marl/algos/hyperparams/test/vdppo.yaml index b2b4c59e..0e792bec 100644 --- a/marllib/marl/algos/hyperparams/test/vdppo.yaml +++ b/marllib/marl/algos/hyperparams/test/vdppo.yaml @@ -35,5 +35,5 @@ algo_args: entropy_coeff: 0.01 clip_param: 0.3 vf_clip_param: 10.0 - batch_mode: "complete_episodes" + batch_mode: "truncate_episodes" mixer: "qmix" # qmix or vdn diff --git a/marllib/marl/ray/ray.yaml b/marllib/marl/ray/ray.yaml index 9ba4d281..b58285be 100644 --- a/marllib/marl/ray/ray.yaml +++ b/marllib/marl/ray/ray.yaml @@ -24,7 +24,7 @@ local_mode: False # True for debug mode only share_policy: "group" # individual(separate) / group(division) / all(share) -evaluation_interval: 10 # evaluate model every 10 training iterations +evaluation_interval: 50 # evaluate model every 10 training iterations framework: "torch" num_workers: 1 # thread number num_gpus: 1 # gpu to use