Merge pull request #98 from Replicable-MARL/sy_dev

Sy dev
Replicable-MARL · Apr 25, 2023 · 12f0ce7 · 12f0ce7
2 parents 392cf4b + d6be00b
commit 12f0ce7
Show file tree

Hide file tree

Showing 66 changed files with 585 additions and 105 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,8 @@
-<div align="center">
-<img src=docs/source/images/logo1.png width=65% />
-</div>
+[comment]: <> (<div align="center">)
+
+[comment]: <> (<img src=docs/source/images/logo1.png width=65% />)
+
+[comment]: <> (</div>)
 
 <h1 align="center"> MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library </h1>
 
@@ -10,10 +12,9 @@
 [![GitHub issues](https://img.shields.io/github/issues/Replicable-MARL/MARLlib)](https://github.com/Replicable-MARL/MARLlib/issues)
 [![PyPI version](https://badge.fury.io/py/marllib.svg)](https://badge.fury.io/py/marllib)
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Replicable-MARL/MARLlib/blob/sy_dev/marllib.ipynb)
-[![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html)
 [![Organization](https://img.shields.io/badge/Organization-ReLER_RL-blue.svg)](https://github.com/Replicable-MARL/MARLlib)
 [![Organization](https://img.shields.io/badge/Organization-PKU_MARL-blue.svg)](https://github.com/Replicable-MARL/MARLlib)
-
+[![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html)
 
 > __News__:
 > We are excited to announce that a major update has just been released. For detailed version information, please refer to the [version info](https://github.com/Replicable-MARL/MARLlib/releases/tag/1.0.2).
@@ -55,7 +56,7 @@ Here we provide a table for the comparison of MARLlib and existing work.
 | [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) |       4 cooperative       |      1     |          share + separate        |          MLP + GRU        |         :x:              |
 | [MAlib](https://github.com/sjtu-marl/malib) |  4 self-play  | 10 | share + group + separate | MLP + LSTM | [![Documentation Status](https://readthedocs.org/projects/malib/badge/?version=latest)](https://malib.readthedocs.io/en/latest/?badge=latest)
 |    [EPyMARL](https://github.com/uoe-agents/epymarl)|        4 cooperative      |    9    |        share + separate       |      GRU             |           :x:            |
-|    **[MARLlib](https://github.com/Replicable-MARL/MARLlib)** |       11 **no task mode restriction**     |    18     |   share + group + separate + **customizable**         |         MLP + CNN + GRU + LSTM          |           [![Documentation Status](https://readthedocs.org/projects/marllib/badge/?version=latest)](https://marllib.readthedocs.io/en/latest/) |
+|    **[MARLlib](https://github.com/Replicable-MARL/MARLlib)** |       12 **no task mode restriction**     |    18     |   share + group + separate + **customizable**         |         MLP + CNN + GRU + LSTM          |           [![Documentation Status](https://readthedocs.org/projects/marllib/badge/?version=latest)](https://marllib.readthedocs.io/en/latest/) |
 
 |   Library   | Github Stars  | Documentation | Issues Open | Activity | Last Update
 |:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|
@@ -108,7 +109,7 @@ First, install MARLlib dependencies to guarantee basic usage.
 following [this guide](https://marllib.readthedocs.io/en/latest/handbook/env.html), finally install patches for RLlib.
 
 ```bash
-$ conda create -n marllib python=3.8
+$ conda create -n marllib python=3.8 # or 3.9
 $ conda activate marllib
 $ git clone https://github.com/Replicable-MARL/MARLlib.git && cd MARLlib
 $ pip install -r requirements.txt
@@ -185,6 +186,7 @@ Most of the popular environments in MARL research are supported by MARLlib:
 | **[GRF](https://github.com/google-research/football)**  | collaborative + mixed | Full | Discrete | 2D |
 | **[Hanabi](https://github.com/deepmind/hanabi-learning-environment)** | cooperative | Partial | Discrete | 1D |
 | **[MATE](https://github.com/XuehaiPan/mate)** | cooperative + mixed | Partial | Both | 1D |
+| **[GoBigger](https://github.com/opendilab/GoBigger)** | cooperative + mixed | Both | Continuous | 1D |
 
 Each environment has a readme file, standing as the instruction for this task, including env settings, installation, and
 important notes.
@@ -320,7 +322,11 @@ More tutorial documentations are available [here](https://marllib.readthedocs.io
 
 ## Awesome List
 
-A collection of research and review papers of multi-agent reinforcement learning (MARL) is available [here](https://marllib.readthedocs.io/en/latest/resources/awesome.html). The papers have been organized based on their publication date and their evaluation of the corresponding environments.
+A collection of research and review papers of multi-agent reinforcement learning (MARL) is available. The papers have been organized based on their publication date and their evaluation of the corresponding environments.
+
+Algorithms: [![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html)
+Environments: [![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/handbook/env.html)
+
 
 ## Community
 

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -11,7 +11,8 @@ This list describes the planned features including breaking changes.
 - [ ] manual training, refer to issue: https://github.com/Replicable-MARL/MARLlib/issues/86#issuecomment-1468188682
 - [ ] new environments
   - [x] MATE: https://github.com/UnrealTracking/mate
-  - [ ] Go-Bigger: https://github.com/opendilab/GoBigger
+  - [x] Go-Bigger: https://github.com/opendilab/GoBigger
   - [ ] Voltage Control: https://github.com/Future-Power-Networks/MAPDN
   - [ ] Overcooked: https://github.com/HumanCompatibleAI/overcooked_ai
-- [ ] Support Transformer architecture
+  - [ ] CloseAirCombat: https://github.com/liuqh16/CloseAirCombat
+- [ ] Support Transformers
diff --git a/docs/source/handbook/env.rst b/docs/source/handbook/env.rst
@@ -594,4 +594,52 @@ Installation
 
 .. code-block:: shell
 
-    pip3 install git+https://github.com/XuehaiPan/mate.git#egg=mate
+    pip3 install git+https://github.com/XuehaiPan/mate.git#egg=mate
+
+
+.. _GoBigger:
+
+GoBigger
+==============
+.. only:: html
+
+    .. figure:: images/env_gobigger.gif
+       :width: 320
+       :align: center
+
+
+GoBigger is a game engine that offers an efficient and easy-to-use platform for agar-like game development. It provides a variety of interfaces specifically designed for game AI development. The game mechanics of GoBigger are similar to those of Agar, a popular massive multiplayer online action game developed by Matheus Valadares of Brazil. The objective of GoBigger is for players to navigate one or more circular balls across a map, consuming Food Balls and smaller balls to increase their size while avoiding larger balls that can consume them. Each player starts with a single ball, but can divide it into two when it reaches a certain size, giving them control over multiple balls.
+Official Link: https://github.com/opendilab/GoBigger
+
+.. list-table::
+   :widths: 25 25
+   :header-rows: 0
+
+   * - ``Original Learning Mode``
+     - Cooperative + Mixed
+   * - ``MARLlib Learning Mode``
+     - Cooperative + Mixed
+   * - ``Observability``
+     - Partial + Full
+   * - ``Action Space``
+     - Continuous
+   * - ``Observation Space Dim``
+     - 1D
+   * - ``Action Mask``
+     - No
+   * - ``Global State``
+     - No
+   * - ``Global State Space Dim``
+     - /
+   * - ``Reward``
+     - Dense
+   * - ``Agent-Env Interact Mode``
+     - Simultaneous
+
+
+Installation
+-----------------
+
+.. code-block:: shell
+
+    conda install -c opendilab gobigger
diff --git a/docs/source/images/env_gobigger.gif b/docs/source/images/env_gobigger.gif
diff --git a/marllib/envs/base_env/__init__.py b/marllib/envs/base_env/__init__.py
@@ -88,3 +88,9 @@
 except Exception as e:
     ENV_REGISTRY["mate"] = str(e)
 
+try:
+    from marllib.envs.base_env.gobigger import RLlibGoBigger
+    ENV_REGISTRY["gobigger"] = RLlibGoBigger
+except Exception as e:
+    ENV_REGISTRY["gobigger"] = str(e)
+
diff --git a/marllib/envs/base_env/config/gobigger.yaml b/marllib/envs/base_env/config/gobigger.yaml
@@ -0,0 +1,33 @@
+# MIT License
+
+# Copyright (c) 2023 Replicable-MARL
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+env: gobigger
+
+env_args:
+  map_name: "st_t1p2" #  st(andard)_t(eam)2p(layer)2
+  #num_teams: 1
+  #num_agents: 2
+  frame_limit: 1600
+mask_flag: False
+global_state_flag: False
+opp_action_in_cc: True
+fixed_batch_timesteps: 3200 # optional, all scenario will use this batch size, only valid for on-policy algorithms
diff --git a/marllib/envs/base_env/gobigger.py b/marllib/envs/base_env/gobigger.py
@@ -0,0 +1,202 @@
+# MIT License
+
+# Copyright (c) 2023 Replicable-MARL
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+import copy
+
+from gobigger.envs import create_env_custom
+from gym.spaces import Dict as GymDict, Box
+from ray.rllib.env.multi_agent_env import MultiAgentEnv
+import numpy as np
+
+
+policy_mapping_dict = {
+    "all_scenario": {
+        "description": "mixed scenarios to t>2 (num_teams > 1)",
+        "team_prefix": ("team0_", "team1_"),
+        "all_agents_one_policy": True,
+        "one_agent_one_policy": True,
+    },
+}
+
+
+class RLlibGoBigger(MultiAgentEnv):
+
+    def __init__(self, env_config):
+
+        map_name = env_config["map_name"]
+
+        env_config.pop("map_name", None)
+        self.num_agents_per_team = int(map_name.split("p")[-1][0])
+        self.num_teams = int(map_name.split("_t")[1][0])
+        if self.num_teams == 1:
+            policy_mapping_dict["all_scenario"]["team_prefix"] = ("team0_",)
+        self.num_agents = self.num_agents_per_team * self.num_teams
+        self.max_steps = env_config["frame_limit"]
+        self.env = create_env_custom(type='st', cfg=dict(
+            team_num=self.num_teams,
+            player_num_per_team=self.num_agents_per_team,
+            frame_limit=self.max_steps
+        ))
+
+        self.action_space = Box(low=-1,
+                                high=1,
+                                shape=(2,),
+                                dtype=float)
+
+        self.rectangle_dim = 4
+        self.food_dim = self.num_agents * 100
+        self.thorns_dim = self.num_agents * 6
+        self.clone_dim = self.num_agents * 10
+        self.team_name_dim = 1
+        self.score_dim = 1
+
+        self.obs_dim = self.rectangle_dim + self.food_dim + self.thorns_dim + \
+                       self.clone_dim + self.team_name_dim + self.score_dim
+
+        self.observation_space = GymDict({"obs": Box(
+            low=-1e6,
+            high=1e6,
+            shape=(self.obs_dim,),
+            dtype=float)})
+
+        self.agents = []
+        for team_index in range(self.num_teams):
+            for agent_index in range(self.num_agents_per_team):
+                self.agents.append("team{}_{}".format(team_index, agent_index))
+
+        env_config["map_name"] = map_name
+        self.env_config = env_config
+
+    def reset(self):
+        original_obs = self.env.reset()
+        obs = {}
+        for agent_index, agent_name in enumerate(self.agents):
+
+            rectangle = list(original_obs[1][agent_index]["rectangle"])
+
+            overlap_dict = original_obs[1][agent_index]["overlap"]
+
+            food = overlap_dict["food"]
+            if 4 * len(food) > self.food_dim:
+                food = food[:self.food_dim // 4]
+            else:
+                padding = [0] * (self.food_dim - 4 * len(food))
+                food.append(padding)
+            food = [item for sublist in food for item in sublist]
+
+            thorns = overlap_dict["thorns"]
+            if 6 * len(thorns) > self.thorns_dim:
+                thorns = thorns[:self.thorns_dim // 6]
+            else:
+                padding = [0] * (self.thorns_dim - 6 * len(thorns))
+                thorns.append(padding)
+            thorns = [item for sublist in thorns for item in sublist]
+
+            clone = overlap_dict["clone"]
+            if 10 * len(clone) > self.clone_dim:
+                clone = clone[:self.clone_dim // 10]
+            else:
+                padding = [0] * (self.clone_dim - 10 * len(clone))
+                clone.append(padding)
+            clone = [item for sublist in clone for item in sublist]
+
+            team = original_obs[1][agent_index]["team_name"]
+            score = original_obs[1][agent_index]["score"]
+
+            all_elements = rectangle + food + thorns + clone + [team] + [score]
+            all_elements = np.array(all_elements, dtype=float)
+
+            obs[agent_name] = {
+                "obs": all_elements
+            }
+
+        return obs
+
+    def step(self, action_dict):
+        actions = {}
+        for i, agent_name in enumerate(self.agents):
+            actions[i] = list(action_dict[agent_name])
+            actions[i].append(-1)
+
+        original_obs, team_rewards, done, info = self.env.step(actions)
+
+        rewards = {}
+        obs = {}
+        infos = {}
+
+        for agent_index, agent_name in enumerate(self.agents):
+
+            rectangle = list(original_obs[1][agent_index]["rectangle"])
+
+            overlap_dict = original_obs[1][agent_index]["overlap"]
+
+            food = overlap_dict["food"]
+            if 4 * len(food) > self.food_dim:
+                food = food[:self.food_dim // 4]
+            else:
+                padding = [0] * (self.food_dim - 4 * len(food))
+                food.append(padding)
+            food = [item for sublist in food for item in sublist]
+
+            thorns = overlap_dict["thorns"]
+            if 6 * len(thorns) > self.thorns_dim:
+                thorns = thorns[:self.thorns_dim // 6]
+            else:
+                padding = [0] * (self.thorns_dim - 6 * len(thorns))
+                thorns.append(padding)
+            thorns = [item for sublist in thorns for item in sublist]
+
+            clone = overlap_dict["clone"]
+            if 10 * len(clone) > self.clone_dim:
+                clone = clone[:self.clone_dim // 10]
+            else:
+                padding = [0] * (self.clone_dim - 10 * len(clone))
+                clone.append(padding)
+            clone = [item for sublist in clone for item in sublist]
+
+            team = original_obs[1][agent_index]["team_name"]
+            score = original_obs[1][agent_index]["score"]
+
+            all_elements = rectangle + food + thorns + clone + [team] + [score]
+            all_elements = np.array(all_elements, dtype=float)
+
+            obs[agent_name] = {
+                "obs": all_elements
+            }
+
+            rewards[agent_name] = team_rewards[team]
+
+        dones = {"__all__": done}
+        return obs, rewards, dones, infos
+
+    def get_env_info(self):
+        env_info = {
+            "space_obs": self.observation_space,
+            "space_act": self.action_space,
+            "num_agents": self.num_agents,
+            "episode_limit": self.max_steps,
+            "policy_mapping_info": policy_mapping_dict
+        }
+        return env_info
+
+    def close(self):
+        self.env.close()