Goal: To learn how to escape from school as fast as possible.
Using ML-Agents in Release 17.
- School_Only1F (Curriculum Learning)
- School_From2F (Curriculum Learning) << Training now
- Exit1 ~ 3 : When Agent touched them, one episode will success and be ended.
- Obstacles : When Agent touched them, one episode will fail and be ended.
- It spawns random place in this stage.
[!] There is spawnable area, made by collider, over the floor. When Agent land on the floor without touching it, Agent spawns again. - It uses different brains depending on where it spawns.
- It can move forward and back and turn around right and left direction (Discrete Action) .
- It can observe around with ray sensor. This ray is fired at 360 degrees.
- Set three different brains to Agent where it spawns.
<Curriculum Training>
There two curriculum parameters : SpawnableAreaNum
and StepReward
.
cf. \config\AgentManagerCurriculum.yaml
Curriculum settings is below:
SpawnableAreaNum | StepReward | Using Behavior | threshold |
---|---|---|---|
0.0 (B_StairSide) | -0.0002 | EL_B_StairSide | 0.6 |
1.0 (A_StairSide) | -0.0002 | EL_A_StairSide | 0.5 |
2.0 (C_StairSide) | -0.0002 | EL_C_StairSide | 0.5 |
3.0 (All) | -0.00025 | One of three | - |
Training starts from SpawnableAreaNum = 0
.
Max step of each Behavior is below :
Behavior Name | Max Step |
---|---|
EL_B_StairSide | 1,000,000 |
EL_A_StairSide | 10,000,000 |
EL_C_StairSide | 10,000,000 |
- Agent gets
StepReward
set by Curriculum training at every step. - When Agent touches Obstacles, it gets
-1.0
. - If Agent reaches Exit which is closest from where it spawned, it gets
1.5
. Else, it gets0.75
.
Here is the result video. *The video is slow, this is due to the specs of my PC :(
Look it on My Twitter.
Here are graph :
Here is the scatter plot. Please compare environment map.
Finally here is result of each value.
Agent has a 90% chance to evacuate from 1F of this school!