Author: Ruiwen Zhou
E-mail: [email protected]
This is the final assignment of EE228 Curriculum @ SJTU, in which we are required to generate a learning-based agent to automatically play 2048 game for a high average score in 10 games.
In this project, I use a completely offline method to obtain a high-performance CNN agent, which has a chance of about 25% to win a single game and more than 70% to reach 1024.
The main HIGHLIGHTS of this project can be summarized as following:
- The training is completely offline, which makes the solution simple.
- The convolution structure of my network is a novel combination of two CNNs.
- The performance of my agent is excellent and stable.
- The operation of generating log files and analyzing data is efficient and simple.
My submission is only one step from 10 victories. However, it is really a pity that I have no luck to produce a perfect log file. Due to time limit, I haven't try unlimited or harder mode and I'd like work on it in the future.
game2048/
: the API and Expectimax package.game.py
: the core 2048Game
class.agents.py
: theAgent
class with instances.displays.py
: theDisplay
class with instances, to show theGame
state.expectimax/
: a powerful ExpectiMax agent by here.
static/
: frontend assets (based on Vue.js) for web app.MyAgent.py
: theMyOwnAgent
class with its implementation.cnn.py
: theConv_Net_Com
class used as agent and its two originals:Conv_Net
: A CNN model offered by TA @duducheng.Conv_Net_v2
: A CNN model published here.
train_model.py
: the whole process for training aConv_Net_Com
instance, with testing function insides.data_process.py
: theDataset_2048
class to read and store data, based ontorch.utils.data.Dataset
.generate_data.py
: utilize the planning-basedExpectimax
to generate ideal trajectories as training set.generate_fingerprint.py
: pre-offered script for solution validation.evaluate.py
: evaluate self-defined agent.mdl_final.pkl
: the stored final version model.run.sh
: the shell script for batch operation on log files.data_analysis.cpp
: the C++ code traversing all log files to analyze the performance of agent.
Due to space limit, generated log files and training dataset are not uploaded in this repository.
- Code tested on Windows and Linux system (Windows 10 and Ubuntu 18.04)
- High version of PyTorch and Torchvision is recommended
- Python 3 (Anaconda 3.6.3 specifically) with numpy, pandas and tqdm
- To train a new CNN agent:
python train_model.py
# Get training information here:
# ......
Save Model? [Y/n] # press Y to store and N to discard the current model
- To evaluate the trained agent:
python evaluate.py >> EE228_evaluation.log
- To generate more log files in a single line, either for higher score or for data analysis:
bash run.sh
# Get the current round and average score of each log here
# Example:
# Round: 1/1000, Average score: @10 times 2048.0
# ......
# At last a C++ program will analyze data automatically
# Some insights about the performance of agent will be listed here
# Example:
# Average score per game: 2048.0
# ......
With parameters set in run.sh
, certain number of log files can be obtained and then analyzed easily.
- To train and evaluate an agent, the operation is exactly the same as in Ubuntu terminal.
run.sh
cannot be used in Windows due to the lack of GPU support in WSL.
(Things might change by the end of this year, as Microsoft plans to publish an update for this)
Using mdl_final.pkl
in agent, I generated 15,000 log files in total and provide some results here:
Average score per game: 1116.08
Max average score @10 times ever obtained: 1945.6
Ratio of AVERAGE score > 1024 @10 times: 70.2267%
Ratio of SINGLE GAME victory: 25.4527%
Max Tile | 16 and 16- | 32 | 64 | 128 | 256 | 512 | 1024 | 2048 | Total |
---|---|---|---|---|---|---|---|---|---|
Frequence | 27 | 69 | 427 | 2083 | 8918 | 31371 | 68926 | 38179 | 150000 |
Frequency | 0.018% | 0.043% | 0.285% | 1.389% | 5.945% | 20.914% | 45.951% | 25.453% | 100.000% |
To be honest, those actions leading to 16- score seem so funny that I really wonder why my agent carried out them.
- As
mdl_final.pkl
has been powerful, incremental learning might be powerful afterwards - I do not use data preprocessing and multi-model decision here, which is possibly useful
- I train my model offline but I do think this project is more suitable for DQN based model
For tutorial of the given 2048 API, refer to API Tutorial.
The code is under Apache-2.0 License.