- Should it be "r" or "(1-alpha)Q_old+alphar" when S,a lead to a terminal state? I think it should be the latter!
- alpha=0.1?
- Boltzmann distribution for selectoing actions?
- Batch update or only on event? Or both?
- Is the basic Q-learning formula OK for my scenario? I think yes, because the action of the other player could be considered a "random" element of the environment's state transition.
- Illegal moves need to lead to instant updates of Q -- otherwise we may get endless loops. I am doing that, but I wonder if this is some sort of double counting?
-
Notifications
You must be signed in to change notification settings - Fork 0
helmuthj/ReinforcementGame
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published