Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't reproduce your result through the models that you provided #1

Open
Weile0409 opened this issue Jun 16, 2021 · 0 comments
Open

Comments

@Weile0409
Copy link

Hi, I am currently studying reinforcement learning, I use your code to get LunarLander DDQN, DQN and Priority model, the learning curve is similar to your result! However, when I use the saved model (mine and your), and apply to your last part of your provided code :

rs=[]
env = gym.make('LunarLander-v2')
env.seed(0)
state = env.reset()
state_size = env.observation_space.shape[0]
action_size=env.action_space.n
eps = 1  

agent = DDQNAgent(state_size, action_size, 1, ddqn=True, priority=True)
# agent.qnetwork_local.load_state_dict(torch.jit.load('LunarLander-v2_DQN_4000_20210429222334.pt', map_location="cuda:0"))  # Choose whatever GPU device numbe
agent.qnetwork_local.load_state_dict(torch.load('model/LunarLander-v2_Priority_4000_20210430004016.pt'))

#img = plt.imshow(env.render(mode='rgb_array')) # only call this once
for _ in range(2000):
    env.render()
    action = agent.act(state, eps)
    next_state, reward, done, _ = env.step(action)
    rs.append(reward)
    if done:
        print(reward)
        state = env.reset()
        
env.close()

I get the result of print(reward) shown as  below :
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100
-100

Why I get all same negative reward? Any suggestion for me to solve this problem? Thank you very much !
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant