Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification about the observation or system state returned by the task class #96

Open
wilhem opened this issue Jul 2, 2024 · 1 comment
Labels
question Further information is requested

Comments

@wilhem
Copy link

wilhem commented Jul 2, 2024

Hello,

I was studying carefully the code for the panda reach task and 2 questions came up to my mind:

  1. The observation vector returned by the system contains the position of the end-effector of the robot. I wonder, whether it would work if the observation of the system consists of the joint angles of the robot instead of the position of the end-effector. Theoretically, the agent should be able to learn anyway. Or not?
  2. The reward is calculated based on the distance between the target and the end-effector or, in sparse mode, it consists of only zeros and ones, when the distance < distance_threshold. But in case of sparse reward any DDPG, PPO, SAC agent will fail to learn. How do you train the agent using the sparse reward? Did you use the hindsight experience replay from SB3?

Thanks

@wilhem wilhem added the question Further information is requested label Jul 2, 2024
@qgallouedec
Copy link
Owner

qgallouedec commented Jul 2, 2024

  1. Yes! It is precisely the config of PandaReachJoints-v3 edit: my bad, in this environment you still get the ee position.
  2. True again, the sparcity makes the task really hard to learn. For reach, it could work though, but for the other tasks you have very low chance to learn anything. That's why we use tricks like HER, indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants