Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In sm_simple.py, SM-G-SUM and SM-G-ABS scaling differ by sz^2 #2

Open
GLJeff opened this issue Oct 26, 2018 · 1 comment
Open

In sm_simple.py, SM-G-SUM and SM-G-ABS scaling differ by sz^2 #2

GLJeff opened this issue Oct 26, 2018 · 1 comment

Comments

@GLJeff
Copy link

GLJeff commented Oct 26, 2018

Note that torch.autograd.backward() calculates the sum of gradients in all states (at least in 0.4.1 https://pytorch.org/docs/stable/autograd.html?highlight=backward#torch.autograd.backward)

SM-G-SUM feeds backward() outputs of 1 and then uses the returned gradients unaltered (ie their sum across states)
SM-G-ABS feeds backward() outputs of 1/sz and then manually calculates the mean of the gradients of the individual states, whereas in SM-G-SUM they were summed inside of backward()

The result is SM-G-SUM using a scale that is sz^2 larger in magnitude than SM-G-ABS. This is difficult to notice when the length of the states is only 2 as in the example, especially so since SM-G-ABS will return a naturally larger scale due to no washout.

Absolutely awesome work on your genetic and evolutionary research! Safe mutations are an incredible milestone in genetic optimization! Now just throw away tensorflow and pytorch and start coding in pure Cuda like you ought to be :)

@GLJeff
Copy link
Author

GLJeff commented Oct 28, 2018

To further clarify: I believe both implementions are wrong in the sense that they are not finding a scaling vector independent of the number of states.

SM-G-SUM should set:
grad_output[:, i] = 1.0/len(_states)
since the gradients get summed by the backward() pass

SM-G-ABS should EITHER:
a) grad_output[:, i] = 1.0
since these values are then averaged along the 2 axis
or
b) mean_abs_jacobian = torch.abs(jacobian).sum(2)
to sum them instead of averaging them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant