-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] feat: encode and decode using element_coder
#117
Conversation
lmk if the pre-commit checks are a pain and I can have it run the normal test suite regardless. |
fr = frac_coords[i] | ||
site_ids = np.where(at > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, this was your criterion to identify the number of sites. However, I'm not sure if this will work so well (it will at least require some changes) as some elements will encode to zero and zero values will decode to some valid elements.
Remedies might be to
- separately encode the number of sites
- always ensure that sites occupied with elements are encoded > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encoding num_sites
as a separate variable might be best. While it's more information for a generative algorithm to learn, it might help out with the issue in #82. Originally, I was leaning towards encoding a non-atom as a integer (i.e. 0
), but something interesting that happens here is that the distance between a "non-atom" and the lowest atom is the same distance between the lowest atom and the second lowest atom, which seems a bit off to me. If we encode the information separately, then we've essentially encoded parameters for a mask to be applied. That sits a bit better with me.
There's also the possibility of creating a buffer between a "non-atom" and the lowest atom, (e.g. 0 < non_atom < 10
).
I think I'll open the encoding num_sites
as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to remove tmp.html
and tmp.png
(I'm guessing this was leftover from me).
fr = frac_coords[i] | ||
site_ids = np.where(at > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encoding num_sites
as a separate variable might be best. While it's more information for a generative algorithm to learn, it might help out with the issue in #82. Originally, I was leaning towards encoding a non-atom as a integer (i.e. 0
), but something interesting that happens here is that the distance between a "non-atom" and the lowest atom is the same distance between the lowest atom and the second lowest atom, which seems a bit off to me. If we encode the information separately, then we've essentially encoded parameters for a mask to be applied. That sits a bit better with me.
There's also the possibility of creating a buffer between a "non-atom" and the lowest atom, (e.g. 0 < non_atom < 10
).
I think I'll open the encoding num_sites
as a separate PR.
atomic_symbols = [ | ||
decode_many( | ||
encoding, self.element_encoding, metric=self.element_decoding_metric | ||
) | ||
for encoding in unscaled_atom_encodings | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decodes non-atoms to "He" using the mod_pettifor
representation (related to atom_range=(0,117)
instead of atom_range=(1,118)
, though it gets dropped later on per your comment.
uniq_atoms = np.unique(list(chain(*atomic_numbers))) self._atom_range = [np.min(uniq_atoms), np.max(uniq_atoms)]
element_coder
ToDo:
fit
to return a list of elements (as not all scales change the same way across the PTE)encode_many
anddecode_many
kjappelbaum/element-coder#7 is implemented (should be easy due to thesciris
data structure we use) switch to theencode_many
anddecode_many