Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat: encode and decode using element_coder #117

Merged
merged 13 commits into from
Jun 28, 2022

Conversation

kjappelbaum
Copy link
Contributor

@kjappelbaum kjappelbaum commented Jun 20, 2022

ToDo:

@kjappelbaum kjappelbaum marked this pull request as draft June 20, 2022 07:20
@sgbaird
Copy link
Member

sgbaird commented Jun 22, 2022

lmk if the pre-commit checks are a pain and I can have it run the normal test suite regardless.

@kjappelbaum kjappelbaum marked this pull request as ready for review June 25, 2022 12:30
fr = frac_coords[i]
site_ids = np.where(at > 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, this was your criterion to identify the number of sites. However, I'm not sure if this will work so well (it will at least require some changes) as some elements will encode to zero and zero values will decode to some valid elements.

Remedies might be to

  • separately encode the number of sites
  • always ensure that sites occupied with elements are encoded > 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding num_sites as a separate variable might be best. While it's more information for a generative algorithm to learn, it might help out with the issue in #82. Originally, I was leaning towards encoding a non-atom as a integer (i.e. 0), but something interesting that happens here is that the distance between a "non-atom" and the lowest atom is the same distance between the lowest atom and the second lowest atom, which seems a bit off to me. If we encode the information separately, then we've essentially encoded parameters for a mask to be applied. That sits a bit better with me.

There's also the possibility of creating a buffer between a "non-atom" and the lowest atom, (e.g. 0 < non_atom < 10).

I think I'll open the encoding num_sites as a separate PR.

Copy link
Member

@sgbaird sgbaird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove tmp.html and tmp.png (I'm guessing this was leftover from me).

fr = frac_coords[i]
site_ids = np.where(at > 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding num_sites as a separate variable might be best. While it's more information for a generative algorithm to learn, it might help out with the issue in #82. Originally, I was leaning towards encoding a non-atom as a integer (i.e. 0), but something interesting that happens here is that the distance between a "non-atom" and the lowest atom is the same distance between the lowest atom and the second lowest atom, which seems a bit off to me. If we encode the information separately, then we've essentially encoded parameters for a mask to be applied. That sits a bit better with me.

There's also the possibility of creating a buffer between a "non-atom" and the lowest atom, (e.g. 0 < non_atom < 10).

I think I'll open the encoding num_sites as a separate PR.

Comment on lines +975 to +980
atomic_symbols = [
decode_many(
encoding, self.element_encoding, metric=self.element_decoding_metric
)
for encoding in unscaled_atom_encodings
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decodes non-atoms to "He" using the mod_pettifor representation (related to atom_range=(0,117) instead of atom_range=(1,118), though it gets dropped later on per your comment.

@kjappelbaum kjappelbaum changed the title [WIP] feat: encode and decode using [WIP] feat: encode and decode using element_coder Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

consider kwarg for mapping noble gases to a "nearby" element or (better) using different elemental featurizer
2 participants