[WIP] feat: encode and decode using `element_coder` #117

kjappelbaum · 2022-06-20T07:19:54Z

ToDo:

test
modify fit to return a list of elements (as not all scales change the same way across the PTE)
after implement encode_many and decode_many kjappelbaum/element-coder#7 is implemented (should be easy due to the sciris data structure we use) switch to the encode_many and decode_many

sgbaird · 2022-06-22T02:09:29Z

lmk if the pre-commit checks are a pain and I can have it run the normal test suite regardless.

kjappelbaum · 2022-06-25T12:33:23Z

src/xtal2png/core.py

            fr = frac_coords[i]
-            site_ids = np.where(at > 0)


As I understand, this was your criterion to identify the number of sites. However, I'm not sure if this will work so well (it will at least require some changes) as some elements will encode to zero and zero values will decode to some valid elements.

Remedies might be to

separately encode the number of sites

always ensure that sites occupied with elements are encoded > 0

Encoding num_sites as a separate variable might be best. While it's more information for a generative algorithm to learn, it might help out with the issue in #82. Originally, I was leaning towards encoding a non-atom as a integer (i.e. 0), but something interesting that happens here is that the distance between a "non-atom" and the lowest atom is the same distance between the lowest atom and the second lowest atom, which seems a bit off to me. If we encode the information separately, then we've essentially encoded parameters for a mask to be applied. That sits a bit better with me.

There's also the possibility of creating a buffer between a "non-atom" and the lowest atom, (e.g. 0 < non_atom < 10).

I think I'll open the encoding num_sites as a separate PR.

sgbaird

I'm going to remove tmp.html and tmp.png (I'm guessing this was leftover from me).

sgbaird · 2022-06-27T23:34:20Z

src/xtal2png/core.py

            fr = frac_coords[i]
-            site_ids = np.where(at > 0)


Encoding num_sites as a separate variable might be best. While it's more information for a generative algorithm to learn, it might help out with the issue in #82. Originally, I was leaning towards encoding a non-atom as a integer (i.e. 0), but something interesting that happens here is that the distance between a "non-atom" and the lowest atom is the same distance between the lowest atom and the second lowest atom, which seems a bit off to me. If we encode the information separately, then we've essentially encoded parameters for a mask to be applied. That sits a bit better with me.

There's also the possibility of creating a buffer between a "non-atom" and the lowest atom, (e.g. 0 < non_atom < 10).

I think I'll open the encoding num_sites as a separate PR.

sgbaird · 2022-06-27T23:39:59Z

src/xtal2png/core.py

+            atomic_symbols = [
+                decode_many(
+                    encoding, self.element_encoding, metric=self.element_decoding_metric
+                )
+                for encoding in unscaled_atom_encodings
+            ]


This decodes non-atoms to "He" using the mod_pettifor representation (related to atom_range=(0,117) instead of atom_range=(1,118), though it gets dropped later on per your comment.

uniq_atoms = np.unique(list(chain(*atomic_numbers))) self._atom_range = [np.min(uniq_atoms), np.max(uniq_atoms)]

feat: encode and decode using

fc39868

kjappelbaum marked this pull request as draft June 20, 2022 07:20

sgbaird and others added 3 commits June 24, 2022 11:33

Merge branch 'main' into element_coder

e717934

fix: imports

8a7a51b

chore: use decode_many

6dbfd34

kjappelbaum marked this pull request as ready for review June 25, 2022 12:30

kjappelbaum commented Jun 25, 2022

View reviewed changes

chore: expose decode metric

35c7ef9

sgbaird reviewed Jun 28, 2022

View reviewed changes

sgbaird added 7 commits June 27, 2022 18:56

mask_type merge from other PR

dc0e4e9

add PR'd tests and element_coder test

848606c

Delete tmp.html

0118e8d

Delete tmp.png

ad5935d

Update .gitignore

21e8ae7

keep a _atom_range variable that will probably get removed later

3bfa9fe

uniq_atoms = np.unique(list(chain(*atomic_numbers))) self._atom_range = [np.min(uniq_atoms), np.max(uniq_atoms)]

check against _atom_range instead of atom_range during test_fit

918a42c

sgbaird mentioned this pull request Jun 28, 2022

replace volume with num_sites and use num_sites as a mask #141

Merged

Boron's atomic number is 5

d705b7c

kjappelbaum changed the title ~~[WIP] feat: encode and decode using~~ [WIP] feat: encode and decode using element_coder Jun 28, 2022

sgbaird approved these changes Jun 28, 2022

View reviewed changes

sgbaird merged commit 8b5f690 into sparks-baird:main Jun 28, 2022

sgbaird linked an issue Jul 8, 2022 that may be closed by this pull request

consider kwarg for mapping noble gases to a "nearby" element or (better) using different elemental featurizer #82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] feat: encode and decode using `element_coder` #117

[WIP] feat: encode and decode using `element_coder` #117

kjappelbaum commented Jun 20, 2022 •

edited by sgbaird

Loading

sgbaird commented Jun 22, 2022

kjappelbaum Jun 25, 2022

sgbaird Jun 27, 2022

sgbaird left a comment

sgbaird Jun 27, 2022

sgbaird Jun 27, 2022

[WIP] feat: encode and decode using element_coder #117

[WIP] feat: encode and decode using element_coder #117

Conversation

kjappelbaum commented Jun 20, 2022 • edited by sgbaird Loading

sgbaird commented Jun 22, 2022

kjappelbaum Jun 25, 2022

Choose a reason for hiding this comment

sgbaird Jun 27, 2022

Choose a reason for hiding this comment

sgbaird left a comment

Choose a reason for hiding this comment

sgbaird Jun 27, 2022

Choose a reason for hiding this comment

sgbaird Jun 27, 2022

Choose a reason for hiding this comment

[WIP] feat: encode and decode using `element_coder` #117

[WIP] feat: encode and decode using `element_coder` #117

kjappelbaum commented Jun 20, 2022 •

edited by sgbaird

Loading