Generalization to building blocks rather than only atoms #200

sgbaird · 2022-07-29T16:01:11Z

From internal communication. By Berend Smit:

Coming from the perspective of MOFs I see one main point as a potential opportunity for the library:

If one could abstract the encoding of the image a bit more from atoms as fundamental building blocks one could apply it also to MOFs (or some coarse-grained representation).

That is, the most general implementation would have an interface such as
encode(structure, site_encoding_func, embedding_encoding_func) -> image array
decode(image,  site_decoding_func, embedding_decoding_func) -> structure
By default, the functions would to the encoding of the elements. However, if users provide other sites/or use symbols to indicate certain building blocks, they might want to choose their own encoding/decoding function. This should also make it easier to use Wyckoff sites instead of all sites.

The embedding_encoding_func would be a function that by default creates the pairwise distance matrix, but might also be the adjacency matrix (which can be useful if one aims to generate new crystallographic nets).

Another interesting question might be how materials cluster in "xtal2png" space compared to other representations, e.g. SOAP. However, this would require the implementation invariant to permutation and supercell expansion.

I really like this suggestion. This actually points to a common issue with many materials informatics repositories. For a while, I've wanted to make CrabNet agnostic to chemical formulas sparks-baird/CrabNet#6. @Pepe-Marquez is also interested in featurization for more general building blocks based on some internal discussions I've had with him.

To implement a really general "building blocks" framework seems non-trivial to me, at least at first. I think the common threads here would be that site_encoding_func-s and embedding_encoding_func-s would operate on pymatgen Structure-s, and the site_decoding_func-s and embedding_decoding_func-s would operate on images, where each row/column represents a unique building block. In the latter case, the current xtal2png representation starts to break down since it contains site coordinates. For arbitrary building blocks (e.g. of structural motifs), additional (invertible) information related to the composition and structure of the motifs would need to be present.

The text was updated successfully, but these errors were encountered:

sgbaird · 2022-08-05T02:26:20Z

Following up from a chat with @kjappelbaum, there could be a hierarchy of building block types manifested in the layers. For example, the first layer encodes information about the atoms, the second layer encodes information about structural motifs (larger building blocks), etc.

sgbaird added the enhancement New feature or request label Jul 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalization to building blocks rather than only atoms #200

Generalization to building blocks rather than only atoms #200

sgbaird commented Jul 29, 2022 •

edited

Loading

sgbaird commented Aug 5, 2022

Generalization to building blocks rather than only atoms #200

Generalization to building blocks rather than only atoms #200

Comments

sgbaird commented Jul 29, 2022 • edited Loading

sgbaird commented Aug 5, 2022

sgbaird commented Jul 29, 2022 •

edited

Loading