Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalization to building blocks rather than only atoms #200

Open
sgbaird opened this issue Jul 29, 2022 · 1 comment
Open

Generalization to building blocks rather than only atoms #200

sgbaird opened this issue Jul 29, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@sgbaird
Copy link
Member

sgbaird commented Jul 29, 2022

From internal communication. By Berend Smit:

Coming from the perspective of MOFs I see one main point as a potential opportunity for the library:

If one could abstract the encoding of the image a bit more from atoms as fundamental building blocks one could apply it also to MOFs (or some coarse-grained representation).

That is, the most general implementation would have an interface such as

encode(structure, site_encoding_func, embedding_encoding_func) -> image array
decode(image,  site_decoding_func, embedding_decoding_func) -> structure

By default, the functions would to the encoding of the elements. However, if users provide other sites/or use symbols to indicate certain building blocks, they might want to choose their own encoding/decoding function. This should also make it easier to use Wyckoff sites instead of all sites.

The embedding_encoding_func would be a function that by default creates the pairwise distance matrix, but might also be the adjacency matrix (which can be useful if one aims to generate new crystallographic nets).

Another interesting question might be how materials cluster in "xtal2png" space compared to other representations, e.g. SOAP. However, this would require the implementation invariant to permutation and supercell expansion.

I really like this suggestion. This actually points to a common issue with many materials informatics repositories. For a while, I've wanted to make CrabNet agnostic to chemical formulas sparks-baird/CrabNet#6. @Pepe-Marquez is also interested in featurization for more general building blocks based on some internal discussions I've had with him.

To implement a really general "building blocks" framework seems non-trivial to me, at least at first. I think the common threads here would be that site_encoding_func-s and embedding_encoding_func-s would operate on pymatgen Structure-s, and the site_decoding_func-s and embedding_decoding_func-s would operate on images, where each row/column represents a unique building block. In the latter case, the current xtal2png representation starts to break down since it contains site coordinates. For arbitrary building blocks (e.g. of structural motifs), additional (invertible) information related to the composition and structure of the motifs would need to be present.

@sgbaird sgbaird added the enhancement New feature or request label Jul 29, 2022
@sgbaird
Copy link
Member Author

sgbaird commented Aug 5, 2022

Following up from a chat with @kjappelbaum, there could be a hierarchy of building block types manifested in the layers. For example, the first layer encodes information about the atoms, the second layer encodes information about structural motifs (larger building blocks), etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant