Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrades Living Document - Models #445

Open
ardunn opened this issue Mar 19, 2020 · 3 comments
Open

Upgrades Living Document - Models #445

ardunn opened this issue Mar 19, 2020 · 3 comments
Labels
living document ISSUE with ideas/progress on large projects or upgrades

Comments

@ardunn
Copy link
Contributor

ardunn commented Mar 19, 2020

Rationale

In addition to featurizers and datasets, it might be useful if this code could access models in the same way datasets are downloaded.

This would extend the usage of matminer as a general purpose "toolbox" for materials science. It would also make matminer a curated resource which would hold many valuable models otherwise spread out across random GH repos and figshare accounts.

I imagine the majority of these would be graphnet models.

Example of usage

Predicting properties directly

For example:

from matminer.models import retrieve_model

m = retrieve_model("automatminer_KVRH_2020")

Or:

m = retrieve_model("MEGNET_2018_e_form")

Then:

my_predictions = m.predict(target="e_form", inputs=["structure"])

Transfer learning for graphnet models

my_embeddings = m.embeddings(inputs=["structure"])

This would provide a consistent interface for getting embeddings from graphnet models to use downstream in other ML tasks, and does not need to weirdly go through BaseFeaturizer like CGCNN featurizer did.


Models to add

CLMP

CRABNet

Roost

MegNET

MODNet

CGCNN

  • code is too difficult to maintain inside a package like matminer as it is not versioned or made into an easily importable package
@ardunn ardunn changed the title Add models Upgrades Living Document - Models Jun 7, 2021
@ardunn ardunn added living document ISSUE with ideas/progress on large projects or upgrades and removed enhancement labels Jun 8, 2021
@CompRhys
Copy link

CompRhys commented Jun 8, 2021

I am keen in order to help make it as easy as possible to use roost and [model name were working on] going forward as this is the last year of my PhD. I believe that standardising into something-like matminer which hopefully will have longer-term support from LBNL will help so if there's anything I can help with on this models initiative let me know. I have a re-implementation of cgcnn in the same repo as roost with an identical interface that could be used here for free if a standard model API is developed.

https://github.com/gomes-lab/H-CLMP is the link for the CLMP model
https://doi.org/10.1038/s41524-021-00552-2 is the journal doi for MODnet

@ardunn
Copy link
Contributor Author

ardunn commented Jun 8, 2021

Thanks @CompRhys, I think RooSt will be a great addition to matminer.

The way i am considering implementing this models module is something like this, for each external package we want to use:

- models/
    - roost/
        - tests/
        - main.py
        - requirements.txt

Each model has a directory.

  • Main.py holds classes which implement simple methods for predicting/getting embeddings for every single model.
    • something like class RoostModel(BaseModel), which would implement methods for getting predictions, embeddings, and a list of the primitives the model was trained on
    • Multiple fitted models can be inside one class (e.g., RoostModel(pretrained="e_form"), or RoostModel(pretrained="k_vrh"), etc.)
    • the class would also have a method for fetching weights from matminer figshare and putting them into the model, using the pre-existing functions in the matminer.datasets module
  • Requirements.txt
    • Holds +git@commit or pip requirements needed to run the model
  • Tests must be included and can be run in individual CI workflows to avoid weird dependency conflicts between different models

@CompRhys
Copy link

I've been thinking about this a little bit more and I think the most effective way to quickly add structure-based graphnets to matminer will be to wrap https://github.com/Open-Catalyst-Project/ocp/tree/master/ocpmodels/models - this way matminer would have access to more models for free as the OCP benchmarking suite is expanded over time. I would say the only issue you might have with this is that it neglects MegNET but you could have a separate wrapper or their package or perhaps the MegNET authors might want to add their model to the OCP benchmark.

Separate to this I could look at whether it is possible to and standardise roost, crabnet and H-CLMP together. As far as I understand the primary difference between roost and crabnet are the choice of message passing function - they use QKV attention whereas mine mimics the attention mechanism in the GAT paper. The H-CLMP model duplicates a fair chunk of code from roost although I am not 100% sure how their model works yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
living document ISSUE with ideas/progress on large projects or upgrades
Projects
None yet
Development

No branches or pull requests

2 participants