Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some elementary EDA to interactive dataset pages #126

Merged
merged 3 commits into from
Aug 12, 2022
Merged

Add some elementary EDA to interactive dataset pages #126

merged 3 commits into from
Aug 12, 2022

Conversation

janosh
Copy link
Member

@janosh janosh commented Apr 8, 2022

On all the dataset details pages, we could show some interactive Plotly plots.

Taking Matbench Dielectrics as an example,

https://ml.materialsproject.org/projects/matbench_dielectric

here are some ideas for plots that would render below the table on each dataset page:

Matbench Dielectric EDA

matbench-dielectric-elements

dielectric-spacegroup-sunburst

dielectric-violin

dielectric-scatter

Code

Here's a Jupyter notebook to generate those plots. Doing the same for the other data sets would be mostly just swapping out the argument to load_dataset and not generating some plots for composition-only tasks.

@ardunn I'm not sure where this code would live exactly if you think it's worth adding. Would it run as part of rebuild_docs.py? Some guidance there would be great!

@ardunn
Copy link
Collaborator

ardunn commented May 19, 2022

@janosh woah. this is, like, super cool.

I think directly adding them to MPcontribs as something that would be dynamically generated on MP's side is opening a can of worms (i.e., we would have to pester @tschaume to upkeep it and update this code, which is something I don't think he wants to do)

I like the idea of adding it to rebuild_docs which would generate static HTML iframes which we can display (by reference) on the matbench website as well as on the MP ML page and MP contribs if MP people so desire.

I can merge this in and then add the ipynb code to rebuild_docs if that would be easiest. I am having some trouble figuring out what the code changes in the 36 files changed are though, is it just an artifact from different linking?

@janosh
Copy link
Member Author

janosh commented May 20, 2022

I can merge this in and then add the ipynb code to rebuild_docs if that would be easiest.

Good to know there's interest! I'm happy to do that and save you some trouble.

I am having some trouble figuring out what the code changes in the 36 files changed are though, is it just an artifact from different linking?

My bad, the many changed files are the result of running a few CLI commands that apply auto-fixes like codespell, isort and pyupgrade. I'll split those of into a separate PR.

@janosh janosh mentioned this pull request May 20, 2022
@janosh
Copy link
Member Author

janosh commented May 20, 2022

@ardunn Alright, here's a rough draft for a function generate_plotly_eda_figs() that creates the above plots for all matbench datasets and saves them in docs_src/static.

I saved all figures as HTML with include_plotlyjs="cdn" to keep the file size down. Does that work for how they'll be deployed?

Also, for speed of regenerating these plots, generate_plotly_eda_figs() caches crystal_system and spacegroup numbers for all datasets to bz2-compressed dataframes in scripts/artifacts. I included them in this PR but happy to gitignore them if you think they'll pollute git history.

@janosh
Copy link
Member Author

janosh commented May 20, 2022

Btw, I say rough draft for generate_plotly_eda_figs() since we need to tweak the title casing and placement here and there. I also just found a bug on this line:

target = df.columns[2] # 3rd col doesn't always give the target

Title is off

matbench_log_gvrh_spacegroup_sunburst

matbench_perovskites_spacegroup_sunburst

matbench_mp_e_form_elements

matbench_steels_elements

matbench_glass_elements

@janosh janosh mentioned this pull request Jul 7, 2022
@janosh
Copy link
Member Author

janosh commented Jul 26, 2022

@mkhorton Suggested we could put these plots on https://materialsproject.org/ml (which apparently has just been a placeholder so far) and make them interactive using dash/crystal-toolkit.

@ardunn
Copy link
Collaborator

ardunn commented Jul 27, 2022

Yes that sounds great! RN there seems to be some conflicts (I tried to fix but I may have messed something up lol)

@janosh
Copy link
Member Author

janosh commented Jul 27, 2022

Oops, did you already solve the conflicts in e8b88e6? If so, I'll undo my merge.

I wasn't quite sure if I'd merged everything correctly anyway. 2 months ago now, can't quite remember what parts I wrote.

@ardunn
Copy link
Collaborator

ardunn commented Jul 27, 2022

@janosh I think I tried but something got messed up. I think maybe the easiest thing is to undo the merge (without re-linting afterwards, I think that was the source of many conflicts)

edit: Oops didn't mean to close this

@ardunn ardunn closed this Jul 27, 2022
@ardunn ardunn reopened this Jul 27, 2022
@janosh
Copy link
Member Author

janosh commented Jul 31, 2022

I undid the merge. Now scripts/rebuild_docs.py conflicts. I'll resolve that manually without merging upstream/main into my main. Is that what you meant?

@ardunn
Copy link
Collaborator

ardunn commented Aug 12, 2022

I undid the merge. Now scripts/rebuild_docs.py conflicts. I'll resolve that manually without merging upstream/main into my main. Is that what you meant?

Sure, that seems fine to me!

@janosh
Copy link
Member Author

janosh commented Aug 12, 2022

Sure, that seems fine to me!

That part is done so I think this is good to go. The CI errors seem to be the issue described in hackingmaterials/matminer#840 and not related to my changes.

@ardunn ardunn merged commit c2df9fc into materialsproject:main Aug 12, 2022
@sgbaird
Copy link
Contributor

sgbaird commented Aug 13, 2022

pymatviz for the win! Nice @janosh! Excited to see when these appear on the Matbench website

@janosh
Copy link
Member Author

janosh commented Aug 13, 2022

pymatviz for the win! Nice @janosh! Excited to see when these appear on the Matbench website

Me too. 😄 @ardunn Let me know if I can help with that in any way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants