Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for data container and HPC #179

Merged
merged 1 commit into from
Jul 29, 2022
Merged

Docs for data container and HPC #179

merged 1 commit into from
Jul 29, 2022

Conversation

odunbar
Copy link
Collaborator

@odunbar odunbar commented Jul 7, 2022

Purpose and Content

add two docs pages requested in Issues #153 , closes #148

  • DataContainers page
  • HPC and Parallelism page

adds an example to demonstrate all of the different parallelism types (serial, multithread, pmap and distributed for)

  • distributed_Lorenz_example.jl

PR Checklist

  • This PR has a corresponding issue OR is linked to an SDI.
  • I have followed CliMA's codebase contribution and style guidelines OR N/A.
  • I have followed CliMA's documentation policy.
  • I have checked all issues and PRs and I certify that this PR does not duplicate an open PR.
  • Documentation has been added/updated OR N/A.

@odunbar odunbar changed the title [WIP] docs for data container and HPC Docs for data container and HPC Jul 20, 2022
@odunbar odunbar requested a review from haakon-e July 20, 2022 19:47
Copy link
Member

@haakon-e haakon-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall great addition to the docs!

One major suggestion:

  • I don't like that a lot of code is duplicated between GModel_xx.jl. I feel like this obscures the actual differences between the parallelism options. Instead, I think the components that are identical should be stored in some common file and loaded into each of _distfor, _multithread, etc.
    We could for example call this common file GModel.jl and rename the serial case e.g. to GModel_serial.jl.

I'm happy to review again once my comments and suggestions are addressed! :)


To provide a consistent form for data (such as observations, parameter ensembles, model evaluations) across the package, we store the data in simple wrappers internally.

Data is always stored as columns of `AbstractMatrix`. That is, we obey the format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Data is always stored as columns of `AbstractMatrix`. That is, we obey the format
Data is always stored column-wise in an `AbstractMatrix` subtype. That is, we obey the format

looking at the constructor code, it appears we only enforce that data is a concrete subtype of AbstractMatrix, so data will be whatever concrete type was supplied upon construction

docs/src/data_wrappers.md Outdated Show resolved Hide resolved
docs/src/data_wrappers.md Outdated Show resolved Hide resolved
docs/src/data_wrappers.md Outdated Show resolved Hide resolved
docs/src/data_wrappers.md Outdated Show resolved Hide resolved
examples/Lorenz/distributed_Lorenz_example.jl Show resolved Hide resolved
docs/src/parallel_hpc.md Outdated Show resolved Hide resolved
docs/src/parallel_hpc.md Outdated Show resolved Hide resolved
docs/src/parallel_hpc.md Outdated Show resolved Hide resolved
examples/Lorenz/GModel_distfor.jl Outdated Show resolved Hide resolved
@odunbar
Copy link
Collaborator Author

odunbar commented Jul 28, 2022

bors try

bors bot added a commit that referenced this pull request Jul 28, 2022
@bors
Copy link
Contributor

bors bot commented Jul 28, 2022

try

Build succeeded:

docs/src/parallel_hpc.md Outdated Show resolved Hide resolved
Copy link
Member

@haakon-e haakon-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the final comments, and squashing, LGTM!

I have not looked in detail on the structure of the GModel_common.jl or distributed_Lorenz_example.jl codes, which we can revisit in a future PR.

Comment on lines +74 to +72
Run the forward model G for an array of parameters by iteratively
calling run_G for each of the N_ensemble parameter values.
Return g_ens, an array of size N_data x N_ensemble, where
g_ens[:,j] = G(params[:,j])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't exactly correct in the current implementation, but I'm happy to leave this for a future PR that cleans up the runscript

docs/src/parallel_hpc.md Outdated Show resolved Hide resolved
parallel & hpc page

added lorenz examples to supplement docs

examples/Lorenz/distributed_Lorenz_example.jl

format

docs changes from review

typo

removed comment

remove serial case, add else case

format

refactor distributed Lorenz example

format

adjusted docs to suit new code

remove ...

review comments

consistent structure for GModel

fixed broken link

links to GModels

typo
@odunbar
Copy link
Collaborator Author

odunbar commented Jul 29, 2022

bors r+

@bors
Copy link
Contributor

bors bot commented Jul 29, 2022

Build succeeded:

@bors bors bot merged commit e30f2f9 into main Jul 29, 2022
@bors bors bot deleted the orad/docs-HPC branch July 29, 2022 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Small question: parallellization
2 participants