Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete redesign of "Observations" object enabling introduction of minibatching #384

Merged
merged 32 commits into from
Jul 31, 2024

Conversation

odunbar
Copy link
Collaborator

@odunbar odunbar commented Jun 17, 2024

Purpose

Closes #382
Closes #383
Closes #385
Improves some of the getters in #386

Summarized by the docs here
and the API docs here

Content

  • Remove the unnecessary Module wrapping Observations object. And Replaces the old and largely useless Observation object
  • Create new Observation ObservationSeries and Minibatcher object. that can store y's Gamma's and minibatching framework to batch up epochs over them.
  • EKP now always internally stores an ObservationSeries rather than y and \Gamma separately.
  • Observations are now accessed using the get functions that pull the stacked-y and blocked-\Gamma from the ObservationSeries object for the current minibatch
  • update_minibatch!( is called at the end of each EKP step, updating the batch. At the end of the epoch, this also calls create_new_epoch!( for the minibatcher to create a new epoch of minibatches
  • compatible with EKI, ETKI, EKS and UKI - as Observation stores the inverse obs noise cov too.
  • learning rate schedulers compatible with minibatching
  • Back-compatible with old interface passing in y and Gamma to EKP
  • changes examples that used the old Observation object
  • unit tests for all new structs
  • added a docs page with small examples
  • API & docstrings
  • resolves the ETKI bug with the timestepper, and ensured scaling preserved in new update

  • I have read and checked the items on the review checklist.

@odunbar odunbar changed the title Clean-up "Observations" object and introducing minibatching [WIP] Clean-up "Observations" object and introducing minibatching Jun 17, 2024
removed Observations Module

format

Redesign of Observation

Observation tests and format

tests for minibatchers

ObservationSeries tested

build=true default for get_obs and get_obs_noise_cov

interface for EKP

add some more convenience functions for ObservationSeries

test no_minibatching setup

updated examples with MB

UKI constructor

remove build-bug where obs_noise_cov append flattens array

typo

format

add vec

typo

added Dict to construct ObservationSeries, and added == operations

added storage of observation inverses
@odunbar odunbar changed the title [WIP] Clean-up "Observations" object and introducing minibatching [WIP] Complete redesign of "Observations" object enabling introduction of minibatching Jun 26, 2024
@odunbar odunbar changed the title [WIP] Complete redesign of "Observations" object enabling introduction of minibatching Complete redesign of "Observations" object enabling introduction of minibatching Jul 11, 2024
docs/src/observations.md Outdated Show resolved Hide resolved
src/EnsembleKalmanInversion.jl Outdated Show resolved Hide resolved
src/EnsembleKalmanProcess.jl Outdated Show resolved Hide resolved
src/EnsembleTransformKalmanInversion.jl Show resolved Hide resolved
Copy link
Contributor

@eviatarbach eviatarbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent, thank you for all the work on this!

I made a few comments in the code. Besides these comments, I was wondering why you removed the multiple samples of y in Cloudy_example_eki.jl and aerosol_activation.jl? It seems like it would be useful to have an example with multiple samples, especially now that they can be handled better.

docs/src/observations.md Outdated Show resolved Hide resolved
docs/src/observations.md Outdated Show resolved Hide resolved
docs/src/observations.md Outdated Show resolved Hide resolved
docs/src/observations.md Outdated Show resolved Hide resolved
docs/src/observations.md Show resolved Hide resolved
docs/src/observations.md Outdated Show resolved Hide resolved
docs/src/observations.md Outdated Show resolved Hide resolved
src/EnsembleKalmanProcess.jl Outdated Show resolved Hide resolved
src/EnsembleKalmanProcess.jl Show resolved Hide resolved
X = FT.((u .- mean(u, dims = 2)) / sqrt(m - 1))
Y = FT.((g .- mean(g, dims = 2)) / sqrt(m - 1))
Ω = inv(I + Y' * Γ_inv * Y)
w = FT.(Ω * Y' * Γ_inv * (y .- mean(g, dims = 2)))
tmp = get_buffer(get_process(ekp)) # the buffer stores Y' * Γ_inv of [size(Y,2),size(Y,1)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section of code is quite difficult to read and verbose. Any way it can be simplified?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ 1 . If it can't be simplified, some additional comments would also be helpful (and commented out code snippets removed)

Copy link
Collaborator Author

@odunbar odunbar Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to allocate all the buffers up front, then I could remove the logic for the computation, I think it looks much cleaner now!

src/EnsembleKalmanProcess.jl Outdated Show resolved Hide resolved
src/EnsembleKalmanProcess.jl Outdated Show resolved Hide resolved
X = FT.((u .- mean(u, dims = 2)) / sqrt(m - 1))
Y = FT.((g .- mean(g, dims = 2)) / sqrt(m - 1))
Ω = inv(I + Y' * Γ_inv * Y)
w = FT.(Ω * Y' * Γ_inv * (y .- mean(g, dims = 2)))
tmp = get_buffer(get_process(ekp)) # the buffer stores Y' * Γ_inv of [size(Y,2),size(Y,1)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ 1 . If it can't be simplified, some additional comments would also be helpful (and commented out code snippets removed)

src/LearningRateSchedulers.jl Outdated Show resolved Hide resolved
Copy link
Member

@costachris costachris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments - Looks good to me

@odunbar
Copy link
Collaborator Author

odunbar commented Jul 30, 2024

@eviatarbach In response to your comment on the examples, before we were not actually using more than one sample even though we gathered many, so what was happening in those examples did not really make sense anyway. My replacement does not effect the functionality. However you are right - we could include the multiple samples in examples in future, but perhaps this can be left to a future PR?

@eviatarbach
Copy link
Contributor

LGTM! Thank you.

@odunbar odunbar merged commit dab5aff into main Jul 31, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants