Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Batching for input and output groups (i.e. Domain localization and observation localization) #377

Open
Tracked by #374
odunbar opened this issue May 16, 2024 · 0 comments · May be fixed by #380
Open
Tracked by #374
Assignees

Comments

@odunbar
Copy link
Collaborator

odunbar commented May 16, 2024

Purpose

To reduce computational complextity via blocking the dependence of parameters and data. The blocking would break and update of all parameters into several small updates of partitions of the parameter space. All this would take place within each single iteration of EKP.

Example

Prototype a user interface to pass

  • parameter partitioning (via e.g. names of the parameter), NB we may wish to enforce that no two parameters can be in different groups
  • data selection (not necessarily a partition, as data can be reused to update different parameter groups parameters)
  • maybe using different EK process/ localizer/ inflation etc. for these different update stages (all within the same iteration)

Simultaneously we should be thinking about how this interface could be used within a setting of mini-batching, where different partial observations are used in different EKP iterations

Discussed implementation

  • Create an array of structs defining a conditionally dependent parameters and , passed into EKP constructor update_groups = in addition to current arguments
struct UpdateGroup {VV <: AbstractVector}
    u_subset::VV 
    y_subset::VV
    # process::Process # in future
    # localizer::Localizer # in future
    # inflation::Inflation # in future
end
  • check all u_subsets form a partition of 1:p
  • if not provided create one UpdateGroup with u_subsets and y_subsets being the full indices
  • In the EnsembleKalmanProcess.jl update_ensemble!(...) function, Replace
u = update_ensemble!(ekp,...) 

with

u = zeros(param_size,ens_size)
for group in update_groups # parallelizable!
    idx =group.u_subset
    u[idx, :] = update_ensemble!(ekp, group, ...) 
end

Possible hurdles

  • move push!(g,...) out of inner update_ensemble! call to be outside
  • where processes use priors, e.g. UKI/EKS (e.g. need to subset process.cov)
  • UKI generating initial ensemble
  • do we calculate the parameter dimension from a fixed quantity anywhere? or is it all deduced from passed in values

Example problem

  • Non-timescale separated coupled system with data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants