Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKP for general likelihoods? #262

Closed
sdwfrost opened this issue Mar 17, 2023 · 6 comments
Closed

EKP for general likelihoods? #262

sdwfrost opened this issue Mar 17, 2023 · 6 comments

Comments

@sdwfrost
Copy link

Is it straightforward to implement the algorithm described in https://www.sciencedirect.com/science/article/pii/S0167715222000967?

@odunbar
Copy link
Collaborator

odunbar commented Mar 17, 2023

I've had some discussion with @eviatarbach about this.

The key point is that, in the notation of the paper linked, they view a forward map H as deterministic.

In our code, we view the EKP forward map G can be nonlinear and/or non-deterministic (and non-Gaussian). For example you could view our forward map G in the paper notation as an evaluation theta -> p_D(z|H(theta)) where D is some unknown distribution and H as an unknown deterministic map.

If you do so, with perfect observation p(y|z)=z then you obtain exactly the Algorithm 1 in the paper. (NB We don't yet have the timestepping implementation for h)

The only non-default option you therefore require if you wanted to do something like this, is to be able to observe the non-deterministic map perfectly.

For this we have a keyword argument deterministic_forward_map::Bool = true in update_ensemble! (at least for EKI Inversion() update). If

  1. true, the form p(y|G(x)) is assumed Gaussian with given noise covariance obs_noise_cov
  2. false, the form p(y|G(x)) is assumed to be a delta function y = G(x)

Quick resolution

You should be able to run the code with the following alteration

  • call update_ensemble!(..., deterministic_forward_map=false).
  • you may need to provide a small observational noise e.g. obs_noise_cov = 1e-4*I into EnsembleKalmanProcess(....,Inversion()) viewed as a small algorithmic regularization, though you can try with obs_noise_cov=0*I .

Let us know how you get on!

Future actions

  • We currently have an open issue into implementing a range of timestepping schemes for EKP i.e. the correct choice of h in the paper. It's on our radar, and we will hopefully get to it later in the year.

@sdwfrost
Copy link
Author

Thanks @odunbar, this is super helpful. I have to deal with count (integer) data (eg number of new infections per day for an infectious disease), that I normally transform via log(x+1). In the standard framework, I get biased parameter estimates (populations are small, so my statistics are far from being Gaussian), though for an initial condition for another scheme, your EKP library is super useful.

@odunbar
Copy link
Collaborator

odunbar commented Mar 20, 2023

Thanks!

I should caveat that EKP still only has good theory for linear & Gaussian problems.

That is to say, although our parameter distribution setups and the algorithms do allow for more interesting priors and likelihoods by pushing nonlinearities/non-Gaussianities into the forward map, this can still yield biases, as EKP still only evolve a Gaussian approximation to perform optimization. In practice though it is has shown effective in many cases.

PS RE the count data, I am by no means an expert, but we do have some work in a filtering context, where we converted infection test-results into PPV/FOR data for data assimilation (c.f. "Synthetic Data" section). This is a different transformation that puts the data into [0,1], and might be of interest to you, though the boundaries can still prove challenging.

@sdwfrost
Copy link
Author

Thanks for the feedback. Here is a writeup of my toy model using EKI and EKS. I'm pretty happy with how well it worked out - thanks!

@odunbar
Copy link
Collaborator

odunbar commented Apr 3, 2023

Awesome! Thanks for the write-up, I'm pleased it worked out for you, it's really great to hear.

If you ever work with a model you can't run >10^4 times for EKS, we do also have an ML-based pipeline CalibrateEmulateSample.jl (CES), currently just for moderate dimensional parameter O(10) , to accelerate obtaining approximate posterior distributions based on the output of EKI. In fact, for CES the posteriors do not need to be based on the Gaussianity that EKS requires, so they may also prove more flexible in capturing correlations.

[Please let me know if you are interested in trying this out too. It should hopefully just require a short script taking the priors and EKI ensemble as input]

@odunbar
Copy link
Collaborator

odunbar commented Apr 13, 2023

Closing as resolved.

@odunbar odunbar closed this as completed Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants