-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
loop branch for mutable timestepper added parameters to EKSStabilityTimestepper added DMC timestepper timestepper interface bugfixes format externalize dim-check error, bugfix stableEKS timestep Unscented consistent, and tests pass update Delt =... to timestepper = ... in unit tests` update example unit test for constructors format consistency wiith 1,9 (not-yet-refined) timestepper example with various plots Timestepper termination condition upheld docstrings unit tests loosen tol typo improved output example loosen tolerance cleaned up plots and experiment repeated for many runs format removed nothing # hide, and added more strings rename timestepper to scheduler added continue options for DMC run test longer preliminary docs for scheduler comparison example improve plot legend and docs typos etc examples/LearningRateSchedulers/compare_schedulers.jl move docs page out of examples subfolder move docs in contents modify conclusions API docstrings
- Loading branch information
Showing
16 changed files
with
905 additions
and
139 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# [Learning Rate Schedulers (a.k.a) Timestepping](@id learning-rate-schedulers) | ||
|
||
## Overview | ||
|
||
We demonstrate the behaviour of different learning rate schedulers through solution of a nonlinear inverse problem. | ||
|
||
In this example we have a model that produces the exponential of a sinusoid ``f(A, v) = \exp(A \sin(t) + v), \forall t \in [0,2\pi]``. Given an initial guess of the parameters as ``A^* \sim \mathcal{N}(2,1)`` and ``v^* \sim \mathcal{N}(0,5)``, the inverse problem is to estimate the parameters from a noisy observation of only the maximum and mean value of the true model output. | ||
|
||
We shall compare the following configurations of implemented schedulers. | ||
1. Fixed, long timestep `DefaultScheduler(0.5)` - orange | ||
2. Fixed, short timestep `DefaultScheduler(0.02)` - green | ||
3. Adaptive timestep (designed originally to ensure EKS remains stable) `EKSStableScheduler()` [Kovachki & Stuart 2018](https://doi.org/10.1088/1361-6420/ab1c3a) - red | ||
4. Misfit controlling timestep (Terminating) `DataMisfitController()` [Iglesias & Yang 2021](https://doi.org/10.1088/1361-6420/abd29b) - purple | ||
5. Misfit controlling timestep (Continuing beyond Terminate condition) `DataMisfitController(on_terminate="continue")` - brown | ||
|
||
|
||
## Timestep and termination time | ||
|
||
Recall, for example for EKI, we perform updates of our ensemble of parameters ``j=1,\dots,J`` at step ``n = 1,\dots,N_\mathrm{it}`` using | ||
|
||
`` \theta_{n+1}^{(j)} = \theta_{n}^{(j)} - \dfrac{\Delta t_n}{J}\sum_{k=1}^J \left \langle \mathcal{G}(\theta_n^{(k)}) - \bar{\mathcal{G}}_n \, , \, \Gamma_y^{-1} \left ( \mathcal{G}(\theta_n^{(j)}) - y \right ) \right \rangle \theta_{n}^{(k)},`` | ||
|
||
where``\bar{\mathcal{G}}_n`` is the mean value of ``\mathcal{G}(\theta_n)`` | ||
across ensemble members. We denote the current time ``t_n = \sum_{i=1}^n\Delta t_i``, and the termination time as ``T = t_{N_\mathrm{it}}``. | ||
|
||
!!! note | ||
Adaptive Schedulers typically try to make the biggest update that controls some measure of this update. For example, `EKSStableScheduler()` controls the frobenius norm of the update, while `DataMisfitController()` controls the Jeffrey divergence between the two steps. Largely they follow a pattern of scheduling very small initial timesteps, leading to much larger steps at later times. | ||
|
||
There are two termination times that the theory indicates are useful | ||
- ``T=1``: In the linear Gaussian case, the ``\{\theta_{N_\mathrm{it}}\}`` will represent the posterior distribution. In nonlinear cases we often find ``\bar{\theta}_{N_\mathrm{it}}`` (the ensemble mean) provides a conservative estimate of the true parameters, while retaining spread. It is noted in [Iglesias & Yang 2021](https://doi.org/10.1088/1361-6420/abd29b) that with small enough (or well chosen) step-sizes this estimate at ``T=1`` satisfies a discrepancy principle with respect to the observational noise. | ||
- ``T\to \infty``: Though theoretical concerns have been made with respect to continuation beyond ``T=1`` for inversion methods such as EKI and UKI, in practice we commonly see better estimates ``\bar{\theta}_{N_\mathrm{it}}`` to the true parameters. As expected this procedure leads to ensemble collapse, and so no meaningful information can be taken from the posterior spread. | ||
|
||
## The experiment | ||
|
||
We assess the schedulers by solving the inverse problem 100 times with different random initial ensembles (using the same data realization for all). Shown below is an example plot of one of these solves with each timestepper. | ||
|
||
![Solution ensembles](assets/ensemble_compare_scheduler.png) | ||
|
||
*Left: The true model over ``[0,2\pi]`` (black), and the ensembles from different schemes (colors). | ||
Right: The noisy observation (black), the distribution it was sampled from (gray-ribbon), and the corresponding ensemble-mean approximation given from each scheduler (colors).* | ||
|
||
To assess the timestepping we show the convergence plot against the algorithm iteration we measure two quantities. | ||
- error (solid) is defined by ``\frac{1}{N_{ens}}\sum^{N_{ens}}_{i=1} \| \theta_i - \theta^* \|^2`` where ``\theta_i`` are ensemble members and ``\theta^*`` is the true value used to create the observed data. | ||
- spread (dashed) is defined by ``\frac{1}{N_{ens}}\sum^{N_{ens}}_{i=1} \| \theta_i - \bar{\theta} \|^2`` where ``\theta_i`` are ensemble members and ``\bar{\theta}`` is the mean over these members. | ||
|
||
![Error vs spread](assets/error_vs_spread_over_iteration_compare_scheduler.png) | ||
|
||
*Left: the error and spread of the different timesteppers at over iterations of the algorithm for a single run. | ||
Right: the error and spread of the different timesteppers at their final iterations, averaged from 100 different draws of initial conditions.* | ||
|
||
From this plot there are several take-home messages. | ||
- Termination at ``T=1``. DMC with termination (purple), closely mimics a small-fixed timestep (green) that finishes stepping at ``T=1``. Both retain more spread than other approaches, and DMC is far more efficient, typically terminating after around 10-20 steps, where fixed-stepping takes 50. We see that (for this experiment) this is a conservative estimate, as continuing to solve (e.g. brown) until later times often leads to a better error while still retaining similar "error vs spread" curves (before inevitable collapse). | ||
- Large fixed step (orange). This is very efficient, but can get stuck in local minima when too large, it collapses the ensemble. Note that even on average it gives lower error to the true parameters than DMC. | ||
- Continuation ``T \to \infty``. Both EKSStable and DMC with continuation schedulers, perform very similarly. Both retain good ensemble spread for during convergence, and collapse after finding a local optimum. This optimum on average has the best error to the true parameters in this experiment. They appear to consistently find the same optimum as ``T\to\infty`` but DMC finds this in fewer iterations. | ||
- Adaptivity. The fixed-step may eventually find the same optimum as EKS/DMC, but after 50 iterations ``T\sim 10^{12} - 10^{15}`` in the adaptive algorithms, while the fixed stepper has reached ``T=25``. On the other end of the scale, the initial timesteps selected are often on the order of ``\Delta t \sim 10^{-8} - 10^{-6}``. The implication is that adaptivity is likely very useful for efficiency of these algorithms. | ||
|
||
### Consequence | ||
|
||
This experiment motivates the possibility of making DMC (with/without) continuation a default timestepper in future releases, for EKI/UKI. | ||
|
||
Currently we will retain constant timestepping while we investigate further, though the DMC is available for use. | ||
|
||
Please let us know how you get on by setting the keyword argument in EKP | ||
```julia | ||
scheduler = DataMisfitController(on_terminate = "continue") | ||
``` | ||
|
||
!!! warning "Ensemble Kalman Sampler" | ||
We observe blow-up in EKS, when not using the `EKSStableScheduler`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
[deps] | ||
ColorSchemes = "35d6a980-a343-548e-a6ea-1d62b119f2f4" | ||
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f" | ||
EnsembleKalmanProcesses = "aa8a2aa5-91d8-4396-bcef-d4f2ec43552d" | ||
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,280 @@ | ||
# # Comparison of different Learning rate schedulers | ||
|
||
# In this example we have a model that produces the exponential of a sinusoid | ||
# ``f(A, v) = exp(A \sin(t) + v), \forall t \in [0,2\pi]``. Given an initial guess of the parameters as | ||
# ``A^* \sim \mathcal{N}(2,1)`` and ``v^* \sim \mathcal{N}(0,5)``, our goal is | ||
# to estimate the parameters from a noisy observation of the maximum, minimum, | ||
# and mean of the true model output. | ||
|
||
# We repeat the experiment using several timestepping methods with EKI, | ||
# we also repeat the experiment over many seeds | ||
|
||
# We produce 3 plots: | ||
# (1) Final model ensembles of a single run, and reproduction of the observed data (max & mean of model) | ||
# (2) Convergence of a single run over iterations, and the average performance over many runs. | ||
# (3) Convergence of a single run over algorithm time, and the average performance ovver many runs. | ||
|
||
|
||
# First, we load the packages we need: | ||
using LinearAlgebra, Random | ||
|
||
using Distributions, Plots, ColorSchemes | ||
using Plots.PlotMeasures | ||
using EnsembleKalmanProcesses | ||
using EnsembleKalmanProcesses.ParameterDistributions | ||
const EKP = EnsembleKalmanProcesses | ||
|
||
## We set up the file syste, | ||
|
||
homedir = pwd() | ||
println(homedir) | ||
figure_save_directory = homedir * "/output/" | ||
data_save_directory = homedir * "/output/" | ||
if ~isdir(figure_save_directory) | ||
mkdir(figure_save_directory) | ||
end | ||
if ~isdir(data_save_directory) | ||
mkdir(data_save_directory) | ||
end | ||
|
||
# Seed for pseudo-random number generator. | ||
repeats = 100 | ||
rng_seeds = [99 + 135 * i for i in 1:repeats] | ||
rngs = [Random.MersenneTwister(rs) for rs in rng_seeds] | ||
|
||
# Now, we define a model which generates a sinusoid given parameters ``\theta``: an | ||
# amplitude and a vertical shift. It then is exponentiated, We will estimate these parameters from data. | ||
|
||
# The model adds a random phase shift upon evaluation. | ||
dt = 0.01 | ||
trange = 0:dt:(2 * pi + dt) | ||
function model(amplitude, vert_shift) | ||
return exp.(amplitude * sin.(trange) .+ vert_shift) | ||
end | ||
|
||
# We then define ``G(\theta)``, which returns the observables of the sinusoid | ||
# given a parameter vector. These observables should be defined such that they | ||
# are informative about the parameters we wish to estimate. | ||
function G(u) | ||
theta, vert_shift = u | ||
sincurve = model(theta, vert_shift) | ||
return [maximum(sincurve), mean(sincurve)] | ||
end | ||
|
||
# Suppose we have a noisy observation of the true system. Here, we create a | ||
# pseudo-observation ``y`` by running our model with the correct parameters | ||
# and adding Gaussian noise to the output. | ||
dim_output = 2 | ||
|
||
Γ = 0.1 * I | ||
noise_dist = MvNormal(zeros(dim_output), Γ) | ||
|
||
u_true = [1.0, 0.8] | ||
y_nonoise = G(u_true) | ||
y = y_nonoise .+ rand(noise_dist) | ||
|
||
gr() | ||
|
||
## Solving the inverse problem | ||
|
||
# We now define prior distributions on the two parameters. For the amplitude, | ||
# we define a prior with mean 2 and standard deviation 1. It is | ||
# additionally constrained to be nonnegative. For the vertical shift we define | ||
# a Gaussian prior with mean 0 and standard deviation 5. | ||
prior_u1 = constrained_gaussian("amplitude", 2, 1, 0, Inf) | ||
prior_u2 = constrained_gaussian("vert_shift", 0, 5, -Inf, Inf) | ||
prior = combine_distributions([prior_u1, prior_u2]) | ||
unconstrained_u_true = transform_constrained_to_unconstrained(prior, u_true) | ||
|
||
# We now generate the initial ensemble and set up the ensemble Kalman inversion. | ||
N_ensemble = 50 | ||
N_iterations = 50 | ||
|
||
# Set up some initial plot information | ||
clrs = palette(:tab10) | ||
plt = plot(trange, model(u_true...), c = :black, label = "Truth", legend = :topright, linewidth = 2, title = "Model") | ||
xlabel!(plt, "Time") | ||
plt_thin = plot(grid = false, xticks = false, title = "(max, mean)") | ||
for i in 1:2 | ||
plot!( | ||
plt_thin, | ||
trange, | ||
repeat([y_nonoise[i]], length(trange)), | ||
ribbon = [2 * sqrt(Γ[i, i]); 2 * sqrt(Γ[i, i])], | ||
linestyle = :dash, | ||
c = :grey, | ||
label = "", | ||
) | ||
plot!(plt_thin, trange, repeat([y[i]], length(trange)), c = :black, label = "") | ||
end | ||
plt2 = | ||
plot(xlabel = "iterations", yscale = :log10, ylim = [1e-4, 1e2], ylabel = "L2 norm", title = "Example convergence") | ||
plt3 = plot( | ||
xlabel = "algorithm time", | ||
yscale = :log10, | ||
ylim = [1e-4, 1e2], | ||
ylabel = "L2 norm", | ||
title = "Example convergence", | ||
) | ||
|
||
|
||
final_misfit = zeros(5, repeats) | ||
final_u_err = zeros(5, repeats) | ||
final_u_spread = zeros(5, repeats) | ||
ts_tmp = [] | ||
|
||
# We run two loops, over `repeats` number of random initial samples | ||
# and over the collection of schedulers `timestepppers` | ||
|
||
for (rng_idx, rng) in enumerate(rngs) | ||
initial_ensemble = EKP.construct_initial_ensemble(rng, prior, N_ensemble) | ||
N_iters = repeat([N_iterations], 5) | ||
schedulers = [ | ||
DefaultScheduler(0.5), | ||
DefaultScheduler(0.02), | ||
EKSStableScheduler(), | ||
DataMisfitController(), | ||
DataMisfitController(on_terminate = "continue"), | ||
] | ||
push!(ts_tmp, schedulers) | ||
for (idx, scheduler, N_iter) in zip(1:length(schedulers), schedulers, N_iters) | ||
ensemble_kalman_process = | ||
EKP.EnsembleKalmanProcess(initial_ensemble, y, Γ, Inversion(); scheduler = scheduler, rng = rng) | ||
|
||
# We are now ready to carry out the inversion. At each iteration, we get the | ||
# ensemble from the last iteration, apply ``G(\theta)`` to each ensemble member, | ||
# and apply the Kalman update to the ensemble. | ||
|
||
u_err = zeros(N_iter + 1) | ||
u_spread = zeros(N_iter + 1) | ||
for i in 1:N_iter | ||
params_i = get_ϕ_final(prior, ensemble_kalman_process) | ||
u_i = get_u_final(ensemble_kalman_process) | ||
u_err[i] = 1 / size(u_i, 2) * sum((u_i .- unconstrained_u_true) .^ 2) | ||
u_spread[i] = 1 / size(u_i, 2) * sum((u_i .- mean(u_i, dims = 2)) .^ 2) | ||
G_ens = hcat([G(params_i[:, i]) for i in 1:N_ensemble]...) | ||
|
||
terminated = EKP.update_ensemble!(ensemble_kalman_process, G_ens) | ||
if !isnothing(terminated) | ||
N_iters[idx] = i - 1 | ||
break # if the timestep was terminated due to timestepping condition | ||
end | ||
|
||
end | ||
# this will change in on failure condition | ||
N_iter = N_iters[idx] | ||
|
||
# required for plots | ||
u_err = u_err[1:(N_iter + 1)] | ||
u_spread = u_spread[1:(N_iter + 1)] | ||
u_i = get_u_final(ensemble_kalman_process) | ||
u_err[N_iter + 1] = 1 / size(u_i, 2) * sum((u_i .- unconstrained_u_true) .^ 2) | ||
u_spread[N_iter + 1] = 1 / size(u_i, 2) * sum((u_i .- mean(u_i, dims = 2)) .^ 2) | ||
|
||
final_u_err[idx, rng_idx] = u_err[end] | ||
final_u_spread[idx, rng_idx] = u_spread[end] | ||
|
||
Δt = ensemble_kalman_process.Δt | ||
alg_time = [sum(Δt[1:i]) for i in 1:length(Δt)] | ||
pushfirst!(alg_time, 0.0) | ||
misfit = get_error(ensemble_kalman_process) | ||
final_misfit[idx, rng_idx] = misfit[end] | ||
|
||
if rng_idx == 1 | ||
plot!(plt2, 0:N_iter, u_err, c = clrs[idx + 1], label = "$(nameof(typeof(scheduler)))") | ||
plot!(plt2, 0:N_iter, u_spread, c = clrs[idx + 1], ls = :dash, label = "") | ||
|
||
plot!(plt3, alg_time, u_err, c = clrs[idx + 1], label = "$(nameof(typeof(scheduler)))") | ||
plot!(plt3, alg_time, u_spread, c = clrs[idx + 1], ls = :dash, label = "") | ||
|
||
# Finally, we get the ensemble after the last iteration. This provides our estimate of the parameters. | ||
final_ensemble = get_ϕ_final(prior, ensemble_kalman_process) | ||
|
||
# To visualize the success of the inversion, we plot model with the true | ||
# parameters, the initial ensemble, and the final ensemble. | ||
|
||
plot!( | ||
plt, | ||
trange, | ||
model(final_ensemble[:, 1]...), | ||
alpha = 0.2, | ||
c = clrs[idx + 1], | ||
label = "$(nameof(typeof(scheduler)))", | ||
) | ||
plot!( | ||
plt, | ||
trange, | ||
[model(final_ensemble[:, i]...) for i in 2:N_ensemble], | ||
alpha = 0.2, | ||
c = clrs[idx + 1], | ||
label = "", | ||
) | ||
G_final_mean = mean(hcat([G(final_ensemble[:, i]) for i in 1:N_ensemble]...), dims = 2) | ||
for k in 1:2 | ||
plot!( | ||
plt_thin, | ||
trange, | ||
repeat([G_final_mean[k]], length(trange)), | ||
linestyle = :dot, | ||
c = clrs[idx + 1], | ||
label = "", | ||
) | ||
end | ||
end | ||
end | ||
end | ||
|
||
#some final plot tweaking / combining | ||
ylims!(plt_thin, ylims(plt)) | ||
ll = @layout [a{0.8w} b{0.2w}] | ||
plt = plot(plt, plt_thin, layout = ll, right_margin = 10mm) | ||
|
||
mean_final_misfit = mean(final_misfit, dims = 2) | ||
println(final_u_err[5, :]) | ||
mean_final_u_err = mean(final_u_err, dims = 2) | ||
mean_final_u_spread = mean(final_u_spread, dims = 2) | ||
|
||
|
||
plot!(plt2, [1], [ylims(plt2)[2] + 1], c = :gray, label = "error") | ||
plot!(plt2, [1], [ylims(plt2)[2] + 1], c = :gray, ls = :dash, label = "spread") | ||
plot!(plt3, [1], [ylims(plt3)[2] + 1], c = :gray, label = "error") | ||
plot!(plt3, [1], [ylims(plt3)[2] + 1], c = :gray, ls = :dash, label = "spread") | ||
plt2_thin = plot(yscale = :log10, grid = false, xticks = false, title = "Final ($(repeats) runs)") | ||
for i in 1:5 | ||
plot!( | ||
plt2_thin, | ||
trange, | ||
repeat([mean_final_u_err[i]], length(trange)), | ||
yscale = :log10, | ||
c = clrs[i + 1], #idx + 1 for colours | ||
label = "", | ||
) | ||
plot!( | ||
plt2_thin, | ||
trange, | ||
repeat([mean_final_u_spread[i]], length(trange)), | ||
c = clrs[i + 1], #idx + 1 for colours | ||
ls = :dash, | ||
label = "", | ||
) | ||
end | ||
plt3_thin = plt2_thin | ||
ylims!(plt2_thin, ylims(plt2)) | ||
plt2 = plot(plt2, plt2_thin, layout = ll, right_margin = 10mm) | ||
ylims!(plt3_thin, ylims(plt3)) | ||
plt3 = plot(plt3, plt3_thin, layout = ll, right_margin = 10mm) | ||
|
||
for (i, t) in enumerate(ts_tmp[1]) | ||
println(" ") | ||
println("Method : ", t) | ||
println("Final misfit: ", mean_final_misfit[i]) | ||
println("Final error : ", mean_final_u_err[i]) | ||
println("Final spread: ", mean_final_u_spread[i]) | ||
end | ||
|
||
savefig(plt, joinpath(figure_save_directory, "ensembles.png")) | ||
savefig(plt2, joinpath(figure_save_directory, "error_vs_spread_over_iteration.png")) | ||
savefig(plt3, joinpath(figure_save_directory, "error_vs_spread_over_time.png")) | ||
savefig(plt, joinpath(figure_save_directory, "ensembles.pdf")) | ||
savefig(plt2, joinpath(figure_save_directory, "error_vs_spread_over_iteration.pdf")) | ||
savefig(plt3, joinpath(figure_save_directory, "error_vs_spread_over_time.pdf")) |
Oops, something went wrong.