Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small docs updates #315

Merged
merged 1 commit into from
Jul 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/src/API/EnsembleKalmanProcess.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ sample_empirical_gaussian
split_indices_by_success
```

## Learning Rate Schedulers
## [Learning Rate Schedulers](@id scheduler_api)

```@docs
DefaultScheduler
Expand Down
55 changes: 43 additions & 12 deletions docs/src/learning_rate_scheduler.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,48 @@ We demonstrate the behaviour of different learning rate schedulers through solut
In this example we have a model that produces the exponential of a sinusoid ``f(A, v) = \exp(A \sin(t) + v), \forall t \in [0,2\pi]``. Given an initial guess of the parameters as ``A^* \sim \mathcal{N}(2,1)`` and ``v^* \sim \mathcal{N}(0,25)``, the inverse problem is to estimate the parameters from a noisy observation of only the maximum and mean value of the true model output.

We shall compare the following configurations of implemented schedulers.
1. Fixed, long timestep `DefaultScheduler(0.5)` - orange
2. Fixed, short timestep `DefaultScheduler(0.02)` - green
1. Fixed, "long" timestep `DefaultScheduler(0.5)` - orange
2. Fixed, "short" timestep `DefaultScheduler(0.02)` - green
3. Adaptive timestep (designed originally to ensure EKS remains stable) `EKSStableScheduler()` [Kovachki & Stuart 2018](https://doi.org/10.1088/1361-6420/ab1c3a) - red
4. Misfit controlling timestep (Terminating) `DataMisfitController()` [Iglesias & Yang 2021](https://doi.org/10.1088/1361-6420/abd29b) - purple
5. Misfit controlling timestep (Continuing beyond Terminate condition) `DataMisfitController(on_terminate="continue")` - brown

One can define the schedulers as
```julia
scheduler = DefaultScheduler(0.5) # fixed stepsize, default values: 1
```
Then when constructing an EnsembleKalmanProcess, one uses the keyword argument
```julia
ekpobj = EKP.EnsembleKalmanProcess(args...; scheduler = scheduler, kwargs...)
```
A variety of other schedulers can be defined similarly:
```julia
scheduler = MutableScheduler(2) # modifiable stepsize
scheduler = EKSStableScheduler(numerator=10.0, nugget = 0.01) # Stable for EKS
scheduler = DataMisfitController(on_terminate = "continue") # non-terminating
```
Please see the [learning rate schedulers API](@ref scheduler_api) for defaults and other details

### Early termination

Early termination can be implemented in the calibration loop as
```julia
using EnsembleKalmanProcesses # for get_ϕ_final, update_ensemble!
# given
# * the number of iterations `N_iter`
# * a prior `prior`
# * a forward map `G`
# * the EKP object `ekpobj`

for i in 1:N_iter
params_i = get_ϕ_final(prior, ekpobj)
g_ens = G(params_i)
terminated = update_ensemble!(ekpobj, g_ens) # check for termination
if !isnothing(terminated) # if termination is flagged, break the loop
break
end
end
```

## Timestep and termination time

Expand All @@ -30,6 +66,10 @@ There are two termination times that the theory indicates are useful
- ``T=1``: In the linear Gaussian case, the ``\{\theta_{N_\mathrm{it}}\}`` will represent the posterior distribution. In nonlinear case it should still provide an approximation to the posterior distribution. Note that as the posterior does not necessarily optimize the data-misfit we find ``\bar{\theta}_{N_\mathrm{it}}`` (the ensemble mean) provides a conservative estimate of the true parameters, while retaining spread. It is noted in [Iglesias & Yang 2021](https://doi.org/10.1088/1361-6420/abd29b) that with small enough (or well chosen) step-sizes this estimate at ``T=1`` satisfies a discrepancy principle with respect to the observational noise.
- ``T\to \infty``: Though theoretical concerns have been made with respect to continuation beyond ``T=1`` for inversion methods such as EKI, in practice we commonly see better optimization of the data-misfit, and thus better representation ``\bar{\theta}_{N_\mathrm{it}}`` to the true parameters. As expected this procedure leads to ensemble collapse, and so no meaningful information can be taken from the posterior spread, and the optimizer is not likely to be the posterior mode.





## The experiment with EKI & UKI

We assess the schedulers by solving the inverse problem with EKI and UKI (we average results over 100 initial ensembles in the case of EKI). We will not draw comparisons between EKI and UKI here, rather we use them to observe consistent behavior in the schedulers. Shown below are the solution plots of one solve with each timestepper, for both methods.
Expand Down Expand Up @@ -59,16 +99,7 @@ Optimizing the objective function (continuing ``T \to \infty``):

### DMC as a default in future?

This experiment motivates the possibility of making DMC (with/without) continuation a default timestepper in future releases, for EKI/SEKI/UKI.

Currently we will retain constant timestepping while we investigate further, though the DMC is available for use.

Please let us know how you get on by setting the keyword argument in EKP
```julia
scheduler = DataMisfitController() # terminating at `T=1`
scheduler = DataMisfitController(terminate_at = 10) # terminating at `T=10`
scheduler = DataMisfitController(on_terminate = "continue") # non-terminating
```
This experiment motivates the possibility of making DMC (with/without) continuation a default timestepper in future releases, for EKI/SEKI/UKI. Currently we will retain constant timestepping as default while we investigate further.

!!! warning "Ensemble Kalman Sampler"
We observe blow-up in EKS, when not using the `EKSStableScheduler`.
Expand Down
15 changes: 8 additions & 7 deletions docs/src/parameter_distributions.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,25 +117,26 @@ In [Simple example revisited](@ref) below, we repeat this example "manually" wit

The recommended constructor can be called as `constrained_gaussian(...; repeats = n)` to return a combined prior formed by `n` identical priors.

## ParameterDistribution class
## ParameterDistribution struct

This section provides more details on the components of a `ParameterDistribution` object.

### ParameterDistributionType

The `ParameterDistributionType` class comprises three subclasses for specifying different types of prior distributions:
The `ParameterDistributionType` struct wraps four types for specifying different types of prior distributions:

- The `Parameterized` type is initialized using a Julia `Distributions.jl` object. Samples are drawn randomly from the distribution object.
- The `VectorOfParameterized` type is initialized with a vector of distributions.
- The `Samples` type is initialized using a two dimensional array. Samples are drawn randomly (with replacement) from the columns of the provided array.

- The `FunctionParameterDistributionType` struct defines parameters specified as fields over a domain. More detail can be found [here](@ref function-parameter-type).

!!! warning
We recommend that the distributions be unbounded (see next section), as the filtering algorithms in EnsembleKalmanProcesses are not guaranteed to preserve constraints unless defined through the `ConstraintType` mechanism.

### ConstraintType

The inference algorithms implemented in EnsembleKalmanProcesses assume unbounded parameter domains. To be able to handle constrained parameter values consistently, the ConstraintType class defines a bijection between the physical, constrained parameter domain and an unphysical, unconstrained domain in which the filtering takes place. This bijection is specified by the functions `transform_constrained_to_unconstrained` and `transform_unconstrained_to_constrained`, which are built from either predefined constructors or user-defined constraint functions given as arguments to the `ConstraintType` constructor.
The inference algorithms implemented in EnsembleKalmanProcesses assume unbounded parameter domains. To be able to handle constrained parameter values consistently, the `ConstraintType` defines a bijection between the physical, constrained parameter domain and an unphysical, unconstrained domain in which the filtering takes place. This bijection is specified by the functions `transform_constrained_to_unconstrained` and `transform_unconstrained_to_constrained`, which are built from either predefined constructors or user-defined constraint functions given as arguments to the `ConstraintType` constructor.

We provide the following predefined constructors which implement mappings that handle the most common constraints:

Expand All @@ -160,15 +161,15 @@ Currently we only support multivariate constraints which are the Cartesian produ
This is simply a `String` used to identify different parameters in multi-parameter situations, as in the methods below.


### FunctionParameterDistributionType

A subtype of `ParameterDistributionType`, we currently support one subclass for specifying prior distributions over functions.
### [FunctionParameterDistributionType](@id function-parameter-type)

Learning a function distribution is useful when one wishes to obtain a parametric representation of a function that is (relatively) agnostic of the underlying grid discretization. Most practical implementations involve posing a restrictive class of functions by truncation of a spectral decomposition. The function is then represented as a set of coefficients of these modes (known as degrees of freedom), rather than directly through the values at evaluation points.

As a subtype of `ParameterDistributionType`, we currently support one option for specifying prior distributions over functions:

- The `GaussianRandomFieldInterface` type is initialized with a Gaussian Random Field object and the GRF package. Currently we support objects from [`GaussianRandomFields.jl`](https://github.com/PieterjanRobbe/GaussianRandomFields.jl) with package `GRFJL()`. Gaussian random fields allow the definition of scalar function distributions defined over a uniform mesh on interval, rectangular, and hyper-rectangular domains.

When constructing a function distribution, one can also enforce a constraint on the output space of the function using a `Constraint()`.
As with other `ParameterDistribution`s, a function distribution, is built from a name, a `FunctionPameterDistributionType` struct and a constraint, here only one, placed on the scalar output space of the function using a `Constraint()`.

!!! note "constraints"
The transformation `transform_unconstrained_to_constrained`, will map from (unconstrained) degrees of freedom, to (constrained) evaluations of the function on a numerical grid. In particular, the `transform_constrained_to_unconstrained` is *no longer the inverse* of this map, it now simply maps from constrained evaluations to unconstrained evaluations on the grid.
Expand Down
18 changes: 11 additions & 7 deletions src/LearningRateSchedulers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -56,15 +56,15 @@ end
$(TYPEDEF)
Scheduler known to be stable for EKS,
In particular, ``\\Delta t = \\frac{\\alpha}{\\|U\\| + \\varepsilon}`` where ``U = (G(u) - \\bar{G(u)}])^T\\Gamma^{-1}(G(u) - y)``.
In particular, ``\\Delta t = \\frac{\\alpha}{\\|U\\| + \\varepsilon}`` where ``U = (G(u) - \\bar{G(u)})^T\\Gamma^{-1}(G(u) - y)``.
Cannot be overriden.
$(TYPEDFIELDS)
"""
struct EKSStableScheduler{FT} <: LearningRateScheduler where {FT <: AbstractFloat}
"the numerator ``\alpha``"
"the numerator ``\\alpha``"
numerator::FT
"the nugget term ``\varepsilon``"
"the nugget term ``\\varepsilon``"
nugget::FT
end

Expand All @@ -91,21 +91,25 @@ $(TYPEDEF)
Scheduler from Iglesias, Yang, 2021, Based on Bayesian Tempering.
Terminates at `T=1` by default, and at this time, ensemble spread provides a (more) meaningful approximation of posterior uncertainty
In particular, for parameters ``\\theta_j`` at step ``n``, to calculate the next timestep
``\\Delta t_n = \\min\\left(\\max\\left(\\frac{J}{2\\Phi}, \\sqrt{\\frac{J}{2\\langle \\Phi, \\Phi \\rangle}}\\right), 1-\\sum^{n-1}_i t_i\\right) `` where ``\\Phi_j = \\|\\Gamma^{-1}(G(\\theta_j) - y)\\|^2``.
``\\Delta t_n = \\min\\left(\\max\\left(\\frac{J}{2\\Phi}, \\sqrt{\\frac{J}{2\\langle \\Phi, \\Phi \\rangle}}\\right), 1-\\sum^{n-1}_i t_i\\right) `` where ``\\Phi_j = \\|\\Gamma^{-\\frac{1}{2}}(G(\\theta_j) - y)\\|^2``.
Cannot be overriden by user provided timesteps.
By default termination returns `true` from `update_ensemble!` and
- if `on_terminate == "stop"`, stops further iteration.
- if `on_terminate == "continue_fixed", continues iteration with the final timestep fixed
- if `on_terminate == "continue", continues the algorithm (though no longer compares to ``1-\\sum^{n-1}_i t_i``)
- if `on_terminate == "continue_fixed"`, continues iteration with the final timestep fixed
- if `on_terminate == "continue"`, continues the algorithm (though no longer compares to ``1-\\sum^{n-1}_i t_i``)
The user may also change the `T` with `terminate_at` keyword.
$(TYPEDFIELDS)
"""
struct DataMisfitController{FT, M, S} <:
LearningRateScheduler where {FT <: AbstractFloat, M <: AbstractMatrix, S <: AbstractString}
"the current iteration"
iteration::Vector{Int}
"the inverse square-root of the noise covariance is stored"
inv_sqrt_noise::Vector{M}
"the algorithm time for termination, default: 1.0"
terminate_at::FT
"the action on termination, default: \"stop\", "
on_terminate::S
end # Iglesias Yan 2021

Expand Down Expand Up @@ -302,7 +306,7 @@ function calculate_timestep!(
push!(ekp.Δt, Δt)

if (sum_Δt < T) && (sum_Δt + Δt >= T)
@info "Termination condition of timestepping scheme `DataMisfitController` has been satisfied."
@info "Termination condition of timestepping scheme `DataMisfitController` will be exceeded during the next iteration."
end
nothing
end
Expand Down
Loading