CliMA · bors · Jul 19, 2023 · Jul 17, 2023
diff --git a/docs/src/API/EnsembleKalmanProcess.md b/docs/src/API/EnsembleKalmanProcess.md
@@ -32,7 +32,7 @@ sample_empirical_gaussian
 split_indices_by_success
 ```
 
-## Learning Rate Schedulers
+## [Learning Rate Schedulers](@id scheduler_api)
 
 ```@docs
 DefaultScheduler

diff --git a/docs/src/learning_rate_scheduler.md b/docs/src/learning_rate_scheduler.md
@@ -7,12 +7,48 @@ We demonstrate the behaviour of different learning rate schedulers through solut
 In this example we have a model that produces the exponential of a sinusoid ``f(A, v) = \exp(A \sin(t) + v), \forall t \in [0,2\pi]``. Given an initial guess of the parameters as ``A^* \sim \mathcal{N}(2,1)`` and ``v^* \sim \mathcal{N}(0,25)``, the inverse problem is to estimate the parameters from a noisy observation of only the maximum and mean value of the true model output.
 
 We shall compare the following configurations of implemented schedulers. 
-1. Fixed, long timestep `DefaultScheduler(0.5)` - orange
-2. Fixed, short timestep `DefaultScheduler(0.02)` - green
+1. Fixed, "long" timestep `DefaultScheduler(0.5)` - orange
+2. Fixed, "short" timestep `DefaultScheduler(0.02)` - green
 3. Adaptive timestep (designed originally to ensure EKS remains stable) `EKSStableScheduler()` [Kovachki & Stuart 2018](https://doi.org/10.1088/1361-6420/ab1c3a) - red
 4. Misfit controlling timestep (Terminating) `DataMisfitController()` [Iglesias & Yang 2021](https://doi.org/10.1088/1361-6420/abd29b) - purple
 5. Misfit controlling timestep (Continuing beyond Terminate condition) `DataMisfitController(on_terminate="continue")` - brown
 
+One can define the schedulers as
+```julia
+scheduler = DefaultScheduler(0.5) # fixed stepsize, default values: 1
+```
+Then when constructing an EnsembleKalmanProcess, one uses the keyword argument
+```julia
+ekpobj = EKP.EnsembleKalmanProcess(args...; scheduler = scheduler, kwargs...)
+```
+A variety of other schedulers can be defined similarly:
+```julia
+scheduler = MutableScheduler(2) # modifiable stepsize
+scheduler = EKSStableScheduler(numerator=10.0, nugget = 0.01) # Stable for EKS
+scheduler = DataMisfitController(on_terminate = "continue") # non-terminating
+```
+Please see the [learning rate schedulers API](@ref scheduler_api) for defaults and other details
+
+### Early termination
+
+Early termination can be implemented in the calibration loop as 
+```julia
+using EnsembleKalmanProcesses # for get_ϕ_final, update_ensemble!
+# given
+# * the number of iterations `N_iter`
+# * a prior `prior`
+# * a forward map `G`
+# * the EKP object `ekpobj`
+
+for i in 1:N_iter
+    params_i = get_ϕ_final(prior, ekpobj)
+    g_ens = G(params_i)
+    terminated = update_ensemble!(ekpobj, g_ens) # check for termination
+    if !isnothing(terminated) # if termination is flagged, break the loop
+       break
+    end
+end 
+```
 
 ## Timestep and termination time
 
@@ -30,6 +66,10 @@ There are two termination times that the theory indicates are useful
 - ``T=1``: In the linear Gaussian case, the ``\{\theta_{N_\mathrm{it}}\}`` will represent the posterior distribution. In nonlinear case it should still provide an approximation to the posterior distribution. Note that as the posterior does not necessarily optimize the data-misfit we find ``\bar{\theta}_{N_\mathrm{it}}`` (the ensemble mean) provides a conservative estimate of the true parameters, while retaining spread. It is noted in [Iglesias & Yang 2021](https://doi.org/10.1088/1361-6420/abd29b) that with small enough (or well chosen) step-sizes this estimate at ``T=1`` satisfies a discrepancy principle with respect to the observational noise.
 - ``T\to \infty``: Though theoretical concerns have been made with respect to continuation beyond ``T=1`` for inversion methods such as EKI, in practice we commonly see better optimization of the data-misfit, and thus better representation ``\bar{\theta}_{N_\mathrm{it}}`` to the true parameters. As expected this procedure leads to ensemble collapse, and so no meaningful information can be taken from the posterior spread, and the optimizer is not likely to be the posterior mode.
 
+
+
+
+
 ## The experiment with EKI & UKI
 
 We assess the schedulers by solving the inverse problem with EKI and UKI (we average results over 100 initial ensembles in the case of EKI). We will not draw comparisons between EKI and UKI here, rather we use them to observe consistent behavior in the schedulers. Shown below are the solution plots of one solve with each timestepper, for both methods. 
@@ -59,16 +99,7 @@ Optimizing the objective function (continuing ``T \to \infty``):
 
 ### DMC as a default in future?
 
-This experiment motivates the possibility of making DMC (with/without) continuation a default timestepper in future releases, for EKI/SEKI/UKI.
-
-Currently we will retain constant timestepping while we investigate further, though the DMC is available for use.
-
-Please let us know how you get on by setting the keyword argument in EKP
-```julia
-scheduler = DataMisfitController() # terminating at `T=1`
-scheduler = DataMisfitController(terminate_at = 10) # terminating at `T=10`
-scheduler = DataMisfitController(on_terminate = "continue") # non-terminating
-```
+This experiment motivates the possibility of making DMC (with/without) continuation a default timestepper in future releases, for EKI/SEKI/UKI. Currently we will retain constant timestepping as default while we investigate further.
 
 !!! warning "Ensemble Kalman Sampler"
     We observe blow-up in EKS, when not using the `EKSStableScheduler`.

diff --git a/docs/src/parameter_distributions.md b/docs/src/parameter_distributions.md
@@ -117,25 +117,26 @@ In [Simple example revisited](@ref) below, we repeat this example "manually" wit
 
     The recommended constructor can be called as `constrained_gaussian(...; repeats = n)` to return a combined prior formed by `n` identical priors.
 
-## ParameterDistribution class
+## ParameterDistribution struct
 
 This section provides more details on the components of a `ParameterDistribution` object.
 
 ### ParameterDistributionType
 
-The `ParameterDistributionType` class comprises three subclasses for specifying different types of prior distributions:
+The `ParameterDistributionType` struct wraps four types for specifying different types of prior distributions:
 
  - The `Parameterized` type is initialized using a Julia `Distributions.jl` object. Samples are drawn randomly from the distribution object.
  - The `VectorOfParameterized` type is initialized with a vector of distributions.
  - The `Samples` type is initialized using a two dimensional array. Samples are drawn randomly (with replacement) from the columns of the provided array.
 
+- The `FunctionParameterDistributionType` struct defines parameters specified as fields over a domain. More detail can be found [here](@ref function-parameter-type).
 
 !!! warning
     We recommend that the distributions be unbounded (see next section), as the filtering algorithms in EnsembleKalmanProcesses are not guaranteed to preserve constraints unless defined through the `ConstraintType` mechanism.
 
 ### ConstraintType
 
-The inference algorithms implemented in EnsembleKalmanProcesses assume unbounded parameter domains. To be able to handle constrained parameter values consistently, the ConstraintType class defines a bijection between the physical, constrained parameter domain and an unphysical, unconstrained domain in which the filtering takes place. This bijection is specified by the functions `transform_constrained_to_unconstrained` and `transform_unconstrained_to_constrained`, which are built from either predefined constructors or user-defined constraint functions given as arguments to the `ConstraintType` constructor. 
+The inference algorithms implemented in EnsembleKalmanProcesses assume unbounded parameter domains. To be able to handle constrained parameter values consistently, the `ConstraintType`  defines a bijection between the physical, constrained parameter domain and an unphysical, unconstrained domain in which the filtering takes place. This bijection is specified by the functions `transform_constrained_to_unconstrained` and `transform_unconstrained_to_constrained`, which are built from either predefined constructors or user-defined constraint functions given as arguments to the `ConstraintType` constructor. 
 
 We provide the following predefined constructors which implement mappings that handle the most common constraints:
 
@@ -160,15 +161,15 @@ Currently we only support multivariate constraints which are the Cartesian produ
 This is simply a `String` used to identify different parameters in multi-parameter situations, as in the methods below.
 
 
-### FunctionParameterDistributionType
-
-A subtype of `ParameterDistributionType`, we currently support one subclass for specifying prior distributions over functions.
+### [FunctionParameterDistributionType](@id function-parameter-type)
 
 Learning a function distribution is useful when one wishes to obtain a parametric representation of a function that is (relatively) agnostic of the underlying grid discretization. Most practical implementations involve posing a restrictive class of functions by truncation of a spectral decomposition. The function is then represented as a set of coefficients of these modes (known as degrees of freedom), rather than directly through the values at evaluation points.
 
+As a subtype of `ParameterDistributionType`, we currently support one option for specifying prior distributions over functions:
+
 - The `GaussianRandomFieldInterface` type is initialized with a Gaussian Random Field object and the GRF package. Currently we support objects from [`GaussianRandomFields.jl`](https://github.com/PieterjanRobbe/GaussianRandomFields.jl) with package `GRFJL()`. Gaussian random fields allow the definition of scalar function distributions defined over a uniform mesh on interval, rectangular, and hyper-rectangular domains.
 
-When constructing a function distribution, one can also enforce a constraint on the output space of the function using a `Constraint()`.
+As with other `ParameterDistribution`s, a function distribution, is built from a name, a `FunctionPameterDistributionType` struct and a constraint, here only one, placed on the scalar output space of the function using a `Constraint()`.
 
 !!! note "constraints"
     The transformation `transform_unconstrained_to_constrained`, will map from (unconstrained) degrees of freedom, to (constrained) evaluations of the function on a numerical grid. In particular, the `transform_constrained_to_unconstrained` is *no longer the inverse* of this map, it now simply maps from constrained evaluations to unconstrained evaluations on the grid.

diff --git a/src/LearningRateSchedulers.jl b/src/LearningRateSchedulers.jl
@@ -56,15 +56,15 @@ end
 $(TYPEDEF)
 
 Scheduler known to be stable for EKS,
-In particular, ``\\Delta t = \\frac{\\alpha}{\\|U\\| + \\varepsilon}`` where ``U = (G(u) - \\bar{G(u)}])^T\\Gamma^{-1}(G(u) - y)``. 
+In particular, ``\\Delta t = \\frac{\\alpha}{\\|U\\| + \\varepsilon}`` where ``U = (G(u) - \\bar{G(u)})^T\\Gamma^{-1}(G(u) - y)``. 
 Cannot be overriden.
 
 $(TYPEDFIELDS)
 """
 struct EKSStableScheduler{FT} <: LearningRateScheduler where {FT <: AbstractFloat}
-    "the numerator ``\alpha``"
+    "the numerator ``\\alpha``"
     numerator::FT
-    "the nugget term ``\varepsilon``"
+    "the nugget term ``\\varepsilon``"
     nugget::FT
 end
 
@@ -91,21 +91,25 @@ $(TYPEDEF)
 Scheduler from Iglesias, Yang, 2021, Based on Bayesian Tempering.
 Terminates at `T=1` by default, and at this time, ensemble spread provides a (more) meaningful approximation of posterior uncertainty
 In particular, for parameters ``\\theta_j`` at step ``n``, to calculate the next timestep
-``\\Delta t_n = \\min\\left(\\max\\left(\\frac{J}{2\\Phi}, \\sqrt{\\frac{J}{2\\langle \\Phi, \\Phi \\rangle}}\\right), 1-\\sum^{n-1}_i t_i\\right) `` where ``\\Phi_j = \\|\\Gamma^{-1}(G(\\theta_j) - y)\\|^2``. 
+``\\Delta t_n = \\min\\left(\\max\\left(\\frac{J}{2\\Phi}, \\sqrt{\\frac{J}{2\\langle \\Phi, \\Phi \\rangle}}\\right), 1-\\sum^{n-1}_i t_i\\right) `` where ``\\Phi_j = \\|\\Gamma^{-\\frac{1}{2}}(G(\\theta_j) - y)\\|^2``. 
 Cannot be overriden by user provided timesteps.
 By default termination returns `true` from `update_ensemble!` and 
 - if `on_terminate == "stop"`, stops further iteration.
-- if `on_terminate == "continue_fixed", continues iteration with the final timestep fixed
-- if `on_terminate == "continue", continues the algorithm (though no longer compares to ``1-\\sum^{n-1}_i t_i``) 
+- if `on_terminate == "continue_fixed"`, continues iteration with the final timestep fixed
+- if `on_terminate == "continue"`, continues the algorithm (though no longer compares to ``1-\\sum^{n-1}_i t_i``) 
 The user may also change the `T` with `terminate_at` keyword.
 
 $(TYPEDFIELDS)
 """
 struct DataMisfitController{FT, M, S} <:
        LearningRateScheduler where {FT <: AbstractFloat, M <: AbstractMatrix, S <: AbstractString}
+    "the current iteration"
     iteration::Vector{Int}
+    "the inverse square-root of the noise covariance is stored"
     inv_sqrt_noise::Vector{M}
+    "the algorithm time for termination, default: 1.0"
     terminate_at::FT
+    "the action on termination, default: \"stop\", "
     on_terminate::S
 end # Iglesias Yan 2021
 
@@ -302,7 +306,7 @@ function calculate_timestep!(
     push!(ekp.Δt, Δt)
 
     if (sum_Δt < T) && (sum_Δt + Δt >= T)
-        @info "Termination condition of timestepping scheme `DataMisfitController` has been satisfied."
+        @info "Termination condition of timestepping scheme `DataMisfitController` will be exceeded during the next iteration."
     end
     nothing
 end