Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substitute types by abstractions. #100

Merged
merged 1 commit into from
Jan 27, 2022
Merged

Substitute types by abstractions. #100

merged 1 commit into from
Jan 27, 2022

Conversation

ilopezgp
Copy link
Contributor

@ilopezgp ilopezgp commented Jan 26, 2022

Relaxes the allowed types of some objects and input args for improved generalization:

  • Vector - > AbstractVector
  • Matrix -> AbstractMatrix
  • Sparse.SparseMatrix -> AbstractMatrix
  • Union{Vector, Matrix} -> AbstractVecOrMat

For covariance matrices in output space,

  • Matrix -> Union{AbstractMatrix, UniformScaling}

Tests involving different matrix types are now included for all Ensemble Kalman Processes.

@ilopezgp ilopezgp linked an issue Jan 26, 2022 that may be closed by this pull request
@@ -16,9 +16,9 @@ Container to store data samples as columns in an array.
"""
struct DataContainer{FT <: Real}
#stored data, each piece of data is a column [data dimension × number samples]
stored_data::Array{FT, 2}
stored_data::AbstractMatrix{FT}
Copy link
Member

@charleskawczynski charleskawczynski Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@jakebolewski jakebolewski Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say most of infa here isn't performance critical so dynamic dispatch on a field is fine if all the computation is in the following LA operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A concern that can be limiting for the package is unnecessary memory use. In some cases, we want to store covariances that are extremely sparse, or even diagonal. Casting them and storing them as a Matrix{FT} then takes up much more space than e.g. Diagonal {FT}. Whichever solution solves this problem would work. Let me know how to proceed.

Copy link
Contributor

@jakebolewski jakebolewski Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write it as generically as possible and then if there is some performance hotspots after a bit of profiling they should be straightforward to address. You don't need to specialize on every Matrix type here (as you said there are many) so IMO what is written is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting to specialize on every type of matrix, just to make the type (and its properties) concrete. This was @glwagner's suggestion, too. It's not much overhead to have very concrete types and simply not specialize on methods that use it.

Copy link
Contributor

@jakebolewski jakebolewski Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then you cannot mix and match representations (different matrix types might be required for different fields). Until performance is actually an issue I wouldn't worry about it and then you can go back and parameterize over type constraints for specialization when those constraints are better understood.

Copy link
Contributor

@jakebolewski jakebolewski Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for calibration the costs are going to be dominated by actually running the models and IO, everything else I'm guessing won't matter much (~python speed is fine) but we'll see as we go

Copy link
Member

@glwagner glwagner Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A concern that can be limiting for the package is unnecessary memory use. In some cases, we want to store covariances that are extremely sparse, or even diagonal. Casting them and storing them as a Matrix{FT} then takes up much more space than e.g. Diagonal {FT}. Whichever solution solves this problem would work. Let me know how to proceed.

@ilopezgp the point that @charleskawczynski was making is the difference between

struct DataContainer{FT <: Real}
    #stored data, each piece of data is a column [data dimension × number samples]
    stored_data::AbstractMatrix{FT}

and

struct DataContainer{M <: AbstractMatrix}
    #stored data, each piece of data is a column [data dimension × number samples]
    stored_data::M

In the first case, stored_data is abstractly typed (versus concretely typed), which means that the compiler cannot infer the type of stored_data given the type DataContainer. This has performance penalties (which @jakebolewski argues are not important). But it's utterly fundamental and so an important point to grasp.

Writing M <: AbstractMatrix will preclude UniformScaling. Omitting <: AbstractMatrix is the conservative choice. You might also write M <: Union{AbstractMatrix, UniformScaling}. However this is risky because you may be failing to anticipate other valid matrix-like types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the difference between abstract types and parameterized types, thank you. I think the conversation was about whether it practically matters in this application.

@ilopezgp ilopezgp changed the title [WIP] Substitute types by abstractions. Substitute types by abstractions. Jan 26, 2022
src/ParameterDistributions.jl Outdated Show resolved Hide resolved
src/ParameterDistributions.jl Show resolved Hide resolved
src/UnscentedKalmanInversion.jl Outdated Show resolved Hide resolved
src/UnscentedKalmanInversion.jl Outdated Show resolved Hide resolved
src/UnscentedKalmanInversion.jl Outdated Show resolved Hide resolved
@odunbar
Copy link
Collaborator

odunbar commented Jan 27, 2022

Could I ask that you add the specific changes to the PR comment as a reference for the conventions we take? i see currently

  • Vector - > AbstractVector
  • Matrix -> AbstractMatrix
  • Sparse.SparseMatrix -> AbstractMatrix
  • Union{Vector, Matrix} -> AbstractVecOrMat
  • Vector{Vector} -> Iterable{Vector}

And on from Greg's point, we currently will exclude UniformScaling. (is this just for aesthetic?)

@ilopezgp
Copy link
Contributor Author

Could I ask that you add the specific changes to the PR comment as a reference for the conventions we take? i see currently

  • Vector - > AbstractVector
  • Matrix -> AbstractMatrix
  • Sparse.SparseMatrix -> AbstractMatrix
  • Union{Vector, Matrix} -> AbstractVecOrMat
  • Vector{Vector} -> Iterable{Vector}

And on from Greg's point, we currently will exclude UniformScaling. (is this just for aesthetic?)

UniformScaling was in fact a specific request in issue #99, so we should include it.

Copy link
Collaborator

@odunbar odunbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good step towards greater flexibility! LGTM and addresses #99

@ilopezgp
Copy link
Contributor Author

Closes #99 (the specific problems initially raised).

@ilopezgp
Copy link
Contributor Author

bors r+

@bors
Copy link
Contributor

bors bot commented Jan 27, 2022

Build succeeded:

@bors bors bot merged commit db54e9f into main Jan 27, 2022
@bors bors bot deleted the abstract_types_1 branch January 27, 2022 20:10
bors bot added a commit to CliMA/CalibrateEmulateSample.jl that referenced this pull request Apr 29, 2022
132: [WIP] Substitute types with abstractions r=tsj5 a=tsj5

This PR implements the abstract typing done in CliMA/EnsembleKalmanProcesses.jl#100, e.g. `Array{FT, 2}` → `AbstractMatrix{FT}`, in order to be consistent with that dependency.

See the discussion concerning performance at that PR; use of abstract types is [recommended against](https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-fields-with-abstract-containers) for perf reasons, but the rationale here is that the code is essentially "glue" rather than numerical routines appearing in hot loops, so writing for generality over perf is justified. 
Another downside is that existing abstract types aren't "abstract" enough, e.g. `UniformScaling` is not a subtype of `AbstractMatrix` and must be handled separately.

As a benefit, the changes made here result in some method signatures being more strongly typed than they are in `master`, allowing us to replace repeated code with multiple dispatch ("Don't Repeat Yourself"). 

`MCMC` is changed to a mutable struct, instead of continuing the current code's practice of enabling mutability by making fields of the struct 1x1 Arrays instead of scalars. This change is a moot point, however, since it will be overridden by PR #130.

Co-authored-by: Thomas Jackson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Covariance matrix obs_noise_cov is restricted to Array{FT, 2}
5 participants