Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrastructure required for SMC #3

Open
25 tasks
alcrene opened this issue May 8, 2020 · 0 comments
Open
25 tasks

Infrastructure required for SMC #3

alcrene opened this issue May 8, 2020 · 0 comments
Milestone

Comments

@alcrene
Copy link
Collaborator

alcrene commented May 8, 2020

This issue documents the work plan to allow for sinn to be used for SMC (Sequential Monte Carlo) applications. Altough the project requiring SMC is currently on ice (and thus also the work on these features), we keep this list for future reference. All points below address at least one of the following:

  • Build more efficient computational graphs. For SMC on complex models this is likely not just “nice to have”, but necessary.
  • Standardize the constructs to reduce time spent writing interfacing code.
  • Simplify the constructs to reduce time spent relearning how to instantiate them.
  • Standardize conventions before I incur more technical debt.

DynArray updates

  • sampling_shape: Number of samples. Might as well allow for multiple sampling dimensions (same effort). Can be used for multiple batches, particle filter. Use empty tuple if there are no sampling dimensions.

Make History inherit from DynArray

  • History should no longer define the following functionality, but inherit them from DynArray:
    • update_function() -> compute()
    • Indexing functions (and have them use to DiscreteAxis)
    • locking function and attributes (incl. observed)
    • access ([]) and evaluation (())
    • compute() (renamed from _compute_up_to())
      This method is subclass-specific and takes care of caching.
      Different data structures will use cache differently. For example, a Series history stores the result of every calculation, will a LagSeries stores only the last few values.

History updates

  • Implement new types
    These types should fix, or at least greatly alleviate my memory issues. They should also translate into less a convoluted Theano, which is good for a) execution & compilation time and b) putting code on GPU.
    • LagFreeSeries: Stores only the most recently computed time.
      • Test: Caching works with single times, slices and arrays of times.
    • LagSeries: Stores only the last n time steps.

      Returns LagFreeSeries if lag is 0.
      • Test: Evaluating in past within lag does not change cache.
      • Test: Evaluating in past beyond lag returns Sinn.Expired (for now – easier to implement than recomputing and updating cache)
      • Test: Evaluating at future $t_i$ moves cache ahead so that values from $t_i - \text{lag}$ to $t_i$ are available.
    • VirtualHistory: Does not store anything
      • Basically just a wrapper around a function
      • Can be safely shared between samples/threads since nothing is cached.
    • ProbabilisticHistoryMixin:
      • Identifies history as probabilistic (i.e. PyMC3 distribution)
      • Sets history type to None
        • Disables cast to history's dtype in update. Replaces with an optional check when config.debug = True (Set this at instantiation – don't check value at every call to update)
        • stores original dtype as sampling_dtype
      • Adds make_sampling_history method:
        • Takes a sample_size or particles argument and adds a sampling dimension to the new history.
        • Wraps the compute method with a function calling sample on the result.
        • Sets history type to sampling_dtype
    • Add observed keyword argument to initializer
      • Sets the data
      • Locks the data
    • Use the NotLongerComputed & Expired return types appropriately.
      • Access ([]) may return
        • cached value
        • Sinn.NotComputed if value has not been computed
        • Sinn.NoLongerComputed(Sinn.NotComputed) if it must be recomputed
        • Sinn.Expired if it was computed and cannot be recomputed (for example, if it depends on a RNG.)
      • Evaluation (())
        • Returns: value or sinn.Expired.
        • Implemented as
          def __call__(self, t):
            r = self[t]
            if issubclass(r, NotComputed):
              r = self.compute(t)
            return r

Model updates

  • Allow any history in a model to be passed as a parameter, or created with a default type
    This will be used to create the “particle” and “probability” models we will need.
    • Create HistorySpec to define at class level the history names, shapes, and allowed/default types.
    • Default created histories use model's time axis.
    • Add make_sampling_model which creates a new model where some probabilistic histories are replaced by samplers.
      Args:
      • sampled_histories: histories it's conditioned on: these probabilistic histories are replaced by samplers

        Other probabilistic histories are left probabilistic

        Passing a non-probabilistic history has no effect

        Throw error if a given history is locked

        Special value all makes all prob. histories into samplers
      • conditioned_on: synonym for sampled_histories
      • sample_size or particles
    • Locked / observed histories are shared between models
    • VirtualHistory's are shared between models
    • Other histories are recreated
@alcrene alcrene added this to the v0.3 milestone May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant