Skip to content

Releases: eozd/bnmf-algs

v0.2.1

29 May 14:02
Compare
Choose a tag to compare

bnmf-algs v0.2.1

In this release we implement a new parallel expectation-maximization algorithm and provide parallelizations to all BLD algorithms. Additionally, new allocation model utility functions and bugfixes are provided.

Highlights

  • A new expectation-maximization algorithm, online_EM, that performs matrix completion while inferring the hidden tensor S from the dataset. This new EM algorithm never explicitly stores S tensor itself, only its marginals along 3 axes; therefore, its memory footprint is much smaller than BLD algorithms. For this reason, it generally runs faster, as well.
  • All BLD and EM algorithms are now parallelized using OpenMP. Sampling-based algorithms still remain sequential due to their very short running times.
  • A new allocation model utility function called total_log_marginal that computes p(X|v) = \sum_S p(X|S)p(S|v) where v are the model parameters. total_log_marginal can be used for model selection purposes by finding the likelihood of the matrix X with respect to the given model parameters, including the model order (rank). However, this function calculates an exact sum; therefore, it has exponential time complexity and scales very badly for matrices greater than 4x4. In the upcoming releases, we intend to provide a Variational Bayes algorithm to perform this computation in a reasonable amount of time.
  • Various utility functions with some of them being:
    • multinomial_mode: Compute the mode of a multinomial distribution using Finucan's algorithm.
    • all_partitions: Compute all partitions of an integer N to K bins. Consecutive partitions differ only by 1 in two bins.
    • partition_change_indices: Compute all increment and decrement indices to apply to an existing partition to generate a new partition.

v0.2

28 Apr 10:38
Compare
Choose a tag to compare
v0.2 Pre-release
Pre-release

bnmf-algs v0.2

In this release we make major changes to library layout. In addition, parallel and much more efficient implementations of certain BLD algorithms and functions are provided.

Highlights

  • OpenMP and CUDA support on certain algorithms. When possible, parallelizations are performed by using Eigen devices. In other cases, parallel implementations are written manually. All the CUDA kernel sources are in cuda_src directory.
  • Time complexity of sampling a matrix without replacement is decreased from O(sum x MN) to O(sum x log(MN)) using a heap-based systematic sampling technique with beta-sampled intervals where (M, N) is the shape of the matrix. This makes all the sampling-based BLD algorithms (seq_greedy_bld, collapsed_gibbs, collapsed_icm) much faster.
  • An approximation to digamma function using its Taylor Expansion that runs twice as fast as GSL implementation and provides 10 digits of precision up to 1e-5 and much higher precision for inputs greater than 1.
  • bnmf-algs is now header-only except CUDA and benchmark parts. This allows the library to be used with arbitrary floating-point types instead of enforcing the double type as matrix elements. Additionally, this makes using the library in other projects simpler. One needs to add the library path to include directories and link the project against GSL to build the library (if not using CUDA)
  • Two example projects are provided. These example projects demonstrate how to build the library with/without CUDA support.
  • A new CUDA implementation of bld_mult algorithm called bld_mult_cuda is provided. All the O(N^3) operations are performed on the GPU where faster O(N^2) updates are performed on the CPU. Additionally, CPU and GPU updates are interleaved, i.e. they update the matrices/tensors at the same time. This version provides a near 20% speed-up on a GeForce GT550M which is a very old and slow GPU.
  • A CUDA implementation of summing a tensor along its axes in cuda::tensor_sums. Since this operation is used by many BLD algorihtms, performing it on the GPU may help many of the upcoming CUDA BLD implementations.
  • CUDA memory helper classes that simplify CUDA copying and memory operations. These classes and functions provide the infrastructure to implement CUDA-based algorithms more easily and will be used in the upcoming CUDA BLD implementations.

Testing

  • Since we can't use GPUs on Travis, CUDA implementations are tested only locally.
  • All the OpenMP implementations are tested on Travis.

v0.1.1

28 Mar 09:48
Compare
Choose a tag to compare
v0.1.1 Pre-release
Pre-release

bnmf-algs v0.1.1

In this release we implement benchmarks for all NMF and BLD algorithms. There is also a minor improvement to bnmf_algs::bld::seq_greedy_bld to speed it up since the previous implementation was flawed in a major way.

Highlights

  • Benchmarks using Celero library.
    • Benchmark code is separated into modules. Adding a new benchmark is relatively simple by adding a new header file. This new benchmark group is automatically added to benchmark executable when the header file is included in benchmark.cpp
  • bnmf_algs::bld::seq_greedy_bld performance improvement
    • Previous version of bnmf_algs::bld::seq_greedy_bld used to allocate and deallocate a GSL rng object in each iteration. Since for dense matrices the number of iterations needed in seq_greedy_bld can be very high, algorithm performance drops dramatically.
    • In this version, we don't use the GSL rng to sample from a custom distribution but implement our own inverse CDF sampling. This is done by keeping a O(n) sized cumulative distribution array. Sampling an individual element is done in O(logn) time whereas updating the cumulative distribution is done in O(n) time.
    • In later versions, this sampling mechanism is going to be improved even further by performing an O(nlogn) time preprocessing step in the beginning and then performing sampling and updating in O(logn) time each.

v0.1.0

19 Mar 09:24
Compare
Choose a tag to compare
v0.1.0 Pre-release
Pre-release

bnmf-algs v0.1.0

This is the first release of bnmf-algs library with support of non-optimized versions of Nonnegative Matrix Factorization (NMF) and Best Latent Decomposition (BLD) algorithms.

Highlights

  • Modern C++ API using Eigen Matrix and Tensor classes.
  • A fully vectorized and parallelized (using Eigen and openmp) NMF implementation with beta divergence under bnmf_algs::nmf namespace.
  • Memory and CPU efficient bnmf_algs::util::Generator class developed by taking python generators as a role model.
  • Matrix Allocation Model related functions
    • bnmf_algs::allocation_model::bnmf_priors
    • bnmf_algs::allocation_model::sample_S
    • bnmf_algs::allocation_model::log_marginal_S
    • bnmf_algs::allocation_model::sample_ones
  • Best Latent Decomposition (BLD) solvers and auxiliaries
    • bnmf_algs::bld::seq_greedy_bld
    • bnmf_algs::bld::bld_mult
    • bnmf_algs::bld::bld_add
    • bnmf_algs::bld::bld_appr
    • bnmf_algs::bld::collapsed_icm
    • bnmf_algs::bld::collapsed_gibbs
    • bnmf_algs::bld::bld_fact
  • Various helper and utility functions

Documentation

bnmf-algs library is extensively documented. Inputs, outputs, possible exceptions thrown by bnmf-algs code are all documented and automatic doxygen documentation is generated. Developers can refer to this documentation by visiting bnmf-algs.readthedocs.io.

Tests

bnmf-algs library is extensively tested. Building information can be seen on travis and code coverage can be seen on codecov.

Future Work

This is a list of future changes to bnmf-algs that is very far from being complete

  • Time and memory optimizations for BLD algorithms
  • Vectorization and parallelization for BLD algorithms
  • New samplers for Allocation Model and new solvers for BLD
  • More tests for BLD algorithms to ensure stricter correctness
  • Python bindings for bnmf-algs using either Python C-API or a library similar to Boost.python or pybind.
  • Many more extensions and improvements that can't be foreseen just yet