Skip to content

Sketching-based matrix computations for numpy arrays

License

Notifications You must be signed in to change notification settings

positiveblue/randNLA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RandNLA

Introducction

Randomized matrix algorithms have been a hot topic in research the last years. Recent developments have shown their utility in large-scale machine learning and statistical data analysis applications.

RandNLA is an implementation of many Randomized algorithms for Numerical Linear Algebra on top of Numpy/Scipy.

Some of these methods are being implemented for libraries like scipy or scikit-learn. However, I could not find any widely used library implementing these methods, so I decided to implement it.

Motivation

Sketching is a way to compress matrices that preserve essential matrix properties. For some problems, sketches can be used to get faster ways to find high-precision solutions to the original problem. This tool can be used for least-squares and robust regression, eigenvector analysis, non-negative matrix factorization, etc...

The main idea of sketching matrices is not new. One of the most famous concepts behind the efficiency of random projection is the Johnson-Lindenstrauss lemma. It is used for random projections, and it has a "crude" implementation in scikit-learn

More recent work has been developed by Kenneth Clarkson and David Woodruff. In their paper Low Rank Approximation and Regression in Input Sparsity Time a new family of subspace embedding matrices is defined. The paper shows how those matrices can be used to obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and p-regression.

During my time at IBM Research Almaden, I have been worked on a xdata open source project for the last year called libSkylark. The library is suitable for general statistical data analysis and optimization applications, but it is heavily focused on distributed systems. The quality of the project is high but libSkylark is not as developer friendly as I would like. Even with bindings to python many people had troubles using the library.

Contributing

First off, thanks for taking the time to contribute!

Now, take a moment to be sure your contributions make sense to everyone else and please make sure to read the Contributing Guide before making a pull request.

Issue tracker

Found a problem? Want a new feature? First of all see if your issue or idea has already been reported. If it hasn't, just open a new clear and descriptive issue.

License

See the file LICENSE for information on the history of this software, terms & conditions for usage, and a DISCLAIMER OF ALL WARRANTIES.

About

Sketching-based matrix computations for numpy arrays

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages