Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear algebra library #138

Open
aalexandrov opened this issue Dec 22, 2015 · 9 comments
Open

Linear algebra library #138

aalexandrov opened this issue Dec 22, 2015 · 9 comments
Assignees

Comments

@aalexandrov
Copy link
Contributor

The examples folder has several algorithms that mix the DataBag abstraction with linear algebra. Currently, we use Breeze, but we might switch to something else if we agree that it is better.

Let's try to make a summary of the different pros and cons of the various options here:

Breeze

  • 👍 Delegates to native libraries (e.g. BLAS, LAPACK)
  • 👍 Seems to be quite popular already.
  • 👎 A bit clumsy type design.

Spire

  • 👍 Proper type design inspired by algebra systems.
  • 👎 Executes everything in Scala.

netlib-java (used by Breeze)

  • 👍 Seems to be gaining a lot of traction
  • 👎 Java based
@joroKr21
Copy link
Member

The netlib-java home page says that Breeze is built on top of it. Basically Breeze is a higher-level Scala wrapper. So I would vote against netlib-java.

Then we have on the one hand Breeze, which is used in MLlib for Spark, and on the other hand Spire, which is somewhat similar to Twitter's Algebird, which they say can be used on top of Scalding or Storm.

What we need to ask ourselves is if we want to completely translate the linear algebra API to DataBag comprehensions (slower, but all types can be supported) or just chunk the matrices/vectors into blocks that are forwarded locally to native libraries (faster, but only numerics can be supported). Ideally we would be able to handle (products of) numerics natively and fallback to the JVM for more complex data (with some warnings ofc).

@fschueler
Copy link
Contributor

Yes, netlib-java is very low-level and used by breeze.

I think numerics cover a big part of all usecases and I would vote for speed in their case. Ideally even for local execution (through breeze).

Nonetheless I like the approach by scalding/algebird very much. Allowing linear algebra operations on for example vectors of bloom filters sounds really cool.

@joroKr21
Copy link
Member

Oh God, I just realized that Breeze doesn't have the outer vector product 🤦

@aalexandrov
Copy link
Contributor Author

But you use this in ALS, don't you?

@joroKr21
Copy link
Member

No, I use Breeze only to invert the matrix. There's a ticket for the outer product on GitHub.

@aalexandrov aalexandrov modified the milestones: Jan 2016, Mar 2016 Feb 26, 2016
@aalexandrov aalexandrov modified the milestones: Mar 2016, Apr 2016 Apr 4, 2016
@aalexandrov aalexandrov assigned akunft and unassigned fschueler Apr 5, 2016
@aalexandrov
Copy link
Contributor Author

I'm leaving @akunft in charge of this.

@aalexandrov
Copy link
Contributor Author

I think we can close this, as the discussion has moved to #187. @stratosphere/emma-committers Does anybody object?

@fschueler
Copy link
Contributor

👍

@akunft
Copy link
Contributor

akunft commented Apr 20, 2016

New meta-issue in #188.

@aalexandrov aalexandrov modified the milestones: Apr 2016, May 2016 May 2, 2016
@akunft akunft added the LINALG label Jul 7, 2016
@aalexandrov aalexandrov modified the milestone: May 2016 Aug 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants