Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use modern RNG #3041

Closed
wants to merge 5 commits into from
Closed

Use modern RNG #3041

wants to merge 5 commits into from

Conversation

flying-sheep
Copy link
Member

@flying-sheep flying-sheep commented Apr 30, 2024

  • Release notes not necessary because:

Seems like this code is super performance sensitive: Having a Python implementation of getrandbits in 8572ecb resulted in a slowdown:

Change Before [0d4554b] After [1b2d9dd] Ratio Benchmark (Parameter)
+ 15.2±0.03ms 31.7±0.1ms 2.09 preprocessing.time_highly_variable_genes

Copy link

codecov bot commented Apr 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.88%. Comparing base (8d046ff) to head (d9877c9).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3041      +/-   ##
==========================================
+ Coverage   75.87%   75.88%   +0.01%     
==========================================
  Files         110      110              
  Lines       12536    12542       +6     
==========================================
+ Hits         9512     9518       +6     
  Misses       3024     3024              
Files Coverage Δ
scanpy/_utils/__init__.py 75.41% <100.00%> (+0.31%) ⬆️

Copy link

scverse-benchmark bot commented Apr 30, 2024

Benchmark changes

Change Before [8d046ff] After [d9877c9] Ratio Benchmark (Parameter)
+ 15.1±0.05ms 17.2±0.1ms 1.14 tools.time_leiden

Comparison: https://github.com/scverse/scanpy/compare/8d046ff37e024ae88eadfb22ea8fd142a6b95aa1..d9877c996b655a236f14fc242717a637365cd7d8
Last changed: Tue, 4 Jun 2024 12:01:33 +0000

More details: https://github.com/scverse/scanpy/pull/3041/checks?check_run_id=25784166991

@flying-sheep flying-sheep added this to the 1.10.2 milestone Apr 30, 2024
@ilan-gold
Copy link
Contributor

@RubenVanEsch could you look at this as well for your issue?

I am not so sure if a 15% slowdown is acceptable given the fact that the answer to solving this could just be "use WSL," which we never got a response on because we go sidetracked. Could @RubenVanEsch or someone else confirm or not that the original problem is reproducible on WSL?

@flying-sheep flying-sheep self-assigned this May 2, 2024
@ilan-gold
Copy link
Contributor

Also @flying-sheep coming back to this - why doesn't this break tests? The underlying number generation mechanism is the same somehow? Or similar enough?

@RubenVanEsch
Copy link

@ilan-gold turns out i cant install WSL on my laptop after all, so unfortunately i cant check this.

@flying-sheep
Copy link
Member Author

The other way round: If using WSL would be a viable workaround, we don’t need this.

So if you can’t use WSL, it’s even more important that you help us by checking if this PR fixes things for you.

@ilan-gold ilan-gold modified the milestones: 1.10.2, 1.10.3 Jun 25, 2024
@flying-sheep flying-sheep removed this from the 1.10.3 milestone Aug 8, 2024
@patrick-nicodemus
Copy link

Hi, I cloned this repo, switched to modern-rng, and installed it with pip. I was able to reproduce the same error.

Exception ignored in: <class 'ValueError'>
Traceback (most recent call last):
    File "numpy\random\_generator.pyx", line 622, in numpy.random._generator.Generator.integers
    File "numpy\random\_bounded_integers.pyx", line 2881, in numpy.random._bounded_integers._rand_int32"
ValueError: high is out of bounds for int32

I am using numpy 1.26, which is the numpy version required by this branch.

@patrick-nicodemus
Copy link

The exception is raised from the C module in the igraph library which actually computes the clustering algorithm, GraphBase.community_leiden. So it may be a bug in igraph, or the incorrect arguments are being passed to igraph.

Here is some sample code to reproduce.

import numpy as np
import anndata as ad
import scanpy as sc

rng = np.random.default_rng()
counts = rng.integers(low=-1000,high=100,size=(100,1000))
counts = np.maximum(counts , 0)
adata = ad.AnnData(counts)
sc.tl.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata,flavor='igraph',n_iterations=2)

@flying-sheep
Copy link
Member Author

Hi Patrick, we’re quite busy with the scverse conference right now, so don’t be distraught if we can’t investigate this more right away.

I’ll link to your reproducers here from the issue, as a closed PR is a place few people ever look, and we’ll come back to this in a week or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

leiden alg with igraph flavor causes out of bounds freezing
4 participants