-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix for scanpy.tl.score_genes #3167
base: main
Are you sure you want to change the base?
Conversation
changes the ranking system so number of genes in each bin succeeds when data is scaled and when there are many zeros in dataset.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3167 +/- ##
=======================================
Coverage 76.54% 76.54%
=======================================
Files 109 109
Lines 12490 12492 +2
=======================================
+ Hits 9560 9562 +2
Misses 2930 2930
|
Hi, that certainly seems like an improvement! I took the liberty to split out two bugs: #3168 and #3169 We had a similar fix #2875, which is hidden behind a I think since that fix is not yet released, we should rename the flag and unify both fixes behind it. Could you please
|
for more information, see https://pre-commit.ci
Hi, Thank you!
Thanks again |
Sure, I’ll happily elaborate! As you can see, our test suite is failing. That’s because we have a test Now of course we’d also like to fix things eventually, so we’d implement your fix behind an option (so people have to opt-in to the changes). Eventually, we’d switch to the new behavior (likely scanpy 2.0). That’s why I propose to rename the Another thing: We need to test that this works as intended. Can you create a reproducer with built-in datasets (or synthetic data that you create using def test_score_genes():
adata = TODO # create test data here
gene_list = TODO
gene_pool = TODO
gene_list, gene_pool, get_subset = _check_score_genes_args(
adata, gene_list, gene_pool, use_raw=use_raw, layer=layer
)
bins = list(_score_genes_bins(
gene_list,
gene_pool,
ctrl_as_ref=False, # needs to be renamed
ctrl_size=50,
n_bins=25,
get_subset=get_subset,
))
assert 0 not in map(len, bins) |
The score_genes procedure currently uses a ranking system to split genes into bins of similar expression levels. The current approach fails with some datasets.