-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask dataframe support #823
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #823 +/- ##
==========================================
- Coverage 83.49% 83.33% -0.16%
==========================================
Files 34 32 -2
Lines 4441 4333 -108
==========================================
- Hits 3708 3611 -97
+ Misses 733 722 -11
|
…ndata into dask-dataframe
for more information, see https://pre-commit.ci
So, I've looked into the length thing a bit. It looks like there is still no way to include info on number of rows for a dask dataframe. This is tracked multiple places in the dask repo, but this issue looks most recent: dask/dask#5633 It's possible we can do something clever to work around this, like persisting the index of the data frame and doing length checks there. We could also not do length checks on dask dataframes until we try to compute, and error then. @ryan-williams, any chance you have thoughts here? Is it best to just wait on dask some more? |
Here is a gist with some code for reading a dataframe saved in AnnData to a dask DataFrame |
@ivirshup I've got a branch with your gist - I can start an issue for this but so far what I see is that:
|
This PR introduces support for Dask dataframes in anndata.
TODOs:
Related PR (Dask array support): #813
Contributors: @rahulbshrestha @syelman