Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): xarray with experimental backed reading #1247

Open
wants to merge 281 commits into
base: main
Choose a base branch
from

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Nov 30, 2023

This PR is a ligther weight version of #947 that involves using the original AnnData object as the class to hold obs and var xr.Dataset.

Copy link

codecov bot commented Dec 7, 2023

Codecov Report

Attention: Patch coverage is 88.58131% with 33 lines in your changes missing coverage. Please review.

Project coverage is 84.56%. Comparing base (f448eb2) to head (1540d27).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/anndata/experimental/backed/_lazy_arrays.py 86.15% 9 Missing ⚠️
src/anndata/_core/storage.py 37.50% 5 Missing ⚠️
src/anndata/tests/helpers.py 64.28% 5 Missing ⚠️
src/anndata/_io/specs/lazy_methods.py 91.30% 4 Missing ⚠️
src/anndata/experimental/backed/_compat.py 86.20% 4 Missing ⚠️
src/anndata/experimental/backed/_io.py 89.47% 4 Missing ⚠️
src/anndata/_core/aligned_df.py 80.00% 1 Missing ⚠️
src/anndata/_core/index.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1247      +/-   ##
==========================================
- Coverage   86.87%   84.56%   -2.31%     
==========================================
  Files          39       44       +5     
  Lines        6033     6303     +270     
==========================================
+ Hits         5241     5330      +89     
- Misses        792      973     +181     
Files with missing lines Coverage Δ
src/anndata/_core/anndata.py 83.77% <100.00%> (+0.04%) ⬆️
src/anndata/_core/merge.py 83.91% <ø> (-11.08%) ⬇️
src/anndata/_core/views.py 85.71% <100.00%> (-5.40%) ⬇️
src/anndata/_io/specs/__init__.py 100.00% <ø> (ø)
src/anndata/_io/specs/registry.py 95.53% <100.00%> (-0.50%) ⬇️
src/anndata/_io/zarr.py 83.75% <100.00%> (+0.20%) ⬆️
src/anndata/_types.py 85.29% <100.00%> (ø)
src/anndata/experimental/__init__.py 100.00% <100.00%> (ø)
src/anndata/experimental/backed/__init__.py 100.00% <100.00%> (ø)
src/anndata/experimental/backed/_xarray.py 100.00% <100.00%> (ø)
... and 8 more

... and 4 files with indirect coverage changes

@ilan-gold ilan-gold added this to the 0.12.0 milestone Aug 27, 2024
Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

I think there are tests missing for a bunch of moving parts like Dataset2D, but a lot of the public functionality seems to already be there and working well!

src/anndata/_io/specs/lazy_methods.py Outdated Show resolved Hide resolved
"""Coerce arrays stored in layers/X, and aligned arrays ({obs,var}{m,p})."""
# If value is a scalar and we allow that, return it
if allow_array_like and np.isscalar(value):
return value
# If value is one of the allowed types, return it
if isinstance(value, StorageType.classes()):

if isinstance(value, (*StorageType.classes(), Dataset2D)):
Copy link
Member

@flying-sheep flying-sheep Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that we don’t forget: let’s get rid of the circular import hack

elem_name = get_elem_name(elem)
index_label = f'{elem_name.replace("/", "")}_names'
index_key = elem.attrs["_index"]
index = elem_dict[index_key] # no sense in reading this in multiple times
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line can move into the helper

src/anndata/_io/specs/registry.py Outdated Show resolved Hide resolved
src/anndata/experimental/backed/_io.py Outdated Show resolved Hide resolved
Comment on lines +67 to +69
needs_xarray = pytest.mark.skipif(
not find_spec("xarray"), reason="Xarray is not installed"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just set pytestmark once at the top, then the whole file is skipped and you don’t have to annotate each individual function.

pytestmark = pytest.mark.skipif(
    not find_spec("xarray"), reason="Xarray is not installed"
)

tests/test_read_backed_experimental.py Outdated Show resolved Hide resolved
assert store.get_access_count("obs/int64") == 1, store.get_subkeys_accessed(
"obs/int64"
)
# one for 0, .zmetadata handles .zarray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t understand either part of this comment, maybe expand a bit

Comment on lines +188 to +190
assert store.get_access_count("obs/.zgroup") == 1, store.get_subkeys_accessed(
"obs/.zgroup"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you know, reading this pattern again and again, maybe we should make this a helper.

It could go directly into the AccessTrackingStore.

That way it’s not possible to ever use two different strings on the left and right side of this statement, which would probably cause hours of debugging.

tests/test_read_backed_experimental.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants