-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make strings_to_categoricals
upon .write()
optional
#1474
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1474 +/- ##
==========================================
- Coverage 86.69% 82.80% -3.89%
==========================================
Files 37 37
Lines 5937 5939 +2
==========================================
- Hits 5147 4918 -229
- Misses 790 1021 +231
|
Just chipping in to say that while I think this would be great, it doesn't really solve scverse/scanpy#1747 as scanpy functions will still cause this conversion (unless something has changed). So this helps if you are just reading/writing You probably knew this already but wanted to make it clear for anyone reading this later. |
Great that you're also onboard, @lazappi!
Yes, you're right about the Scanpy plotting functionality. But I think the issue is more pressing for anndata, given its widespread use outside of Scanpy. |
@flying-sheep I am disabling auto-merge. Isaac brought up a good point about whether this should be in |
We agreed on the following scope for settings (I’ll add checks for things that apply here)
I think we didn’t fully agree on if this should be an “and” or an “or” for things to go into settings. I think IO is localized enough that it doesn’t need a global setting, but I have no strong opinion here. |
@ivirshup your call then. |
I am fine leaving this as-is @flying-sheep so will enable auto-merge |
I am comfortable with "or" here |
…()` optional
…ional (#1584) Co-authored-by: Alex Wolf <[email protected]>
To illustrate that it'd be a simple change, this PR adds the 3 lines for what @ivirshup suggested here 3 years ago:
In the two issues below, @grst, @adamgayoso, @lazappi & @ljjh20 expressed that "forced" sanitization is problematic.
I found these issues only now but also agree that a write function should not mutate the object that's about to be written. @chaichontat made me aware.
Things turn out particularly bad for gene symbols, which should remain strings even for performance reasons and compatibility with JS. Having 30k categories in a 32k dimensional genes vector is detrimental, not beneficial.