Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid importing from dask_cudf.core #593

Merged
merged 3 commits into from
Apr 26, 2024

Conversation

rjzamora
Copy link
Member

This PR is intended to resolve test failures related to the recent dask-expr migration in dask.dataframe. I noticed that cuxfilter was importing from dask_cudf.core (no longer allowed when query-planning is enabled). There may be other issues as well.

@AjayThorve
Copy link
Member

AjayThorve commented Apr 25, 2024

@rjzamora adding the following env variable to both ./ci/test_python.sh and ci/test_wheel.sh should resolve the tests for now:

  1. ci/test_python.sh
DASK_DATAFRAME__QUERY_PLANNING=False pytest \
  1. ci/test_wheel.sh
DASK_DATAFRAME__QUERY_PLANNING=False python -m pytest -n 8 ./python/cuxfilter/tests

@rjzamora
Copy link
Member Author

Thanks @AjayThorve !

Just to clarify: My primary goal here is to find/fix the problems with DASK_DATAFRAME__QUERY_PLANNING=True. Therefore, It's fine to switch of query-planning to unblock CI, but in this particular PR, I'd like to make sure query-planning is enabled.

@AjayThorve AjayThorve added bug Something isn't working non-breaking Non-breaking change labels Apr 25, 2024
@AjayThorve
Copy link
Member

Thanks @AjayThorve !

Just to clarify: My primary goal here is to find/fix the problems with DASK_DATAFRAME__QUERY_PLANNING=True. Therefore, It's fine to switch of query-planning to unblock CI, but in this particular PR, I'd like to make sure query-planning is enabled.

okay, that makes sense! Thanks for the PR!

@rjzamora
Copy link
Member Author

I am only seeing a failure in one test now (cuxfilter/tests/charts/datashader/test_graph_assets.py::TestGraphAssets::test_calc_connected_edges).

As far as I can tell, we are comparing the following output DataFrame:

     x    y  color
0  1.0  1.0   11.0
1  1.0  1.0   11.0
2  NaN  NaN    NaN
3  1.0  1.0   10.0
4  0.0  0.0   10.0
5  NaN  NaN    NaN

...to an "expected" DataFrame:

     x    y  color
0  1.0  1.0   10.0
1  0.0  0.0   10.0
2  NaN  NaN    NaN
3  1.0  1.0   11.0
4  1.0  1.0   11.0
5  NaN  NaN    NaN

Before I dig in too far here, is it possible that both results are correct? That is, does the order of the rows matter here, or just the relationship between "x", "y" and "color"? The new dask_cudf API now uses more upstream algorithms than the legacy API did. So, the ordering of some results may be different.

@rjzamora rjzamora changed the title [WIP] Avoid importing from dask_cudf.core Avoid importing from dask_cudf.core Apr 25, 2024
@rjzamora rjzamora marked this pull request as ready for review April 25, 2024 19:59
@rjzamora rjzamora requested a review from a team as a code owner April 25, 2024 19:59
)

assert res.to_pandas().equals(result.to_pandas())
assert_eq(res, result, check_divisions=False, check_index=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only use check_index=False if the ordering of rows is not important. Not sure if this is the case for calc_connected_edges, but I have a feeling it might be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts @AjayThorve ? Seems like this PR resolves query-planning failures if this test can be modified. Otherwise, I'll need to investigate the differences in row ordering further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the ordering is not important for calc_connected_edges, so it's safe to use check_index=False.

Copy link
Member

@AjayThorve AjayThorve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Thanks for this @rjzamora

@AjayThorve
Copy link
Member

@rjzamora should I go ahead and merge this? Lgtm.

@rjzamora
Copy link
Member Author

should I go ahead and merge this? Lgtm.

Seems fine to me as long as you think the test_graph_assets.py change makes sense :)

@AjayThorve
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit 41819ce into rapidsai:branch-24.06 Apr 26, 2024
30 checks passed
@rjzamora rjzamora deleted the avoid-core-imports branch April 26, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants