Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot remove instruction: %all-reduce = f32[32]{0} all-reduce(f32[32]{0} %reduce.3), channel_id=1, replica_groups={{0}}, to_apply=%region_0.44, metadata={op_name="parallelize(train_step_shard_parallel)/jit(main)/reduce_sum[axes=(0, 1, 2)];"} #15114

Closed
huhuiqi7 opened this issue Jul 19, 2024 · 1 comment

Comments

@huhuiqi7
Copy link

huhuiqi7 commented Jul 19, 2024

I am testing the auto sharding function of the CNN network and encountered an error. I am using Python to call the xla_extension library compiled in XLA.

error log:
(MeshHostWorker pid=15170) 2024-07-15 11:02:36.208517: E external/xla/xla/status_macros.cc:54] INTERNAL: RET_CHECK failure (external/xla/xla/hlo/ir/hlo_computation.cc:318) IsSafelyRemovable(instruction) Cannot remove instruction: %all-reduce = f32[32]{0} all-reduce(f32[32]{0} %reduce.3), channel_id=1, replica_groups={{0}}, to_apply=%region_0.44, metadata={op_name="parallelize(train_step_shard_parallel)/jit(main)/reduce_sum[axes=(0, 1, 2)];"}

python code:
compiled = backend.compile(xla_computation_to_mlir_text(hlo.get_computation()), compile_options)

build command:
HERMETIC_PYTHON_VERSION=3.9 HTTP_PROXY=http://10.88.206.35:31166 HTTPS_PROXY=http://10.88.206.35:31166 bazel build --repo_env=HERMETIC_PYTHON_VERSION --test_output=all --spawn_strategy=sandboxed //xla/python:xla_extension --python_path=/root/miniconda3/envs/py39/bin/python --sandbox_debug

environment:
cuda=12.2, nccl=2.19.3-1+cuda12.2, clang=17.0.2, python=3.9.19, Ubuntu=20.04

@huhuiqi7
Copy link
Author

huhuiqi7 commented Aug 1, 2024

Fixed by updating jax version from 0.4.8 to 0.4.10

@huhuiqi7 huhuiqi7 closed this as completed Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant