Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fails #14352

Open
AleksKnezevic opened this issue Jul 2, 2024 · 43 comments
Open

Build fails #14352

AleksKnezevic opened this issue Jul 2, 2024 · 43 comments

Comments

@AleksKnezevic
Copy link

AleksKnezevic commented Jul 2, 2024

I'm trying to build xla from source for CPU following the instructions here and it's failing with:

xla/service/gpu/runtime/command_buffer_cmd_emitter.cc:32:10: fatal error: 'xla/service/gpu/runtime/gpublas_lt_matmul_thunk.h' file not found
   32 | #include "xla/service/gpu/runtime/gpublas_lt_matmul_thunk.h"

I'm on an ubuntu 20.04 machine running a TF docker as the guide recommends. I have bazel 6.5.0 and clang 17. Is this config supported? Anything better to use?

If I run build with -s I see that -iquote . is included, and xla/service/gpu/runtime/gpublas_lt_matmul_thunk.h is there. Here's the full subcommand output:

SUBCOMMAND: # //xla/mlir_hlo:transforms_passes [action 'Compiling xla/mlir_hlo/transforms/alloc_to_arg_pass.cc', configuration: 94361239bbff404c4f64fc2130c27f83352dcbf11f78849ccfba5713e8af8cb0, execution platform: @local_execution_config_platform//:platform]
(cd /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/execroot/xla && \
  exec env - \
    CLANG_COMPILER_PATH=/usr/lib/llvm-17/bin/clang \
    DOCKER_CACHEBUSTER=1719550690218542678 \
    LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 \
    PATH=/root/.cache/bazelisk/downloads/bazelbuild/bazel-6.5.0-linux-x86_64/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python3 \
    TF2_BEHAVIOR=1 \
  /usr/lib/llvm-17/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++14' -MD -MF bazel-out/k8-opt/bin/xla/mlir_hlo/_objs/transforms_passes/alloc_to_arg_pass.d '-frandom-seed=bazel-out/k8-opt/bin/xla/mlir_hlo/_objs/transforms_passes/alloc_to_arg_pass.o' '-DLLVM_ON_UNIX=1' '-DHAVE_BACKTRACE=1' '-DBACKTRACE_HEADER=<execinfo.h>' '-DLTDL_SHLIB_EXT=".so"' '-DLLVM_PLUGIN_EXT=".so"' '-DLLVM_ENABLE_THREADS=1' '-DHAVE_DEREGISTER_FRAME=1' '-DHAVE_LIBPTHREAD=1' '-DHAVE_PTHREAD_GETNAME_NP=1' '-DHAVE_PTHREAD_H=1' '-DHAVE_PTHREAD_SETNAME_NP=1' '-DHAVE_REGISTER_FRAME=1' '-DHAVE_SETENV_R=1' '-DHAVE_STRERROR_R=1' '-DHAVE_SYSEXITS_H=1' '-DHAVE_UNISTD_H=1' -D_GNU_SOURCE '-DHAVE_LINK_H=1' '-DHAVE_MALLINFO=1' '-DHAVE_SBRK=1' '-DHAVE_STRUCT_STAT_ST_MTIM_TV_NSEC=1' -DHAVE_BUILTIN_THREAD_POINTER '-DLLVM_NATIVE_ARCH="X86"' '-DLLVM_NATIVE_ASMPARSER=LLVMInitializeX86AsmParser' '-DLLVM_NATIVE_ASMPRINTER=LLVMInitializeX86AsmPrinter' '-DLLVM_NATIVE_DISASSEMBLER=LLVMInitializeX86Disassembler' '-DLLVM_NATIVE_TARGET=LLVMInitializeX86Target' '-DLLVM_NATIVE_TARGETINFO=LLVMInitializeX86TargetInfo' '-DLLVM_NATIVE_TARGETMC=LLVMInitializeX86TargetMC' '-DLLVM_NATIVE_TARGETMCA=LLVMInitializeX86TargetMCA' '-DLLVM_HOST_TRIPLE="x86_64-unknown-linux-gnu"' '-DLLVM_DEFAULT_TARGET_TRIPLE="x86_64-unknown-linux-gnu"' '-DLLVM_VERSION_MAJOR=19' '-DLLVM_VERSION_MINOR=0' '-DLLVM_VERSION_PATCH=0' '-DLLVM_VERSION_STRING="19.0.0git"' -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS '-DLLVM_HAS_AArch64_TARGET=1' '-DLLVM_HAS_AMDGPU_TARGET=1' '-DLLVM_HAS_ARM_TARGET=1' '-DLLVM_HAS_NVPTX_TARGET=1' '-DLLVM_HAS_PowerPC_TARGET=1' '-DLLVM_HAS_RISCV_TARGET=1' '-DLLVM_HAS_SystemZ_TARGET=1' '-DLLVM_HAS_X86_TARGET=1' '-DBLAKE3_USE_NEON=0' -DBLAKE3_NO_AVX2 -DBLAKE3_NO_AVX512 -DBLAKE3_NO_SSE2 -DBLAKE3_NO_SSE41 '-DBAZEL_CURRENT_REPOSITORY=""' -iquote . -iquote bazel-out/k8-opt/bin -iquote external/llvm-project -iquote bazel-out/k8-opt/bin/external/llvm-project -iquote external/stablehlo -iquote bazel-out/k8-opt/bin/external/stablehlo -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/transforms_passes -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/deallocation_passes -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/deallocation_passes_inc_gen -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/deallocation_utils -Ibazel-out/k8-opt/bin/external/llvm-project/mlir/_virtual_includes/ArithCanonicalizationIncGen -Ibazel-out/k8-opt/bin/external/llvm-project/mlir/_virtual_includes/AsmParserTokenKinds -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/mhlo_passes -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/chlo_legalize_to_hlo_inc_gen -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/hlo_legalize_to_stablehlo -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/map_stablehlo_to_hlo_op -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/mlir_hlo -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/canonicalize_inc_gen -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/convert_op_folder -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/hlo_ops_attrs_inc_gen -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/hlo_ops_common -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/hlo_ops_enums_inc_gen -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/hlo_ops_inc_gen -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/hlo_ops_pattern_inc_gen -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/hlo_ops_typedefs_inc_gen -Ibazel-out/k8-opt/bin/external/llvm-project/mlir/_virtual_includes/MLIRShapeCanonicalizationIncGen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/base -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/base_attr_interfaces_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/broadcast_utils -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/chlo_ops -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/chlo_attrs_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/chlo_enums_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/chlo_ops_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/stablehlo_type_inference -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/stablehlo_assembly_format -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/stablehlo_ops -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/stablehlo_attrs_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/stablehlo_enums_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/stablehlo_ops_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/stablehlo_types_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/version -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/legalize_to_linalg_utils -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/map_mhlo_to_scalar_op -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/legalize_to_standard_inc_gen -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/lower_complex_inc_gen -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/map_chlo_to_hlo_op -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/mhlo_pass_inc_gen -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/mhlo_rng_utils -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/mhlo_scatter_gather_utils -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/shape_component_analysis -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/stablehlo_legalize_to_hlo -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/type_conversion -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/unfuse_batch_norm -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/stablehlo_passes -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/linalg_passes -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/linalg_pass_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/stablehlo_pass_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/vhlo_ops -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/vhlo_attr_interfaces_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/vhlo_attrs_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/vhlo_enums_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/vhlo_op_interfaces_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/vhlo_ops_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/vhlo_types -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/vhlo_type_interfaces_inc_gen -Ibazel-out/k8-opt/bin/external/stablehlo/_virtual_includes/vhlo_types_inc_gen -Ibazel-out/k8-opt/bin/xla/mlir_hlo/_virtual_includes/transforms_passes_inc_gen -isystem external/llvm-project/llvm/include -isystem bazel-out/k8-opt/bin/external/llvm-project/llvm/include -isystem external/llvm-project/mlir/include -isystem bazel-out/k8-opt/bin/external/llvm-project/mlir/include -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-array-bounds -Wunused-result '-Werror=unused-result' -Wswitch '-Werror=switch' '-Wno-error=unused-but-set-variable' -DAUTOLOAD_DYNAMIC_KERNELS -Wno-sign-compare '-Wno-error=unused-command-line-argument' -Wno-gnu-offsetof-extensions '-std=c++17' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c xla/mlir_hlo/transforms/alloc_to_arg_pass.cc -o bazel-out/k8-opt/bin/xla/mlir_hlo/_objs/transforms_passes/alloc_to_arg_pass.o
@adeel10x
Copy link

adeel10x commented Jul 3, 2024

I got the same error.

[21,949 / 23,537] Compiling xla/service/gpu/runtime/command_buffer_cmd_emitter.cc; 14s processwrapper-sandbox ... (4 actions running)
ERROR: /xla/xla/service/gpu/runtime/BUILD:113:11: Compiling xla/service/gpu/runtime/command_buffer_cmd_emitter.cc failed: (Exit 1): clang failed: error executing command (from target //xla/service/gpu/runtime:command_buffer_cmd_emitter) /usr/lib/llvm-17/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics -fno-omit-frame-pointer -g0 ... (remaining 232 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
In file included from xla/service/gpu/runtime/command_buffer_cmd_emitter.cc:16:
In file included from ./xla/service/gpu/runtime/command_buffer_cmd_emitter.h:20:
In file included from ./xla/service/gpu/runtime/command_buffer_cmd.h:40:
In file included from ./xla/service/buffer_assignment.h:51:
In file included from ./xla/service/memory_space_assignment/memory_space_assignment.h:188:
./xla/service/memory_space_assignment/cost_analysis.h:118:3: warning: explicitly defaulted default constructor is implicitly deleted [-Wdefaulted-function-deleted]
  118 |   HloCostAnalysisCosts() = default;
      |   ^
./xla/service/memory_space_assignment/cost_analysis.h:120:26: note: default constructor of 'HloCostAnalysisCosts' is implicitly deleted because field 'hlo_cost_analysis_' of reference type 'const HloCostAnalysis &' would not be initialized
  120 |   const HloCostAnalysis& hlo_cost_analysis_;
      |                          ^
./xla/service/memory_space_assignment/cost_analysis.h:118:28: note: replace 'default' with 'delete'
  118 |   HloCostAnalysisCosts() = default;
      |                            ^~~~~~~
      |                            delete
xla/service/gpu/runtime/command_buffer_cmd_emitter.cc:32:10: fatal error: 'xla/service/gpu/runtime/gpublas_lt_matmul_thunk.h' file not found
   32 | #include "xla/service/gpu/runtime/gpublas_lt_matmul_thunk.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning and 1 error generated.
INFO: Elapsed time: 34675.488s, Critical Path: 338.10s
INFO: 21953 processes: 7250 internal, 1 local, 14702 processwrapper-sandbox.
FAILED: Build did NOT complete successfully

@mooskagh
Copy link
Member

mooskagh commented Jul 4, 2024

It fails for me with a different error, but still fails.

I'll take a look whether it's something recent or it's broken for some time now.

@mooskagh
Copy link
Member

mooskagh commented Jul 4, 2024

The build of this particular file is broken since end of May. There are two more similar cases, when GPU files fail to build in CPU build.

If you try to build useful targets (rather than everything, //xla/...), the build is actually works.

I'll ask around what is the intended behaviour there, but likely it's one of the following:

  1. The targets that don't make sense for a given configuration, should somehow be excluded, so that "build //xla/..." doesn't pick them up.
  2. They should be built successfully, even if it doesn't make sense (e.g. by putting #ifdefs aroung GPU-specific header files and funcitons).
  3. Those rules are fine to fail, we should update build instructions and not mention //xla/... there.

@AleksKnezevic
Copy link
Author

Thanks @mooskagh. What is the command to build for cpu then? How do I check what's tested in CI to know what should be working?
I'm also having issues with mac os build.

@AleksKnezevic
Copy link
Author

AleksKnezevic commented Jul 5, 2024

I tried bazel build //xla/pjrt/... --spawn_strategy=sandboxed --test_output=all which works on linux, but fails on macos running on arm64 with:

ERROR: /private/var/tmp/_bazel_aknezevic/c8bf39b27849f05994a284a377632764/external/gloo/BUILD.bazel:91:11: Compiling gloo/transport/tcp/socket.cc failed: (Exit 1): wrapped_clang_pp failed: error executing command (from target @gloo//:transport_tcp) external/local_config_cc/wrapped_clang_pp '-D_FORTIFY_SOURCE=1' -fstack-protector -fcolor-diagnostics -Wall -Wthread-safety -Wself-assign -fno-omit-frame-pointer -g0 -O2 -DNDEBUG ... (remaining 39 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
external/gloo/gloo/transport/tcp/socket.cc:23:45: error: use of undeclared identifier 'SOCK_NONBLOCK'
  auto rv = socket(ai_family, SOCK_STREAM | SOCK_NONBLOCK, 0);
                                            ^
external/gloo/gloo/transport/tcp/socket.cc:65:18: error: non-constant-expression cannot be narrowed from type 'rep' (aka 'long long') to '__darwin_suseconds_t' (aka 'int') in initializer list [-Wc++11-narrowing]
      .tv_usec = (timeout.count() % 1000) * 1000,
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
external/gloo/gloo/transport/tcp/socket.cc:65:18: note: insert an explicit cast to silence this issue
      .tv_usec = (timeout.count() % 1000) * 1000,
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                 static_cast<__darwin_suseconds_t>( )

I see only certain parts of gloo are excluded for macos build (this not being one of them), so I assume this should work? Is there anything else I need to set up in my env?

@akuegel
Copy link
Member

akuegel commented Jul 5, 2024

Can you please try whether a49d8aa fixes the issue with compiling command_buffer_cmd_emitter.cc ?

@AleksKnezevic
Copy link
Author

@akuegel, what is the build command I should use? Just running bazel build //xla/.. --spawn_strategy=sandbozed --test_output=all I still see:

ERROR: /localdev/aknezevic/xla/xla/service/gpu/tests/BUILD:960:9: Compiling xla/service/gpu/tests/gpu_cub_sort_test.cc failed: (Exit 1): clang failed: error executing command (from target //xla/service/gpu/tests:gpu_cub_sort_test_gpu_h100) /usr/lib/llvm-14/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics -fno-omit-frame-pointer -g0 ... (remaining 220 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
xla/service/gpu/tests/gpu_cub_sort_test.cc:27:10: fatal error: 'xla/service/gpu/gpu_sort_rewriter.h' file not found
#include "xla/service/gpu/gpu_sort_rewriter.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@akuegel
Copy link
Member

akuegel commented Jul 5, 2024

@akuegel, what is the build command I should use? Just running bazel build //xla/.. --spawn_strategy=sandbozed --test_output=all I still see:

ERROR: /localdev/aknezevic/xla/xla/service/gpu/tests/BUILD:960:9: Compiling xla/service/gpu/tests/gpu_cub_sort_test.cc failed: (Exit 1): clang failed: error executing command (from target //xla/service/gpu/tests:gpu_cub_sort_test_gpu_h100) /usr/lib/llvm-14/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics -fno-omit-frame-pointer -g0 ... (remaining 220 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
xla/service/gpu/tests/gpu_cub_sort_test.cc:27:10: fatal error: 'xla/service/gpu/gpu_sort_rewriter.h' file not found
#include "xla/service/gpu/gpu_sort_rewriter.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

But that is a different error from before, so at least the other error is fixed now. Will look into the new error next week, unless someone else fixes it before then.

@AleksKnezevic
Copy link
Author

Sounds good, thanks!

@akuegel
Copy link
Member

akuegel commented Jul 10, 2024

cc2ec3f hopefully fixes this error related to not finding the gpu_sort_rewriter.h header

@akuegel
Copy link
Member

akuegel commented Jul 11, 2024

I did two more fixes today: 15f0014 and 3e5c57e
Please let me know if you still see some errors.

@AleksKnezevic
Copy link
Author

Thanks @akuegel! Those initial errors seem to have been resolved. I'm running into this now:

ERROR: /localdev/aknezevic/.cache/bazel/_bazel_aknezevic/f015ee4000db43b08516019840665d5e/external/shardy/shardy/dialect/sdy/ir/BUILD:145:11: Compiling shardy/dialect/sdy/ir/verifiers.cc failed: (Exit 1): clang failed: error executing command (from target @shardy//shardy/dialect/sdy/ir:dialect) /usr/lib/llvm-14/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics -fno-omit-frame-pointer -g0 ... (remaining 120 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:708:42: error: reference to local binding 'valueIndex' declared in enclosing function 'mlir::sdy::(anonymous namespace)::verifyManualComputationValue'
             << " sharding at index " << valueIndex
                                         ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:665:14: note: 'valueIndex' declared here
  for (auto [valueIndex, valueEntry] : llvm::enumerate(llvm::zip_equal(
             ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:708:42: error: reference to local binding 'valueIndex' declared in enclosing function 'mlir::sdy::(anonymous namespace)::verifyManualComputationValue<mlir::ValueTypeRange<mlir::OperandRange>, mlir::ValueTypeRange<llvm::MutableArrayRef<mlir::BlockArgument>>>'
             << " sharding at index " << valueIndex
                                         ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:788:14: note: in instantiation of function template specialization 'mlir::sdy::(anonymous namespace)::verifyManualComputationValue<mlir::ValueTypeRange<mlir::OperandRange>, mlir::ValueTypeRange<llvm::MutableArrayRef<mlir::BlockArgument>>>' requested here
  if (failed(verifyManualComputationValue(
             ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:665:14: note: 'valueIndex' declared here
  for (auto [valueIndex, valueEntry] : llvm::enumerate(llvm::zip_equal(
             ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:708:42: error: reference to local binding 'valueIndex' declared in enclosing function 'mlir::sdy::(anonymous namespace)::verifyManualComputationValue<mlir::ValueTypeRange<mlir::ResultRange>, mlir::TypeRange>'
             << " sharding at index " << valueIndex
                                         ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:791:14: note: in instantiation of function template specialization 'mlir::sdy::(anonymous namespace)::verifyManualComputationValue<mlir::ValueTypeRange<mlir::ResultRange>, mlir::TypeRange>' requested here
      failed(verifyManualComputationValue(
             ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:665:14: note: 'valueIndex' declared here
  for (auto [valueIndex, valueEntry] : llvm::enumerate(llvm::zip_equal(

Any thoughts?

@akuegel
Copy link
Member

akuegel commented Jul 11, 2024

@tomnatan30 as this is a compile error in shardy, maybe you can take a look?

@sixshotx
Copy link

Hey, I am also following similar instructions to build xla from source for the CPU backend and ran into the following error:

In file included from ./xla/service/memory_space_assignment/memory_space_assignment.h:188:
./xla/service/memory_space_assignment/cost_analysis.h:118:3: warning: explicitly defaulted default constructor is implicitly deleted [-Wdefaulted-function-deleted]
  118 |   HloCostAnalysisCosts() = default;
      |   ^
./xla/service/memory_space_assignment/cost_analysis.h:120:26: note: default constructor of 'HloCostAnalysisCosts' is implicitly deleted because field 'hlo_cost_analysis_' of reference type 'const HloCostAnalysis &' would not be initialized
  120 |   const HloCostAnalysis& hlo_cost_analysis_;
      |                          ^
./xla/service/memory_space_assignment/cost_analysis.h:118:28: note: replace 'default' with 'delete'
  118 |   HloCostAnalysisCosts() = default;
      |                            ^~~~~~~
      |                            delete
xla/service/gpu/fusions/triton.cc:41:10: fatal error: 'xla/service/gpu/ir_emitter_triton.h' file not found
   41 | #include "xla/service/gpu/ir_emitter_triton.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Am I missing anything? Thanks.

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 11, 2024 via email

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 11, 2024 via email

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 11, 2024 via email

@AleksKnezevic
Copy link
Author

I was just following the instruction on the build page:

./configure.py --backend=CPU
bazel build --test_output=all --spawn_strategy=sandboxed //xla/...

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 11, 2024 via email

@AleksKnezevic
Copy link
Author

hmm, just tried a clean compile with the same error. @sixshotx and @adeel10x are you seeing the same thing? You'll have to pull to latest main to get @akuegel's fixes.

@akuegel
Copy link
Member

akuegel commented Jul 12, 2024

@tomnatan30 I think this might be a gcc specific compile error. We have seen that in similar cases in the XLA code base. The fix should be to explicitly capture valueIndex here: https://github.com/openxla/shardy/blob/main/shardy/dialect/sdy/ir/verifiers.cc#L706

@akuegel
Copy link
Member

akuegel commented Jul 12, 2024

openxla/shardy#8 should hopefully fix this error

@akuegel
Copy link
Member

akuegel commented Jul 12, 2024

The fix has been merged in openxla/shardy@2da0ff5, @AleksKnezevic can you please give it another try?

Edit: nevermind, I think I need to also update the pin of shardy in the xla repo.

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 12, 2024 via email

@AleksKnezevic
Copy link
Author

Thanks @akuegel, lmk once it's in, I tried updating it to test but I'm getting a but an

 Error applying patch /localdev/aknezevic/xla/third_party/shardy/shardy.patch

error.

@akuegel
Copy link
Member

akuegel commented Jul 12, 2024

Thanks @akuegel, lmk once it's in, I tried updating it to test but I'm getting a but an

 Error applying patch /localdev/aknezevic/xla/third_party/shardy/shardy.patch

error.

The Patch file is obsolete If you update to the latest revision. So you can just remove everything in shardy.patch. Unfortunately the change to update the revision requires approval from someone in the US.

@AleksKnezevic
Copy link
Author

Awesome, building now. btw, what tool do you use to be able to see/browse third_party at the right commit and create these patches? I'm not too familiar with bazel, and used to git submodules

@AleksKnezevic
Copy link
Author

Now running into:

ERROR: /localdev/aknezevic/.cache/bazel/_bazel_aknezevic/f015ee4000db43b08516019840665d5e/external/shardy/shardy/dialect/sdy/ir/BUILD:113:11: Compiling shardy/dialect/sdy/ir/verifiers.cc failed: (Exit 1): clang failed: error executing command (from target @shardy//shardy/dialect/sdy/ir:dialect) /usr/lib/llvm-14/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics -fno-omit-frame-pointer -g0 ... (remaining 121 arguments skipped)
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:706:33: error: 'valueIndex' in capture list does not name a variable
    auto emitManualAxesError = [valueIndex, &valueKindStr, &op](
                                ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:709:42: error: reference to local binding 'valueIndex' declared in enclosing function 'mlir::sdy::(anonymous namespace)::verifyManualComputationValue'
             << " sharding at index " << valueIndex
                                         ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:665:14: note: 'valueIndex' declared here
  for (auto [valueIndex, valueEntry] : llvm::enumerate(llvm::zip_equal(
             ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:709:42: error: reference to local binding 'valueIndex' declared in enclosing function 'mlir::sdy::(anonymous namespace)::verifyManualComputationValue<mlir::ValueTypeRange<mlir::OperandRange>, mlir::ValueTypeRange<llvm::MutableArrayRef<mlir::BlockArgument>>>'
             << " sharding at index " << valueIndex
                                         ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:789:14: note: in instantiation of function template specialization 'mlir::sdy::(anonymous namespace)::verifyManualComputationValue<mlir::ValueTypeRange<mlir::OperandRange>, mlir::ValueTypeRange<llvm::MutableArrayRef<mlir::BlockArgument>>>' requested here
  if (failed(verifyManualComputationValue(
             ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:665:14: note: 'valueIndex' declared here
  for (auto [valueIndex, valueEntry] : llvm::enumerate(llvm::zip_equal(
             ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:709:42: error: reference to local binding 'valueIndex' declared in enclosing function 'mlir::sdy::(anonymous namespace)::verifyManualComputationValue<mlir::ValueTypeRange<mlir::ResultRange>, mlir::TypeRange>'
             << " sharding at index " << valueIndex
                                         ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:792:14: note: in instantiation of function template specialization 'mlir::sdy::(anonymous namespace)::verifyManualComputationValue<mlir::ValueTypeRange<mlir::ResultRange>, mlir::TypeRange>' requested here
      failed(verifyManualComputationValue(
             ^
external/shardy/shardy/dialect/sdy/ir/verifiers.cc:665:14: note: 'valueIndex' declared here
  for (auto [valueIndex, valueEntry] : llvm::enumerate(llvm::zip_equal(
             ^

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 12, 2024 via email

@AleksKnezevic
Copy link
Author

So this was with the latest commit in shardy:

/localdev/aknezevic/xla > git diff third_party/shardy/workspace.bzl
diff --git a/third_party/shardy/workspace.bzl b/third_party/shardy/workspace.bzl
index ca3f6421dc..29bdd6f908 100644
--- a/third_party/shardy/workspace.bzl
+++ b/third_party/shardy/workspace.bzl
@@ -3,13 +3,12 @@
 load("//third_party:repo.bzl", "tf_http_archive", "tf_mirror_urls")
 
 def repo():
-    SHARDY_COMMIT = "7afabee9bf7addaef719244fe0a605463738384d"
-    SHARDY_SHA256 = "7271375db347541dfd544fca1390b9aa700013849fba22c48b46bcf04a7f281a"
+    SHARDY_COMMIT = "2fe34232a42752d8ae972948db9fd48a6f33f412"
+    SHARDY_SHA256 = ""
 
     tf_http_archive(
         name = "shardy",
         sha256 = SHARDY_SHA256,
         strip_prefix = "shardy-{commit}".format(commit = SHARDY_COMMIT),
         urls = tf_mirror_urls("https://github.com/openxla/shardy/archive/{commit}.zip".format(commit = SHARDY_COMMIT)),
-        patch_file = ["//third_party/shardy:shardy.patch"],
     )

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 12, 2024 via email

@AleksKnezevic
Copy link
Author

And empty SHA256 just ignores the check, I believe, and I removed the patch_file line. The correct commit is applied, if you look at the errors above, they're different.

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 12, 2024 via email

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 12, 2024 via email

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 12, 2024 via email

@sixshotx
Copy link

@AleksKnezevic I can compile xla successfully now with clang 17.0.6; sync point is 886d989e.

@akuegel Thanks for the quick fix in the gpu directory!

@AleksKnezevic
Copy link
Author

@tomnatan30 on the latest commit I hit:

ERROR: /localdev/aknezevic/.cache/bazel/_bazel_aknezevic/f015ee4000db43b08516019840665d5e/external/shardy/shardy/dialect/sdy/transforms/propagation/BUILD:93:11: Compiling shardy/dialect/sdy/transforms/propagation/op_sharding_rule_registry.cc failed: (Exit 1): clang failed: error executing command (from target @shardy//shardy/dialect/sdy/transforms/propagation:op_sharding_rule_registry) /usr/lib/llvm-14/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics -fno-omit-frame-pointer -g0 ... (remaining 121 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
external/shardy/shardy/dialect/sdy/transforms/propagation/op_sharding_rule_registry.cc:321:36: error: reference to local binding 'lhsDim' declared in enclosing lambda expression
                builder.addFactor({lhsDim, addRhs ? rhsDim : kNullDim},
                                   ^
external/shardy/shardy/dialect/sdy/transforms/propagation/op_sharding_rule_registry.cc:302:22: note: 'lhsDim' declared here
          for (auto [lhsDim, rhsDim, outDim] :
                     ^
external/shardy/shardy/dialect/sdy/transforms/propagation/op_sharding_rule_registry.cc:321:53: error: reference to local binding 'rhsDim' declared in enclosing lambda expression
                builder.addFactor({lhsDim, addRhs ? rhsDim : kNullDim},
                                                    ^
external/shardy/shardy/dialect/sdy/transforms/propagation/op_sharding_rule_registry.cc:302:30: note: 'rhsDim' declared here
          for (auto [lhsDim, rhsDim, outDim] :
                             ^
external/shardy/shardy/dialect/sdy/transforms/propagation/op_sharding_rule_registry.cc:322:44: error: reference to local binding 'outDim' declared in enclosing lambda expression
                                  addOut ? outDim : kNullDim, factorSize);
                                           ^
external/shardy/shardy/dialect/sdy/transforms/propagation/op_sharding_rule_registry.cc:302:38: note: 'outDim' declared here
          for (auto [lhsDim, rhsDim, outDim] :
                     

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 12, 2024 via email

@AleksKnezevic
Copy link
Author

After pulling to latest main (which includes all of your changes @tomnatan30), I hit the following error:

ERROR: /localdev/aknezevic/xla/xla/service/gpu/model/fuzztest/BUILD:11:28: no such package 'tools': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package.
 - /localdev/aknezevic/xla/tools and referenced by '//xla/service/gpu/model/fuzztest:affine_grammar_source'
ERROR: Analysis of target '//xla/service/gpu/model/fuzztest:affine_grammar_source' failed; build aborted: Analysis failed
INFO: Elapsed time: 17.830s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (168 packages loaded, 688 targets configured)

@jreiffers, seems to be related to something you added. If I comment out this BUILD file the compile passes successfully. Thanks everyone for your help!

@tomnatan30
Copy link
Contributor

tomnatan30 commented Jul 15, 2024 via email

@jreiffers
Copy link
Member

Hm, this is a target generated by googlefuzztest. The relevant line is here. I think the problem is that this tool is within fuzztest, not within XLA, so the path is wrong when we generate the target within XLA. I'll try to find someone who built the tooling.

@akuegel
Copy link
Member

akuegel commented Jul 16, 2024

@jreiffers I have already sent a change to googlefuzztest that I hope will fix this.

@akuegel
Copy link
Member

akuegel commented Jul 18, 2024

#15060 is the attempt to update the revision of google_fuzztest that contains the fix. Unfortunately it is blocked because at head, google_fuzztest is now also using Riegeli, and we don't necessarily want to pull in that dependency as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants