Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

VertexAICustomTrainingJob does not have accelerator_count #175

Closed
1 task done
jeremy-thomas-roc opened this issue Apr 24, 2023 · 0 comments · Fixed by #174
Closed
1 task done

VertexAICustomTrainingJob does not have accelerator_count #175

jeremy-thomas-roc opened this issue Apr 24, 2023 · 0 comments · Fixed by #174

Comments

@jeremy-thomas-roc
Copy link
Contributor

jeremy-thomas-roc commented Apr 24, 2023

Expectation / Proposal

Using the VertexAICustomTrainingJob class to attach a GPU to a custom training job would work. It turns out that the MachineSpec submitted does not include an accelerator_count, which means that specifying an accelerator_type breaks this block

Traceback / Example

"Submission failed. Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 72, in error_remapped_callable return callable_(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1030, in call return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 910, in _end_unary_response_blocking raise _InactiveRpcError(state) # pytype: disable=not-instantiable grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.INVALID_ARGUMENT details = "List of found errors: 1.Field: job_spec.worker_pool_specs[0].machine_spec.accelerator_type; Message: Both accelerator_type and accelerator_count should be specified or none. " debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.69.95:443 {created_time:"2023-04-24T13:37:52.582665137+00:00", grpc_status:3, grpc_message:"List of found errors:\t1.Field: job_spec.worker_pool_specs[0].machine_spec.accelerator_type; Message: Both accelerator_type and accelerator_count should be specified or none.\t"}" > The above exception was the direct cause of the following exception: google.api_core.exceptions.InvalidArgument: 400 List of found errors: 1.Field: job_spec.worker_pool_specs[0].machine_spec.accelerator_type; Message: Both accelerator_type and accelerator_count should be specified or none. [field_violations { field: "job_spec.worker_pool_specs[0].machine_spec.accelerator_type" description: "Both accelerator_type and accelerator_count should be specified or none." } ]"

I opened #174 for this fix

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant