VertexAICustomTrainingJob does not have accelerator_count #175

jeremy-thomas-roc · 2023-04-24T13:53:14Z

Expectation / Proposal

Using the VertexAICustomTrainingJob class to attach a GPU to a custom training job would work. It turns out that the MachineSpec submitted does not include an accelerator_count, which means that specifying an accelerator_type breaks this block

Traceback / Example

"Submission failed. Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 72, in error_remapped_callable return callable_(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1030, in call return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 910, in _end_unary_response_blocking raise _InactiveRpcError(state) # pytype: disable=not-instantiable grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.INVALID_ARGUMENT details = "List of found errors: 1.Field: job_spec.worker_pool_specs[0].machine_spec.accelerator_type; Message: Both accelerator_type and accelerator_count should be specified or none. " debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.69.95:443 {created_time:"2023-04-24T13:37:52.582665137+00:00", grpc_status:3, grpc_message:"List of found errors:\t1.Field: job_spec.worker_pool_specs[0].machine_spec.accelerator_type; Message: Both accelerator_type and accelerator_count should be specified or none.\t"}" > The above exception was the direct cause of the following exception: google.api_core.exceptions.InvalidArgument: 400 List of found errors: 1.Field: job_spec.worker_pool_specs[0].machine_spec.accelerator_type; Message: Both accelerator_type and accelerator_count should be specified or none. [field_violations { field: "job_spec.worker_pool_specs[0].machine_spec.accelerator_type" description: "Both accelerator_type and accelerator_count should be specified or none." } ]"

I would like to help contribute a pull request to resolve this!

I opened #174 for this fix

The text was updated successfully, but these errors were encountered:

jeremy-thomas-roc mentioned this issue Apr 24, 2023

Add accelerator_count for VertexAICustomTrainingJob #174

Merged

5 tasks

desertaxle closed this as completed in #174 May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VertexAICustomTrainingJob does not have accelerator_count #175

VertexAICustomTrainingJob does not have accelerator_count #175

jeremy-thomas-roc commented Apr 24, 2023 •

edited

Loading

VertexAICustomTrainingJob does not have accelerator_count #175

VertexAICustomTrainingJob does not have accelerator_count #175

Comments

jeremy-thomas-roc commented Apr 24, 2023 • edited Loading

Expectation / Proposal

Traceback / Example

jeremy-thomas-roc commented Apr 24, 2023 •

edited

Loading