-
Notifications
You must be signed in to change notification settings - Fork 26
feat: add kill_infrastructure
implementation for vertex ai worker
#213
Conversation
|
||
# Vertex will include an error message upon valid | ||
# flow cancellations, so we'll avoid raising an error in that case | ||
if error_msg and "CANCELED" not in error_msg: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mock = job_config.credentials.job_service_client.create_custom_job | ||
# the CancelCustomJobRequest class seems to reject a MagicMock value | ||
# so here, we'll use a SimpleNamespace as the mocked return values | ||
mock.return_value = SimpleNamespace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not fully understanding why, but returning a plain MagicMock here with the correct name
and state
keys would cause the test to fail, somewhere in the CancelCustomJobRequest
class
self = <[AttributeError('Unknown field for CancelCustomJobRequest: _pb') raised in repr()] CancelCustomJobRequest object at 0x168bb4490>
mapping = {'name': <MagicMock name='mock_name.name' id='5749201248'>}, ignore_unknown_fields = False, kwargs = {'name': <MagicMock name='mock_name.name' id='5749201248'>}
params = {'name': <MagicMock name='mock_name.name' id='5749201248'>}, marshal = <proto.marshal.marshal.Marshal object at 0x106f4c790>, key = 'name'
value = <MagicMock name='mock_name.name' id='5749201248'>, pb_value = <MagicMock name='mock_name.name' id='5749201248'>
so, here i just use a SimpleNamespace
to return the right key values that can be access via dot notation when it's being passed into CancelCustomJobRequest
downstream
title="Accelerator Count", | ||
description=( | ||
"The number of accelerators to attach to the machine. " | ||
"See https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec" | ||
), | ||
default=0, | ||
example=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unrelated to the kill infra path, but something i found while testing this further
both the accelerator_*
params are optional, and they have an XOR relationship - both have to exist, or both have to be null. so, i swapped these to be optional + removed the keys from the job body validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these to be optional in the UI, they'll need a default value of None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok gotcha. Do you get errors in the worker when you don't fill in those fields? I'd expect Pydantic to complain that no value is provided if there's no default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's so weird, i would expect that as well. but i just created a new work pool and started the worker, and:
$ prefect work-pool create 'a-new-pool' --type vertex-ai
Created work pool 'a-new-pool'.
$ prefect worker start -p a-new-pool
Discovered type 'vertex-ai' for work pool 'a-new-pool'.
Worker 'VertexAIWorker b1b27bbe-1070-4f5a-9ccc-f63a6e7adad1' started!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wouldn't see the error until a flow run is prepared to be run because that's when a configuration object is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i didn't see it at all during my e2e testing, where i created a bunch of vertex jobs. does it make a difference that these properties are on the variables class, not the configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw i added the default None
values in, this thread is more for my understanding / sanity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂️ I didn't realize this is the variables class. No objects are ever created from this class, so you wouldn't see that error. It all makes sense now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok whew, sanity restored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks great! I have one small comment and I'm going to push some changelog updates to this branch so that we can release once this is merged!
title="Accelerator Count", | ||
description=( | ||
"The number of accelerators to attach to the machine. " | ||
"See https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec" | ||
), | ||
default=0, | ||
example=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these to be optional in the UI, they'll need a default value of None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
Closes #212
Followup to #211
Example
Screenshots
Checklist
pre-commit
checks.pre-commit install && pre-commit run --all
locally for formatting and linting.mkdocs serve
view documentation locally.