-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HyperAI integration: orchestrator and service connector #2372
Merged
stefannica
merged 140 commits into
zenml-io:develop
from
christianversloot:hyperai-integration
Feb 6, 2024
Merged
HyperAI integration: orchestrator and service connector #2372
stefannica
merged 140 commits into
zenml-io:develop
from
christianversloot:hyperai-integration
Feb 6, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
schustmi
approved these changes
Feb 2, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🦭
New dependencies detected. Learn more about Socket for GitHub ↗︎
|
src/zenml/integrations/hyperai/orchestrators/hyperai_orchestrator.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/hyperai/orchestrators/hyperai_orchestrator.py
Outdated
Show resolved
Hide resolved
…le and fix is_remote
…nml into hyperai-integration
…nml into hyperai-integration
kabinja
pushed a commit
to kabinja/zenml
that referenced
this pull request
Feb 6, 2024
* Add init for HyperAI integration * WIP: HyperAI service connector * WIP * WIP: HyperAI Service Connector * WIP: HyperAI Orchestrator * Replace Docker compose write with temporary file and SCP * Variable assignment error * Set dependency * Set basic values of the HyperAI settings and config * Add config property * Allow mounts to be made * Remove newline * Finish (untested) orchestrator * Import HyperAI integration * Import HyperAI service connector in service connector registry * Rename resource type * Rename auth method * Force key to be base64 * Fixes to service connector * Identify instance by name and IP address * Strip IP address Python * Strip IP address Python * Return paramiko client * WIP * Mimic sagemaker integration * Fixes to make HyperAI orchestrator visible * Fixes to make orchestrator work * Temp change default local ip for testing * Environment fix * Use upstream steps to determine dependencies * Add support for scheduled pipelines * Polish schedules * Add configuration support for multiple Paramiko key types * Add Base64 instructions * Rename various vars * Add instructions about possible cron * Some docstring edits * Add setting for CR autologin * Add rudimentary Docker login * Move value * Add docstring * Remove unused def * Extract Paramiko key type given service connector configuration * Add better warnings * Check for None differently * Automatic Docker login if configured * Add HyperAI orchestrator flavor to docs * Basic docs for HyperAI orchestrator * Add HyperAI service connector to auth management docs * Add HyperAI service connector to docs * Set autologin to False by default * Add test similar to Airflow orchestrator * Formatting * Revert changes needed to run successfully locally * Add mount path validation * Improve error handling and formatting * Format mount paths differently * Upgrade azureml-core to 1.54.0.post1 * Fix docstring * Update src/zenml/integrations/hyperai/service_connectors/hyperai_service_connector.py Co-authored-by: Michael Schuster <[email protected]> * Rename def into _validate_mount_paht * Update config docstring to default to False * Move Settings, Config and Flavor to lavor folder * Remove type from docstring * Remove type from docstring * Remove type check convered by pydantic * Select container registry more efficiently * Remove redundant type conversion * Move Paramiko client creation into helper method * Reformatting * Fix imports * Temp changes for local testing * Fix imports * Revert "Temp changes for local testing" This reverts commit 76fdb29. * Rename HYPERAI_RESOURCE_TYPE into hyperai-instance * Rename ip_address into hostname * Update src/zenml/integrations/hyperai/service_connectors/hyperai_service_connector.py Co-authored-by: Stefan Nica <[email protected]> * Raise AuthorizationException if client cannot be created * Remove RuntimeError in two places because it will never arrive in that state anymore * Remove try/catch statement * Let exception fall through if applicable * Remove raises * Add warning hint about long-lived credentials * Renames in docs based on changes * Add missing io import * Formatting * Add automatic_cleanup_pipeline_files to HyperAIOrchestratorConfig * Remove redundant variable assignment * Clean only if users configure auto cleaning * Update docs * Work in progress: multi IP service connector * Resources * Append hostname instead * Omit assigning value * Rename config value * Ensure that hostname is passed to Paramiko client * Raise NotImplementedError instead of pass value * Formatting * Changes to _verify * Reflect changes in service connector docs * Fix connector value validation to allow arrays to be used with the CLI * Reflect changes in orchestrator docs * Fix connector verification to allow the multi-instance case * Ensure that pipelines can run when scheduled by setting run ID dynamically * Reformatting * Add information about scheduled pipelines to docs * Use service connector username to create Compose files on instance * Add GPU reservation if configured that way * Formatting * Add instruction * Add prerequisites for HyperAI instance * Formatting and docstrings * Fixed remaining linter errors * Applied review suggestions * Add paramiko to API docs mocks * HyperAI orchestrator config tests; make additional assertions available and fix is_remote * Remove GPU-based Dockerfile * Ensure that shell commands are escaped when used * Provide password to stdin differently * Escape case where file cannot be written to HyperAI instance * Escape inputs differently * Use network mode host to avoid non-overlapping IPv4 network pool error * Disable security check for paramiko auto-add-policy * Changes to escaping * Silenced remaining security issues and fixed remaining linter errors --------- Co-authored-by: Michael Schuster <[email protected]> Co-authored-by: Stefan Nica <[email protected]> Co-authored-by: Alex Strick van Linschoten <[email protected]>
adtygan
pushed a commit
to adtygan/zenml
that referenced
this pull request
Mar 21, 2024
* Add init for HyperAI integration * WIP: HyperAI service connector * WIP * WIP: HyperAI Service Connector * WIP: HyperAI Orchestrator * Replace Docker compose write with temporary file and SCP * Variable assignment error * Set dependency * Set basic values of the HyperAI settings and config * Add config property * Allow mounts to be made * Remove newline * Finish (untested) orchestrator * Import HyperAI integration * Import HyperAI service connector in service connector registry * Rename resource type * Rename auth method * Force key to be base64 * Fixes to service connector * Identify instance by name and IP address * Strip IP address Python * Strip IP address Python * Return paramiko client * WIP * Mimic sagemaker integration * Fixes to make HyperAI orchestrator visible * Fixes to make orchestrator work * Temp change default local ip for testing * Environment fix * Use upstream steps to determine dependencies * Add support for scheduled pipelines * Polish schedules * Add configuration support for multiple Paramiko key types * Add Base64 instructions * Rename various vars * Add instructions about possible cron * Some docstring edits * Add setting for CR autologin * Add rudimentary Docker login * Move value * Add docstring * Remove unused def * Extract Paramiko key type given service connector configuration * Add better warnings * Check for None differently * Automatic Docker login if configured * Add HyperAI orchestrator flavor to docs * Basic docs for HyperAI orchestrator * Add HyperAI service connector to auth management docs * Add HyperAI service connector to docs * Set autologin to False by default * Add test similar to Airflow orchestrator * Formatting * Revert changes needed to run successfully locally * Add mount path validation * Improve error handling and formatting * Format mount paths differently * Upgrade azureml-core to 1.54.0.post1 * Fix docstring * Update src/zenml/integrations/hyperai/service_connectors/hyperai_service_connector.py Co-authored-by: Michael Schuster <[email protected]> * Rename def into _validate_mount_paht * Update config docstring to default to False * Move Settings, Config and Flavor to lavor folder * Remove type from docstring * Remove type from docstring * Remove type check convered by pydantic * Select container registry more efficiently * Remove redundant type conversion * Move Paramiko client creation into helper method * Reformatting * Fix imports * Temp changes for local testing * Fix imports * Revert "Temp changes for local testing" This reverts commit 76fdb29. * Rename HYPERAI_RESOURCE_TYPE into hyperai-instance * Rename ip_address into hostname * Update src/zenml/integrations/hyperai/service_connectors/hyperai_service_connector.py Co-authored-by: Stefan Nica <[email protected]> * Raise AuthorizationException if client cannot be created * Remove RuntimeError in two places because it will never arrive in that state anymore * Remove try/catch statement * Let exception fall through if applicable * Remove raises * Add warning hint about long-lived credentials * Renames in docs based on changes * Add missing io import * Formatting * Add automatic_cleanup_pipeline_files to HyperAIOrchestratorConfig * Remove redundant variable assignment * Clean only if users configure auto cleaning * Update docs * Work in progress: multi IP service connector * Resources * Append hostname instead * Omit assigning value * Rename config value * Ensure that hostname is passed to Paramiko client * Raise NotImplementedError instead of pass value * Formatting * Changes to _verify * Reflect changes in service connector docs * Fix connector value validation to allow arrays to be used with the CLI * Reflect changes in orchestrator docs * Fix connector verification to allow the multi-instance case * Ensure that pipelines can run when scheduled by setting run ID dynamically * Reformatting * Add information about scheduled pipelines to docs * Use service connector username to create Compose files on instance * Add GPU reservation if configured that way * Formatting * Add instruction * Add prerequisites for HyperAI instance * Formatting and docstrings * Fixed remaining linter errors * Applied review suggestions * Add paramiko to API docs mocks * HyperAI orchestrator config tests; make additional assertions available and fix is_remote * Remove GPU-based Dockerfile * Ensure that shell commands are escaped when used * Provide password to stdin differently * Escape case where file cannot be written to HyperAI instance * Escape inputs differently * Use network mode host to avoid non-overlapping IPv4 network pool error * Disable security check for paramiko auto-add-policy * Changes to escaping * Silenced remaining security issues and fixed remaining linter errors --------- Co-authored-by: Michael Schuster <[email protected]> Co-authored-by: Stefan Nica <[email protected]> Co-authored-by: Alex Strick van Linschoten <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe changes
HyperAI (hyperai.ai) has recently become one of our suppliers of GPU instances. Unfortunately, unlike the major public cloud providers, HyperAI does not yet have an SDK like
sagemaker
. Still, it is critical for us to keep using ZenML, as many of our pipelines are built that way. This PR implements a HyperAI integration by means of an orchestrator and service connector.Service connector:
paramiko
, it provides an authenticatedSSHClient
given the configuredip_address
(with an optionalinstance_name
serving as a nickname to distinguish multiple IP addresses),username
,base64_ssh_key
and optionallyssh_passphrase
.The orchestrator:
service_completed_successfully
depends_on
condition to compose a Docker Compose file which guarantees the order of execution, including more complex pipelines by usingstep.spec.upstream_steps
.Logo:
The code assumes a HyperAI logo to be present in a seemingly public bucket on your end. I can ask the HyperAI team to provide a proper logo that can be put in this bucket so that it's visible within the ZenML dashboard.
This way:
Testing:
As discussed with @htahir1 , we're actively awaiting HyperAI usage (and so is the HyperAI team, as they've been in touch as far as I know) so we'd prefer to start using this integration as soon as possible. Do note that this does not mean that we should merge it recklessly, we're just very excited :)
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes
Summary by CodeRabbit