Skip to content

Commit

Permalink
HyperAI integration: orchestrator and service connector (zenml-io#2372)
Browse files Browse the repository at this point in the history
* Add init for HyperAI integration

* WIP: HyperAI service connector

* WIP

* WIP: HyperAI Service Connector

* WIP: HyperAI Orchestrator

* Replace Docker compose write with temporary file and SCP

* Variable assignment error

* Set dependency

* Set basic values of the HyperAI settings and config

* Add config property

* Allow mounts to be made

* Remove newline

* Finish (untested) orchestrator

* Import HyperAI integration

* Import HyperAI service connector in service connector registry

* Rename resource type

* Rename auth method

* Force key to be base64

* Fixes to service connector

* Identify instance by name and IP address

* Strip IP address Python

* Strip IP address Python

* Return paramiko client

* WIP

* Mimic sagemaker integration

* Fixes to make HyperAI orchestrator visible

* Fixes to make orchestrator work

* Temp change default local ip for testing

* Environment fix

* Use upstream steps to determine dependencies

* Add support for scheduled pipelines

* Polish schedules

* Add configuration support for multiple Paramiko key types

* Add Base64 instructions

* Rename various vars

* Add instructions about possible cron

* Some docstring edits

* Add setting for CR autologin

* Add rudimentary Docker login

* Move value

* Add docstring

* Remove unused def

* Extract Paramiko key type given service connector configuration

* Add better warnings

* Check for None differently

* Automatic Docker login if configured

* Add HyperAI orchestrator flavor to docs

* Basic docs for HyperAI orchestrator

* Add HyperAI service connector to auth management docs

* Add HyperAI service connector to docs

* Set autologin to False by default

* Add test similar to Airflow orchestrator

* Formatting

* Revert changes needed to run successfully locally

* Add mount path validation

* Improve error handling and formatting

* Format mount paths differently

* Upgrade azureml-core to 1.54.0.post1

* Fix docstring

* Update src/zenml/integrations/hyperai/service_connectors/hyperai_service_connector.py

Co-authored-by: Michael Schuster <[email protected]>

* Rename def into _validate_mount_paht

* Update config docstring to default to False

* Move Settings, Config and Flavor to lavor folder

* Remove type from docstring

* Remove type from docstring

* Remove type check convered by pydantic

* Select container registry more efficiently

* Remove redundant type conversion

* Move Paramiko client creation into helper method

* Reformatting

* Fix imports

* Temp changes for local testing

* Fix imports

* Revert "Temp changes for local testing"

This reverts commit 76fdb29.

* Rename HYPERAI_RESOURCE_TYPE into hyperai-instance

* Rename ip_address into hostname

* Update src/zenml/integrations/hyperai/service_connectors/hyperai_service_connector.py

Co-authored-by: Stefan Nica <[email protected]>

* Raise AuthorizationException if client cannot be created

* Remove RuntimeError in two places because it will never arrive in that state anymore

* Remove try/catch statement

* Let exception fall through if applicable

* Remove raises

* Add warning hint about long-lived credentials

* Renames in docs based on changes

* Add missing io import

* Formatting

* Add automatic_cleanup_pipeline_files to HyperAIOrchestratorConfig

* Remove redundant variable assignment

* Clean only if users configure auto cleaning

* Update docs

* Work in progress: multi IP service connector

* Resources

* Append hostname instead

* Omit assigning value

* Rename config value

* Ensure that hostname is passed to Paramiko client

* Raise NotImplementedError instead of pass value

* Formatting

* Changes to _verify

* Reflect changes in service connector docs

* Fix connector value validation to allow arrays to be used with the CLI

* Reflect changes in orchestrator docs

* Fix connector verification to allow the multi-instance case

* Ensure that pipelines can run when scheduled by setting run ID dynamically

* Reformatting

* Add information about scheduled pipelines to docs

* Use service connector username to create Compose files on instance

* Add GPU reservation if configured that way

* Formatting

* Add instruction

* Add prerequisites for HyperAI instance

* Formatting and docstrings

* Fixed remaining linter errors

* Applied review suggestions

* Add paramiko to API docs mocks

* HyperAI orchestrator config tests; make additional assertions available and fix is_remote

* Remove GPU-based Dockerfile

* Ensure that shell commands are escaped when used

* Provide password to stdin differently

* Escape case where file cannot be written to HyperAI instance

* Escape inputs differently

* Use network mode host to avoid non-overlapping IPv4 network pool error

* Disable security check for paramiko auto-add-policy

* Changes to escaping

* Silenced remaining security issues and fixed remaining linter errors

---------

Co-authored-by: Michael Schuster <[email protected]>
Co-authored-by: Stefan Nica <[email protected]>
Co-authored-by: Alex Strick van Linschoten <[email protected]>
  • Loading branch information
4 people authored and kabinja committed Feb 6, 2024
1 parent ce9a34f commit c1abd87
Show file tree
Hide file tree
Showing 22 changed files with 1,359 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,11 @@ zenml service-connector list-types
┃ │ │ 🌀 kubernetes-cluster │ service-account │ │ ┃
┃ │ │ 🐳 docker-registry │ oauth2-token │ │ ┃
┃ │ │ │ impersonation │ │ ┃
┠──────────────────────────────┼───────────────┼───────────────────────┼──────────────────┼───────┼────────┨
┃ HyperAI Service Connector │ 🤖 hyperai │ 🤖 hyperai-instance │ rsa-key │ ✅ │ ✅ ┃
┃ │ │ │ dsa-key │ │ ┃
┃ │ │ │ ecdsa-key │ │ ┃
┃ │ │ │ ed25519-key │ │ ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━┷━━━━━━━┷━━━━━━━━┛
```
{% endcode %}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
description: Configuring HyperAI Connectors to connect ZenML to HyperAI instances.
---

# HyperAI Service Connector

The ZenML HyperAI Service Connector allows authenticating with a HyperAI instance for deployment of pipeline runs. This connector provides pre-authenticated Paramiko SSH clients to Stack Components that are linked to it.

```
$ zenml service-connector list-types --type hyperai
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━━┓
┃ NAME │ TYPE │ RESOURCE TYPES │ AUTH METHODS │ LOCAL │ REMOTE ┃
┠───────────────────────────┼────────────┼────────────────────┼──────────────┼───────┼────────┨
┃ HyperAI Service Connector │ 🤖 hyperai │ 🤖 hyperai-instance │ rsa-key │ ✅ │ ✅ ┃
┃ │ │ │ dsa-key │ │ ┃
┃ │ │ │ ecdsa-key │ │ ┃
┃ │ │ │ ed25519-key │ │ ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━┷━━━━━━━┷━━━━━━━━┛
```

## Prerequisites
The HyperAI Service Connector is part of the HyperAI integration. It is necessary to install the integration in order to use this Service Connector:

* `zenml integration install hyperai` installs the HyperAI integration

## Resource Types
The HyperAI Service Connector supports HyperAI instances.

## Authentication Methods
ZenML creates an SSH connection to the HyperAI instance in the background when using this Service Connector. It then provides these connections to stack components requiring them, such as the HyperAI Orchestrator. Multiple authentication methods are supported:

1. RSA key based authentication.
2. DSA (DSS) key based authentication.
3. ECDSA key based authentication.
4. ED25519 key based authentication.

{% hint style="warning" %}
SSH private keys configured in the connector will be distributed to all clients that use them to run pipelines with the HyperAI orchestrator. SSH keys are long-lived credentials that give unrestricted access to HyperAI instances.
{% endhint %}

When configuring the Service Connector, it is required to provide at least one hostname via `hostnames` and the `username` with which to login. Optionally, it is possible to provide an `ssh_passphrase` if applicable. This way, it is possible to use the HyperAI service connector in multiple ways:

1. Create one service connector per HyperAI instance with different SSH keys.
2. Configure a reused SSH key just once for multiple HyperAI instances, then select the individual instance when creating the HyperAI orchestrator component.

## Auto-configuration

{% hint style="info" %}
This Service Connector does not support auto-discovery and extraction of authentication credentials from HyperAI instances. If this feature is useful to you or your organization, please let us know by messaging us in [Slack](https://zenml.io/slack-invite) or [creating an issue on GitHub](https://github.com/zenml-io/zenml/issues).
{% endhint %}

## Stack Components use

The HyperAI Service Connector can be used by the HyperAI Orchestrator to deploy pipeline runs to HyperAI instances.

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
description: Orchestrating your pipelines to run on HyperAI.ai instances.
---

# HyperAI orchestrator
[HyperAI](https://www.hyperai.ai) is a cutting-edge cloud compute platform designed to make AI accessible for everyone. The HyperAI orchestrator is an [orchestrator](orchestrators.md) flavor that allows you to easily deploy your pipelines on HyperAI instances.

{% hint style="warning" %}
This component is only meant to be used within the context of
a [remote ZenML deployment scenario](/docs/book/deploying-zenml/zenml-self-hosted/zenml-self-hosted.md).
Usage with a local ZenML deployment may lead to unexpected behavior!
{% endhint %}

### When to use it

You should use the HyperAI orchestrator if:

* you're looking for a managed solution for running your pipelines.
* you're a HyperAI customer.

### Prerequisites
You will need to do the following to start using the HyperAI orchestrator:

* Have a running HyperAI instance. It must be accessible from the internet (or at least from the IP addresses of your ZenML users) and allow SSH key based access (passwords are not supported).
* Ensure that a recent version of Docker is installed. This version must include Docker Compose, meaning that the command `docker compose` works.
* Ensure that the appropriate [NVIDIA Driver](https://www.nvidia.com/en-us/drivers/unix/) is installed on the HyperAI instance (if not already installed by the HyperAI team).
* Ensure that the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) is installed and configured on the HyperAI instance.

Note that it is possible to omit installing the NVIDIA Driver and NVIDIA Container Toolkit. However, you will then be unable to use the GPU from within your ZenML pipeline. Additionally, you will then need to disable GPU access within the container when configuring the Orchestrator component, or the pipeline will not start correctly.

## How it works
The HyperAI orchestrator works with Docker Compose, which can be used to construct machine learning pipelines.
Under the hood, it creates a Docker Compose file which it then deploys and executes on the configured HyperAI instance.
For each ZenML pipeline step, it creates a service in this file. It uses the `service_completed_successfully` condition
to ensure that pipeline steps will only run if their connected upstream steps have successfully finished.

If configured for it, the HyperAI orchestrator will connect the HyperAI instance to the stack's container registry to ensure
a smooth transfer of Docker images.

### Scheduled pipelines

[Scheduled pipelines](../../../user-guide/advanced-guide/pipelining-features/schedule-pipeline-runs.md) are supported by the HyperAI orchestrator. Currently, only cron expressions are supported via `cron_expression`. When pipeline runs are scheduled, they are added as a crontab entry
on the HyperAI instance.

### How to deploy it
To use the HyperAI orchestrator, you must configure a HyperAI Service Connector in ZenML and link it to the HyperAI orchestrator
component. The service connector contains credentials with which ZenML connects to the HyperAI instance.

Additionally, the HyperAI orchestrator must be used in a stack that contains a container registry and an image builder.

### How to use it

To use the HyperAI orchestrator, we must configure a HyperAI Service Connector first using one of its supported authentication
methods. For example, for authentication with an RSA-based key, create the service connector as follows:

```shell
zenml service-connector register <SERVICE_CONNECTOR_NAME> --type=hyperai --auth-method=rsa-key --base64_ssh_key=<BASE64_SSH_KEY> --hostnames=<INSTANCE_1>,<INSTANCE_2>,..,<INSTANCE_N> --username=<INSTANCE_USERNAME>
```

Hostnames are either DNS resolvable names or IP addresses.

For example, if you have two servers - one at `1.2.3.4` and another at `4.3.2.1`, you could provide them as `--hostnames=1.2.3.4,4.3.2.1`.

Optionally, it is possible to provide a passphrase for the key (`--ssh_passphrase`).

Following registering the service connector, we can register the orchestrator and use it in our active stack:

```shell
zenml orchestrator register <ORCHESTRATOR_NAME> --flavor=hyperai

# Register and activate a stack with the new orchestrator
zenml stack register <STACK_NAME> -o <ORCHESTRATOR_NAME> ... --set
```

You can now run any ZenML pipeline using the HyperAI orchestrator:

```shell
python file_that_runs_a_zenml_pipeline.py
```

#### Enabling CUDA for GPU-backed hardware

Note that if you wish to use this orchestrator to run steps on a GPU, you will need to
follow [the instructions on this page](/docs/book/user-guide/advanced-guide/infrastructure-management/scale-compute-to-the-cloud.md) to ensure
that it works. It requires adding some extra settings customization and is essential to enable CUDA for the GPU to
give its full acceleration.

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>

Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Additional orchestrators are provided by integrations:
| [SkypilotAWSOrchestrator](skypilot-vm.md) | `vm_aws` | `skypilot[aws]` | Runs your pipelines in AWS VMs using SkyPilot |
| [SkypilotGCPOrchestrator](skypilot-vm.md) | `vm_gcp` | `skypilot[gcp]` | Runs your pipelines in GCP VMs using SkyPilot |
| [SkypilotAzureOrchestrator](skypilot-vm.md) | `vm_azure` | `skypilot[azure]` | Runs your pipelines in Azure VMs using SkyPilot |
| [HyperAIOrchestrator](hyperai.md) | `hyperai` | `hyperai` | Runs your pipeline in HyperAI.ai instances.
| [Custom Implementation](custom.md) | _custom_ | | Extend the orchestrator abstraction and provide your own implementation |

If you would like to see the available flavors of orchestrators, you can use the command:
Expand Down
1 change: 1 addition & 0 deletions docs/mocked_libs.json
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@
"neptune",
"neuralprophet",
"openai",
"paramiko",
"polars",
"pyarrow",
"pyarrow.parquet",
Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@ types-certifi = { version = "^2021.10.8.0", optional = true }
types-croniter = { version = "^1.0.2", optional = true }
types-futures = { version = "^3.3.1", optional = true }
types-Markdown = { version = "^3.3.6", optional = true }
types-paramiko = { version = ">=3.4.0", optional = true }
types-Pillow = { version = "^9.2.1", optional = true }
types-protobuf = { version = "^3.18.0", optional = true }
types-PyMySQL = { version = "^1.0.4", optional = true }
Expand Down Expand Up @@ -232,6 +233,7 @@ dev = [
"types-croniter",
"types-futures",
"types-Markdown",
"types-paramiko",
"types-Pillow",
"types-protobuf",
"types-PyMySQL",
Expand Down
8 changes: 6 additions & 2 deletions src/zenml/cli/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -915,12 +915,14 @@ def prompt_configuration(
config_dict = {}
for attr_name, attr_schema in config_schema.get("properties", {}).items():
title = attr_schema.get("title", attr_name)
attr_type = attr_schema.get("type", "string")
attr_type_name = attr_type = attr_schema.get("type", "string")
if attr_type == "array":
attr_type_name = "list (CSV or JSON)"
title = f"[{attr_name}] {title}"
required = attr_name in config_schema.get("required", [])
hidden = attr_schema.get("format", "") == "password"
subtitles: List[str] = []
subtitles.append(attr_type)
subtitles.append(attr_type_name)
if hidden:
subtitles.append("secret")
if required:
Expand All @@ -938,6 +940,8 @@ def prompt_configuration(
if hidden and not show_secrets:
title += " is currently set to: [HIDDEN]"
else:
if attr_type == "array":
existing_value = json.dumps(existing_value)
title += f" is currently set to: '{existing_value}'"
else:
title += " is not currently set"
Expand Down
1 change: 1 addition & 0 deletions src/zenml/integrations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
GreatExpectationsIntegration,
)
from zenml.integrations.huggingface import HuggingfaceIntegration # noqa
from zenml.integrations.hyperai import HyperAIIntegration # noqa
from zenml.integrations.kaniko import KanikoIntegration # noqa
from zenml.integrations.kserve import KServeIntegration # noqa
from zenml.integrations.kubeflow import KubeflowIntegration # noqa
Expand Down
2 changes: 1 addition & 1 deletion src/zenml/integrations/azure/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ class AzureIntegration(Integration):
"azure-keyvault-keys",
"azure-keyvault-secrets",
"azure-identity==1.10.0",
"azureml-core==1.48.0",
"azureml-core==1.54.0.post1",
"azure-mgmt-containerservice>=20.0.0",
"azure-storage-blob==12.17.0", # temporary fix for https://github.com/Azure/azure-sdk-for-python/issues/32056
"kubernetes",
Expand Down
1 change: 1 addition & 0 deletions src/zenml/integrations/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
GRAPHVIZ = "graphviz"
KSERVE = "kserve"
HUGGINGFACE = "huggingface"
HYPERAI = "hyperai"
GREAT_EXPECTATIONS = "great_expectations"
KANIKO = "kaniko"
KUBEFLOW = "kubeflow"
Expand Down
53 changes: 53 additions & 0 deletions src/zenml/integrations/hyperai/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
"""Initialization of the HyperAI integration."""
from typing import List, Type

from zenml.integrations.constants import HYPERAI
from zenml.integrations.integration import Integration
from zenml.stack import Flavor

# Service connector constants
HYPERAI_CONNECTOR_TYPE = "hyperai"
HYPERAI_RESOURCE_TYPE = "hyperai-instance"


class HyperAIIntegration(Integration):
"""Definition of HyperAI integration for ZenML."""

NAME = HYPERAI
REQUIREMENTS = [
"paramiko>=3.4.0",
]

@classmethod
def activate(cls) -> None:
"""Activates the integration."""
from zenml.integrations.hyperai import service_connectors # noqa

@classmethod
def flavors(cls) -> List[Type[Flavor]]:
"""Declare the stack component flavors for the HyperAI integration.
Returns:
List of stack component flavors for this integration.
"""
from zenml.integrations.hyperai.flavors import (
HyperAIOrchestratorFlavor
)

return [HyperAIOrchestratorFlavor]


HyperAIIntegration.check_installation()
20 changes: 20 additions & 0 deletions src/zenml/integrations/hyperai/flavors/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
"""Airflow integration flavors."""

from zenml.integrations.hyperai.flavors.hyperai_orchestrator_flavor import (
HyperAIOrchestratorFlavor,
)

__all__ = ["HyperAIOrchestratorFlavor"]
Loading

0 comments on commit c1abd87

Please sign in to comment.