Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

add GCP Cloud Run worker guide #203

Merged
merged 54 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
86505a7
first commit for new worker and cloud run guide
biancaines Aug 2, 2023
cde4564
added section detailing how to create a google service account
biancaines Aug 3, 2023
9005348
Added additional sections to the guide
biancaines Aug 31, 2023
0bb2789
added link and images to the guide, and finished the full draft.
biancaines Sep 1, 2023
01331e9
Merge branch 'main' into bianca-worker-guide
biancaines Sep 1, 2023
0a5b6bd
added health-check back to worker start cmd
biancaines Sep 1, 2023
2624251
Merge branch 'bianca-worker-guide' of https://github.com/biancaines/p…
biancaines Sep 1, 2023
1249503
Update docs/gcp-worker-guide.md
biancaines Sep 1, 2023
39bd954
changed some wording and added links, ty Taylor
biancaines Sep 1, 2023
a52fca3
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
c1ed8e0
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
d9c59e5
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
72a6390
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
d0c0249
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
5547710
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
6931f8c
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
12614a7
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
34894b6
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
ba208d9
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
3215e6f
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
98b2255
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
7027814
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
f0d6f88
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
860f8f5
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
ce340d5
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
50fab01
Update docs/gcp-worker-guide.md
biancaines Sep 12, 2023
ef642f7
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
a00fc23
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
557cc9b
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
e1ea765
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
dd4909e
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
5aca3b9
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
38e0451
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
7abc11f
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
9f5b52b
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
7133d94
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
45f2f20
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
428eb2f
Update docs/gcp-worker-guide.md
biancaines Sep 25, 2023
1a84f8d
Pushing custom image to Google Artifact Registry instead of GCR.io. M…
biancaines Oct 3, 2023
23ab4ab
Update docs/gcp-worker-guide.md
biancaines Oct 3, 2023
7029e6c
Update docs/gcp-worker-guide.md
biancaines Oct 3, 2023
68e959b
Update docs/gcp-worker-guide.md
biancaines Oct 3, 2023
3526b74
Update docs/gcp-worker-guide.md
biancaines Oct 3, 2023
277fc95
Update docs/gcp-worker-guide.md
biancaines Oct 3, 2023
2e83589
Update docs/gcp-worker-guide.md
biancaines Oct 3, 2023
132c673
replaced references to us-east1 with multi-region
biancaines Oct 3, 2023
f3937e0
Update docs/gcp-worker-guide.md
biancaines Oct 3, 2023
becf203
replaced us-east1 with multi-region
biancaines Oct 3, 2023
fdb0509
Merge branch 'main' into bianca-worker-guide
biancaines Oct 3, 2023
0ac25f6
Merge branch 'bianca-worker-guide' of https://github.com/biancaines/p…
biancaines Oct 3, 2023
a3c5ea7
Merge branch 'bianca-worker-guide' of https://github.com/biancaines/p…
biancaines Oct 3, 2023
e333d02
Merge branch 'main' into bianca-worker-guide
discdiver Oct 4, 2023
19517eb
Merge branch 'main' into bianca-worker-guide
desertaxle Oct 13, 2023
d788f7e
Merge branch 'main' into bianca-worker-guide
discdiver Oct 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
337 changes: 337 additions & 0 deletions docs/gcp-worker-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,337 @@
# Google Cloud Run Worker Guide

## Why use Google Cloud Run for flow run execution?
Google Cloud Run is a fully managed compute platform that automatically scales your containerized applications.

1. Serverless architecture: Cloud Run follows a serverless architecture, which means you don't need to manage any underlying infrastructure. Google Cloud Run automatically handles the scaling and availability of your flow run infrastructure, allowing you to focus on developing and deploying your code.

2. Scalability: Cloud Run can automatically scale your pipeline to handle varying workloads and traffic. It can quickly respond to increased demand and scale back down during low activity periods, ensuring efficient resource utilization.

3. Integration with Google Cloud services: Google Cloud Run easily integrates with other Google Cloud services, such as Google Cloud Storage, Google Cloud Pub/Sub, and Google Cloud Build.
This interoperability enables you to build end-to-end data pipelines that use a variety of services.

4. Portability: Since Cloud Run uses container images, you can develop your pipelines locally using Docker and then deploy them on Google Cloud Run without significant modifications. This portability allows you to run the same pipeline in different environments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


## Google Cloud Run guide
After completing this guide, you will have:

1. Created a Google Cloud Service Account
2. Created a Prefect Work Pool
3. Deployed a Prefect Worker as a Cloud Run Service
4. Deployed a Flow
5. Executed the Flow as a Google Cloud Run Job

If you're looking for a general introduction to workers, work pools, and deployments, check out the [workers and work pools tutorial](https://docs.prefect.io/latest/tutorial/workers/).

### Prerequisites
Before starting this guide, make sure you have:

- A [Google Cloud Platform (GCP) account](https://cloud.google.com/gcp).
- A project on your GCP account where you have the necessary permissions to create Cloud Run Services and Service Accounts.
- The `gcloud` CLI installed on your local machine. You can follow Google Cloud's [installation guide](https://cloud.google.com/sdk/docs/install). If you're using Apple (or a Linux system) you can also use [Homebrew](https://formulae.brew.sh/cask/google-cloud-sdk) for installation.
- [Docker](https://www.docker.com/get-started/) installed on your local machine.
- A Prefect server instance. You can sign up for a forever free [Prefect Cloud Account](https://app.prefect.cloud/) or, alternatively, self-host a [Prefect server](https://docs.prefect.io/latest/guides/host/).

### Step 1. Create a Google Cloud service account
First, open a terminal or command prompt on your local machine where `gcloud` is installed. If you haven't already authenticated with `gcloud`, run the following command and follow the instructions to log in to your GCP account.

```bash
gcloud auth login
```

Next, you'll set your project where you'd like to create the service account. Use the following command and replace `<PROJECT_ID>` with your GCP project's ID.

```bash
gcloud config set project <PROJECT-ID>
```

For example, if your project's ID is `prefect-project` the command will look like this:

```bash
gcloud config set project prefect-project
```

Now you're ready to make the service account. To do so, you'll need to run this command:

```bash
gcloud iam service-accounts create <SERVICE-ACCOUNT-NAME> --display-name="<DISPLAY-NAME>"
```

Here's an example of the command above which you can use which already has the service account name and display name provided. An additional option to describe the service account has also been added:

```bash
gcloud iam service-accounts create prefect-service-account \
--description="service account to use for the prefect worker" \
--display-name="prefect-service-account"
```

The last step of this process is to make sure the service account has the proper permissions to execute flow runs as Cloud Run jobs.
Run the following commands to grant the necessary permissions:

```bash
gcloud projects add-iam-policy-binding <PROJECT-ID> \
--member="serviceAccount:<SERVICE-ACCOUNT-NAME>@<PROJECT-ID>.iam.gserviceaccount.com" \
--role="roles/iam.serviceAccountUser"
```
```bash
gcloud projects add-iam-policy-binding <PROJECT-ID> \
--member="serviceAccount:<SERVICE-ACCOUNT-NAME>@<PROJECT-ID>.iam.gserviceaccount.com" \
--role="roles/run.admin"
```

### Step 2. Create a Cloud Run work pool
Let's walk through the process of creating a Cloud Run work pool.

#### Create a GCP Credentials Block
You'll need to create a GCP Credenitals block to manage authentication wth GCP. This block will be referenced in the base job template of your work pool.

The block created in this guide will contain the JSON key for the service account created in the previous step.
To get the JSON key, paste the following command into your terminal.
```bash
gcloud iam service-accounts keys create my_key.json \
--serviceAccount:<SERVICE_ACCOUNT_NAME>@<PROJECT_ID>.iam.gserviceaccount.com
```
Comment on lines +90 to +93
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@biancaines Hi, I am trying out the guide at the moment to setup prefect with cloud run for my org. Thank you so much, it is such a valuable resource! 😊

This command however did not work for me, because --serviceAccount is not recognized as a command:

ERROR: (gcloud.iam.service-accounts.keys.create) unrecognized arguments: --serviceAccount:<my-account-name>@<my-project>.iam.gserviceaccount.com

With the help of the GCP Documentation here, the following command worked for me:

 gcloud iam service-accounts keys create my_key.json \
    --iam-account=<SERVICE_ACCOUNT_NAME>@<PROJECT_ID>.iam.gserviceaccount.com

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @bjarneschroeder ! Thank you for you kind words, and for providing the command you used! I'll give this a try as well, and either swap out the command or create a note that it can be used as an alternative.

Running this command will generate a JSON key file in your directory.

Now you're ready to create the GCP Credentials block. Navigate to the Blocks page in Prefect UI, and create a new GCP credentials block with a descriptive block name. Enter your GCP project ID into the `Project` field.
Copy the contents of the JSON key file in your directory and paste them into the `Service Account Info` field.
Last but not least, save the block.

#### Fill out the work pool base job template
You can create a new work pool using the Prefect UI or CLI. The following command creates a work pool of type `cloud-run` via the CLI (you'll want to replace the <WORK-POOL-NAME> with the name of your work pool, and remove the square brackets):
```bash
prefect work-pool create --type cloud-run <WORK-POOL-NAME>
```

Once the work pool is created, find the work pool in the UI and edit it.

There are many ways to customize the base job template for the work pool. Modifying the template influences the infrastructure configuration that the worker provisions for flow runs submitted to the work pool. For this guide we are going to modify just a few of the available fields.

Specify the region for the cloud run job.
![region](img/cloud-run-work-pool-region.png)

Select the GCP credentials block that has the JSON key file for the service account.
![creds](img/cloud-run-work-pool-gcp-creds.png)

Save the name of the service account created in first step of this guide.
![name](img/cloud-run-work-pool-service-account-name.png)

Your work pool is now ready to receive scheduled flow runs!

### Step 3. Deploy a Cloud Run worker
Now you can launch a Cloud Run service to host the Cloud Run worker. This worker will poll the work pool that you created in the previous step.

Navigate back to your terminal and run the following commands to set your Prefect API key and URL as environment variables.
Be sure to replace `<ACCOUNT-ID>` and `<WORKSPACE-ID>` with your Prefect account and workspace IDs (both will be available in the URL of the UI when previewing the workspace dashboard).
You'll want to replace `<YOUR-API-KEY>` with an active API key as well.

```bash
export PREFECT_API_URL='https://api.prefect.cloud/api/accounts/<ACCOUNT-ID>/workspaces/<WORKSPACE-ID>'
export PREFECT_API_KEY='<YOUR-API-KEY>'
```

Once those variables are set, run the following shell command to deploy your worker as a service.
Don't forget to replace `<YOUR-SERVICE-ACCOUNT-NAME>` with the name of the service account you created in the first step of this guide, and replace `<WORK-POOL-NAME>` with the name of the work pool you created in the second step.

```bash
gcloud run deploy prefect-worker --image=prefecthq/prefect:2-latest \
--set-env-vars PREFECT_API_URL=$PREFECT_API_URL,PREFECT_API_KEY=$PREFECT_API_KEY \
--service-account <YOUR-SERVICE-ACCOUNT-NAME> \
--no-cpu-throttling \
--min-instances 1 \
--args "prefect","worker","start","--install-policy","always","--with-healthcheck","-p","<WORK-POOL-NAME>","-t","cloud-run"
```

After running this command, you'll be prompted to specify a region. Choose the same region that you selected when creating the Cloud Run work pool in the second step of this guide.
The next prompt will ask if you'd like to allow unauthentiated invocations to your worker. For this guide, you can select "No".

After a few seconds, you'll be able to see your new `prefect-worker` service by navigating to the Cloud Run page of your Google Cloud console. Additionally, you should be able to see a record of this worker in the Prefect UI on the work pool's page by navigating to the `Worker` tab.
Let's not leave our worker hanging, it's time to give it a job.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol get a job worker


### Step 4. Deploy a flow
Let's prepare a flow to run as a Cloud Run job. In this section of the guide, we'll "bake" our code into a Docker image, and push that image to Google Artifact Registry.

### Create a registry
Let's create a docker repository in your Google Artifact Registry to host your custom image. If you already have a registry, and are authenticated to it, skip ahead to the *Write a flow* section.

The following command creates a repository using the gcloud CLI. You'll want to replace the `<REPOSITORY-NAME>` with your own value. :
```bash
gcloud artifacts repositories create <REPOSITORY-NAME> \
--repository-format=docker --location=us
```

Now you can authenticate to artifact registry:
```bash
gcloud auth configure-docker us-docker.pkg.dev
```

### Write a flow
First, create a new directory. This will serve as the root of your project's repository. Within the directory, create a sub-directory called `flows`.
Navigate to the `flows` subdirectory and create a new file for your flow. Feel free to write your own flow, but here's a ready-made one for your convenience:

```python
import httpx
from prefect import flow, task
from prefect.artifacts import create_markdown_artifact

@task
def mark_it_down(temp):
markdown_report = f"""# Weather Report
## Recent weather
| Time | Temperature |
|:--------------|-------:|
| Now | {temp} |
| In 1 hour | {temp + 2} |
"""
create_markdown_artifact(
key="weather-report",
markdown=markdown_report,
description="Very scientific weather report",
)


@flow
def fetch_weather(lat: float, lon: float):
base_url = "https://api.open-meteo.com/v1/forecast/"
weather = httpx.get(
base_url,
params=dict(latitude=lat, longitude=lon, hourly="temperature_2m"),
)
most_recent_temp = float(weather.json()["hourly"]["temperature_2m"][0])
mark_it_down(most_recent_temp)


if __name__ == "__main__":
fetch_weather(38.9, -77.0)
```

In the remainder of this guide, this script will be referred to as `weather_flow.py`, but you can name yours whatever you'd like.

#### Creating a `prefect.yaml` file
Now we're ready to make a `prefect.yaml` file, which will be responsible for managing the deployments of this repository.
**Navigate back to the root of your directory**, and run the following command to create a `prefect.yaml` file using Prefect's docker deployment recipe.

```bash
prefect init --recipe docker
```

You'll receive a prompt to put in values for the image name and tag. Since we will be pushing the image to Google Artifact Registry, the name of your image should be prefixed with the path to the docker repository you created within the registry. For example: `us-docker.pkg.dev/<PROJECT-ID>/<REPOSITORY-NAME>/`. You'll want to replace `<PROJECT-ID>` with the ID of your project in GCP. This should match the ID of the project you used in first step of this guide. Here is an example of what this could look like:

```bash
image_name: us-docker.pkg.dev/prefect-project/my-artifact-registry/gcp-weather-image
tag: latest
```

At this point, there will be a new `prefect.yaml` file available at the root of your project. The contents will look similar to the example below, however, I've added in a combination of [yaml templating options](https://docs.prefect.io/latest/concepts/deployments/#templating-options) and [prefect deployment actions](https://docs.prefect.io/latest/concepts/deployments/#deployment-actions) to build out a simple CI/CD process. Feel free to copy the contents and paste them in your prefect.yaml:

```yaml
# Welcome to your prefect.yaml file! You can you this file for storing and managing
# configuration for deploying your flows. We recommend committing this file to source
# control along with your flow code.

# Generic metadata about this project
name: <WORKING-DIRECTORY>
prefect-version: 2.13.4

# build section allows you to manage and build docker image
build:
- prefect_docker.deployments.steps.build_docker_image:
id: build_image
requires: prefect-docker>=0.3.1
image_name: <PATH-TO-ARTIFACT-REGISTRY>/gcp-weather-image
tag: latest
dockerfile: auto
platform: linux/amd64

# push section allows you to manage if and how this project is uploaded to remote locations
push:
- prefect_docker.deployments.steps.push_docker_image:
requires: prefect-docker>=0.3.1
image_name: '{{ build_image.image_name }}'
tag: '{{ build_image.tag }}'

# pull section allows you to provide instructions for cloning this project in remote locations
pull:
- prefect.deployments.steps.set_working_directory:
directory: /opt/prefect/<WORKING-DIRECTORY>

# the deployments section allows you to provide configuration for deploying flows
deployments:
- name: gcp-weather-deploy
version: null
tags: []
description: null
schedule: {}
flow_name: null
entrypoint: flows/weather_flow.py:fetch_weather
parameters:
lat: 14.5994
lon: 28.6731
work_pool:
name: my-cloud-run-pool
work_queue_name: default
job_variables:
image: '{{ build_image.image }}'
```
!!!Tip
After copying the example above, don't forget to replace `<WORKING-DIRECTORY>` with the name of the directory where your flow folder and `prefect.yaml` live. You'll also need to replace `<PATH-TO-ARTIFACT-REGISTRY>` with the path to the Docker repository in your Google Artifact Registry.

To get a better understanding of the different components of the `prefect.yaml` file above and what they do, feel free to read this next section. Otherwise, you can skip ahead to *Flow Deployment*.

In the `build` section of the `prefect.yaml` the following step is executed at deployment build time:

1. `prefect_docker.deployments.steps.build_docker_image` : builds a Docker image automatically which uses the name and tag chosen previously.

!!!Warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful! nice inclusion

If you are using an ARM-based chip (such as an M1 or M2 Mac), you'll want to ensure that you add `platform: linux/amd64` to your `build_docker_image` step to ensure that your docker image uses an AMD architecture. For example:

```yaml
- prefect_docker.deployments.steps.build_docker_image:
id: build_image
requires: prefect-docker>=0.3.1
image_name: us-docker.pkg.dev/prefect-project/my-docker-repository/gcp-weather-image
tag: latest
dockerfile: auto
platform: linux/amd64
```


The `push` section sends the Docker image to the Docker repository in your Google Artifact Registry, so that it can be easily accessed by the worker for flow run execution.

The `pull` section sets the working directory for the process prior to importing your flow.

In the `deployments` section of the `prefect.yaml` file above, you'll see that there is a deployment declaration named `gcp-weather-deploy`. Within the declaration, the entrypoint for the flow is specified along with some default parameters which will be passed to the flow at runtime. Last but not least, the name of the workpool that we created in step 2 of this guide is specified.

#### Flow deployment
Once you're happy with the specifications in the `prefect.yaml` file, run the following command in the terminal to deploy your flow:

```bash
prefect deploy --name gcp-weather-deploy
```

Once the flow is deployed to Prefect Cloud or your local Prefect Server, it's time to queue up a flow run!

### Step 5. Flow execution
Find your deployment in the UI, and hit the *Quick Run* button.
You have now successfully submitted a flow run to your Cloud Run worker!
If you used the flow script provided in this guide, check the *Artifacts* tab for the flow run once it completes.
You'll have a nice little weather report waiting for you there. Hope your day is a sunny one!

### Recap and next steps
Congratulations on completing this guide! Looking back on our journey, you have:

1. Created a Google Cloud service account
2. Created a Cloud Run work pool
3. Deployed a Cloud Run worker
4. Deployed a flow
5. Executed a flow

For next steps, you could:

- Take a look at some of the other [work pools](https://docs.prefect.io/latest/concepts/work-pools/) Prefect has to offer
- Do a deep drive on Prefect [concepts](https://docs.prefect.io/latest/concepts/)
- Try out [another guide](https://docs.prefect.io/latest/guides/) to explore new deployment patterns and recipes

The world is your oyster 🦪✨.
Binary file added docs/img/cloud-run-work-pool-gcp-creds.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/cloud-run-work-pool-region.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ watch:

nav:
- Home: index.md
- Google Cloud Run Execution Guide: gcp-worker-guide.md
- Blocks Catalog: blocks_catalog.md
- Examples Catalog: examples_catalog.md
- Contributing: contributing.md
Expand Down