Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support for Kubernetes #1058

Merged
merged 22 commits into from
Jan 8, 2024
Merged

feat: Support for Kubernetes #1058

merged 22 commits into from
Jan 8, 2024

Conversation

sundar-mudupalli-work
Copy link
Contributor

@sundar-mudupalli-work sundar-mudupalli-work commented Nov 23, 2023

Hi,

This merge includes an upgrade to enable DVT to be orchestrated using Kubernetes jobs. See below
feat: Horizontal scaling row validations via Kubernetes - For how this can be used see here. For design details, please see here

Please review and set up time with me if you have any questions.

Thank you.

Sundar Mudupalli

…rlier

Added functionality to support Kubernetes Indexed jobs - which when provided with a directory will only run the job corresponding to the index.
Tested in a non Kubernetes setup
Shortened the option to 2 character code -kc
@sundar-mudupalli-work
Copy link
Contributor Author

/gcbrun

4 similar comments
@sundar-mudupalli-work
Copy link
Contributor Author

/gcbrun

@sundar-mudupalli-work
Copy link
Contributor Author

/gcbrun

@sundar-mudupalli-work
Copy link
Contributor Author

/gcbrun

@sundar-mudupalli-work
Copy link
Contributor Author

/gcbrun

@sundar-mudupalli-work sundar-mudupalli-work changed the title feat: Support for Kubernetes, updated secrets usage, documentation updates and removing pyArrow dependency feat: Support for Kubernetes, retrieving database connections from secret manager and documentation updates Nov 25, 2023
@sundar-mudupalli-work
Copy link
Contributor Author

/gcbrun

### Passing database connection parameters
DVT database connection parameters are saved in `$HOME/.config/google-pso-data-validator` directory with passwords in raw text. With Kubernetes, DVT cannot depend on `.config` directory holding the connections unless they are baked into the image (or mounted as a volume - see Cloud Run limitation below) - which will require each customer to modify the container image we provide. A better approach (for regular and containerized DVT) would be use the (GCP) Secret Manager and retrieve connection credentials as a JSON object when we connect to the database. DVT currently uses the Secret Manager for retrieving secrets and stores them in the `.config` directory when the connections are added. This seems rather odd as it defeats the main purposes of using the secret manager - security and password rotation.

I am proposing a simple command line change to resolve this issue. Whenever a connection parameter is specified, allow the user to optionally specify a secret manager (provider, project-id). If a secret manager is specified, then DVT retrieves the connection information directly from the secret manager at the time of creating the connection. This is the recommended approach to handle secrets as opposed to mounting secrets as volumes. Cloud Run also has a limitation that multiple secrets [cannot be mounted with the same path](https://cloud.google.com/run/docs/configuring/services/secrets#disallowed_paths_and_limitations). Since DVT requires connections to two different databases with the connection info being mounted in the same directory, i.e. `$HOME/.config/google-pso-data-validator`, DVT cannot run within Cloud Run without this change. With this change, DVT can be run in a container in Cloud Run or Kubernetes fetching the connection information from the GCP Secret Manager.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing the secret-manager-project/type for every validation command removes the de-coupling of connection configuration and validation configuration. I understand the issue here is storing connection configs locally, which would be inconvenient in Kubernetes.
But why can't we use GCS connections + Secret Manager to get the correct config at runtime? We would only need to set the PSO_CONFIG_DIR env variable to enable GCS connections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neha,

I had not thought of that as an option initially. There are some challenges:

  1. The directory structure under PSO_DV_CONFIG is two directories, one connections with connection information and another directory validations with yaml files. When we look for yaml files we only look for yaml files in the current directory - so in this case under validations. The way generate-table-partitions works, it creates yaml files under the config-dir/<schema_name>.<table_name>. Therefore PSO_DV_CONFIG cannot be directly used - we would need to copy files after running generate-table-partitions.
  2. We have an open security issue with using GCS (or file system) for connection information.
  3. Database passwords and connection information are secrets and the proposed implementation is simpler than the current implementation which requires multiple secrets per connection.

Hope that helps.

Sundar Mudupalli

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - Issue #756 which you mentioned involves de-coupling connections/ and validations/. I think we need to address this issue before we move forward with Kubernetes or other distributed validations. This addresses your first point as well - theoretically generate-table-partitions -cdir my-table/validations/ should drop the YAML in the right directory.

I agree with your proposed change to make the entire connection JSON a secret rather than each field in the JSON - we should move forward with that. I just think we need to reconsider adding secret manager flags to each command and instead use GCS based connections, even if we need to address Issue 756 first.

@sundar-mudupalli-work sundar-mudupalli-work changed the title feat: Support for Kubernetes, retrieving database connections from secret manager and documentation updates feat: Support for Kubernetes Dec 19, 2023
@pull-request-size pull-request-size bot added size/L and removed size/M labels Dec 27, 2023
@sundar-mudupalli-work
Copy link
Contributor Author

/gcbrun

@sundar-mudupalli-work
Copy link
Contributor Author

/gcbrun

@sundar-mudupalli-work
Copy link
Contributor Author

/gcbrun

Copy link
Collaborator

@nehanene15 nehanene15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall, just left comments on documentation!

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
docs/internal/kubernetes_jobs.md Outdated Show resolved Hide resolved
docs/internal/kubernetes_jobs.md Outdated Show resolved Hide resolved
docs/internal/kubernetes_jobs.md Outdated Show resolved Hide resolved
docs/internal/kubernetes_jobs.md Outdated Show resolved Hide resolved

Indexed completion mode supports partitioned yaml files generated by `generate-table-partitions` in Data Validation, if each worker process ran only the yaml file corresponding to its index. I have an introduced an optional parameter `--kube-completions` or `-kc`. When this flag is used with `data-validation configs run` with a config directory and the container runs in indexed jobs mode, each container only processes the specific validation yaml file corresponding to its index. If the flag is used `data-validation configs run` with a config directory and DVT is not running in indexed jobs mode, a warning is issued. In all other instances, this flag is ignored.
### IAM Permissions
### Passing database connection parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this section accurate? I know we plan on creating a separate PR that tackles Secret Manager support. Until then, users should be able to use GCS connections for Kubernetes/Cloud Run orchestration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated document and addded things to Future Work regarding using secret manage and other clieanup

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get rid of the 'Future Work' section personally since it doesn't belong in the public documentation.. these are internal roadmap items that we can track with issues.

If you want to keep it in, we should at least delete the line "This inconsistent behavior is challenging and should be fixed." since it doesn't belong in a product's public docs

@sundar-mudupalli-work
Copy link
Contributor Author

@nehanene15 - pleaes take a look. I have updated the document(s).

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved

Indexed completion mode supports partitioned yaml files generated by `generate-table-partitions` in Data Validation, if each worker process ran only the yaml file corresponding to its index. I have an introduced an optional parameter `--kube-completions` or `-kc`. When this flag is used with `data-validation configs run` with a config directory and the container runs in indexed jobs mode, each container only processes the specific validation yaml file corresponding to its index. If the flag is used `data-validation configs run` with a config directory and DVT is not running in indexed jobs mode, a warning is issued. In all other instances, this flag is ignored.
### IAM Permissions
### Passing database connection parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get rid of the 'Future Work' section personally since it doesn't belong in the public documentation.. these are internal roadmap items that we can track with issues.

If you want to keep it in, we should at least delete the line "This inconsistent behavior is challenging and should be fixed." since it doesn't belong in a product's public docs

@sundar-mudupalli-work
Copy link
Contributor Author

Done. PTAL

@sundar-mudupalli-work
Copy link
Contributor Author

/gcbrun

Copy link
Collaborator

@nehanene15 nehanene15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still an unresolved comment - I would remove the 'Future Work' section or delete the line "This inconsistent behavior is challenging and should be fixed" since we shouldn't point out bugs in our public docs.

I'll approve it since the next PR to fix the GCS configs will delete this section in the docs anyways

@sundar-mudupalli-work sundar-mudupalli-work merged commit fdbdbe0 into develop Jan 8, 2024
5 checks passed
@sundar-mudupalli-work sundar-mudupalli-work deleted the K8-indexed branch January 8, 2024 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants