feat: Support for Kubernetes #1058

sundar-mudupalli-work · 2023-11-23T05:20:49Z

Hi,

This merge includes an upgrade to enable DVT to be orchestrated using Kubernetes jobs. See below
feat: Horizontal scaling row validations via Kubernetes - For how this can be used see here. For design details, please see here

Please review and set up time with me if you have any questions.

Thank you.

Sundar Mudupalli

…rlier Added functionality to support Kubernetes Indexed jobs - which when provided with a directory will only run the job corresponding to the index. Tested in a non Kubernetes setup

…rofessional-services-data-validator into K8-indexed

… develop

Shortened the option to 2 character code -kc

sundar-mudupalli-work · 2023-11-23T05:33:28Z

/gcbrun

sundar-mudupalli-work · 2023-11-23T14:27:00Z

/gcbrun

sundar-mudupalli-work · 2023-11-23T14:46:46Z

/gcbrun

sundar-mudupalli-work · 2023-11-24T02:11:57Z

/gcbrun

sundar-mudupalli-work · 2023-11-25T04:27:12Z

/gcbrun

sundar-mudupalli-work · 2023-12-04T06:28:51Z

/gcbrun

nehanene15 · 2023-12-05T17:25:38Z

docs/internal/kubernetes_jobs.md

+### Passing database connection parameters
+DVT database connection parameters are saved in `$HOME/.config/google-pso-data-validator` directory with passwords in raw text. With Kubernetes, DVT cannot depend on `.config` directory holding the connections unless they are baked into the image (or mounted as a volume - see Cloud Run limitation below) - which will require each customer to modify the container image we provide. A better approach (for regular and containerized DVT) would be use the (GCP) Secret Manager and retrieve connection credentials as a JSON object when we connect to the database. DVT currently uses the Secret Manager for retrieving secrets and stores them in the `.config` directory when the connections are added. This seems rather odd as it defeats the main purposes of using the secret manager - security and password rotation. 
+
+I am proposing a simple command line change to resolve this issue. Whenever a connection parameter is specified, allow the user to optionally specify a secret manager (provider, project-id). If a secret manager is specified, then DVT retrieves the connection information directly from the secret manager at the time of creating the connection. This is the recommended approach to handle secrets as opposed to mounting secrets as volumes. Cloud Run also has a limitation that multiple secrets [cannot be mounted with the same path](https://cloud.google.com/run/docs/configuring/services/secrets#disallowed_paths_and_limitations). Since DVT requires connections to two different databases with the connection info being mounted in the same directory, i.e. `$HOME/.config/google-pso-data-validator`, DVT cannot run within Cloud Run without this change. With this change, DVT can be run in a container in Cloud Run or Kubernetes fetching the connection information from the GCP Secret Manager. 


Introducing the secret-manager-project/type for every validation command removes the de-coupling of connection configuration and validation configuration. I understand the issue here is storing connection configs locally, which would be inconvenient in Kubernetes.
But why can't we use GCS connections + Secret Manager to get the correct config at runtime? We would only need to set the PSO_CONFIG_DIR env variable to enable GCS connections.

Neha,

I had not thought of that as an option initially. There are some challenges:

The directory structure under PSO_DV_CONFIG is two directories, one connections with connection information and another directory validations with yaml files. When we look for yaml files we only look for yaml files in the current directory - so in this case under validations. The way generate-table-partitions works, it creates yaml files under the config-dir/<schema_name>.<table_name>. Therefore PSO_DV_CONFIG cannot be directly used - we would need to copy files after running generate-table-partitions.

We have an open security issue with using GCS (or file system) for connection information.

Database passwords and connection information are secrets and the proposed implementation is simpler than the current implementation which requires multiple secrets per connection.

Hope that helps.

Sundar Mudupalli

Yeah - Issue #756 which you mentioned involves de-coupling connections/ and validations/. I think we need to address this issue before we move forward with Kubernetes or other distributed validations. This addresses your first point as well - theoretically generate-table-partitions -cdir my-table/validations/ should drop the YAML in the right directory.

I agree with your proposed change to make the entire connection JSON a secret rather than each field in the JSON - we should move forward with that. I just think we need to reconsider adding secret manager flags to each command and instead use GCS based connections, even if we need to address Issue 756 first.

sundar-mudupalli-work · 2023-12-27T16:56:15Z

/gcbrun

sundar-mudupalli-work · 2023-12-27T17:19:05Z

/gcbrun

sundar-mudupalli-work · 2023-12-30T16:41:15Z

/gcbrun

nehanene15

Looks great overall, just left comments on documentation!

README.md

docs/internal/kubernetes_jobs.md

nehanene15 · 2024-01-02T18:07:40Z

docs/internal/kubernetes_jobs.md

+
+Indexed completion mode supports partitioned yaml files generated by `generate-table-partitions` in Data Validation, if each worker process ran only the yaml file corresponding to its index. I have an introduced an optional parameter `--kube-completions` or `-kc`. When this flag is used with `data-validation configs run` with a config directory and the container runs in indexed jobs mode, each container only processes the specific validation yaml file corresponding to its index. If the flag is used `data-validation configs run` with a config directory and DVT is not running in indexed jobs mode, a warning is issued. In all other instances, this flag is ignored.
+### IAM Permissions
+### Passing database connection parameters


Is this section accurate? I know we plan on creating a separate PR that tackles Secret Manager support. Until then, users should be able to use GCS connections for Kubernetes/Cloud Run orchestration.

Updated document and addded things to Future Work regarding using secret manage and other clieanup

I think we should get rid of the 'Future Work' section personally since it doesn't belong in the public documentation.. these are internal roadmap items that we can track with issues.

If you want to keep it in, we should at least delete the line "This inconsistent behavior is challenging and should be fixed." since it doesn't belong in a product's public docs

Co-authored-by: Neha Nene <[email protected]>

sundar-mudupalli-work · 2024-01-05T17:42:13Z

@nehanene15 - pleaes take a look. I have updated the document(s).

README.md

nehanene15 · 2024-01-05T19:13:32Z

docs/internal/kubernetes_jobs.md

+
+Indexed completion mode supports partitioned yaml files generated by `generate-table-partitions` in Data Validation, if each worker process ran only the yaml file corresponding to its index. I have an introduced an optional parameter `--kube-completions` or `-kc`. When this flag is used with `data-validation configs run` with a config directory and the container runs in indexed jobs mode, each container only processes the specific validation yaml file corresponding to its index. If the flag is used `data-validation configs run` with a config directory and DVT is not running in indexed jobs mode, a warning is issued. In all other instances, this flag is ignored.
+### IAM Permissions
+### Passing database connection parameters


I think we should get rid of the 'Future Work' section personally since it doesn't belong in the public documentation.. these are internal roadmap items that we can track with issues.

If you want to keep it in, we should at least delete the line "This inconsistent behavior is challenging and should be fixed." since it doesn't belong in a product's public docs

Co-authored-by: Neha Nene <[email protected]>

sundar-mudupalli-work · 2024-01-08T07:27:29Z

Done. PTAL

sundar-mudupalli-work · 2024-01-08T16:34:24Z

/gcbrun

nehanene15

There is still an unresolved comment - I would remove the 'Future Work' section or delete the line "This inconsistent behavior is challenging and should be fixed" since we shouldn't point out bugs in our public docs.

I'll approve it since the next PR to fix the GCS configs will delete this section in the docs anyways

sundar-mudupalli-work and others added 5 commits August 24, 2023 16:22

Update partition_table_prd.md potential way to address #923 and #950

7ab2be7

Fixed bug with specifying a config file in GCS - this did not work ea…

a39b9d8

…rlier Added functionality to support Kubernetes Indexed jobs - which when provided with a directory will only run the job corresponding to the index. Tested in a non Kubernetes setup

Merge branch 'K8-indexed' of https://github.com/GoogleCloudPlatform/p…

6bb41e2

…rofessional-services-data-validator into K8-indexed

Merge branch 'develop' into K8-indexed; Got all the latest fixes from…

0f22e62

… develop

Added a Implementation Note on how to scale DVT with Kubernetes

7d0ab39

Shortened the option to 2 character code -kc

sundar-mudupalli-work requested review from nj1973 and nehanene15 November 23, 2023 05:20

sundar-mudupalli-work requested a review from a team as a code owner November 23, 2023 05:20

pull-request-size bot added the size/XL label Nov 23, 2023

sundar-mudupalli-work changed the title ~~feat: Support for Kubernetes, updated secrets usage, documentation updates and removing pyArrow dependency~~ feat: Support for Kubernetes, retrieving database connections from secret manager and documentation updates Nov 25, 2023

nehanene15 reviewed Dec 5, 2023

View reviewed changes

sundar-mudupalli-work force-pushed the K8-indexed branch from fc46d6b to 7d0ab39 Compare December 19, 2023 04:45

pull-request-size bot added size/M and removed size/XL labels Dec 19, 2023

sundar-mudupalli-work changed the title ~~feat: Support for Kubernetes, retrieving database connections from secret manager and documentation updates~~ feat: Support for Kubernetes Dec 19, 2023

Linted files and added unit tests for config_runner changes.

4f1ebdb

pull-request-size bot added size/L and removed size/M labels Dec 27, 2023

sundar-mudupalli-work added 4 commits December 27, 2023 16:16

Updated README on how to run multiple instances concurrently.

0216c7f

Updated README.md

d1275ec

Merge branch 'develop' into K8-indexed

4cca42a

Merge branch 'develop' into K8-indexed

e17f661

Some Lint fixes

4fa260c

sundar-mudupalli-work added 2 commits December 30, 2023 16:05

Updated tests to mock build_config_managers_from_yaml.

f21cc91

Fixed reference to directory.

007e738

nehanene15 reviewed Jan 2, 2024

View reviewed changes

sundar-mudupalli-work and others added 5 commits January 5, 2024 10:24

Update README.md

82cc140

Co-authored-by: Neha Nene <[email protected]>

Update README.md

c972507

Co-authored-by: Neha Nene <[email protected]>

Update README.md

10fa94a

Co-authored-by: Neha Nene <[email protected]>

Update docs/internal/kubernetes_jobs.md

f557df2

Co-authored-by: Neha Nene <[email protected]>

Updated docs

e595bac

nehanene15 reviewed Jan 5, 2024

View reviewed changes

sundar-mudupalli-work and others added 4 commits January 6, 2024 07:24

Update README.md

992d0f1

Co-authored-by: Neha Nene <[email protected]>

Some more doc changes.

5c2d7ac

Final changes ?

492a2e7

Final typos

3105d50

nehanene15 approved these changes Jan 8, 2024

View reviewed changes

sundar-mudupalli-work merged commit fdbdbe0 into develop Jan 8, 2024
5 checks passed

sundar-mudupalli-work deleted the K8-indexed branch January 8, 2024 17:13

release-please bot mentioned this pull request Jan 8, 2024

chore(develop): release 4.4.0 #1063

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support for Kubernetes #1058

feat: Support for Kubernetes #1058

sundar-mudupalli-work commented Nov 23, 2023 •

edited

Loading

sundar-mudupalli-work commented Nov 23, 2023

sundar-mudupalli-work commented Nov 23, 2023

sundar-mudupalli-work commented Nov 23, 2023

sundar-mudupalli-work commented Nov 24, 2023

sundar-mudupalli-work commented Nov 25, 2023

sundar-mudupalli-work commented Dec 4, 2023

nehanene15 Dec 5, 2023

sundar-mudupalli-work Dec 5, 2023

nehanene15 Dec 5, 2023

sundar-mudupalli-work commented Dec 27, 2023

sundar-mudupalli-work commented Dec 27, 2023

sundar-mudupalli-work commented Dec 30, 2023

nehanene15 left a comment

nehanene15 Jan 2, 2024

sundar-mudupalli-work Jan 5, 2024

nehanene15 Jan 5, 2024

sundar-mudupalli-work commented Jan 5, 2024

nehanene15 Jan 5, 2024

sundar-mudupalli-work commented Jan 8, 2024

sundar-mudupalli-work commented Jan 8, 2024

nehanene15 left a comment

feat: Support for Kubernetes #1058

feat: Support for Kubernetes #1058

Conversation

sundar-mudupalli-work commented Nov 23, 2023 • edited Loading

sundar-mudupalli-work commented Nov 23, 2023

sundar-mudupalli-work commented Nov 23, 2023

sundar-mudupalli-work commented Nov 23, 2023

sundar-mudupalli-work commented Nov 24, 2023

sundar-mudupalli-work commented Nov 25, 2023

sundar-mudupalli-work commented Dec 4, 2023

nehanene15 Dec 5, 2023

Choose a reason for hiding this comment

sundar-mudupalli-work Dec 5, 2023

Choose a reason for hiding this comment

nehanene15 Dec 5, 2023

Choose a reason for hiding this comment

sundar-mudupalli-work commented Dec 27, 2023

sundar-mudupalli-work commented Dec 27, 2023

sundar-mudupalli-work commented Dec 30, 2023

nehanene15 left a comment

Choose a reason for hiding this comment

nehanene15 Jan 2, 2024

Choose a reason for hiding this comment

sundar-mudupalli-work Jan 5, 2024

Choose a reason for hiding this comment

nehanene15 Jan 5, 2024

Choose a reason for hiding this comment

sundar-mudupalli-work commented Jan 5, 2024

nehanene15 Jan 5, 2024

Choose a reason for hiding this comment

sundar-mudupalli-work commented Jan 8, 2024

sundar-mudupalli-work commented Jan 8, 2024

nehanene15 left a comment

Choose a reason for hiding this comment

sundar-mudupalli-work commented Nov 23, 2023 •

edited

Loading