From 0328c0e4099963982a6b7ddb44b64d913379f2c1 Mon Sep 17 00:00:00 2001 From: afleisc <86309184+afleisc@users.noreply.github.com> Date: Tue, 29 Jun 2021 11:06:31 -0400 Subject: [PATCH] docs: Issue263 Installation doc updates (#270) * docs: Updated the installation doc to include a section on cloning the repo, docs on authentication, and updating Terraform instructions so that it can function * docs: Reformat sections to be more context friendly. Add more clear language and link to Google Cloud SDK installation guide * docs: specified terraform folder --- docs/installation.md | 52 +++++++++++++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 17 deletions(-) diff --git a/docs/installation.md b/docs/installation.md index 2b7a322a7..0ae340beb 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -1,5 +1,4 @@ # Data Validation Tool Installation Guide -The data validation tool can be installed on any machine that has Python 3.6+ installed. The tool natively supports BigQuery connections. If you need to connect to other databases such as Teradata or Oracle, you will need to install the appropriate connection libraries. (See the [Connections](connections.md) page for details) @@ -7,23 +6,36 @@ This tool can be natively installed on your machine or can be containerized and ## Prerequisites -The Data Validation Tool can be configured to store the results of validation runs into BigQuery tables. To allow tool to do that, we need to do following: + +- Any machine with Python 3.6+ installed. ## Setup -To write results to BigQuery, you'll need to setup the required cloud -resources, local authentication, and configure the tool. +By default, the data validation tool writes the results of data validation to `stdout`. However, we recommend storing the results of validations to a BigQuery table in order to standardize the process and share results across a team. In order to allow the data validation tool to write to a BigQuery table, users need to have a BigQuery table created with a specific schema. If you choose to write results to a BigQuery table, there are a couple of requirements: + +- A Google Cloud Platform project with the BigQuery API enabled. + +- A Google user account with appropriate permissions. If you plan to run this tool in production, it's recommended that you create a service account specifically for running the tool. See our [guide](https://cloud.google.com/docs/authentication/production) on how to authenticate with your service account. If you are using a service account, you need to grant your service account appropriate roles on your project so that it has permissions to create and read resources. + +Clone the repository onto your machine and navigate inside the directory: + +``` +git clone https://github.com/GoogleCloudPlatform/professional-services-data-validator.git +cd professional-services-data-validator +``` + +There are two methods of creating the BigQuery output table for the tool: via *Terraform* or the *Cloud SDK*. -A Google Cloud Platform project with the BigQuery API enabled is required. -Confirm which Google user account will be used to execute the tool. If you plan to run this tool in -production, it's recommended that you create a service account specifically -for running the tool. -There are two methods of creating the Cloud resources necessary for the tool: via Terraform or the Cloud SDK. -### Create cloud resources - Terraform +### Cloud Resource Creation - Terraform -You can use Terraform to create the necessary BigQuery resources. (See next -section for manually creating resources with `gcloud`.) +By default, Terraform is run inside a test environment and needs to be directed to your project. Perform the following steps to direct the creation of the BigQuery table to your project: + +1. Delete the `testenv.tf` file inside the `terraform` folder +2. View `variables.tf` inside the `terraform` folder and replace `default = "pso-kokoro-resources"` with `default = "YOUR_PROJECT_ID"` + + +After installing the [terraform CLI tool](https://learn.hashicorp.com/tutorials/terraform/install-cli) and completing the steps above, run the following commands from inside the root of the repo: ``` cd terraform @@ -31,18 +43,17 @@ terraform init terraform apply ``` -You should see a dataset named `pso_data_validator` and a table named -`results`. +### Cloud Resource Creation - Cloud SDK (gcloud) -### Create cloud resources - Cloud SDK (gcloud) +Install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) if necessary. -Create a dataset for validation results. +Create a dataset for validation results: ``` bq mk pso_data_validator ``` -Create a table. +Create a table: ``` bq mk --table \ @@ -52,6 +63,13 @@ bq mk --table \ terraform/results_schema.json ``` +### Cloud Resource Creation - After success + +You should see a dataset named `pso_data_validator` and a table named +`results` created inside of your project. + +After installing the CLI tool using the instructions below, you will be ready to run data validation commands and output the results to BigQuery. See an example [here](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#store-results-in-a-bigquery-table). + ## Deploy Data Validation CLI on your machine