Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Issue263 Installation doc updates #270

Merged
merged 4 commits into from
Jun 29, 2021
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 32 additions & 18 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,59 @@
# Data Validation Tool Installation Guide
The data validation tool can be installed on any machine that has Python 3.6+ installed.

The tool natively supports BigQuery connections. If you need to connect to other databases such as Teradata or Oracle, you will need to install the appropriate connection libraries. (See the [Connections](connections.md) page for details)

This tool can be natively installed on your machine or can be containerized and run with Docker.


## Prerequisites
The data validation tool can be configured to store the results of validation runs into BigQuery tables. To allow tool to
do that, we need to do following:

- Any machine with Python 3.6+ installed.

## Setup

To write results to BigQuery, you'll need to setup the required cloud
resources, local authentication, and configure the tool.
By default, the data validation tool writes the results of data validation to `stdout`. However, we recommend storing the results of validations to a BigQuery table in order to standardize the process and share results across a team. In order to allow the data validation tool to write to a BigQuery table, users need to have a BigQuery table created with a specific schema. If you choose to write results to a BigQuery table, there are a couple of requirements:

A Google Cloud Platform project with the BigQuery API enabled is required.
- A Google Cloud Platform project with the BigQuery API enabled.

Confirm which Google user account will be used to execute the tool. If you plan to run this tool in
production, it's recommended that you create a service account specifically
for running the tool.
There are two methods of creating the Cloud resources necessary for the tool: via Terraform or the Cloud SDK.
### Create cloud resources - Terraform
- A Google user account with appropriate permissions. If you plan to run this tool in production, it's recommended that you create a service account specifically for running the tool. See our [guide](https://cloud.google.com/docs/authentication/production) on how to authenticate with your service account. If you are using a service account, you need to grant your service account appropriate roles on your project so that it has permissions to create and read resources.

You can use Terraform to create the necessary BigQuery resources. (See next
section for manually creating resources with `gcloud`.)
Clone the repository onto your machine and navigate inside the directory:

```
git clone https://github.com/GoogleCloudPlatform/professional-services-data-validator.git
cd professional-services-data-validator
```

There are two methods of creating the BigQuery output table for the tool: via *Terraform* or the *Cloud SDK*.


### Cloud Resource Creation - Terraform

By default, Terraform is run inside a test environment and needs to be directed to your project. Perform the following steps to direct the creation of the BigQuery table to your project:

1. Delete the `testenv.tf` file inside the `terraform` folder
2. View `variables.tf` inside the `terraform` folder and replace `default = "pso-kokoro-resources"` with `default = "YOUR_PROJECT_ID"`


After installing the [terraform CLI tool](https://learn.hashicorp.com/tutorials/terraform/install-cli) and completing the steps above, run the following commands from inside the root of the repo:

```
cd terraform
terraform init
terraform apply
```

You should see a dataset named `pso_data_validator` and a table named
`results`.
### Cloud Resource Creation - Cloud SDK (gcloud)

### Create cloud resources - Cloud SDK (gcloud)
Install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) if necessary.

Create a dataset for validation results.
Create a dataset for validation results:

```
bq mk pso_data_validator
```

Create a table.
Create a table:

```
bq mk --table \
Expand All @@ -53,8 +63,12 @@ bq mk --table \
terraform/results_schema.json
```

### Cloud Resource Creation - After success

You should see a dataset named `pso_data_validator` and a table named
`results` created inside of your project.

After installing the CLI tool using the instructions below, you will be ready to run data validation commands and output the results to BigQuery. See an example [here](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#store-results-in-a-bigquery-table).

## Deploy Data Validation CLI on your machine

Expand Down