Skip to content

Commit

Permalink
docs: Issue263 Installation doc updates (#270)
Browse files Browse the repository at this point in the history
* docs: Updated the installation doc to include a section on cloning the repo, docs on authentication, and updating Terraform instructions so that it can function

* docs: Reformat sections to be more context friendly. Add more clear language and link to Google Cloud SDK installation guide

* docs: specified terraform folder
  • Loading branch information
afleisc committed Jun 29, 2021
1 parent 3c21ee5 commit 0328c0e
Showing 1 changed file with 35 additions and 17 deletions.
52 changes: 35 additions & 17 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,59 @@
# Data Validation Tool Installation Guide
The data validation tool can be installed on any machine that has Python 3.6+ installed.

The tool natively supports BigQuery connections. If you need to connect to other databases such as Teradata or Oracle, you will need to install the appropriate connection libraries. (See the [Connections](connections.md) page for details)

This tool can be natively installed on your machine or can be containerized and run with Docker.


## Prerequisites
The Data Validation Tool can be configured to store the results of validation runs into BigQuery tables. To allow tool to do that, we need to do following:

- Any machine with Python 3.6+ installed.

## Setup

To write results to BigQuery, you'll need to setup the required cloud
resources, local authentication, and configure the tool.
By default, the data validation tool writes the results of data validation to `stdout`. However, we recommend storing the results of validations to a BigQuery table in order to standardize the process and share results across a team. In order to allow the data validation tool to write to a BigQuery table, users need to have a BigQuery table created with a specific schema. If you choose to write results to a BigQuery table, there are a couple of requirements:

- A Google Cloud Platform project with the BigQuery API enabled.

- A Google user account with appropriate permissions. If you plan to run this tool in production, it's recommended that you create a service account specifically for running the tool. See our [guide](https://cloud.google.com/docs/authentication/production) on how to authenticate with your service account. If you are using a service account, you need to grant your service account appropriate roles on your project so that it has permissions to create and read resources.

Clone the repository onto your machine and navigate inside the directory:

```
git clone https://github.com/GoogleCloudPlatform/professional-services-data-validator.git
cd professional-services-data-validator
```

There are two methods of creating the BigQuery output table for the tool: via *Terraform* or the *Cloud SDK*.

A Google Cloud Platform project with the BigQuery API enabled is required.

Confirm which Google user account will be used to execute the tool. If you plan to run this tool in
production, it's recommended that you create a service account specifically
for running the tool.
There are two methods of creating the Cloud resources necessary for the tool: via Terraform or the Cloud SDK.
### Create cloud resources - Terraform
### Cloud Resource Creation - Terraform

You can use Terraform to create the necessary BigQuery resources. (See next
section for manually creating resources with `gcloud`.)
By default, Terraform is run inside a test environment and needs to be directed to your project. Perform the following steps to direct the creation of the BigQuery table to your project:

1. Delete the `testenv.tf` file inside the `terraform` folder
2. View `variables.tf` inside the `terraform` folder and replace `default = "pso-kokoro-resources"` with `default = "YOUR_PROJECT_ID"`


After installing the [terraform CLI tool](https://learn.hashicorp.com/tutorials/terraform/install-cli) and completing the steps above, run the following commands from inside the root of the repo:

```
cd terraform
terraform init
terraform apply
```

You should see a dataset named `pso_data_validator` and a table named
`results`.
### Cloud Resource Creation - Cloud SDK (gcloud)

### Create cloud resources - Cloud SDK (gcloud)
Install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) if necessary.

Create a dataset for validation results.
Create a dataset for validation results:

```
bq mk pso_data_validator
```

Create a table.
Create a table:

```
bq mk --table \
Expand All @@ -52,6 +63,13 @@ bq mk --table \
terraform/results_schema.json
```

### Cloud Resource Creation - After success

You should see a dataset named `pso_data_validator` and a table named
`results` created inside of your project.

After installing the CLI tool using the instructions below, you will be ready to run data validation commands and output the results to BigQuery. See an example [here](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#store-results-in-a-bigquery-table).


## Deploy Data Validation CLI on your machine

Expand Down

0 comments on commit 0328c0e

Please sign in to comment.