diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2608f7acb..d488df6c7 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -34,6 +34,8 @@ Clone repo locally and install via pip: ``` git clone git@github.com:GoogleCloudPlatform/professional-services-data-validator.git cd professional-services-data-validator/ +python -m venv env +source env/bin/activate python -m pip install --upgrade pip python -m pip install . ``` diff --git a/README.md b/README.md index 59dfc33ce..f718e16e5 100644 --- a/README.md +++ b/README.md @@ -1,98 +1,81 @@ -# Data Validation Tool (Beta) +# Data Validation Tool -The Data Validation Tool (Beta) is an open sourced Python CLI tool based on the -[Ibis framework](https://ibis-project.org/docs/tutorial/01-Introduction-to-Ibis.html) +The Data Validation Tool is an open sourced Python CLI tool based on the +[Ibis framework](https://ibis-project.org/docs/3.0.2/) that compares heterogeneous data source tables with multi-leveled validation functions. -Data validation is a critical step in a Data Warehouse, Database or Data Lake -migration project, where structured or semi-structured data from both the source -and the destination tables are compared to ensure they are matched and correct -after each migration step (e.g. data and schema migration, SQL script -translation, ETL migration, etc.). The Data Validation Tool (DVT) provides an -automated and repeatable solution to perform this task. +Data validation is a critical step in a data warehouse, database, or data lake +migration project where data from both the source and the target tables are +compared to ensure they are matched and correct after each migration step +(e.g. data and schema migration, SQL script translation, ETL migration, etc.). +The Data Validation Tool (DVT) provides an automated and repeatable solution to +perform this task. DVT supports the following validations: -* Column validation (count, sum, avg, min/max, group_by) -* Row level validation +* Column validation (count, sum, avg, min, max, group by) +* Row validation (BQ, Hive, and Teradata only) * Schema validation * Custom Query validation -* RawSQL exploration +* Ad hoc SQL exploration DVT supports the following connection types: -* [BigQuery](docs/connections.md#google-bigquery) -* [DB2](docs/connections.md#db2) -* [FileSystem](docs/connections.md#filesystem) -* [Hive](docs/connections.md#hive) -* [Impala](docs/connections.md#impala) -* [MSSQL](docs/connections.md#mssql-server) -* [MySQL](docs/connections.md#mysql) -* [Oracle](docs/connections.md#oracle) -* [Postgres](docs/connections.md#postgres) -* [Redshift](docs/connections.md#redshift) -* [Snowflake](docs/connections.md#snowflake) -* [Spanner](docs/connections.md#google-spanner) -* [Teradata](docs/connections.md#teradata) - -The [Connections](docs/connections.md) page provides details about how to create +* [BigQuery](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#google-bigquery) +* [DB2](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#db2) +* [FileSystem](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#filesystem) +* [Hive](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#hive) +* [Impala](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#impala) +* [MSSQL](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#mssql-server) +* [MySQL](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#mysql) +* [Oracle](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#oracle) +* [Postgres](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#postgres) +* [Redshift](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#redshift) +* [Snowflake](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#snowflake) +* [Spanner](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#google-spanner) +* [Teradata](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#teradata) + +The [Connections](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md) page provides details about how to create and list connections for the validation tool. ### Disclaimer This is not an officially supported Google product. Please be aware that bugs may lurk, and that we reserve the right to make small backwards-incompatible changes. Feel free to open bugs or feature requests, or contribute directly -(see [CONTRIBUTING.md](CONTRIBUTING.md) for details). +(see [CONTRIBUTING.md](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/CONTRIBUTING.md) for details). ## Installation -The [Installation](docs/installation.md) page describes the prerequisites and -setup steps needed to install and use the data validation tool. +The [Installation](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/installation.md) page describes the prerequisites and +setup steps needed to install and use the Data Validation Tool. ## Usage Before using this tool, you will need to create connections to the source and target tables. Once the connections are created, you can run validations on those tables. Validation results can be printed to stdout (default) or outputted -to BigQuery. The validation tool also allows you to save or edit validation +to BigQuery (recommended). DVT also allows you to save or edit validation configurations in a YAML file. This is useful for running common validations or updating the configuration. ### Managing Connections -The Data Validation Tool expects to receive a source and target connection for -each validation which is run. +Before running validations, DVT requires setting up a source and target connection. +These connections can be stored locally or in a GCS directory. To create connections, +please review the [Connections](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md) page. -These connections can be supplied directly to the configuration, but more often -you want to manage connections separately and reference them by name. +### Running Validations -Connections can be stored locally or in a GCS directory. +The CLI is the main interface to use this tool and it has several different +commands which can be used to create and run validations. Below are the command +syntax and options for running validations. -To create connections please review the [Connections](docs/connections.md) page. - -### Running CLI Validations - -The data validation CLI is a main interface to use this tool. - -The CLI has several different commands which can be used to create and re-run -validations. - -The validation tool first expects connections to be created before running -validations. To create connections please review the -[Connections](docs/connections.md) page. - -Once you have your connections set up, you are ready to run the validations. - -### Validation command syntax and options - -Below are the command syntax and options for running validations from the CLI. -DVT supports column (including grouped column) and schema validations. +Alternatives to running DVT in the CLI include deploying DVT to Cloud Run, Cloud Functions, or Airflow +([Examples Here](https://github.com/GoogleCloudPlatform/professional-services-data-validator/tree/develop/samples)). See the [Validation Logic](https://github.com/GoogleCloudPlatform/professional-services-data-validator#validation-logic) section +to learn more about how DVT uses the CLI to generate SQL queries. #### Column Validations Below is the command syntax for column validations. To run a grouped column -validation, simply specify the `--grouped-columns` flag. You can also take -grouped column validations a step further by providing the `--primary-key` flag. -With this flag, if a mismatch was found, DVT will dive deeper into the slice -with the error and find the row (primary key value) with the inconsistency. +validation, simply specify the `--grouped-columns` flag. You can specify a list of string columns for aggregations in order to calculate an aggregation over the `length(string_col)`. Running an aggregation @@ -147,8 +130,7 @@ data-validation (--verbose or -v) validate column The default aggregation type is a 'COUNT *'. If no aggregation flag (i.e count, sum , min, etc.) is provided, the default aggregation will run. -The [Examples](docs/examples.md) page provides many examples of how a tool can -used to run powerful validations without writing any queries. +The [Examples](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md) page provides many examples of how a tool can be used to run powerful validations without writing any queries. #### Row Validations @@ -166,9 +148,9 @@ the table. This will include casting to string, sanitizing the data (ifnull, rtr and finally hashing the row. -Additionally you can use -[Calculated Fields](#calculated-fields) to compare derived values such as string -counts and hashes of multiple columns. +Under the hood, row validation uses +[Calculated Fields](https://github.com/GoogleCloudPlatform/professional-services-data-validator#calculated-fields) to +apply functions such as IFNULL() or RTRIM(). These can be edited in the YAML config to customize your row validation. ``` data-validation (--verbose or -v) validate row @@ -237,7 +219,7 @@ data-validation (--verbose or -v) validate schema Defaults to table. ``` -### Custom Query Column Validations +#### Custom Query Column Validations Below is the command syntax for custom query column validations. @@ -279,17 +261,18 @@ data-validation (--verbose or -v) validate custom-query The default aggregation type is a 'COUNT *'. If no aggregation flag (i.e count, sum , min, etc.) is provided, the default aggregation will run. -The [Examples](docs/examples.md) page provides few examples of how this tool can -used to run custom query validations. +The [Examples](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md) +page provides few examples of how this tool can be used to run custom query validations. -### Custom Query Row Validations +#### Custom Query Row Validations -#### (Note: Row hash validation is currently only supported for BigQuery, Imapala/Hive and Teradata) +(Note: Row hash validation is currently only supported for BigQuery, Teradata, and +Imapala/Hive. Struct and array data types are not currently supported.) Below is the command syntax for row validations. In order to run row level -validations you need to pass `--hash` flag with `*` value which means all the fields -of the custom query result will be concatenated and hashed. +validations you need to pass `--hash` flag which will specify the fields +of the custom query result that will be concatenated and hashed. Below is the command syntax for custom query row validations. @@ -324,15 +307,59 @@ data-validation (--verbose or -v) validate custom-query Defaults to table. ``` -The [Examples](docs/examples.md) page provides few examples of how this tool can -used to run custom query row validations. +The [Examples](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md) +page provides few examples of how this tool can be used to run custom query row validations. + +### YAML Configuration Files + +You can customize the configuration for any given validation by providing use +case specific CLI arguments or editing the YAML configuration file. + +For example, the following command creates a YAML file for the validation of the +`new_york_citibike` table: `data-validation validate column -sc my_bq_conn -tc +my_bq_conn -tbls bigquery-public-data.new_york_citibike.citibike_trips -c +citibike.yaml`. + +The vaildation config file is saved to the GCS path specified by the `PSO_DV_CONFIG_HOME` +env variable if that has been set; otherwise, it is saved to wherever the tool is run. + +You can now edit the YAML file if, for example, the `new_york_citibike` table is +stored in datasets that have different names in the source and target systems. +Once the file is updated and saved, the following command runs the +validation: + +``` +data-validation configs run -c citibike.yaml +``` + +View the complete YAML file for a Grouped Column validation on the +[Examples](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#sample-yaml-config-grouped-column-validation) page. + +You can view a list of all saved validation YAML files using `data-validation configs list`, and print a YAML config using `data-validation configs get -c citibike.yaml`. + +### Validation Reports + +The result handlers tell DVT where to store the results of +each validation. The tool can write the results of a validation run to Google +BigQuery or print to stdout (default). View the schema of the results +table [here](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/terraform/results_schema.json). +To output to BigQuery, simply include the `-bqrh` flag during a validation run +like so: +``` +data-validation validate column + -sc bq_conn + -tc bq_conn + -tbls bigquery-public-data.new_york_citibike.citibike_trips + -bqrh project_id.dataset.table + -sa service-acct@project.iam.gserviceaccount.com +``` -### Running Custom SQL Exploration +### Ad Hoc SQL Exploration There are many occasions where you need to explore a data source while running validations. To avoid the need to open and install a new client, the CLI allows -you to run custom queries. +you to run ad hoc queries. ``` data-validation query @@ -342,6 +369,27 @@ data-validation query The raw query to run against the supplied connection ``` +### Building Matched Table Lists + +Creating the list of matched tables can be a hassle. We have added a feature +which may help you to match all of the tables together between source and +target. The find-tables tool: + +- Pulls all tables in the source (applying a supplied allowed-schemas filter) +- Pulls all tables from the target +- Uses Levenshtein distance to match tables +- Finally, it prints a JSON list of tables which can be a reference for the + validation run config. + +Note that our score cutoff default is a 0.8, which was manually tested to be an +accurate value. If no matches occur, reduce this value. + +``` +data-validation find-tables --source-conn source --target-conn target \ + --allowed-schemas pso_data_validator \ + --score-cutoff 0.8 +``` + ### Using Beta CLI Features There may be occasions we want to release a new CLI feature under a Beta flag. @@ -361,62 +409,14 @@ Functions, and other deployment services. `data-validation beta deploy` -## Query Configurations - -You can customize the configuration for any given validation by providing use -case specific CLI arguments or editing the saved YAML configuration file. - -For example, the following command creates a YAML file for the validation of the -`new_york_citibike` table: `data-validation validate column -sc my_bq_conn -tc -my_bq_conn -tbls bigquery-public-data.new_york_citibike.citibike_trips -c -citibike.yaml`. - -The vaildation config file is saved to the GCS path specified by the `PSO_DV_CONFIG_HOME` env variable if that has been set; otherwise, it is saved to wherever the tool is run. - -Here is the generated YAML file named `citibike.yaml`: - -``` -result_handler: {} -source: my_bq_conn -target: my_bq_conn -validations: -- aggregates: - - field_alias: count - source_column: null - target_column: null - type: count - filters: [] - labels: [] - schema_name: bigquery-public-data.new_york_citibike - table_name: citibike_trips - target_schema_name: bigquery-public-data.new_york_citibike - target_table_name: citibike_trips - threshold: 0.0 - type: Column -``` - -You can now edit the YAML file if, for example, the `new_york_citibike` table is -stored in datasets that have different names in the source and target systems. -Once the file is updated and saved, the following command runs the new -validation: - -``` -data-validation configs run -c citibike.yaml -``` - -View the complete YAML file for a GroupedColumn validation on the -[examples](docs/examples.md#) page. - -You can view a list of all saved validation YAML files using `data-validation configs list`, and print a YAML config using `data-validation configs get -c citibike.yaml`. - +## Validation Logic ### Aggregated Fields Aggregate fields contain the SQL fields that you want to produce an aggregate -for. Currently the functions `COUNT()`, `AVG()`, `SUM()`, `MIN()` and `MAX()` +for. Currently the functions `COUNT()`, `AVG()`, `SUM()`, `MIN()`, and `MAX()` are supported. -#### Sample Aggregate Config - +Here is a sample aggregate config: ``` validations: - aggregates: @@ -432,12 +432,11 @@ validations: source_column: tripduration target_column: tripduration type: sum - - field_alias: bit_xor__hashed_column - source_column: hashed_column - target_column: hashed_column - type: bit_xor ``` +If you are aggregating columns with large values, you can CAST() before aggregation +with calculated fields as shown in [this example](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#sample-yaml-with-calc-fields-cast-to-numeric-before-aggregation). + ### Filters Filters let you apply a WHERE statement to your validation query (ie. `SELECT * @@ -455,24 +454,34 @@ Grouped Columns contain the fields you want your aggregations to be broken out by, e.g. `SELECT last_updated::DATE, COUNT(*) FROM my.table` will produce a resultset that breaks down the count of rows per calendar date. -### Comparison Fields +### Hash and Comparison Fields + +Row level validations can involve either a hash/checksum or comparison fields. +A hash validation (`--hash '*'`) will first sanitize the data with the following +operations on all or selected columns: CAST to string, IFNULL replace with a default +replacement string, RSTRIP, and UPPER. Then, it will CONCAT() the results +and run a SHA256() hash and compare the source and target results. Since each row will +be returned in the result set, it is recommended to utilize the `--use-random-row` feature +to validate a subset of the table. -For row validations you need to specify the specific columns that you want to -compare. These values will be compared via a JOIN on their corresponding primary +Comparison field validations (`--comp-fields column`) involve an value comparison of the +column values. These values will be compared via a JOIN on their corresponding primary key and will be evaluated for an exact match. +See hash and comparison field validations in the [Examples](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#run-a-row-hash-validation-for-all-rows) page. + ### Calculated Fields Sometimes direct comparisons are not feasible between databases due to differences in how particular data types may be handled. These differences can -be resolved by applying functions to columns in the source query itself. +be resolved by applying functions to columns in the query itself. Examples might include trimming whitespace from a string, converting strings to a single case to compare case insensitivity, or rounding numeric types to a significant figure. Once a calculated field is defined, it can be referenced by other calculated fields at any "depth" or higher. Depth controls how many subqueries are executed -in the resulting query. For example, with the following yaml config... +in the resulting query. For example, with the following YAML config... ``` - calculated_fields: @@ -506,144 +515,14 @@ FROM ( ) as table_0 ``` -Calculated fields can be used by aggregate fields to produce validations on -calculated or sanitized raw data, such as calculating the aggregate hash of a -table. For example the following yaml config... - -``` -validations: -- aggregates: - - field_alias: xor__multi_statement_hash - source_column: multi_statement_hash - target_column: multi_statement_hash - type: bit_xor - calculated_fields: - - field_alias: multi_statement_hash - source_calculated_columns: [multi_statement_concat] - target_calculated_columns: [multi_statement_concat] - type: hash - depth: 2 - - field_alias: multi_statement_concat - source_calculated_columns: [calc_length_col_a, - calc_ifnull_col_b, - calc_rstrip_col_c, - calc_upper_col_d] - target_calculated_columns: [calc_length_col_a, - calc_ifnull_col_b, - calc_rstrip_col_c, - calc_upper_col_d] - type: concat - depth: 1 - - field_alias: calc_length_col_a - source_calculated_columns: [col_a] - target_calculated_columns: [col_a] - type: length - depth: 0 - - field_alias: calc_ifnull_col_b - source_calculated_columns: [col_b] - target_calculated_columns: [col_b] - type: ifnull - depth: 0 - - field_alias: calc_rstrip_col_c - source_calculated_columns: [col_c] - target_calculated_columns: [col_c] - type: rstrip - depth: 0 - - field_alias: calc_upper_col_d - source_calculated_columns: [col_d] - target_calculated_columns: [col_d] - type: upper - depth: 0 -``` - -is equivalent to the following SQL query... - -``` -SELECT - BIT_XOR(multi_statement_hash) AS xor__multi_statement_hash -FROM ( - SELECT - FARM_FINGERPRINT(mult_statement_concat) AS multi_statement_hash - FROM ( - SELECT - CONCAT(calc_length_col_a, - calc_ifnull_col_b, - calc_rstrip_col_c, - calc_upper_col_d) AS multi_statement_concat - FROM ( - SELECT - CAST(LENGTH(col_a) AS STRING) AS calc_length_col_a - , IFNULL(col_b, - 'DEFAULT_REPLACEMENT_STRING') AS calc_ifnull_col_b - , RTRIM(col_c) AS calc_rstrip_col_c - , UPPER(col_d) AS calc_upper_col_d - FROM my.table - ) AS table_0 - ) AS table_1 - ) AS table_2 -``` - -## Validation Reports - -The output handlers tell the data validation tool where to store the results of -each validation. The tool can write the results of a validation run to Google -BigQuery or print to stdout (default). - -View the schema of the results [here](terraform/results_schema.json). - -### Configure tool to output to BigQuery - -``` -data-validation validate column - -sc bq_conn - -tc bq_conn - -tbls bigquery-public-data.new_york_citibike.citibike_trips - -bqrh project_id.dataset.table - -sa service-acct@project.iam.gserviceaccount.com -``` - -## Building Matched Table Lists - -Creating the list of matched tables can be a hassle. We have added a feature -which may help you to match all of the tables together between source and -target. The find-tables tool: - -- Pulls all tables in the source (applying a supplied allowed-schemas filter) -- Pulls all tables from the target -- Uses Levenshtein distance to match tables -- Finally, it prints a JSON list of tables which can be a reference for the - validation run config. - -Note that our score cutoff default is a 0.8, which was manually tested to be an -accurate value. If no matches occur, reduce this value. - -``` -data-validation find-tables --source-conn source --target-conn target \ - --allowed-schemas pso_data_validator \ - --score-cutoff 0.8 -``` - -## Add Support for an existing Ibis Data Source - -If you want to add an Ibis Data Source which exists, but was not yet supported -in the Data Validation tool, it is a simple process. - -1. In data_validation/data_validation.py - - - Import the extended Client for the given source (ie. from - ibis.sql.mysql.client import MySQLClient). - - Add the "": Client to the global CLIENT_LOOKUP dictionary. - -2. In third_party/ibis/ibis_addon/operations.py - - - Add the RawSQL operator to the data source registry (for custom filter - support). +If you generate the config file for a row validation, you can see that it uses +calculated fields to generate the query. You can also use calculated fields +in column level validations to generate the length of a string, or cast +a INT field to BIGINT for aggregations. -3. You are done, you can reference the data source via the config. +See the [Examples page](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#sample-yaml-with-calc-fields-cast-to-numeric-before-aggregation) for a sample +cast to NUMERIC. - - Config: {"source_type": "", ...KV Values required in Client...} - ## Contributing -Contributions are welcome. See the [contributing guide](CONTRIBUTING.md) for -details. +Contributions are welcome. See the [contributing guide](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/CONTRIBUTING.md) for details. diff --git a/docs/connections.md b/docs/connections.md index 42ec31cb6..15e3c2844 100644 --- a/docs/connections.md +++ b/docs/connections.md @@ -53,10 +53,9 @@ The data validation tool supports the following connection types. * [FileSystem](#filesystem) * [Impala](#Impala) * [Hive](#Hive) -* [DB2] (#DB2) +* [DB2](#DB2) As you see above, Teradata and BigQuery have different sets of custom arguments (for example project_id for BQ versus host for Teradata). - Every connection type requires its own configuration for connectivity. To find out the parameters for each connection type, use the following command. ``` @@ -233,7 +232,7 @@ Then `pip install pyodbc`. } ``` -## FileSystem +## FileSystem (CSV or JSON only) ``` { # Configuration Required for All Data Sources @@ -245,7 +244,7 @@ Then `pip install pyodbc`. # The local, s3, or GCS file path to the data "file_path": "gs://path/to/file", - # The file type. Either 'csv' or 'json + # The file type. Either 'csv' or 'json' "file_type":"csv" } ``` diff --git a/docs/examples.md b/docs/examples.md index 56969c552..c103c6244 100644 --- a/docs/examples.md +++ b/docs/examples.md @@ -41,11 +41,16 @@ data-validation validate column -sc my_bq_conn -tc my_bq_conn -tbls bigquery-pub data-validation validate column -sc my_bq_conn -tc my_bq_conn -tbls bigquery-public-data.new_york_citibike.citibike_trips --count bikeid,gender ```` -#### Run a checksum validation for all rows +#### Run a row hash validation for all rows ````shell script data-validation validate row -sc my_bq_conn -tc my_bq_conn -tbls bigquery-public-data.new_york_citibike.citibike_stations --primary-keys station_id --hash '*' ```` +#### Run a row level comparison field validation for 100 random rows +````shell script +data-validation validate row -sc my_bq_conn -tc my_bq_conn -tbls bigquery-public-data.new_york_citibike.citibike_stations --primary-keys station_id -comp-fields name -rr -rbs 100 +```` + #### Store results in a BigQuery table ````shell script data-validation validate column -sc my_bq_conn -tc my_bq_conn -tbls bigquery-public-data.new_york_citibike.citibike_trips --count tripduration,start_station_name -bqrh $YOUR_PROJECT_ID.pso_data_validator.results @@ -178,7 +183,7 @@ validations: threshold: 0.0 type: Column use_random_rows: false - ``` +``` #### Sample YAML with Calc Fields (Cast to NUMERIC before aggregation) diff --git a/samples/bq_utils/add_columns_schema.sh b/samples/bq_utils/add_columns_schema.sh index 057e1c5b7..185bbbbb9 100755 --- a/samples/bq_utils/add_columns_schema.sh +++ b/samples/bq_utils/add_columns_schema.sh @@ -3,7 +3,7 @@ # Reference: https://cloud.google.com/bigquery/docs/managing-table-schemas#adding_columns_to_a_tables_schema_definition DATASET=pso_data_validator -TABLE=results_test +TABLE=results # The JSON schema includes two additional columns for primary_keys and num_random_rows bq update $PROJECT_ID:$DATASET.$TABLE ../../terraform/results_schema.json \ No newline at end of file