Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add example of BigQuery cast to NUMERIC, update chore release version #476

Merged
merged 2 commits into from
May 12, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 47 additions & 11 deletions docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ data-validation query
--query, -q The Raw query to run against the supplied connection
````

#### Sample YAML file (Grouped Column validation)
#### Sample YAML Config (Grouped Column validation)
```yaml
result_handler:
project_id: my-project-id
Expand All @@ -155,36 +155,72 @@ validations:
source_column: null
target_column: null
type: count
- field_alias: sum__num_bikes_available
source_column: num_bikes_available
target_column: num_bikes_available
type: sum
cast: float64
- field_alias: sum__num_docks_available
source_column: num_docks_available
target_column: num_docks_available
type: sum
calculated_fields: []
filters:
- source: region_id=71
target: region_id=71
type: custom
format: table
grouped_columns:
- cast: null
field_alias: region_id
source_column: region_id
target_column: region_id
labels:
- !!python/tuple
- description
- user
- test
random_row_batch_size: null
schema_name: bigquery-public-data.new_york_citibike
table_name: citibike_stations
target_schema_name: bigquery-public-data.new_york_citibike
target_table_name: citibike_stations
threshold: 0.0
type: Column
use_random_rows: false
```

#### Sample YAML with Calc Fields (Cast to NUMERIC before aggregation)

The NUMERIC data type in BigQuery is equivalent to DECIMAL(38,9). This configuration
will run a SUM(CAST(column to NUMERIC)) to avoid integer overflow.

```yaml
result_handler: {}
source: my_bq_conn
target: my_bq_conn
validations:
- aggregates:
- field_alias: count
source_column: null
target_column: null
type: count
- field_alias: sum__int
source_column: cast__int
target_column: cast__int
type: sum
calculated_fields:
- depth: 0
field_alias: cast__int
source_calculated_columns:
- int
target_calculated_columns:
- int
type: cast
default_cast: decimal(38,9)
filters: []
format: table
labels: []
random_row_batch_size: null
schema_name: project.dataset
table_name: my_table
target_schema_name: project.dataset
target_table_name: my_table
threshold: 0.0
type: Column
use_random_rows: false
```

#### Run a custom query column validation
````shell script
data-validation validate --custom-query-type column custom-query --source-query-file source_query.sql --target-query-file target_query.sql -sc my_bq_conn -tc my_bq_conn -tbls bigquery-public-data.new_york_citibike.citibike_stations
Expand Down