diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index df3f686e..0774c0bb 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -48,7 +48,23 @@ To run our local testing suite, use: `python3 -m nox --envdir ~/dvt/envs/ -s unit_small blacken lint` -You can also use [our script](tests/local_check.sh) with all checks step by step. +See [our script](tests/local_check.sh) for using nox to run tests step by step. + +You can also run pytest directly: +```python +pip install pyfakefs==4.6.2 +pytest tests/unit +``` + +To lint your code, run: +``` +pip install black==22.3.0 +pip install flake8 +black $BLACK_PATHS # Find this variable in our noxfile +flake8 data_validation +flake8 tests +``` +The above is similar to our [noxfile lint test](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/noxfile.py). ## Conventional Commits diff --git a/README.md b/README.md index 18160823..85f90e55 100644 --- a/README.md +++ b/README.md @@ -73,6 +73,8 @@ Alternatives to running DVT in the CLI include deploying DVT to Cloud Run, Cloud ([Examples Here](https://github.com/GoogleCloudPlatform/professional-services-data-validator/tree/develop/samples)). See the [Validation Logic](https://github.com/GoogleCloudPlatform/professional-services-data-validator#validation-logic) section to learn more about how DVT uses the CLI to generate SQL queries. +Note that we do not support nested or complex columns for column or row validations. + #### Column Validations Below is the command syntax for column validations. To run a grouped column @@ -98,9 +100,6 @@ data-validation (--verbose or -v) (--log-level or -ll) validate column i.e 'bigquery-public-data.new_york_citibike.citibike_trips' [--grouped-columns or -gc GROUPED_COLUMNS] Comma separated list of columns for Group By i.e col_a,col_b - [--primary-keys or -pk PRIMARY_KEYS] - Comma separated list of columns to use as primary keys - (Note) Only use with grouped column validation. See *Primary Keys* section. [--count COLUMNS] Comma separated list of columns for count or * for all columns [--sum COLUMNS] Comma separated list of columns for sum or * for all numeric [--min COLUMNS] Comma separated list of columns for min or * for all numeric @@ -135,8 +134,8 @@ data-validation (--verbose or -v) (--log-level or -ll) validate column Comma separated list of statuses to filter the validation results. Supported statuses are (success, fail). If no list is provided, all statuses are returned. ``` -The default aggregation type is a 'COUNT *'. If no aggregation flag (i.e count, -sum , min, etc.) is provided, the default aggregation will run. +The default aggregation type is a 'COUNT *', which will run in addition to the validations you specify. To remove this default, +use [YAML configs](https://github.com/GoogleCloudPlatform/professional-services-data-validator/tree/develop#running-dvt-with-yaml-configuration-files). The [Examples](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md) page provides many examples of how a tool can be used to run powerful validations without writing any queries. diff --git a/docs/examples.md b/docs/examples.md index 6a71cb1c..962e09c7 100644 --- a/docs/examples.md +++ b/docs/examples.md @@ -45,13 +45,13 @@ Above command executes validations stored in a config file named citibike.yaml. #### Generate partitions and save as multiple configuration files ````shell script data-validation generate-table-partitions \ + -sc my_bq_conn \ -tc my_bq_conn \ -tbls bigquery-public-data.new_york_trees.tree_census_2015 \ --primary-keys tree_id \ --hash '*' \ --filters 'tree_id>3000' \ -cdir partitions_dir \ - --partition-key tree_id \ --partition-num 200 ```` Above command creates multiple partitions based on `--partition-key`. Number of generated configuration files is decided by `--partition-num`