Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: first class support for row level hashing #345

Merged
merged 33 commits into from
Feb 23, 2022

Conversation

renzokuken
Copy link
Collaborator

No description provided.

@renzokuken renzokuken changed the title adding scaffolding for calc field builder in config manager feat: adding scaffolding for calc field builder in config manager Jan 29, 2022
@renzokuken renzokuken changed the title feat: adding scaffolding for calc field builder in config manager feat: first class support for row level hashing Jan 29, 2022
Copy link
Collaborator

@dhercher dhercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 1 of changes

data_validation/__main__.py Show resolved Hide resolved
data_validation/__main__.py Show resolved Hide resolved
data_validation/__main__.py Outdated Show resolved Hide resolved
data_validation/cli_tools.py Outdated Show resolved Hide resolved
data_validation/combiner.py Outdated Show resolved Hide resolved
data_validation/config_manager.py Outdated Show resolved Hide resolved
data_validation/data_validation.py Show resolved Hide resolved
data_validation/data_validation.py Outdated Show resolved Hide resolved
data_validation/config_manager.py Outdated Show resolved Hide resolved
@renzokuken
Copy link
Collaborator Author

@nehanene15 @dhercher PR updated for review. Please let me know if you have any questions or concerns

data_validation/config_manager.py Show resolved Hide resolved
data_validation/config_manager.py Outdated Show resolved Hide resolved
data_validation/config_manager.py Outdated Show resolved Hide resolved
data_validation/config_manager.py Show resolved Hide resolved
data_validation/result_handlers/text.py Outdated Show resolved Hide resolved
data_validation/validation_builder.py Show resolved Hide resolved
data_validation/validation_builder.py Show resolved Hide resolved
tests/unit/test_schema_validation.py Outdated Show resolved Hide resolved
@nehanene15
Copy link
Collaborator

We will also need to add documentation about this functionality - we can do that in this PR or open a new one. This would include:

  • CLI flags/descriptions
  • Updating the examples.md with row validations
  • When to use calculated vs comparison fields, How to exclude columns from hashing
  • Limitations of row validation

Copy link
Collaborator

@nehanene15 nehanene15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also document a couple examples of row hashing in docs/examples?

@renzokuken renzokuken merged commit 3d78ee5 into develop Feb 23, 2022
@renzokuken renzokuken deleted the issue300-make-hashing-user-friendly branch February 23, 2022 21:51
ajwelch4 added a commit to ajwelch4/professional-services-data-validator that referenced this pull request Feb 26, 2022
…threshold and labels to be ignored for column validations (GoogleCloudPlatform#376)
nehanene15 pushed a commit that referenced this pull request Mar 3, 2022
ngdav pushed a commit that referenced this pull request Mar 16, 2022
* adding scaffolding for calc field builder in config manager

* exposing cast via calculated fields. Don't know if we necessarily need this just adding for consistency

* diff check

* config file generating as expected

* expanding cli for row level validations

* splitting out comparison fields from aggregates

* row comparisons operational (sort of)

* re-enabling aggregate validations

* cohabitation of validation types!

* figuring out why unit tests are borked

* continuing field split

* stash before merge

* testing diff

* tests passing

* removing extra print statements

* tests and lint

* adding fail tests

* first round of requested changes

* change requests round two.

* refactor CLI and lint

* swapping out farm fingerprint for sha256 as default

* changes per CR

* fixing text result tests

* adding docs

* hash example

* linting

* think I found the broken test

* fixed tests

* setting default for depth length

* relaxing system test
ngdav pushed a commit that referenced this pull request Mar 16, 2022
ngdav added a commit that referenced this pull request May 4, 2022
* feat: add db2 connection

* feat: add connection

* feat: DB2 connection fix

* fix: do not require db2 client unless needed

* fix: Db2 count validation/agg functions, DB2Client

fixes sum, min, avg, max functions for mysql, ps, db2, and more
streamline DB2Client imports

* style: linting

* Fix: Multiple updates (#359)

* fix: update spelling

* fix:Adding double quote to prevent globbing and word splitting.

Adding double quote to prevent globbing and word splitting.

* fix:updating comment

* fix: Updating inline comments

* fix:Spelling

* fix:Updating spelling

* test: Support local integration tests for Teradata, Postgres and SQL Server (#364)

* test: get Teradata user name from TERADATA_USER env var

* test: add --no-cloud-sql flag to pytest options

* test: instantiate CloudSQLResourceManager in a fixture when --no-cloud-sql is not passed

* test: optionally get Postgres host from POSTGRES_HOST env var

* test: optionally get SQL Server host from SQL_SERVER_HOST env var

* test: optionally get SQL server user from SQL_SERVER_USER env var

Co-authored-by: A.J. Welch <[email protected]>

* fix: supporting non default schemas for mssql (#365)

* fix: supporting non default schemas for mssql

* fix:updated MSSQL client instantiation

* fix: typo

* feat: GCS support for validation configs (#340)

* gcs support for validation configs, incl. get and list functionality, and new 'configs' cmd

* fix: test for nan when calculating fail/success in combiner (#341) (#366)

* fix: ensure all statuses are success or fail, particularly after _join_pivots (#329) (#370)

* feat: first class support for row level hashing (#345)

* adding scaffolding for calc field builder in config manager

* exposing cast via calculated fields. Don't know if we necessarily need this just adding for consistency

* diff check

* config file generating as expected

* expanding cli for row level validations

* splitting out comparison fields from aggregates

* row comparisons operational (sort of)

* re-enabling aggregate validations

* cohabitation of validation types!

* figuring out why unit tests are borked

* continuing field split

* stash before merge

* testing diff

* tests passing

* removing extra print statements

* tests and lint

* adding fail tests

* first round of requested changes

* change requests round two.

* refactor CLI and lint

* swapping out farm fingerprint for sha256 as default

* changes per CR

* fixing text result tests

* adding docs

* hash example

* linting

* think I found the broken test

* fixed tests

* setting default for depth length

* relaxing system test

* feat: Hive partitioned tables support (#375)

* feat: add support for partitioned tables

* feat: import schema class

* fix: update docs

* fix: use an appropriate column filter list for schema validation (#350) (#371)

* fix: make status values consistent across validation types (#377) (#378)

* fix: make status values consistent across validation types (#377)

* fix: make validation status values consts (#377)

* fix: revert change from #345 that causes filters, threshold and labels to be ignored for column validations (#376) (#379)

* feat: Hive hash function support (#392)

* adding addons for impala hive hashing functions

* fix: import fixed_arity

* move logic to ibis_addon

* replacing isnull with nvl

* adding nvl function

* test FillNa

* missing import

* updating t0 prefix to column names



Co-authored-by: Mike Hilton <[email protected]>

* docs: add Db2 link to README

Co-authored-by: Elaina Yao <[email protected]>
Co-authored-by: David Ng <[email protected]>
Co-authored-by: Alejandro Leal <[email protected]>
Co-authored-by: AJ <[email protected]>
Co-authored-by: A.J. Welch <[email protected]>
Co-authored-by: Neha Nene <[email protected]>
Co-authored-by: dmedora <[email protected]>
Co-authored-by: Mike Hilton <[email protected]>
Co-authored-by: ngdav <[email protected]>
Co-authored-by: Dylan Hercher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants