Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use an appropriate column filter list for schema validation (#350) #371

Conversation

ajwelch4
Copy link
Member

  • fix: use an appropriate column filter list for schema validation (BigQuery Schema Validation Keyword Exception for Table with Records and Repeated Fields #350)
    • Here is the full traceback I was a able to reproduce:
      Traceback (most recent call last):
        File "/home/ajwelch/professional-services-data-validator/env/bin/data-validation", line 33, in <module>
          sys.exit(load_entry_point('google-pso-data-validator', 'console_scripts', 'data-validation')())
        File "/home/ajwelch/professional-services-data-validator/data_validation/__main__.py", line 397, in main
          validate(args)
        File "/home/ajwelch/professional-services-data-validator/data_validation/__main__.py", line 376, in validate
          run(args)
        File "/home/ajwelch/professional-services-data-validator/data_validation/__main__.py", line 357, in run
          run_validations(args, config_managers)
        File "/home/ajwelch/professional-services-data-validator/data_validation/__main__.py", line 333, in run_validations
          run_validation(config_manager, verbose=args.verbose)
        File "/home/ajwelch/professional-services-data-validator/data_validation/__main__.py", line 322, in run_validation
          validator.execute()
        File "/home/ajwelch/professional-services-data-validator/data_validation/data_validation.py", line 101, in execute
          return self.result_handler.execute(self.config, result_df)
        File "/home/ajwelch/professional-services-data-validator/data_validation/result_handlers/text.py", line 61, in execute
          self.print_formatted_(result_df)
        File "/home/ajwelch/professional-services-data-validator/data_validation/result_handlers/text.py", line 46, in print_formatted_
          result_df.drop(self.cols_filter_list, axis=1).to_markdown(
        File "/home/ajwelch/professional-services-data-validator/env/lib/python3.9/site-packages/pandas/core/frame.py", line 4308, in drop
          return super().drop(
        File "/home/ajwelch/professional-services-data-validator/env/lib/python3.9/site-packages/pandas/core/generic.py", line 4153, in drop
          obj = obj._drop_axis(labels, axis, level=level, errors=errors)
        File "/home/ajwelch/professional-services-data-validator/env/lib/python3.9/site-packages/pandas/core/generic.py", line 4188, in _drop_axis
          new_axis = axis.drop(labels, errors=errors)
        File "/home/ajwelch/professional-services-data-validator/env/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5591, in drop
          raise KeyError(f"{labels[mask]} not found in axis")
      KeyError: "['labels' 'pct_threshold'] not found in axis"
      ``
      
  • fix: only reference use_random_row and random_row_batch_size args if we are not doing a schema validation
    • fixes the following traceback:
      Traceback (most recent call last):
        File "/home/ajwelch/professional-services-data-validator/env/bin/data-validation", line 33, in <module>
          sys.exit(load_entry_point('google-pso-data-validator', 'console_scripts', 'data-validation')())
        File "/home/ajwelch/professional-services-data-validator/data_validation/__main__.py", line 394, in main
          validate(args)
        File "/home/ajwelch/professional-services-data-validator/data_validation/__main__.py", line 373, in validate
          run(args)
        File "/home/ajwelch/professional-services-data-validator/data_validation/__main__.py", line 349, in run
          config_managers = build_config_managers_from_args(args)
        File "/home/ajwelch/professional-services-data-validator/data_validation/__main__.py", line 170, in build_config_managers_from_args
          use_random_rows=args.use_random_row,
      AttributeError: 'Namespace' object has no attribute 'use_random_row'
      

@nehanene15 nehanene15 merged commit 806151a into GoogleCloudPlatform:develop Mar 1, 2022
ngdav added a commit that referenced this pull request May 4, 2022
* feat: add db2 connection

* feat: add connection

* feat: DB2 connection fix

* fix: do not require db2 client unless needed

* fix: Db2 count validation/agg functions, DB2Client

fixes sum, min, avg, max functions for mysql, ps, db2, and more
streamline DB2Client imports

* style: linting

* Fix: Multiple updates (#359)

* fix: update spelling

* fix:Adding double quote to prevent globbing and word splitting.

Adding double quote to prevent globbing and word splitting.

* fix:updating comment

* fix: Updating inline comments

* fix:Spelling

* fix:Updating spelling

* test: Support local integration tests for Teradata, Postgres and SQL Server (#364)

* test: get Teradata user name from TERADATA_USER env var

* test: add --no-cloud-sql flag to pytest options

* test: instantiate CloudSQLResourceManager in a fixture when --no-cloud-sql is not passed

* test: optionally get Postgres host from POSTGRES_HOST env var

* test: optionally get SQL Server host from SQL_SERVER_HOST env var

* test: optionally get SQL server user from SQL_SERVER_USER env var

Co-authored-by: A.J. Welch <[email protected]>

* fix: supporting non default schemas for mssql (#365)

* fix: supporting non default schemas for mssql

* fix:updated MSSQL client instantiation

* fix: typo

* feat: GCS support for validation configs (#340)

* gcs support for validation configs, incl. get and list functionality, and new 'configs' cmd

* fix: test for nan when calculating fail/success in combiner (#341) (#366)

* fix: ensure all statuses are success or fail, particularly after _join_pivots (#329) (#370)

* feat: first class support for row level hashing (#345)

* adding scaffolding for calc field builder in config manager

* exposing cast via calculated fields. Don't know if we necessarily need this just adding for consistency

* diff check

* config file generating as expected

* expanding cli for row level validations

* splitting out comparison fields from aggregates

* row comparisons operational (sort of)

* re-enabling aggregate validations

* cohabitation of validation types!

* figuring out why unit tests are borked

* continuing field split

* stash before merge

* testing diff

* tests passing

* removing extra print statements

* tests and lint

* adding fail tests

* first round of requested changes

* change requests round two.

* refactor CLI and lint

* swapping out farm fingerprint for sha256 as default

* changes per CR

* fixing text result tests

* adding docs

* hash example

* linting

* think I found the broken test

* fixed tests

* setting default for depth length

* relaxing system test

* feat: Hive partitioned tables support (#375)

* feat: add support for partitioned tables

* feat: import schema class

* fix: update docs

* fix: use an appropriate column filter list for schema validation (#350) (#371)

* fix: make status values consistent across validation types (#377) (#378)

* fix: make status values consistent across validation types (#377)

* fix: make validation status values consts (#377)

* fix: revert change from #345 that causes filters, threshold and labels to be ignored for column validations (#376) (#379)

* feat: Hive hash function support (#392)

* adding addons for impala hive hashing functions

* fix: import fixed_arity

* move logic to ibis_addon

* replacing isnull with nvl

* adding nvl function

* test FillNa

* missing import

* updating t0 prefix to column names



Co-authored-by: Mike Hilton <[email protected]>

* docs: add Db2 link to README

Co-authored-by: Elaina Yao <[email protected]>
Co-authored-by: David Ng <[email protected]>
Co-authored-by: Alejandro Leal <[email protected]>
Co-authored-by: AJ <[email protected]>
Co-authored-by: A.J. Welch <[email protected]>
Co-authored-by: Neha Nene <[email protected]>
Co-authored-by: dmedora <[email protected]>
Co-authored-by: Mike Hilton <[email protected]>
Co-authored-by: ngdav <[email protected]>
Co-authored-by: Dylan Hercher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants