Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add --dry-run option to validate. #778

Merged
merged 3 commits into from
May 12, 2023

Conversation

ajwelch4
Copy link
Member

Adds a --dry-run option to the validate command:

> data-validation validate --dry-run row \
  -sc my_bq_conn \
  -tc my_bq_conn \
  -tbls bigquery-public-data.new_york_citibike.citibike_stations \
  --primary-keys station_id \
  --hash '*'
{
    "source_query": "SELECT `hash__all`, `station_id`\nFROM (\n  SELECT *, TO_HEX(SHA256(`concat__all`)) AS `hash__all`\n  FROM (\n    SELECT *,\n           ARRAY_TO_STRING([`upper__rstrip__ifnull__cast__station_id`, `upper__rstrip__ifnull__cast__name`, `upper__rstrip__ifnull__cast__short_name`, `upper__rstrip__ifnull__cast__latitude`, `upper__rstrip__ifnull__cast__longitude`, `upper__rstrip__ifnull__cast__region_id`, `upper__rstrip__ifnull__cast__rental_methods`, `upper__rstrip__ifnull__cast__capacity`, `upper__rstrip__ifnull__cast__eightd_has_key_dispenser`, `upper__rstrip__ifnull__cast__num_bikes_available`, `upper__rstrip__ifnull__cast__num_bikes_disabled`, `upper__rstrip__ifnull__cast__num_docks_available`, `upper__rstrip__ifnull__cast__num_docks_disabled`, `upper__rstrip__ifnull__cast__is_installed`, `upper__rstrip__ifnull__cast__is_renting`, `upper__rstrip__ifnull__cast__is_returning`, `upper__rstrip__ifnull__cast__eightd_has_available_keys`, `upper__rstrip__ifnull__cast__last_reported`], '') AS `concat__all`\n    FROM (\n      SELECT *,\n             upper(`rstrip__ifnull__cast__station_id`) AS `upper__rstrip__ifnull__cast__station_id`,\n             upper(`rstrip__ifnull__cast__name`) AS `upper__rstrip__ifnull__cast__name`,\n             upper(`rstrip__ifnull__cast__short_name`) AS `upper__rstrip__ifnull__cast__short_name`,\n             upper(`rstrip__ifnull__cast__latitude`) AS `upper__rstrip__ifnull__cast__latitude`,\n             upper(`rstrip__ifnull__cast__longitude`) AS `upper__rstrip__ifnull__cast__longitude`,\n             upper(`rstrip__ifnull__cast__region_id`) AS `upper__rstrip__ifnull__cast__region_id`,\n             upper(`rstrip__ifnull__cast__rental_methods`) AS `upper__rstrip__ifnull__cast__rental_methods`,\n             upper(`rstrip__ifnull__cast__capacity`) AS `upper__rstrip__ifnull__cast__capacity`,\n             upper(`rstrip__ifnull__cast__eightd_has_key_dispenser`) AS `upper__rstrip__ifnull__cast__eightd_has_key_dispenser`,\n             upper(`rstrip__ifnull__cast__num_bikes_available`) AS `upper__rstrip__ifnull__cast__num_bikes_available`,\n             upper(`rstrip__ifnull__cast__num_bikes_disabled`) AS `upper__rstrip__ifnull__cast__num_bikes_disabled`,\n             upper(`rstrip__ifnull__cast__num_docks_available`) AS `upper__rstrip__ifnull__cast__num_docks_available`,\n             upper(`rstrip__ifnull__cast__num_docks_disabled`) AS `upper__rstrip__ifnull__cast__num_docks_disabled`,\n             upper(`rstrip__ifnull__cast__is_installed`) AS `upper__rstrip__ifnull__cast__is_installed`,\n             upper(`rstrip__ifnull__cast__is_renting`) AS `upper__rstrip__ifnull__cast__is_renting`,\n             upper(`rstrip__ifnull__cast__is_returning`) AS `upper__rstrip__ifnull__cast__is_returning`,\n             upper(`rstrip__ifnull__cast__eightd_has_available_keys`) AS `upper__rstrip__ifnull__cast__eightd_has_available_keys`,\n             upper(`rstrip__ifnull__cast__last_reported`) AS `upper__rstrip__ifnull__cast__last_reported`\n      FROM (\n        SELECT *,\n               rtrim(`ifnull__cast__station_id`) AS `rstrip__ifnull__cast__station_id`,\n               rtrim(`ifnull__cast__name`) AS `rstrip__ifnull__cast__name`,\n               rtrim(`ifnull__cast__short_name`) AS `rstrip__ifnull__cast__short_name`,\n               rtrim(`ifnull__cast__latitude`) AS `rstrip__ifnull__cast__latitude`,\n               rtrim(`ifnull__cast__longitude`) AS `rstrip__ifnull__cast__longitude`,\n               rtrim(`ifnull__cast__region_id`) AS `rstrip__ifnull__cast__region_id`,\n               rtrim(`ifnull__cast__rental_methods`) AS `rstrip__ifnull__cast__rental_methods`,\n               rtrim(`ifnull__cast__capacity`) AS `rstrip__ifnull__cast__capacity`,\n               rtrim(`ifnull__cast__eightd_has_key_dispenser`) AS `rstrip__ifnull__cast__eightd_has_key_dispenser`,\n               rtrim(`ifnull__cast__num_bikes_available`) AS `rstrip__ifnull__cast__num_bikes_available`,\n               rtrim(`ifnull__cast__num_bikes_disabled`) AS `rstrip__ifnull__cast__num_bikes_disabled`,\n               rtrim(`ifnull__cast__num_docks_available`) AS `rstrip__ifnull__cast__num_docks_available`,\n               rtrim(`ifnull__cast__num_docks_disabled`) AS `rstrip__ifnull__cast__num_docks_disabled`,\n               rtrim(`ifnull__cast__is_installed`) AS `rstrip__ifnull__cast__is_installed`,\n               rtrim(`ifnull__cast__is_renting`) AS `rstrip__ifnull__cast__is_renting`,\n               rtrim(`ifnull__cast__is_returning`) AS `rstrip__ifnull__cast__is_returning`,\n               rtrim(`ifnull__cast__eightd_has_available_keys`) AS `rstrip__ifnull__cast__eightd_has_available_keys`,\n               rtrim(`ifnull__cast__last_reported`) AS `rstrip__ifnull__cast__last_reported`\n        FROM (\n          SELECT *,\n                 IFNULL(`cast__station_id`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__station_id`,\n                 IFNULL(`cast__name`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__name`,\n                 IFNULL(`cast__short_name`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__short_name`,\n                 IFNULL(`cast__latitude`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__latitude`,\n                 IFNULL(`cast__longitude`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__longitude`,\n                 IFNULL(`cast__region_id`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__region_id`,\n                 IFNULL(`cast__rental_methods`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__rental_methods`,\n                 IFNULL(`cast__capacity`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__capacity`,\n                 IFNULL(`cast__eightd_has_key_dispenser`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__eightd_has_key_dispenser`,\n                 IFNULL(`cast__num_bikes_available`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__num_bikes_available`,\n                 IFNULL(`cast__num_bikes_disabled`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__num_bikes_disabled`,\n                 IFNULL(`cast__num_docks_available`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__num_docks_available`,\n                 IFNULL(`cast__num_docks_disabled`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__num_docks_disabled`,\n                 IFNULL(`cast__is_installed`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__is_installed`,\n                 IFNULL(`cast__is_renting`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__is_renting`,\n                 IFNULL(`cast__is_returning`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__is_returning`,\n                 IFNULL(`cast__eightd_has_available_keys`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__eightd_has_available_keys`,\n                 IFNULL(`cast__last_reported`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__last_reported`\n          FROM (\n            SELECT *, `station_id` AS `cast__station_id`, `name` AS `cast__name`,\n                   `short_name` AS `cast__short_name`,\n                   CAST(`latitude` AS STRING) AS `cast__latitude`,\n                   CAST(`longitude` AS STRING) AS `cast__longitude`,\n                   CAST(`region_id` AS STRING) AS `cast__region_id`,\n                   `rental_methods` AS `cast__rental_methods`,\n                   CAST(`capacity` AS STRING) AS `cast__capacity`,\n                   CAST(`eightd_has_key_dispenser` AS STRING) AS `cast__eightd_has_key_dispenser`,\n                   CAST(`num_bikes_available` AS STRING) AS `cast__num_bikes_available`,\n                   CAST(`num_bikes_disabled` AS STRING) AS `cast__num_bikes_disabled`,\n                   CAST(`num_docks_available` AS STRING) AS `cast__num_docks_available`,\n                   CAST(`num_docks_disabled` AS STRING) AS `cast__num_docks_disabled`,\n                   CAST(`is_installed` AS STRING) AS `cast__is_installed`,\n                   CAST(`is_renting` AS STRING) AS `cast__is_renting`,\n                   CAST(`is_returning` AS STRING) AS `cast__is_returning`,\n                   CAST(`eightd_has_available_keys` AS STRING) AS `cast__eightd_has_available_keys`,\n                   FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP(`last_reported`), 'UTC') AS `cast__last_reported`\n            FROM `bigquery-public-data.new_york_citibike.citibike_stations`\n          ) t5\n        ) t4\n      ) t3\n    ) t2\n  ) t1\n) t0",
    "target_query": "SELECT `hash__all`, `station_id`\nFROM (\n  SELECT *, TO_HEX(SHA256(`concat__all`)) AS `hash__all`\n  FROM (\n    SELECT *,\n           ARRAY_TO_STRING([`upper__rstrip__ifnull__cast__station_id`, `upper__rstrip__ifnull__cast__name`, `upper__rstrip__ifnull__cast__short_name`, `upper__rstrip__ifnull__cast__latitude`, `upper__rstrip__ifnull__cast__longitude`, `upper__rstrip__ifnull__cast__region_id`, `upper__rstrip__ifnull__cast__rental_methods`, `upper__rstrip__ifnull__cast__capacity`, `upper__rstrip__ifnull__cast__eightd_has_key_dispenser`, `upper__rstrip__ifnull__cast__num_bikes_available`, `upper__rstrip__ifnull__cast__num_bikes_disabled`, `upper__rstrip__ifnull__cast__num_docks_available`, `upper__rstrip__ifnull__cast__num_docks_disabled`, `upper__rstrip__ifnull__cast__is_installed`, `upper__rstrip__ifnull__cast__is_renting`, `upper__rstrip__ifnull__cast__is_returning`, `upper__rstrip__ifnull__cast__eightd_has_available_keys`, `upper__rstrip__ifnull__cast__last_reported`], '') AS `concat__all`\n    FROM (\n      SELECT *,\n             upper(`rstrip__ifnull__cast__station_id`) AS `upper__rstrip__ifnull__cast__station_id`,\n             upper(`rstrip__ifnull__cast__name`) AS `upper__rstrip__ifnull__cast__name`,\n             upper(`rstrip__ifnull__cast__short_name`) AS `upper__rstrip__ifnull__cast__short_name`,\n             upper(`rstrip__ifnull__cast__latitude`) AS `upper__rstrip__ifnull__cast__latitude`,\n             upper(`rstrip__ifnull__cast__longitude`) AS `upper__rstrip__ifnull__cast__longitude`,\n             upper(`rstrip__ifnull__cast__region_id`) AS `upper__rstrip__ifnull__cast__region_id`,\n             upper(`rstrip__ifnull__cast__rental_methods`) AS `upper__rstrip__ifnull__cast__rental_methods`,\n             upper(`rstrip__ifnull__cast__capacity`) AS `upper__rstrip__ifnull__cast__capacity`,\n             upper(`rstrip__ifnull__cast__eightd_has_key_dispenser`) AS `upper__rstrip__ifnull__cast__eightd_has_key_dispenser`,\n             upper(`rstrip__ifnull__cast__num_bikes_available`) AS `upper__rstrip__ifnull__cast__num_bikes_available`,\n             upper(`rstrip__ifnull__cast__num_bikes_disabled`) AS `upper__rstrip__ifnull__cast__num_bikes_disabled`,\n             upper(`rstrip__ifnull__cast__num_docks_available`) AS `upper__rstrip__ifnull__cast__num_docks_available`,\n             upper(`rstrip__ifnull__cast__num_docks_disabled`) AS `upper__rstrip__ifnull__cast__num_docks_disabled`,\n             upper(`rstrip__ifnull__cast__is_installed`) AS `upper__rstrip__ifnull__cast__is_installed`,\n             upper(`rstrip__ifnull__cast__is_renting`) AS `upper__rstrip__ifnull__cast__is_renting`,\n             upper(`rstrip__ifnull__cast__is_returning`) AS `upper__rstrip__ifnull__cast__is_returning`,\n             upper(`rstrip__ifnull__cast__eightd_has_available_keys`) AS `upper__rstrip__ifnull__cast__eightd_has_available_keys`,\n             upper(`rstrip__ifnull__cast__last_reported`) AS `upper__rstrip__ifnull__cast__last_reported`\n      FROM (\n        SELECT *,\n               rtrim(`ifnull__cast__station_id`) AS `rstrip__ifnull__cast__station_id`,\n               rtrim(`ifnull__cast__name`) AS `rstrip__ifnull__cast__name`,\n               rtrim(`ifnull__cast__short_name`) AS `rstrip__ifnull__cast__short_name`,\n               rtrim(`ifnull__cast__latitude`) AS `rstrip__ifnull__cast__latitude`,\n               rtrim(`ifnull__cast__longitude`) AS `rstrip__ifnull__cast__longitude`,\n               rtrim(`ifnull__cast__region_id`) AS `rstrip__ifnull__cast__region_id`,\n               rtrim(`ifnull__cast__rental_methods`) AS `rstrip__ifnull__cast__rental_methods`,\n               rtrim(`ifnull__cast__capacity`) AS `rstrip__ifnull__cast__capacity`,\n               rtrim(`ifnull__cast__eightd_has_key_dispenser`) AS `rstrip__ifnull__cast__eightd_has_key_dispenser`,\n               rtrim(`ifnull__cast__num_bikes_available`) AS `rstrip__ifnull__cast__num_bikes_available`,\n               rtrim(`ifnull__cast__num_bikes_disabled`) AS `rstrip__ifnull__cast__num_bikes_disabled`,\n               rtrim(`ifnull__cast__num_docks_available`) AS `rstrip__ifnull__cast__num_docks_available`,\n               rtrim(`ifnull__cast__num_docks_disabled`) AS `rstrip__ifnull__cast__num_docks_disabled`,\n               rtrim(`ifnull__cast__is_installed`) AS `rstrip__ifnull__cast__is_installed`,\n               rtrim(`ifnull__cast__is_renting`) AS `rstrip__ifnull__cast__is_renting`,\n               rtrim(`ifnull__cast__is_returning`) AS `rstrip__ifnull__cast__is_returning`,\n               rtrim(`ifnull__cast__eightd_has_available_keys`) AS `rstrip__ifnull__cast__eightd_has_available_keys`,\n               rtrim(`ifnull__cast__last_reported`) AS `rstrip__ifnull__cast__last_reported`\n        FROM (\n          SELECT *,\n                 IFNULL(`cast__station_id`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__station_id`,\n                 IFNULL(`cast__name`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__name`,\n                 IFNULL(`cast__short_name`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__short_name`,\n                 IFNULL(`cast__latitude`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__latitude`,\n                 IFNULL(`cast__longitude`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__longitude`,\n                 IFNULL(`cast__region_id`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__region_id`,\n                 IFNULL(`cast__rental_methods`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__rental_methods`,\n                 IFNULL(`cast__capacity`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__capacity`,\n                 IFNULL(`cast__eightd_has_key_dispenser`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__eightd_has_key_dispenser`,\n                 IFNULL(`cast__num_bikes_available`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__num_bikes_available`,\n                 IFNULL(`cast__num_bikes_disabled`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__num_bikes_disabled`,\n                 IFNULL(`cast__num_docks_available`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__num_docks_available`,\n                 IFNULL(`cast__num_docks_disabled`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__num_docks_disabled`,\n                 IFNULL(`cast__is_installed`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__is_installed`,\n                 IFNULL(`cast__is_renting`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__is_renting`,\n                 IFNULL(`cast__is_returning`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__is_returning`,\n                 IFNULL(`cast__eightd_has_available_keys`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__eightd_has_available_keys`,\n                 IFNULL(`cast__last_reported`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__last_reported`\n          FROM (\n            SELECT *, `station_id` AS `cast__station_id`, `name` AS `cast__name`,\n                   `short_name` AS `cast__short_name`,\n                   CAST(`latitude` AS STRING) AS `cast__latitude`,\n                   CAST(`longitude` AS STRING) AS `cast__longitude`,\n                   CAST(`region_id` AS STRING) AS `cast__region_id`,\n                   `rental_methods` AS `cast__rental_methods`,\n                   CAST(`capacity` AS STRING) AS `cast__capacity`,\n                   CAST(`eightd_has_key_dispenser` AS STRING) AS `cast__eightd_has_key_dispenser`,\n                   CAST(`num_bikes_available` AS STRING) AS `cast__num_bikes_available`,\n                   CAST(`num_bikes_disabled` AS STRING) AS `cast__num_bikes_disabled`,\n                   CAST(`num_docks_available` AS STRING) AS `cast__num_docks_available`,\n                   CAST(`num_docks_disabled` AS STRING) AS `cast__num_docks_disabled`,\n                   CAST(`is_installed` AS STRING) AS `cast__is_installed`,\n                   CAST(`is_renting` AS STRING) AS `cast__is_renting`,\n                   CAST(`is_returning` AS STRING) AS `cast__is_returning`,\n                   CAST(`eightd_has_available_keys` AS STRING) AS `cast__eightd_has_available_keys`,\n                   FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP(`last_reported`), 'UTC') AS `cast__last_reported`\n            FROM `bigquery-public-data.new_york_citibike.citibike_stations`\n          ) t5\n        ) t4\n      ) t3\n    ) t2\n  ) t1\n) t0"
}

@conventional-commit-lint-gcf
Copy link

🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use automerge label. Good luck human!

-- conventional-commit-lint bot
https://conventionalcommits.org/

@nehanene15
Copy link
Collaborator

/gcbrun

Copy link
Collaborator

@nehanene15 nehanene15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - let's add this in the README as well then we're good to merge

@ajwelch4
Copy link
Member Author

/gcbrun

@ajwelch4 ajwelch4 changed the title WIP: Add --dry-run option to validate. Add --dry-run option to validate. May 12, 2023
@ajwelch4
Copy link
Member Author

/gcbrun

@ajwelch4 ajwelch4 changed the title Add --dry-run option to validate. feat: Add --dry-run option to validate. May 12, 2023
@ajwelch4 ajwelch4 merged commit 8989350 into GoogleCloudPlatform:develop May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants