Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Changed result schema 'status' column to 'validation_status' #420

Merged
merged 20 commits into from
Apr 11, 2022

Conversation

Raniksingh
Copy link
Contributor

Changed result schema 'status' column to 'validation_status' to resolve duplicate column error.
Closes #417

@Raniksingh
Copy link
Contributor Author

/gcbrun

@Raniksingh Raniksingh changed the title Changed result schema 'status' column to 'validation_status' feat : Changed result schema 'status' column to 'validation_status' Apr 4, 2022
@Raniksingh Raniksingh changed the title feat : Changed result schema 'status' column to 'validation_status' feat: Changed result schema 'status' column to 'validation_status' Apr 4, 2022
@nehanene15
Copy link
Collaborator

This change isn't backwards compatible, since it will break users' existing commands since DVT will now try to write to BQ with a different schema. We should catch the BQ load error and raise an exception with a descriptive error message describing that we have updated the schema and the user will need to update that on BigQuery.

@nehanene15 nehanene15 changed the title feat: Changed result schema 'status' column to 'validation_status' BREAKING CHANGE: Changed result schema 'status' column to 'validation_status' Apr 5, 2022
@nehanene15 nehanene15 changed the title BREAKING CHANGE: Changed result schema 'status' column to 'validation_status' feat!: Changed result schema 'status' column to 'validation_status' Apr 5, 2022
@ryanmcdowell
Copy link
Collaborator

To this point... can we give the customer the actual query to execute in order to perform this upgrade. Since BigQuery doesn't currently support a RENAME clause for columns, we should make this as simple as possible for end users. Ideally just executing a provided script or template SQL.

Copy link
Collaborator

@nehanene15 nehanene15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per Ryan's comment, let's document the actual query users will have to run to rename this column.

We can add this script under data_validation/samples/bq_utils/update_schema.sql and we can reference this script in the error message.

@Raniksingh
Copy link
Contributor Author

@nehanene15 Added the script data_validation/samples/bq_utils/update_schema.sh to update bigquery schema from 'status' to 'validation_status'

@Raniksingh
Copy link
Contributor Author

/gcbrun

@Raniksingh
Copy link
Contributor Author

/gcbrun

Copy link
Collaborator

@nehanene15 nehanene15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small changes in bash script + wrong path provided in error message.

data_validation/result_handlers/bigquery.py Outdated Show resolved Hide resolved
samples/bq_utils/update_schema.sh Outdated Show resolved Hide resolved
samples/bq_utils/update_schema.sh Outdated Show resolved Hide resolved
samples/bq_utils/update_schema.sh Outdated Show resolved Hide resolved
samples/bq_utils/update_schema.sh Outdated Show resolved Hide resolved
samples/bq_utils/update_schema.sh Outdated Show resolved Hide resolved

#PROJECT=pso-kokoro-resources
DEPRECATED_TABLE=pso_data_validator.results_deprecated #--Deprecated results table with column name 'status'
CURRENT_TABLE=pso_data_validator.results #--Current results table
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive my ignorance...

Since the -bqrh flag allows the user to specify the output table in PROJECT_ID.DATASET.TABLE format... wouldn't it be better to take this an input?

./update_schema.sh <current-bq-result-table>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, although we aren't running the script for them just pointing them to the script in the error message.

Since we already will require the user to specify a env var for the deprecated table name, I think it would require the same amount of user effort to specify 2 env vars versus 1 env var and a script input.

@Raniksingh Raniksingh merged commit dfcd0d5 into GoogleCloudPlatform:develop Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update result schema 'status' column to 'validation_status' to avoid duplicate column names
4 participants