-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: Changed result schema 'status' column to 'validation_status' #420
Conversation
/gcbrun |
This change isn't backwards compatible, since it will break users' existing commands since DVT will now try to write to BQ with a different schema. We should catch the BQ load error and raise an exception with a descriptive error message describing that we have updated the schema and the user will need to update that on BigQuery. |
To this point... can we give the customer the actual query to execute in order to perform this upgrade. Since BigQuery doesn't currently support a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per Ryan's comment, let's document the actual query users will have to run to rename this column.
We can add this script under data_validation/samples/bq_utils/update_schema.sql and we can reference this script in the error message.
@nehanene15 Added the script data_validation/samples/bq_utils/update_schema.sh to update bigquery schema from 'status' to 'validation_status' |
/gcbrun |
/gcbrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small changes in bash script + wrong path provided in error message.
samples/bq_utils/update_schema.sh
Outdated
|
||
#PROJECT=pso-kokoro-resources | ||
DEPRECATED_TABLE=pso_data_validator.results_deprecated #--Deprecated results table with column name 'status' | ||
CURRENT_TABLE=pso_data_validator.results #--Current results table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive my ignorance...
Since the -bqrh
flag allows the user to specify the output table in PROJECT_ID.DATASET.TABLE
format... wouldn't it be better to take this an input?
./update_schema.sh <current-bq-result-table>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, although we aren't running the script for them just pointing them to the script in the error message.
Since we already will require the user to specify a env var for the deprecated table name, I think it would require the same amount of user effort to specify 2 env vars versus 1 env var and a script input.
Changed result schema 'status' column to 'validation_status' to resolve duplicate column error.
Closes #417