Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable an 'allowlist' for data type mappings for schema validations #606

Closed
nehanene15 opened this issue Oct 12, 2022 · 2 comments · Fixed by #643
Closed

Enable an 'allowlist' for data type mappings for schema validations #606

nehanene15 opened this issue Oct 12, 2022 · 2 comments · Fixed by #643
Assignees
Labels
good first issue Good issue for new DVT contributors priority: p1 High priority. Fix may be included in the next release. type: feature request 'Nice-to-have' improvement, new feature or different behavior or design.

Comments

@nehanene15
Copy link
Collaborator

There are use cases when users expect a certain data type to be mapped to a different target data type in migrations. For example, when using the Data Transfer Service, users will expect a Redshift non-nullable string to be converted to a BigQuery nullable string data type.

Another example is if users changed the precision/scale from Redshift DECIMAL(12,2) to BigQuery BIGNUMERIC/DECIMAL(38,9).

DVT schema validations should let users provide an allowlist of data type mappings that should be accepted as a successful validation.

ex.

data-validation validate schema -sc redshift -tc bq -tbls schema.table=schema.table --allow-list decimal(12,2):decimal(38,9),string[non-nullable]:string

The allowlist should accept a comma-separated list of type mappings or a file with the mappings.

[--allow-list MAPPING_FILE | MAPPING] Comma-separated list of data type mappings to be considered successful matches between source and target.
@damianmomotgoogle
Copy link

Suggestion - instead of disabling checks completely IMHO it would be better to distinguish errors (schema mismatches which cause loss of data or functionality) from warnings (schema does not match perfectly but data fits and there's no loss in precision).

Examples:

optional -> required: error
required -> optional: warning
int64 -> int32: error
int32 -> int64: warning
DECIMAL(38,9) -> DECIMAL(10,2): error
DECIMAL(10,2) -> DECIMAL(38,9): warning

This way errors still remain as errors but cases in which data fits and still works properly are not silenced completely but displayed as warnings

@nehanene15 nehanene15 added type: feature request 'Nice-to-have' improvement, new feature or different behavior or design. priority: p1 High priority. Fix may be included in the next release. good first issue Good issue for new DVT contributors labels Oct 14, 2022
@kanhaPrayas kanhaPrayas self-assigned this Nov 24, 2022
@nehanene15
Copy link
Collaborator Author

@kanhaPrayas The above suggestion is the ideal route as there is less burden on the user. We will probably need to parse through the string value of data type to determine whether the source to target data type is a warning or error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good issue for new DVT contributors priority: p1 High priority. Fix may be included in the next release. type: feature request 'Nice-to-have' improvement, new feature or different behavior or design.
Projects
None yet
4 participants