Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support aggregate validation on string columns with length(string_col) #424

Closed
nehanene15 opened this issue Apr 5, 2022 · 1 comment · Fixed by #430
Closed

Support aggregate validation on string columns with length(string_col) #424

nehanene15 opened this issue Apr 5, 2022 · 1 comment · Fixed by #430
Labels
priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) type: feature request 'Nice-to-have' improvement, new feature or different behavior or design.

Comments

@nehanene15
Copy link
Collaborator

DVT should support column level aggregates (sum, min, max, avg) on the length of string columns. This is supported via calculated fields in a YAML config, but it should also be supported through the CLI.

Running a '*' aggregation should run over all numeric columns.
data-validation validate column -sc my_bq_conn -tc my_bq_conn -tbls pso-kokoro-resources.hivetest.mascot -sum '*'

But specifying a certain string column name should implicitly convert run the aggregation over the length(string_col)
data-validation validate column -sc my_bq_conn -tc my_bq_conn -tbls pso-kokoro-resources.hivetest.mascot -sum name

@nehanene15 nehanene15 added type: feature request 'Nice-to-have' improvement, new feature or different behavior or design. priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. labels Apr 5, 2022
@nehanene15 nehanene15 assigned nehanene15 and unassigned nehanene15 Apr 5, 2022
@ryanmcdowell
Copy link
Collaborator

Perhaps adding an additional flag could also help:

--wildcard-include-string-len

Thinking that it may be simpler to run a single validation run for cases which you want to do both numeric and string length instead of separate runs.

@nehanene15 nehanene15 changed the title Support aggregate validation on string columns with length(string_col) Support aggregate validation on string/array columns with length(string_col) Apr 5, 2022
@nehanene15 nehanene15 changed the title Support aggregate validation on string/array columns with length(string_col) Support aggregate validation on string columns with length(string_col) Apr 6, 2022
@nehanene15 nehanene15 added priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) and removed priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. labels Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) type: feature request 'Nice-to-have' improvement, new feature or different behavior or design.
Projects
None yet
2 participants