Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom-query validations broken on Hive backend #1162

Closed
nehanene15 opened this issue Jun 6, 2024 · 2 comments · Fixed by #1164
Closed

custom-query validations broken on Hive backend #1162

nehanene15 opened this issue Jun 6, 2024 · 2 comments · Fixed by #1164
Assignees
Labels
priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@nehanene15
Copy link
Collaborator

nehanene15 commented Jun 6, 2024

When running custom-query column or row validations, the query generated is invalid. It appends a "t0" prefix to the column names which creates an invalid query.

Command to reproduce:

data-validation -v validate custom-query column -sc hive -tc hive -sq "select id, name as count_name from default.mascot" -tq "select id, name as count_name from default.mascot"

Query generated:

SELECT count(1) AS `count`
FROM (
  SELECT t1.`t0.id`, t1.`t0.name`
  FROM (
    select id, name from default.mascot
  ) t1
) t0

Error: impala.error.HiveServer2Error: Error while compiling statement: FAILED: SemanticException [Error 10002]: Line 3:12 Invalid column reference 't0.id'

We should add a custom query integration test for Hive as part of this PR.

@nehanene15 nehanene15 added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. labels Jun 6, 2024
@nehanene15
Copy link
Collaborator Author

Issue stems here: https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/data_validation/clients.py#L146

If I print (iq.columns), it returns ['t0.id', 't0.name'] when it should be ['id', 'name']

@Raniksingh
Copy link
Contributor

Raniksingh commented Jun 6, 2024

Possible fix is to remove t0 by adding below line. prefix which is getting generate at this function

cur.description = [(x[0].replace('t0.', '', 1), *x[1:]) for x in cur.description]

Initial test worked fine. But need thorough testing.
cc - @piyushsarraf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
3 participants