Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize row validation to avoid SELECT * in each level #466

Closed
nehanene15 opened this issue May 4, 2022 · 0 comments · Fixed by #599
Closed

Optimize row validation to avoid SELECT * in each level #466

nehanene15 opened this issue May 4, 2022 · 0 comments · Fixed by #599
Assignees
Labels
priority: p1 High priority. Fix may be included in the next release. type: cleanup An internal cleanup or hygiene concern.

Comments

@nehanene15
Copy link
Collaborator

Currently a row validation generates a query like so:

SELECT *, TO_HEX(SHA256(`concat__all`)) AS `hash__all`
FROM (
  SELECT *,
         ARRAY_TO_STRING([`upper__rstrip__ifnull__cast__id`, `upper__rstrip__ifnull__cast__ts`, `upper__rstrip__ifnull__cast__order_date`], '') AS `concat__all`
  FROM (
    SELECT *,
           upper(`rstrip__ifnull__cast__id`) AS `upper__rstrip__ifnull__cast__id`,
           upper(`rstrip__ifnull__cast__ts`) AS `upper__rstrip__ifnull__cast__ts`,
           upper(`rstrip__ifnull__cast__order_date`) AS `upper__rstrip__ifnull__cast__order_date`
    FROM (
      SELECT *, rtrim(`ifnull__cast__id`) AS `rstrip__ifnull__cast__id`,
             rtrim(`ifnull__cast__ts`) AS `rstrip__ifnull__cast__ts`,
             rtrim(`ifnull__cast__order_date`) AS `rstrip__ifnull__cast__order_date`
      FROM (
        SELECT *,
               IFNULL(`cast__id`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__id`,
               IFNULL(`cast__ts`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__ts`,
               IFNULL(`cast__order_date`, 'DEFAULT_REPLACEMENT_STRING') AS `ifnull__cast__order_date`
        FROM (
          SELECT *, CAST(`id` AS STRING) AS `cast__id`,
                 CAST(`ts` AS STRING) AS `cast__ts`,
                 CAST(`order_date` AS STRING) AS `cast__order_date`
          FROM `pso-kokoro-resources.hivetest.test_timestamp`
        ) t4
      ) t3
    ) t2
  ) t1
) t0

We can optimize the results returned by the query if we get rid of the SELECT * at each level.

@nehanene15 nehanene15 added type: cleanup An internal cleanup or hygiene concern. priority: p1 High priority. Fix may be included in the next release. labels Jun 1, 2022
@nehanene15 nehanene15 reopened this Sep 20, 2022
@nehanene15 nehanene15 self-assigned this Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 High priority. Fix may be included in the next release. type: cleanup An internal cleanup or hygiene concern.
Projects
None yet
2 participants