Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

column validation: --sum of time zoned timestamp on BigQuery has side effect on other engines #938

Open
nj1973 opened this issue Aug 9, 2023 · 1 comment
Labels
priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@nj1973
Copy link
Contributor

nj1973 commented Aug 9, 2023

This problem affects at least SQL Server and Teradata.

You can use the standard dvt_core_types integration test table to reproduce.

When comparing --sum=col_tstz on SQL Server and BigQuery then we get an epoch seconds mismatch:

  • SQL Server: 259206
  • BigQuery: 280806

However when comparing SQL Server to itself then the value is 280806, which matches BigQuery.

The problem stems from these lines in config_manager.py:

        elif column_type == "timestamp" or column_type == "!timestamp":
            if (
                self.source_client.name == "bigquery"
                or self.target_client.name == "bigquery"
            ):
                calc_func = "cast"
                cast_type = "timestamp"

They are saying that if the source or target is BigQuery then cast both source and target to timestamp before doing the epoch seconds expression. The cast to timestamp is cropping the time zone in SQL Server and Teradata.

First thought was to change the SQL Server cast to first convert to UTC but I also think it is wrong that a BigQuery requirement changes what we do on the other engine. If BigQuery needs a pre-cast to TIMESTAMP then it probably shouldn't be catered for in both source and target queries. Or maybe both thoughts are true.

When resolved we should add col_tstz to sum validation in the test_column_validation_core_types_to_bigquery integration test for SQL Server and Teradata.

@nj1973 nj1973 added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Aug 9, 2023
@nehanene15 nehanene15 added the priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) label Jan 22, 2024
@nj1973
Copy link
Contributor Author

nj1973 commented Jan 29, 2024

I've retested this on the latest branch and it is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants