Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate column: sum, min, max false success for numeric values with precision > 16 #754

Closed
nj1973 opened this issue Mar 2, 2023 · 2 comments
Labels
priority: p1 High priority. Fix may be included in the next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@nj1973
Copy link
Contributor

nj1973 commented Mar 2, 2023

Test BigQuery tables:

create table `dvt_test.tab_long_dec1` (
  id int64 not null
, col_16 numeric(16)
, col_17 numeric(17)
, col_18 numeric(18));
insert into dvt_test.tab_long_dec1 values
(1,NUMERIC '1234567890123456',NUMERIC '12345678901234567',NUMERIC '123456789012345678');
create table dvt_test.tab_long_dec2 (
  id int64 not null
, col_16 numeric(16)
, col_17 numeric(17)
, col_18 numeric(18));
insert into dvt_test.tab_long_dec2 values
(1,NUMERIC '1234567890123456'+1,NUMERIC '12345678901234567'+1,NUMERIC '123456789012345678'+1);

Note that tab_long_dec2 data differs from tab_long_dec1. The value in col_16, col_27 and col_18 has been incremented by 1.
Therefore sum and max comparison of these columns should fail.

col_16

data-validation validate column -sc bq -tc bq \
-tbls=dvt_test.tab_long_dec1=dvt_test.tab_long_dec2 \
-sum=col_16 -max=col_16

Correct behaviour, all columns have validation_status="fail"

col_17

data-validation validate column -sc bq -tc bq \
-tbls=dvt_test.tab_long_dec1=dvt_test.tab_long_dec2 \
-sum=col_17 -max=col_17

Incorrect behaviour, sum__col_17 and max__col_17 have validation_status="success". This is wrong.

col_18

data-validation validate column -sc bq -tc bq \
-tbls=dvt_test.tab_long_dec1=dvt_test.tab_long_dec2 \
-sum=col_18 -max=col_18

Incorrect behaviour, sum__col_18 and max__col_18 have validation_status="success". This is wrong.

I suspect this relates to code in data_validation/combiner.py that casts all values to float64. 64 bit float is a lossy data type for large values.
I saw the same issue when testing PostgreSQL too, I believe the problem is not engine specific.

@nj1973
Copy link
Contributor Author

nj1973 commented Mar 2, 2023

We should also add an integration test for this.

@nehanene15 nehanene15 added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 High priority. Fix may be included in the next release. labels Mar 28, 2023
@sharangagarwal sharangagarwal self-assigned this May 31, 2023
@sharangagarwal sharangagarwal removed their assignment Jun 8, 2023
@nj1973
Copy link
Contributor Author

nj1973 commented Jul 28, 2023

Closed by fix for issue 900, duplicate issues.

@nj1973 nj1973 closed this as completed Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 High priority. Fix may be included in the next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants