Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate column: Sum on Oracle and Teradata TIMESTAMP Throws Exception #762

Closed
nj1973 opened this issue Mar 10, 2023 · 4 comments · Fixed by #941
Closed

validate column: Sum on Oracle and Teradata TIMESTAMP Throws Exception #762

nj1973 opened this issue Mar 10, 2023 · 4 comments · Fixed by #941
Assignees
Labels
priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@nj1973
Copy link
Contributor

nj1973 commented Mar 10, 2023

It's fair to day that a SUM on a date based column doesn't make a lot of sense but we support it on other engines and it will often get triggered through columns being provided using *.

Test table:

CREATE TABLE dvt_test.tab_ts
(   id              NUMBER(8) NOT NULL PRIMARY KEY
,   col_datetime    TIMESTAMP(3)
,   col_tstz        TIMESTAMP(3) WITH TIME ZONE
);

INSERT INTO dvt_test.tab_ts VALUES
(1,TIMESTAMP'1970-01-01 00:00:01'
,to_timestamp_tz('1970-01-01 00:00:01 -01:00','YYYY-MM-DD HH24:MI:SS TZH:TZM'));
COMMIT;

Test command:

data-validation validate column \
-sc ora_conn -tc ora_conn \
-tbls=dvt_test.tab_ts --sum='*' --min='*' --max='*'

Exception:

sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-00907: missing right parenthesis
[SQL: SELECT count(:count_1) AS count, sum(t0.id) AS sum__id, sum(t0.epoch_seconds__col_datetime) AS sum__epoch_seconds__col_datetime, sum(t0.epoch_seconds__col_tstz) AS sum__epoch_seconds__col_tstz, min(t0.id) AS min__id, min(t0.col_datetime) AS min__col_datetime, min(t0.col_tstz) AS min__col_tstz, max(t0.id) AS max__id, max(t0.col_datetime) AS max__col_datetime, max(t0.col_tstz) AS max__col_tstz
FROM (SELECT t1.id AS id, t1.col_datetime AS col_datetime, t1.col_tstz AS col_tstz, CAST(EXTRACT(epoch FROM t1.col_datetime) AS NUMBER(19)) AS epoch_seconds__col_datetime, CAST(EXTRACT(epoch FROM t1.col_tstz) AS NUMBER(19)) AS epoch_seconds__col_tstz
FROM dvt_test.tab_ts t1) t0]
[parameters: {'count_1': '*'}]

The problem SQL expressions are:

CAST(EXTRACT(epoch FROM t1.col_datetime) AS NUMBER(19)) AS epoch_seconds__col_datetime,
CAST(EXTRACT(epoch FROM t1.col_tstz) AS NUMBER(19)) AS epoch_seconds__col_tstz

EXTRACT(epoch …) is invalid syntax on Oracle.

When complete we need to enable the integration test for this problem. Search for "issue-762" in tests/system/data_sources/test_oracle.py and tests/system/data_sources/test_teradata.py to action this. Enabling this test is blocked by #759.

@nj1973 nj1973 added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Mar 10, 2023
@nj1973 nj1973 changed the title validate column: Sum on Oracle TIMESTAMP Throws Exception validate column: Sum on Oracle and Teradata TIMESTAMP Throws Exception Mar 10, 2023
@nj1973
Copy link
Contributor Author

nj1973 commented Mar 10, 2023

Same problem on Teradata.

Test table:

CREATE TABLE dvt_test.tab_ts
(   id              NUMBER(8) NOT NULL PRIMARY KEY
,   col_datetime    TIMESTAMP(3)
,   col_tstz        TIMESTAMP(3) WITH TIME ZONE
);

INSERT INTO dvt_test.tab_ts VALUES
(1,TIMESTAMP'1970-01-01 00:00:01'
,CAST('1970-01-01 00:00:01.000-01:00' AS TIMESTAMP(3) WITH TIME ZONE));

Test command:

data-validation validate column \
-sc ora_conn -tc ora_conn \
-tbls=dvt_test.tab_ts --sum='*'

Exception:

03/09/2023 05:21:48 PM-INFO: SELECT count(*) AS "count",
       sum("epoch_seconds__col_datetime") AS "sum__epoch_seconds__col_datetime"
FROM (
  SELECT id, col_int8, col_int16, col_int32, col_int64, col_dec_20, col_dec_38, col_dec_10_2, col_float32, col_float64, col_varchar_30, col_char_2, col_string, col_date, col_datetime, col_tstz,
         unix_timestamp("col_datetime") AS "epoch_seconds__col_datetime"
  FROM udf.dvt_core_types
) t0
Traceback (most recent call last):
  File "/home/user/professional-services-data-validator/env/lib/python3.10/site-packages/pandas/io/sql.py", line 2056, in execute
    cur.execute(*args, **kwargs)
  File "/home/user/professional-services-data-validator/env/lib/python3.10/site-packages/teradatasql/__init__.py", line 705, in execute
    self.executemany (sOperation, None, ignoreErrors)
  File "/home/user/professional-services-data-validator/env/lib/python3.10/site-packages/teradatasql/__init__.py", line 952, in executemany
    raise OperationalError (sErr)
teradatasql.OperationalError: [Version 17.20.0.15] [Session 5886] [Teradata Database] [Error 3706] Syntax error: Data Type "col_datetime" does not match a Defined Type name.

@nehanene15 nehanene15 added the priority: p1 High priority. Fix may be included in the next release. label Mar 28, 2023
@sharangagarwal sharangagarwal self-assigned this May 31, 2023
@nehanene15 nehanene15 added priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. and removed priority: p1 High priority. Fix may be included in the next release. labels Aug 1, 2023
@nehanene15
Copy link
Collaborator

Part of this PR could also include removing timestamp from the default aggregations IMO.
For example, we can support -sum col_ts but doing a -sum '*' should only gather timestamp columns.
We could also support a --wildcard-include-timestamp similar to the current --wildcard-include-string-len flag.

@nj1973
Copy link
Contributor Author

nj1973 commented Aug 8, 2023

Yes, that makes sense. Summing a timestamp is probably not expected so should be ok to be on request instead of default behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
3 participants