Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate-table-partitions between Teradata and BQ does not work with uppercase primary key names #1136

Closed
sundar-mudupalli-work opened this issue May 16, 2024 · 0 comments · Fixed by #1142
Assignees
Labels
priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@sundar-mudupalli-work
Copy link
Contributor

sundar-mudupalli-work commented May 16, 2024

Hi,

It appears that generate-table-partitions does not work with primary key names in Teradata that are in upper case. Teradata's column names are not case sensitive.

The DDL that was used was

CREATE SET TABLE udfs.issue1136, FALLBACK (
    N_NATIONKEY INTEGER NOT NULL,
    N_NAME CHAR(25) CHARACTER SET LATIN CASESPECIFIC NOT NULL,
    N_REGIONKEY INTEGER NOT NULL,
    N_COMMENT CHAR(152) CHARACTER SET LATIN CASESPECIFIC NOT NULL )
UNIQUE PRIMARY INDEX ( N_NATIONKEY);

When we try the following command, it works fine data-validation generate-table-partitions -sc teradata -tc teradata -tbls=udfs.issue1136 -pk=N_NATIONKEY -hash '*' -pn 3 -cdir .

The real purpose is to compare two tables from different databases - so we try to compare teradata with BigQuery table defined as follows:

CREATE TABLE pso_data_validator.issue1136 (
  n_nationkey INT64,
  n_name STRING,
  n_regionkey INT64,
  n_comment STRING
);

If we use the command data-validation generate-table-partitions -sc teradata -tc bq -tbls=udfs.issue1136=pso_data_validator.issue1136 -pk=N_NATIONKEY -hash '*' -pn 3 -cdir . you get the error:

    target_count = target_partition_row_builder.get_count()
  File "/home/user/professional-services-data-validator/env/lib/python3.10/site-packages/data_validation/query_builder/partition_row_builder.py", line 65, in get_count
    return self.query[self.primary_keys].count().execute()
...
ibis.common.exceptions.IbisTypeError: Column 'N_NATIONKEY' is not found in table. Existing columns: 'n_nationkey', 'n_name', 'n_regionkey', 'n_comment'.

However if you invert the case and try data-validation generate-table-partitions -sc teradata -tc bq -tbls=udfs.issue1136=pso_data_validator.issue1136 -pk=n_nationkey -hash '*' -pn 3 -cdir . you get the error:

    source_count = source_partition_row_builder.get_count()
  File "/home/user/professional-services-data-validator/env/lib/python3.10/site-packages/data_validation/query_builder/partition_row_builder.py", line 65, in get_count
    return self.query[self.primary_keys].count().execute()
....
ibis.common.exceptions.IbisTypeError: Column 'n_nationkey' is not found in table. Existing columns: 'N_NATIONKEY', 'N_NAME', 'N_REGIONKEY', 'N_COMMENT'.

In the first case, ibis is not able to execute the count operation since the BQ column names are all in lower case. In the second case, ibis is not able to execute the count operation since the Teradata column names are all in upper case. So either case you use - it does not work. We try to get the count to make sure the table sizes are within 10% of each other - if not the partition split is not going to be very even.

I checked with a table defined with lower case column names in Oracle and it return ibis column names in lower case. A bit more investigation is needed.

Suggestions on how to fix ?

@sundar-mudupalli-work sundar-mudupalli-work changed the title generate-table-partitions does not work with uppercase primary key names generate-table-partitions between Teradata and BQ does not work with uppercase primary key names May 16, 2024
@helensilva14 helensilva14 added priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels May 16, 2024
@nehanene15 nehanene15 self-assigned this May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
Status: Done 🚀
3 participants