Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate row: ORA-00972: identifier is too long error on Oracle 11g #724

Closed
nj1973 opened this issue Feb 15, 2023 · 1 comment · Fixed by #749
Closed

validate row: ORA-00972: identifier is too long error on Oracle 11g #724

nj1973 opened this issue Feb 15, 2023 · 1 comment · Fixed by #749
Assignees

Comments

@nj1973
Copy link
Contributor

nj1973 commented Feb 15, 2023

Prior to version 12.2 Oracle has an identifier length limit of 30 characters.

When running a row validation columns in the SQL projection are aliased with names reflecting the functions that have been applied. Example:

data-validation validate row -sc ora_conn -tc bq_conn -tbls=dvt_test.tab_dec_id=dvt_test.tab_dec_id \
--primary-keys id --concat 'col1'
...
sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-00972: identifier is too long
[SQL: SELECT t0.concat__all, t0.id
FROM (SELECT t1.id AS id, t1.col1 AS col1, t1.cast__col1 AS cast__col1, t1.ifnull__cast__col1 AS ifnull__cast__col1, t1.rstrip__ifnull__cast__col1 AS rstrip__ifnull__cast__col1, t1.upper__rstrip__ifnull__cast__col1 AS upper__rstrip__ifnull__cast__col1, upper__rstrip__ifnull__cast__col1 AS concat__all
FROM (SELECT t2.id AS id, t2.col1 AS col1, t2.cast__col1 AS cast__col1, t2.ifnull__cast__col1 AS ifnull__cast__col1, t2.rstrip__ifnull__cast__col1 AS rstrip__ifnull__cast__col1, upper(t2.rstrip__ifnull__cast__col1) AS upper__rstrip__ifnull__cast__col1
FROM (SELECT t3.id AS id, t3.col1 AS col1, t3.cast__col1 AS cast__col1, t3.ifnull__cast__col1 AS ifnull__cast__col1, rtrim(t3.ifnull__cast__col1) AS rstrip__ifnull__cast__col1
FROM (SELECT t4.id AS id, t4.col1 AS col1, t4.cast__col1 AS cast__col1, coalesce(t4.cast__col1, :param_1) AS ifnull__cast__col1
FROM (SELECT t5.id AS id, t5.col1 AS col1, CAST(t5.col1 AS VARCHAR2(4000 CHAR)) AS cast__col1
FROM dvt_test.tab_dec_id t5) t4) t3) t2) t1) t0]
[parameters: {'param_1': 'DEFAULT_REPLACEMENT_STRING'}]

The identifier length has exceeded 30:

$ echo "upper__rstrip__ifnull__cast__col1" | awk '{print length($1)}'
33

Most other engines (including Oracle 12.2+) have a limit of 128 characters so we are less likely to run into this, although in theory they are all vulnerable.

@nj1973 nj1973 self-assigned this Feb 23, 2023
@nj1973
Copy link
Contributor Author

nj1973 commented Feb 27, 2023

This also affects column validation using aggregate functions.

I've made changes on the associated branch to:

  1. Identify the maximum identifier length for both source and target engines and take the lowest one as the ceiling for the validation.
  2. When prefixing a column alias with an operation, such as sum__, ifnull__ etc, we compare the length of the resulting name with the maximum length.
  3. If the new name breaches the maximum then a simplified alias is used. The simplified alias uses the column position in place of its name.
  4. This has been done for both concat/hash row validation and aggregate column validation.
  5. I've added unit tests for the name shortening logic.

Ideally we could do with adding integration tests for Oracle 11g, I think this can happen outside of this issue.

Example of column validation SQL from new logic:

SELECT count(:count_1) AS count
, max(t0.id) AS max__id
, max(t0.col_dec_long_123456789012345) AS max__dvt_calc_col_1
, max(t0.length__dvt_calc_col_4) AS max__length__dvt_calc_col_4
FROM (
  SELECT t1.id AS id
  , t1.col_dec_long_123456789012345 AS col_dec_long_123456789012345
  , t1.col_dec2_long_123456789012345 AS col_dec2_long_123456789012345
  , t1.col_date_long_1234567890 AS col_date_long_1234567890
  , t1.col_string_long_1234567890 AS col_string_long_1234567890
  , length(t1.col_string_long_1234567890) AS length__dvt_calc_col_4
  FROM dvt_test.tab_long_cols t1
) t0

Note that:

  • length__dvt_calc_col_4 is introduced at depth 1 because the length of length__col_string_long_1234567890 would cross 30 characters
  • The max__ alias is then applied to the length column at depth 0: max__length__dvt_calc_col_4
  • The max__ alias is applied to the decimal column at depth 0 because the length of max__col_dec_long_123456789012345 would cross 30 characters
  • max__id is unaffected because it remains below 30 characters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant