Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Protect column and row validation calculated column names from Oracle 30 character identifier limit #749

Merged
merged 3 commits into from
Mar 14, 2023

Conversation

nj1973
Copy link
Contributor

@nj1973 nj1973 commented Feb 27, 2023

This protects all engines not just Oracle 12.1 and below. It's just that Oracle 12.1 and below have a 30 character limit which we almost always cross for row validation. Other engines have a 128 character identifier limit and therefore are less likely to be breached.

This PR introduces the following change:

  1. Identify the maximum identifier length for both source and target engines and take the lowest one as the ceiling for the validation.
  2. When prefixing a column alias with an operation, such as sum__, ifnull__ etc, we compare the length of the resulting name with the maximum length.
  3. If the new name breaches the maximum then a simplified alias is used. The simplified alias uses the column position in place of its name.
  4. This has been done for both concat/hash row validation and aggregate column validation.
  5. I've added unit tests for the name shortening logic.
  6. Ideally we could do with adding integration tests for Oracle 11g, I think this can happen outside of this issue.

Example of column validation SQL from new logic:

SELECT count(:count_1) AS count
, max(t0.id) AS max__id
, max(t0.col_dec_long_123456789012345) AS max__dvt_calc_col_1
, max(t0.length__dvt_calc_col_4) AS max__length__dvt_calc_col_4
FROM (
  SELECT t1.id AS id
  , t1.col_dec_long_123456789012345 AS col_dec_long_123456789012345
  , t1.col_dec2_long_123456789012345 AS col_dec2_long_123456789012345
  , t1.col_date_long_1234567890 AS col_date_long_1234567890
  , t1.col_string_long_1234567890 AS col_string_long_1234567890
  , length(t1.col_string_long_1234567890) AS length__dvt_calc_col_4
  FROM dvt_test.tab_long_cols t1
) t0

Note that:

  • length__dvt_calc_col_4 is introduced at depth 1 because the length of length__col_string_long_1234567890 would cross 30 characters
  • The max__ alias is then applied to the length column at depth 0: max__length__dvt_calc_col_4
  • The max__ alias is applied to the decimal column at depth 0 because the length of max__col_dec_long_123456789012345 would cross 30 characters
  • max__id is unaffected because it remains below 30 characters

The same logic would be seen in a row validation too but more functions are applied so we're more likely to see dvt_calc_col columns. It's worth pointing out that this is only visible in the SQL statement, the end report still uses proper column names.

@nj1973 nj1973 linked an issue Feb 27, 2023 that may be closed by this pull request
@nj1973
Copy link
Contributor Author

nj1973 commented Mar 6, 2023

/gcbrun

1 similar comment
@nj1973
Copy link
Contributor Author

nj1973 commented Mar 6, 2023

/gcbrun

Copy link
Collaborator

@nehanene15 nehanene15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nj1973 nj1973 merged commit 89413c1 into develop Mar 14, 2023
@nj1973 nj1973 deleted the fix/issue-724-identifier-too-long branch March 14, 2023 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

validate row: ORA-00972: identifier is too long error on Oracle 11g
3 participants