Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(snowflake): opt-in denormalization of column names #24982

Merged
merged 6 commits into from
Aug 15, 2023

Conversation

villebro
Copy link
Member

@villebro villebro commented Aug 14, 2023

SUMMARY

The PR #24471, which meant to harmonize column naming for Oracle-like databases like Snowflake, caused issues for deployments that were relying on the current behavior of normalizing column names for physical datasets. This PR changes adds a field normalize_columns to the Dataset/SQLA Table models. This defaults to False for new datasets, but for old datsets, this is set to True via a db migration to ensure we don't break existing datasets.

For existing datasets, "Normalize columns" is checked:
image
When checked, the behavior is consistent with how it was previously, i.e. physical datasets on Snowflake have normalized column names:
image

For new datasets, the checkbox is unchecked:
image
In this case, a physical dataset on Snowflake will denormalize the columns, usually showing them as UPPERCASE:
image

This means, that for new datasets, column names are no longer normalized, unless the flag is checked.

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@villebro villebro requested a review from a team as a code owner August 14, 2023 17:57
@villebro villebro force-pushed the villebro/snowflake-normalize branch from 254921a to 1fa6d86 Compare August 14, 2023 17:58
@villebro villebro force-pushed the villebro/snowflake-normalize branch 2 times, most recently from 87fed4b to 04575d3 Compare August 14, 2023 18:30
@villebro villebro force-pushed the villebro/snowflake-normalize branch from 04575d3 to c7a7656 Compare August 14, 2023 18:54
def upgrade():
op.add_column(
"tables",
sa.Column("normalize_columns", sa.Boolean(), nullable=True, default=False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between NULL and FALSE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley they're essentially the same. Would you prefer I change it to just nullable=True without a default value, or just have the default value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only need two states, then I would stick with TRUE and FALSE, i.e., non-nullable, unless there's a performance or storage cost for using FALSE rather than NULL—which likely will be the predominant value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems SQLAlchemy is slightly flaky when it comes to assigning default values with the NULL constraint in place. I dug around, and found that the is_sqllab_viz flag on the SqlaTable model is expected to work similarly, and there the migration also had to allow for nullable=True. So reverting back to that.

@villebro villebro force-pushed the villebro/snowflake-normalize branch 2 times, most recently from 4f5c21e to 232f9b4 Compare August 14, 2023 23:38
@@ -105,6 +105,7 @@ def test_external_metadata_by_name_for_physical_table(self):
"database_name": tbl.database.database_name,
"schema_name": tbl.schema,
"table_name": tbl.table_name,
"normalize_columns": tbl.normalize_columns,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests that create a dataset with/without this value to check for api backwards compatibility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll do that 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@villebro villebro force-pushed the villebro/snowflake-normalize branch 2 times, most recently from 2765dca to d3d8655 Compare August 14, 2023 23:57
@villebro villebro force-pushed the villebro/snowflake-normalize branch from d3d8655 to ddee4b3 Compare August 15, 2023 02:12
@villebro villebro force-pushed the villebro/snowflake-normalize branch 4 times, most recently from 1a715e6 to fc593d2 Compare August 15, 2023 14:42
@villebro villebro force-pushed the villebro/snowflake-normalize branch from fc593d2 to 114e867 Compare August 15, 2023 15:02
Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the fix @villebro!

Copy link
Member

@eschutho eschutho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@villebro villebro merged commit f94dc49 into apache:master Aug 15, 2023
29 checks passed
@villebro villebro deleted the villebro/snowflake-normalize branch August 16, 2023 01:21
@michael-s-molina michael-s-molina added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Aug 16, 2023
jinghua-qa pushed a commit to preset-io/superset that referenced this pull request Aug 16, 2023
@mistercrunch mistercrunch added 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants