Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Retroactively add granularity param to charts #12960

Merged
merged 2 commits into from
Feb 11, 2021

Conversation

etr2460
Copy link
Member

@etr2460 etr2460 commented Feb 5, 2021

SUMMARY

We ran into some data accuracy issues in our environment where time range filters wouldn't apply to certain charts (like Big Number) when set on a dashboard. We determined that this was because these charts somehow didn't have a granularity param set on them, so the time range wasn't appropriately set on the chart query. This was especially dangerous, because the chart showed as applying the filter, but it wasn't actually applied to the query.

The fix is this db migration that adds the granularity param onto charts that should have one but don't.

TEST PLAN

Before the migration, see big number charts without the granularity or granularity_sqla param not filter by time range correctly.

Run the migration, and see the dashboard perform the filter appropriately.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

to: @john-bodley @ktmud @graceguo-supercat @villebro
cc: @junlincc

@etr2460 etr2460 added the risk:db-migration PRs that require a DB migration label Feb 5, 2021
- Find all charts without a granularity or granularity_sqla param.
- Get the dataset that backs the chart.
- If the dataset has the main dttm column set, use it.
- Otherwise, find all the dttm columns in the dataset and use the first one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this mention that this mimics the behavior of the frontend?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, done

if "granularity" in params or "granularity_sqla" in params:
continue

table = session.query(SqlaTable).get(slc.datasource_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this performant? I wonder if the join should be part of the slice query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ended up only altering 150 slices in our DB (and only got to this step for about 4k) so i'm not sure performance matters that much. It's a trade off between doing the join with a much larger number of slices (200k+) vs. waiting until we get to this step. idk

@etr2460 etr2460 force-pushed the erik-ritter--add-granularity-migration branch from 05b1ea9 to 12210ca Compare February 5, 2021 04:37
@codecov-io
Copy link

codecov-io commented Feb 5, 2021

Codecov Report

Merging #12960 (30660ce) into master (9982fde) will decrease coverage by 2.43%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #12960      +/-   ##
==========================================
- Coverage   69.14%   66.70%   -2.44%     
==========================================
  Files        1025      491     -534     
  Lines       48767    28888   -19879     
  Branches     5188        0    -5188     
==========================================
- Hits        33718    19269   -14449     
+ Misses      14915     9619    -5296     
+ Partials      134        0     -134     
Flag Coverage Δ
cypress ?
javascript ?
python 66.70% <0.00%> (-0.92%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...43f2fdb_add_granularity_to_charts_where_missing.py 0.00% <0.00%> (ø)
superset/sql_validators/postgres.py 50.00% <0.00%> (-50.00%) ⬇️
superset/views/database/views.py 62.69% <0.00%> (-24.88%) ⬇️
superset/dataframe.py 91.66% <0.00%> (-8.34%) ⬇️
superset/databases/commands/create.py 83.67% <0.00%> (-8.17%) ⬇️
superset/databases/commands/update.py 85.71% <0.00%> (-8.17%) ⬇️
superset/sql_validators/base.py 93.33% <0.00%> (-6.67%) ⬇️
superset/db_engine_specs/sqlite.py 90.62% <0.00%> (-6.25%) ⬇️
superset/db_engine_specs/base.py 79.85% <0.00%> (-6.15%) ⬇️
superset/db_engine_specs/presto.py 82.25% <0.00%> (-5.63%) ⬇️
... and 577 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9982fde...30660ce. Read the comment docs.

@villebro
Copy link
Member

villebro commented Feb 5, 2021

@etr2460 I noticed a similar weirdness when I was fixing a regression in the table chart. For some reason the chart looked different in the dashboard compared to the Explore view. When looking at the metadata I noticed that the chart was missing a control value that had been added to the control panel after the chart had been created. Upon closer inspection it turned out Explore merges the chart metadata on top of the default control values, but dashboard doesn't. I didn't yet have time to look into this more closely, but I believe making sure the metadata flow is the same in Dashboard and Explore view might solve this problem, potentially making the migration unnecessary.

Copy link
Member

@ktmud ktmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a safe migration if the query can be tuned to be more performant. Unifying form_data and control defaults merging logics between Dashboard and Explore might take a lot more time than relatively straightforward migration so I'll vote +1 on moving this forward.


slices_changed = 0

for slc in session.query(Slice).filter(Slice.datasource_type == "table").all():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query will fetch all chart slices with SQLA datasource and run JSON parse in Python. Could we filter out only those didn't have granularity or granularity_sqla instead?

Suggested change
for slc in session.query(Slice).filter(Slice.datasource_type == "table").all():
for slc in (
session.query(Slice)
.filter(and_(
Slice.datasource_type == "table",
not Slice.params.like('%"granularity%'))
))
.yield_per(500)
):

Not that matters in practicality, but I'd also try to stream things whenever I try to fetch an unknown number of all results (.yield_per(500)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you're saying do a plain text filter first to remove most of the slices that aren't eligible, and only do the json parse on what's remaining? It looks janky, but it should work, will update

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense - no point in pulling in slices that aren't applicable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, since we don't need complex regexp here so a simple text match should work.

@etr2460 etr2460 force-pushed the erik-ritter--add-granularity-migration branch from 12210ca to 9c6ba17 Compare February 5, 2021 17:08
@villebro
Copy link
Member

villebro commented Feb 8, 2021

@etr2460 I noticed a similar weirdness when I was fixing a regression in the table chart. For some reason the chart looked different in the dashboard compared to the Explore view. When looking at the metadata I noticed that the chart was missing a control value that had been added to the control panel after the chart had been created. Upon closer inspection it turned out Explore merges the chart metadata on top of the default control values, but dashboard doesn't. I didn't yet have time to look into this more closely, but I believe making sure the metadata flow is the same in Dashboard and Explore view might solve this problem, potentially making the migration unnecessary.

I looked into this, and it turns out default values are in fact applied on the Chart form data on the Dashboard similarly as on the Explore view. The control panel on the Table chart was just setting the value of the queryModel control (which had the default value null) while rendering the control panel based on other form data.

Going forward we should potentially make default values "smarter", by making it possible to introduce hooks that return defaults based on other context. In the case of queryMode in table model, it would be inferred from the other formData, and in the case of granularity, it could default to main_dttm_col in the dataset if missing/unset.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ktmud
Copy link
Member

ktmud commented Feb 10, 2021

Close and reopen to trigger docker-build and python-lint (3.7):

image

@ktmud ktmud closed this Feb 10, 2021
@ktmud ktmud reopened this Feb 10, 2021
@villebro
Copy link
Member

FYI a migration was merged today, so I believe we need to update the downgrade revision id. #13052

@etr2460
Copy link
Member Author

etr2460 commented Feb 11, 2021

good call, updating now. And I agree @villebro there's some weirdness going on with passing parameters the same way between dashboard and explore. This might not be a long term fix, but more so fixes weirdness that's been cropping up currently

@etr2460 etr2460 merged commit a6df284 into apache:master Feb 11, 2021
@etr2460 etr2460 deleted the erik-ritter--add-granularity-migration branch February 11, 2021 21:52
amitmiran137 pushed a commit to nielsen-oss/superset that referenced this pull request Feb 14, 2021
* fix: Retroactively add granularity param to charts

* Update down revision
amitmiran137 pushed a commit to nielsen-oss/superset that referenced this pull request Feb 14, 2021
* master: (30 commits)
  refactor(native-filters): decouple params from filter config modal (first phase) (apache#13021)
  fix(native-filters): set currentValue null when empty (apache#13000)
  Custom superset_config.py + secret envs (apache#13096)
  Update http error code from 400 to 403 (apache#13061)
  feat(native-filters): add storybook entry for select filter (apache#13005)
  feat(native-filters): Time native filter (apache#12992)
  Force pod restart on config changes (apache#13056)
  feat(cross-filters): add cross filters (apache#12662)
  fix(explore): Enable selecting an option not included in suggestions (apache#13029)
  Improves RTL configuration (apache#13079)
  Added a note about the ! prefix for breaking changes to CONTRIBUTING.md (apache#13083)
  chore: lock down npm to v6 (apache#13069)
  fix: API tests, make them possible to run independently again (apache#13076)
  fix: add config to disable dataset ownership on the old api (apache#13051)
  add required * indicator to message content/notif method (apache#12931)
  fix: Retroactively add granularity param to charts (apache#12960)
  fix(ci): multiline regex in change detection (apache#13075)
  feat(style): hide dashboard header by url parameter (apache#12918)
  fix(explore): pie chart label bugs (apache#13052)
  fix: Disabled state button transition time (apache#13008)
  ...
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels risk:db-migration PRs that require a DB migration size/L 🚢 1.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants