Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(dashboard): Virtualization POC #21438

Merged
merged 7 commits into from
Oct 11, 2022

Conversation

kgabryje
Copy link
Member

@kgabryje kgabryje commented Sep 12, 2022

SUMMARY

This PR is a proof of concept for dashboard virtualization.
Due to a performance issues on large dashboards (60+ charts in a single tab/view), this PR introduces virtualization in an attempt to reduce the number of elements (DOM nodes) rendered at once.
In the initial state, we render only the charts in current viewport and charts max 1 view height away from current viewport.
When user scrolls down, charts that cross the 1 view height margin get rendered too.
We remove charts from DOM tree when they are at least 4 view heights away from current viewport.
Those values were chosen arbitrarily/experimentally to provide good middle-ground between memory efficiency and good user experience when scrolling up/down.

To sum it up: if chart is not rendered and it's less than 1 view height away from viewport - render it; if chart is rendered and it's more than 4 view heights away from viewport - remove it from DOM

Rerendering charts do NOT trigger new queries - we reuse data fetched on dashboard mount and only redraw the charts.

As of now, this feature is in proof-of-concept faze and should be considered as experimental. It's hidden behind a feature flag DASHBOARD_VIRTUALIZATION

Virtualization is disabled for bot workers (like Celery) to ensure that reports and alerts work correctly

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Screen.Recording.2022-10-09.at.18.45.21.mov

TESTING INSTRUCTIONS

  1. Enable FF DASHBOARD_VIRTUALIZATION
  2. Open a dashboard with a lot of charts
  3. Scroll up and down - verify that only the charts in current viewport (or close to it - 100% margin) are rendered. The rest of them displays a spinner and loads only when user scroll them into view
  4. Verify that charts render when user scrolls, but do not send new queries to the backend

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags: DASHBOARD_VIRTUALIZATION
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Sep 12, 2022

Codecov Report

Merging #21438 (d169d72) into master (f58227a) will decrease coverage by 0.01%.
The diff coverage is 22.72%.

❗ Current head d169d72 differs from pull request most recent head 67052ff. Consider uploading reports for the commit 67052ff to get more accurate results

@@            Coverage Diff             @@
##           master   #21438      +/-   ##
==========================================
- Coverage   66.84%   66.83%   -0.02%     
==========================================
  Files        1800     1802       +2     
  Lines       68951    68972      +21     
  Branches     7335     7342       +7     
==========================================
+ Hits        46092    46096       +4     
- Misses      20965    20980      +15     
- Partials     1894     1896       +2     
Flag Coverage Δ
hive 52.92% <ø> (ø)
javascript 53.14% <22.72%> (-0.03%) ⬇️
mysql 78.24% <ø> (ø)
postgres 78.31% <ø> (ø)
presto 52.82% <ø> (ø)
python 81.45% <ø> (ø)
sqlite 76.79% <ø> (ø)
unit 51.04% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ackages/superset-ui-core/src/utils/featureFlags.ts 100.00% <ø> (ø)
.../src/dashboard/components/gridComponents/Chart.jsx 55.04% <ø> (ø)
...ashboard/components/gridComponents/ChartHolder.tsx 88.88% <ø> (ø)
...tend/src/utils/isDashboardVirtualizationEnabled.ts 0.00% <0.00%> (ø)
superset/config.py 91.63% <ø> (ø)
superset/views/base.py 75.70% <ø> (ø)
...nd/src/dashboard/components/gridComponents/Row.jsx 51.21% <17.64%> (-23.79%) ⬇️
superset-frontend/src/components/Chart/Chart.jsx 51.66% <50.00%> (-0.88%) ⬇️
superset-frontend/src/utils/isBot.ts 100.00% <100.00%> (ø)
...nts/controls/DateFilterControl/DateFilterLabel.tsx 41.41% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kgabryje
Copy link
Member Author

/testenv up FEATURE_DASHBOARD_VIRTUALIZATION

@github-actions
Copy link
Contributor

@kgabryje Ephemeral environment spinning up at http://34.216.203.189:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@hushaoqing
Copy link
Contributor

Love this feature.
One suggestion: from my experience, it would be better if it didn't re-render when scrolling back.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, super nice feature! I kinda agree with @hushaoqing that leaving the charts rendered if they've already been rendered would be nicer. But since this is behind a FF we can probably iterate on these details in forthcoming PRs if there is a good reason not to make that change now.

@michael-s-molina
Copy link
Member

We currently have react-virtualized and react-window in our dependencies that provide virtualization. react-window is the lighter-weight alternative. Can we use one of them for this problem?

@kgabryje
Copy link
Member Author

Thank you for the feedback!
@hushaoqing @villebro Removing charts after scrolling them out of view was a conscious decision in order to reduce the number of nodes in DOM tree. However, I do agree that seeing the loading spinner all the time might be annoying, especially for the charts that take longer to render. It's very easy to change that behaviour, I'll do it later today. If we get some opposite feedback (charts should be removed), I can introduce another feature flag to make it configurable.

@michael-s-molina I considered using react-window, but I don't think it's applicable in this case. react-window expects a clear list of elements to render, with their heights and widths. The logic of rendering the dashboard is more complex - here we have DashboardComponents recursively rendering other DashboardComponents.
Moreover, even if we managed to restructure dashboard rendering logic to use react-window, that would mean that we not only delay drawing the chart, but also running chart queries - as it's the src/components/Chart.jsx component that handles running queries and we'd not render it until it's in view.
I think in this case the "manual" implementation of virtualization with IntersectionObserver is a better choice

@michael-s-molina
Copy link
Member

@hushaoqing @villebro Removing charts after scrolling them out of view was a conscious decision in order to reduce the number of nodes in DOM tree. However, I do agree that seeing the loading spinner all the time might be annoying, especially for the charts that take longer to render. It's very easy to change that behaviour, I'll do it later today. If we get some opposite feedback (charts should be removed), I can introduce another feature flag to make it configurable.

One reason to render only visible items is memory footprint. Many virtualization libraries use that approach. That being said, may I suggest using pre-defined values for the feature flag instead of creating a new one? Like this:

DASHBOARD_VIRTUALIZATION = NONE | VIEWPORT | PAGINATED
NONE = No virtualization is applied
VIEWPORT = Only elements in the viewport are kept in the DOM
PAGINATED = Elements are added to the DOM when they enter the viewport and they are not removed

@kgabryje
Copy link
Member Author

One reason to render only visible items is memory footprint. Many virtualization libraries use that approach. That being said, may I suggest using pre-defined values for the feature flag instead of creating a new one? Like this:

DASHBOARD_VIRTUALIZATION = NONE | VIEWPORT | PAGINATED NONE = No virtualization is applied VIEWPORT = Only elements in the viewport are kept in the DOM PAGINATED = Elements are added to the DOM when they enter the viewport and they are not removed

Do we have an established pattern for 3-state feature flags?

@rusackas
Copy link
Member

Do we have an established pattern for 3-state feature flags?

I'm not aware of one, and it might be worth mentioning that a lot of companies use feature flag manager platforms that prefer to work (or only work) with booleans. Booleans feel safer.

Loving this feature, btw. I'm getting weird performance issues (i.e. the browser freezing) but i'll try it again in a bit and see if it's just a fluke.

@kgabryje
Copy link
Member Author

I'm getting weird performance issues (i.e. the browser freezing) but i'll try it again in a bit and see if it's just a fluke.

If you're getting perf issues with virtualization, but not without it, that might mean that the intersection observer is too expensive to be put on every chart. If that's the case, we might want to put it on Row component instead and propagate the visibility prop down to charts...

@kgabryje
Copy link
Member Author

Do we have an established pattern for 3-state feature flags?

I'm not aware of one, and it might be worth mentioning that a lot of companies use feature flag manager platforms that prefer to work (or only work) with booleans. Booleans feel safer.

I refactored it like this: DASHBOARD_VIRTUALIZATION: Literal["NONE", "VIEWPORT", "PAGINATED"] = "NONE". It's available in redux state as common.conf.DASHBOARD_VIRTUALIZATION. We use a similar pattern with another config option: DASHBOARD_AUTO_REFRESH_MODE: Literal["fetch", "force"] = "force". However, I'm also worried about potential problems with feature flag managers...

@michael-s-molina
Copy link
Member

I'm not aware of one, and it might be worth mentioning that a lot of companies use feature flag manager platforms that prefer to work (or only work) with booleans. Booleans feel safer.

If that is a concern, a possible approach would be to treat it like a configuration instead of a feature flag, similar to CACHE_TYPE which defaults to null but can be overridden with different cache backends. This is not a blocker for me, I'm just trying to avoid two feature flags for the same feature.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Sep 13, 2022
@kgabryje
Copy link
Member Author

kgabryje commented Sep 13, 2022

@rusackas @michael-s-molina Updated the PR:

  1. Make DASHBOARD_VIRTUALIZATION an enum config instead of a feature flag
  2. VIEWPORT mode - remove charts that are not in view; PAGINATED - persist the already rendered charts; NONE - virtualization disabled
  3. Move IntersectionObserver to Row. Now we'll have much fewer observers - 1 per row instead of 1 per chart

@kgabryje
Copy link
Member Author

Unfortunately I can't start a testenv with virtualization since now it's a config instead of a feature flag, so if you want to test manually, you need to pull my branch to your local setup

@stephenLYZ
Copy link
Member

Nice feature. Another case to consider is about color consistency. But this scenario is similar to the hidden tab, where the charts are lazy renderer, and this case will be addressed in another PR. cc @geido

@mayurnewase
Copy link
Contributor

Awesome feature, tested locally.

@michael-s-molina
Copy link
Member

michael-s-molina commented Sep 14, 2022

@kgabryje @rusackas Thinking more carefully about the problem, I think we could dynamically choose the strategy (viewport or paginated) based on the number of charts that needs to be rendered. This way we would have the best of both strategies and avoid adding another feature flag or configuration.

Comment on lines +20 to +23
export enum DASHBOARD_VIRTUALIZATION_MODE {
NONE = 'NONE',
VIEWPORT = 'VIEWPORT',
PAGINATED = 'PAGINATED',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm always nagging about this 😄 But I think we should stick to PascalCase enums in our TS codebase: https://www.typescriptlang.org/docs/handbook/enums.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on PascalCase. In the latest graphql-codegen CLI, it'd even force the backend enums to be PascalCase by default.

@geido
Copy link
Member

geido commented Sep 16, 2022

Nice feature. Another case to consider is about color consistency. But this scenario is similar to the hidden tab, where the charts are lazy renderer, and this case will be addressed in another PR. cc @geido

Good catch @stephenLYZ. I think we can use the same approach we would use for the tabs. We would be catching series as they appear and either apply an existing color if that was seen before or assign a new color from the scheme and persist it.

Copy link
Member

@geido geido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! Just one preference from me, I think the options VIEWPORT and PAGINATED are not super clear at first glance. If we set the charts to load on scroll I think LAZY might be a more appropriate keyword.

@michael-s-molina
Copy link
Member

Love it! Just one preference from me, I think the options VIEWPORT and PAGINATED are not super clear at first glance. If we set the charts to load on scroll I think LAZY might be a more appropriate keyword.

The problem is that both are lazy 😜. Anyway, I vote for removing the flag/configuration completely and choosing the strategy based on the number of charts that need to be rendered.

@justinpark
Copy link
Member

justinpark commented Sep 16, 2022

@kgabryje Is this safe to support the screen capture from "Reports" worker?
I personally recommend using react-intersection-observer which can cover the fallback case of unsupported IntersectionObserver (which possibly issue on screen capture from the celery worker)

btw I recently introduced the similar feature (which only limits the runQuery) in #21488

@kgabryje
Copy link
Member Author

kgabryje commented Sep 19, 2022

Thank you all for feedback and suggestions!
@justinpark I'm not sure if delaying queries is a better solution. Queries are initialized in the order in which the charts are placed on the dashboard, so while we usually are exceeding the browser's simultaneous query limit, charts at the top of the page don't really need to wait for the charts at the bottom. Moreover, I've chosen to only delay drawing charts instead of drawing + running queries in order to reduce a possible "slow scroll" experience.

Great catch about the reports! Do you have any idea how to disable virtualization for celery workers?

@michael-s-molina Agreed, but we need to figure out the threshold between the 2 strategies. I was thinking about this: if the chart hasn't been in view yet, draw it when it's 1 view height away from the viewport (like we do now). If the chart has been drawn and viewed, remove it when it's 5 (arbitrarily chosen number) view heights away from the viewport. That way we won't remove charts in regular dashboards, but we will in big dashboards.
WDYT?

@villebro Good point about the Pascal case! However, if we can figure out the dynamic approach proposed by Michael we won't need that config, so I'm not changing that for now 🙂

@michael-s-molina
Copy link
Member

@michael-s-molina Agreed, but we need to figure out the threshold between the 2 strategies. I was thinking about this: if the chart hasn't been in view yet, draw it when it's 1 view height away from the viewport (like we do now). If the chart has been drawn and viewed, remove it when it's 5 (arbitrarily chosen number) view heights away from the viewport. That way we won't remove charts in regular dashboards, but we will in big dashboards.
WDYT?

Sounds good. We can determine the arbitrary number by displaying a heavy viewport of charts, measuring the consumed memory, and multiplying it by what we consider acceptable.

@ktmud
Copy link
Member

ktmud commented Sep 20, 2022

IIRC, react-virtualized has been discontinued by the author in favor of continued support for react-window. So that's another reason to not use react-virtualized.

if (
isDashboardVirtualizationEnabled(this.props.dashboardVirtualizationMode)
) {
this.observer = new IntersectionObserver(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we disconnect and unobserve on unmount? Just to reduce the chance of memory leak.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kgabryje
Copy link
Member Author

kgabryje commented Oct 9, 2022

@michael-s-molina @justinpark @geido @villebro @hushaoqing @rusackas @ktmud Thank you all for feedback and suggestions... and your patience. I updated the PR - now the charts render when they are 1vh away from the viewport, but they're removed from DOM only if they are more than 4vh away from viewport. See the updated summary for more details

@kgabryje
Copy link
Member Author

/testenv up FEATURE_DASHBOARD_VIRTUALIZATION

@github-actions
Copy link
Contributor

@kgabryje Ephemeral environment spinning up at http://52.42.75.81:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

Copy link
Member

@geido geido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the enhancements!

@kgabryje kgabryje merged commit 406e44b into apache:master Oct 11, 2022
@github-actions
Copy link
Contributor

Ephemeral environment shutdown and build artifacts deleted.

@mistercrunch mistercrunch added 🚢 2.1.3 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Feb 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.