Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High (100%) CPU usage on Linux/Nvidia, even with vsync enabled #275

Open
64kramsystem opened this issue Aug 9, 2021 · 5 comments
Open

Comments

@64kramsystem
Copy link

64kramsystem commented Aug 9, 2021

Seems to be the same issue as #257, but leaving here for tracking purposes in case somebody searches.

On my system, any Macroquad program causes CPU to run at 100%+. This is unique to Macroquad - all the other engines I've tried don't have this problem (Amethist/Tetra/Ggez).

I'm on Ubuntu 20.04, and vsync is enabled.

@ar37-rs
Copy link

ar37-rs commented Aug 18, 2021

for temporary solution, put main thread to sleep for a while after next_frame:
next_frame().await;
std:: thread ::sleep(Duration::from_millis(10));

@64kramsystem 64kramsystem changed the title High (100%) CPU usage, even with vsync enabled High (100%) CPU usage on Linux/Nvidia, even with vsync enabled Oct 10, 2021
@64kramsystem 64kramsystem reopened this Oct 10, 2021
@64kramsystem
Copy link
Author

(Cross-posted from #257, as this is the correct location).

Got this on Linux.

The backtrace shows that glFinish() ends with the nvidia driver calling sched_yield(), presumably in a busy-wait loop. (To confirm, with strace I'm counting 60k sched_yield syscalls per second.)

As suggested in #257 (comment) this seems to be a nvidia driver issue. A web search finds related issues in a couple of projects (e.g. gnome), but I don't understand what the solution was.

With the information at hand (thanks @martinxyz), and after some digging, I think it's an overlook ("bug-ish" sort of thing) in MQ.

I've checked the glFinish() invocation history, and it is used for telemetry:

commit 05e69c49c5e8b5281f7f6645882168628e45b89f
Author: Fedor Logachev <[email protected]>
Date:   Thu Jan 7 12:11:04 2021 -0600

    telemetry: More internal metrics

diff --git a/profiler/src/lib.rs b/profiler/src/lib.rs
index 0317101..4286e55 100644
--- a/profiler/src/lib.rs
+++ b/profiler/src/lib.rs
@@ -188,14 +188,20 @@ fn zone_ui(ui: &mut Ui, zone: &Zone, n: usize) {
                     }
                 }
 
-                ui.label(
-                    None,
-                    &format!(
-                        "Full frame time: {:.3}ms {:.1}(1/t)",
-                        get_frame_time(),
-                        (1.0 / get_frame_time())
-                    ),
-                );
+                if let Some(frame) = state
+                    .selected_frame
+                    .as_ref()
+                    .or_else(|| state.frames_buffer.get(0))
+                {
+                    ui.label(
+                        None,
+                        &format!(
+                            "Full frame time: {:.3}ms {:.1}(1/t)",
+                            frame.full_frame_time,
+                            (1.0 / frame.full_frame_time)
+                        ),
+                    );
+                }
 
                 if state.paused {
                     if ui.button(None, "resume") {
diff --git a/src/lib.rs b/src/lib.rs
index 486e01c..4e3bd7f 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -247,6 +247,8 @@ fn draw(&mut self) {
             let _z = telemetry::ZoneGuard::new("Event::draw");
 
             if let Some(future) = unsafe { MAIN_FUTURE.as_mut() } {
+                let _z = telemetry::ZoneGuard::new("Main loop");
+
                 get_context().begin_frame();
 
                 if exec::resume(future) {
@@ -263,6 +265,16 @@ fn draw(&mut self) {
 
             get_context().frame_time = date::now() - get_context().last_frame_time;
             get_context().last_frame_time = date::now();
+
+            #[cfg(any(target_arch = "wasm32", target_os = "linux"))]
+            {
+                let _z = telemetry::ZoneGuard::new("glFinish/glFLush");
+
+                unsafe {
+                    miniquad::gl::glFlush();
+                    miniquad::gl::glFinish();
+                }
+            }
         }
 
         telemetry::reset();

I've had a look around, and for example, Intel discourages using glFinish() in its OpenGL optimization guide.

Potentially, this could be a misuse of the OpenGL APIs, but I'm no expert in this.

Regardless, at the very least, telemetry should be preset according to the profile (e.g. disabled in release mode, and enabled in debug mode), or anyway, it should be configurable, and if enabled in release mode, it should be very clearly documented that it can potentially hurt performance.

@martinxyz
Copy link

Out of curiosity, I switched to the nouveau driver (open source) and the CPU load didn't disappear.

The traceback was still going through the same glFinish() call, though usually ending at poll() (which would be good) and almost never at sched_yield(). Not 100% trusting this. Anyway, strace still showed ~8'000 sched_yield calls per second.

When I comment out the glFlush(); glFinish(); lines added by the telemetry commit (see previous comment by @64kramsystem) then the sched_yield() syscalls go away.

My "Ctrl-C" tracebacks in gdb now go from miniquad::graphics::Context::clear() through dri_image_drawable_get_buffers() and end up in a poll() syscall.

CPU load also seems lower, but I'm having trouble measuring it reliably. Load average and top disagree, and it seems to depend on the window size. A larger window gives lower(?!) CPU load.

In case it matters, all my tests were on Gnome3 and Xorg (not Wayland).

@mhcerri
Copy link

mhcerri commented May 30, 2023

@not-fl3, what exactly are you trying to achieve with this piece of code bellow? Is there any reason it is only used for linux and wasm32?

@@ -263,6 +265,16 @@ impl EventHandlerFree for Stage {

             get_context().frame_time = date::now() - get_context().last_frame_time;
             get_context().last_frame_time = date::now();
+
+            #[cfg(any(target_arch = "wasm32", target_os = "linux"))]
+            {
+                let _z = telemetry::ZoneGuard::new("glFinish/glFLush");
+
+                unsafe {
+                    miniquad::gl::glFlush();
+                    miniquad::gl::glFinish();
+                }
+            }
         }

         telemetry::reset();

glFinish() is known for causing busy waits with Nvidia drivers. If that is really needed, do you think it would be possible to only call it when the profiler is enabled?

Thanks for the awesome project!

@DissolveDZ
Copy link

this is still an active issue, I might do a pull request soon if i get around to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants