Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dragonfly sometimes fails with OOM even when --cache_mode=true #3155

Open
bogdanp05 opened this issue Jun 10, 2024 · 2 comments
Open

Dragonfly sometimes fails with OOM even when --cache_mode=true #3155

bogdanp05 opened this issue Jun 10, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@bogdanp05
Copy link

bogdanp05 commented Jun 10, 2024

Describe the bug
Dragonfly fails with OOM even when --cache_mode=true. This happens after multiple hours/days of intense load.

To Reproduce
We noticed this on several services that were processing ~200k commands/s and which had between 11k and 15k clients. Also, at the time of the OOM restart, the RSS memory was 162GB, while the used memory was 128GB.

Jun 08 13:56:10 dragonfly-1-4 systemd[1]: dragonfly.service: A process of this unit has been killed by the OOM killer.
Jun 08 13:56:38 dragonfly-1-4 systemd[1]: dragonfly.service: Main process exited, code=killed, status=9/KILL
Jun 08 13:56:38 dragonfly-1-4 systemd[1]: dragonfly.service: Failed with result 'oom-kill'.
Jun 08 13:56:38 dragonfly-1-4 systemd[1]: dragonfly.service: Consumed 2w 4d 4h 34min 1.671s CPU time.
Jun 08 13:56:53 dragonfly-1-4 systemd[1]: dragonfly.service: Scheduled restart job, restart counter is at 1.
Jun 08 13:56:53 dragonfly-1-4 systemd[1]: Stopped dragonfly.service - Aiven dragonfly in container.
Jun 08 13:56:53 dragonfly-1-4 systemd[1]: dragonfly.service: Consumed 2w 4d 4h 34min 1.671s CPU time.
Jun 08 13:56:53 dragonfly-1-4 systemd[1]: Started dragonfly.service - Aiven dragonfly in container.

Expected behavior
Dragonfly shouldn't crash.

Environment (please complete the following information):

  • OS: Fedora 38
  • Kernel: 6.6.12
  • Containerized?: Fedora toolbox container
  • Dragonfly Version: 1.18.1
@bogdanp05 bogdanp05 added the bug Something isn't working label Jun 10, 2024
@kostasrim
Copy link
Contributor

Hi @bogdanp05, thank you for reporting this.

@chakaz I see RSS here, is that something we are aware of or shall we investigate?

@romange
Copy link
Collaborator

romange commented Jun 12, 2024

We are taking care of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants