Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] reload config time #390

Open
wurmrobert opened this issue Dec 31, 2018 · 5 comments
Open

[Question] reload config time #390

wurmrobert opened this issue Dec 31, 2018 · 5 comments
Labels
performance Related to performance bugs, info, improvements
Milestone

Comments

@wurmrobert
Copy link

wurmrobert commented Dec 31, 2018

  • Version: 0.0.8
  • OS: docker with alpine latest

I am using your collector for about 550 devices.
Everything works well. But is it normal that the reload config process sometimes is really slow? Most of the time it takes more than one minute. I run your collector in a docker container. When i restart the container i am much faster and the gather process takes also the latest config. So are my device settings invidious or is that a known issue?

Each Device has the following settings:

around 15 indexed with direct tag measuerments
1 measurement group
no custom filters

timeout: 20
retries: 5
log level: info
snmpdebug: false
snmp version: 2c
disable bulk: false
max repetitions: 5
freq: 300
update flt freq: 60
concurrent gather: true
@jensenja
Copy link
Contributor

jensenja commented Jan 2, 2019

I'm sure the devs will chime in, but I've also found that config reloads can hang/take some time when the config reload is issued during a polling cycle. When snmpcollector isn't actively polling anything, config reloads are quite snappy. I only came to this conclusion via trial and error when some devices I was testing snmpcollector with were timing out. The config reloads would just hang because snmpcollector seemed to be busy attempting to poll devices that were eventually just never going to respond. I'm also running snmpcollector in a Docker container. I have about 250 devices but I'm federating their polling with 8 different snmpcollector instances.

@sbengo
Copy link
Collaborator

sbengo commented Jan 7, 2019

Hi @wurmrobert , @jensenja

As you have described, the reload config action waits until all devices finishes its polling to be able to do a safe-load runtime configuration on all devices. It means that the maximum wait time should be equal than the slowest device.

If a device has lots of measurements/metrics and trends to hang, you can play with timeout and retries parameters to try to decrease the reload config time

Thanks,
Regards!!

@wurmrobert
Copy link
Author

Thanks for your explanations @jensenja, @sbengo.

For what is the "safe-load" functionality exactly needed? Is it required that data won't get lost?
Are there any issues which are against an "unsafe-reload", or would that be even possible?

@JuSacco
Copy link

JuSacco commented Jan 20, 2021

Hi! @jensenja @sbengo @wurmrobert
Im using snmp collector to get polling data from about 5k devices, the issue that Im getting it's when I reload config, I've get holes on metric dashboards of Grafana. That is because I got a lot of devices (I know it), now Im playing with timeout and retries but without encouraging result:

Host Measurements Timeout Retries Time
5027 5 5 2 0:05:54
5027 5 5 2 0:03:34
5027 5 5 2 0:03:51
5027 5 5 2 0:05:31
5027 5 5 2 0:06:09
5027 5 5 2 0:05:50
5027 5 3 1 0:07:19
5027 5 3 1 0:07:12
5027 5 3 1 0:06:00
5027 5 3 1 0:05:43
5027 5 3 1 0:07:29

I stick to wurmrobert ask. What are the issues if we do with "unsafe-reload"?
Its posible to stop gathering as most close as posible when devices are ready to reload config?

Thanks in advance!

@toni-moreno
Copy link
Owner

Hello @JuSacco , @wurmrobert , @jensenja as an easy workaround we have added recently here ( 8247ecb ) a new API call /api/rt/agent/shutdown ( after logged) with will end the process immediately . If you are working with snmpcollector docker image you can config --restart=always option and snmpcollector will reload fast config and will resume the gathering process. Feel free to test it, and give us feedback about this new option.

@toni-moreno toni-moreno added this to the 2.0 milestone Mar 7, 2021
@toni-moreno toni-moreno added the performance Related to performance bugs, info, improvements label Mar 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Related to performance bugs, info, improvements
Projects
None yet
Development

No branches or pull requests

5 participants