Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

解决Flask gunicorn的WORKER TIMEOUT报错 #174

Open
AlexiaChen opened this issue Apr 28, 2023 · 0 comments
Open

解决Flask gunicorn的WORKER TIMEOUT报错 #174

AlexiaChen opened this issue Apr 28, 2023 · 0 comments
Labels
Python Python语言相关的一切 软件调试 调试技巧,思考

Comments

@AlexiaChen
Copy link
Owner

AlexiaChen commented Apr 28, 2023

服务架构是,gunicorn启动的WSGI server用Nginx做反向代理。 就是网络上说的Nginx + gunicorn + Flask的架构。

错误日志是:

[2023-04-28 01:58:09 +0000] [11] [CRITICAL] WORKER TIMEOUT (pid:15)
[2023-04-28 01:58:09,717] INFO in client: Got keepalive def03be1-9193-4219-be32-5c3caf806f6e in 10.36s
Exception ignored in: <function _ChannelCallState.__del__ at 0x7fd37fb905e0>
Traceback (most recent call last):
  File "/app/__pypackages__/3.8/lib/grpc/_channel.py", line 1247, in __del__
    self.channel.close(cygrpc.StatusCode.cancelled,
  File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 513, in grpc._cython.cygrpc.Channel.close
  File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 399, in grpc._cython.cygrpc._close
  File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 420, in grpc._cython.cygrpc._close
  File "/usr/local/lib/python3.8/threading.py", line 302, in wait
    waiter.acquire()
  File "/app/__pypackages__/3.8/lib/gevent/thread.py", line 121, in acquire
    acquired = BoundedSemaphore.acquire(self, blocking, timeout)
  File "src/gevent/_semaphore.py", line 180, in gevent._gevent_c_semaphore.Semaphore.acquire
  File "src/gevent/_semaphore.py", line 259, in gevent._gevent_c_semaphore.Semaphore.acquire
  File "src/gevent/_semaphore.py", line 249, in gevent._gevent_c_semaphore.Semaphore.acquire
  File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
  File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
  File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
  File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
gevent.exceptions.LoopExit: This operation would block forever
        Hub: <Hub '' at 0x7fd38477b220 epoll default pending=0 ref=0 fileno=6 resolver=<gevent.resolver.thread.Resolver at 0x7fd383daf100 pool=<ThreadPool at 0x7fd380a6c740 tasks=0 size=0 maxsize=10 hub=<Hub at 0x7fd38477b220 thread_ident=0x7fd385f4b740>>> threadpool=<ThreadPool at 0x7fd380a6c740 tasks=0 size=0 maxsize=10 hub=<Hub at 0x7fd38477b220 thread_ident=0x7fd385f4b740>> thread_ident=0x7fd385f4b740>
        Handles:
[]

在线上发现一个现象,一个http请求Python Flask写的REST API 服务被Block住了很久,我把gunicorn的timeout配置加大也不行。试了这个 https://stackoverflow.com/questions/10855197/frequent-worker-timeout 链接里面的各种方法,包括把preload设置为True也不行。

gunicorn的配置:

import multiprocessing

bind = "0.0.0.0:5000"
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "gevent"
loglevel = "info"

后来仔细想了下,为什么其他的Http REST API接口并没有这么被block住超时的情况,我想了下,是这个API又调用了stable diffusion的gRPC的API,不是stable diffustion的REST API。然后我的gunicorn的worker_class又是gevent的配置,如果是默认的sync配置,则没有以上问题。但是我的服务端的场景,更推荐用async的gevent啥的。所以我就尝试Google了grpc gevent gunicorn相关的关键词,终于找到了,原来是gevent和grpc根本不兼容导致的。

要在你的Flask入口程序,比如 app.py的import标准库之前(文件头的最开始处)写以下兼容性的补丁代码:

from gevent import monkey
monkey.patch_all()

import grpc.experimental.gevent as grpc_gevent
grpc_gevent.init_gevent()

# import a bunch of standard packages

我想着这个代码比较丑陋,而且相关的兼容性issue也比较早了。我的gevent和grpc版本应该不老,按理来说早就被开源社区修复掉了,而且他们说已经解决,这个只是临时补丁而已。没想到居然还是通过这个补丁解决了,既然已经解决,我就不具体去查那个版本解决的了。

打完补丁后,gunicorn的配置,建议改成如下:

import multiprocessing

bind = "0.0.0.0:5000"
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "gevent"
loglevel = "info"
timeout=100
graceful_timeout=100
keepalive=256
preload=True

References

@AlexiaChen AlexiaChen added 软件调试 调试技巧,思考 Python Python语言相关的一切 labels Apr 28, 2023
Repository owner locked and limited conversation to collaborators May 2, 2023
Repository owner unlocked this conversation May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python Python语言相关的一切 软件调试 调试技巧,思考
Projects
None yet
Development

No branches or pull requests

1 participant