Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(integrations): KeyError('sentry-monitor-start-timestamp-s') #3278

Conversation

Mohsen-Khodabakhshi
Copy link
Contributor

@Mohsen-Khodabakhshi Mohsen-Khodabakhshi commented Jul 10, 2024

I said my problem in this issue:
#3277

I prefered to use .get("sentry-monitor-start-timestamp-s") to solve my problem and don't get exception.
I hope it can be useful to some extent.

Fixes #3277

Copy link
Member

@sl0thentr0py sl0thentr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sl0thentr0py sl0thentr0py added the Trigger: tests using secrets PR code is safe; run CI label Jul 11, 2024
@szokeasaurusrex
Copy link
Member

@sl0thentr0py @Mohsen-Khodabakhshi, we should also handle the case where the other values from the headers are missing. It's quite possible that if we merge this PR as is, you could get a KeyError for one of the other values, since maybe the headers are getting cleared somewhere.

@szokeasaurusrex
Copy link
Member

I can take over the PR and implement these changes if you would like

@sl0thentr0py
Copy link
Member

@szokeasaurusrex ideally this shouldn't happen at all though so I think something else might be going wrong in this function

def _update_celery_task_headers(original_headers, span, monitor_beat_tasks):
and swallowed silently

@szokeasaurusrex
Copy link
Member

@sl0thentr0py interesting. Could it be that something else is just overwriting these headers, or should our SDK really be the only thing accessing them?

@sl0thentr0py
Copy link
Member

no I don't think something's overwriting the headers, I think we don't even add our headers because that function is a mess and this with block just swallows some other exception

with capture_internal_exceptions():

Copy link

codecov bot commented Jul 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.36%. Comparing base (1c86489) to head (26d7406).
Report is 24 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3278      +/-   ##
==========================================
- Coverage   79.37%   79.36%   -0.01%     
==========================================
  Files         132      132              
  Lines       14271    14271              
  Branches     2997     2997              
==========================================
- Hits        11327    11326       -1     
  Misses       2099     2099              
- Partials      845      846       +1     
Files Coverage Δ
sentry_sdk/integrations/celery/beat.py 86.06% <100.00%> (ø)

... and 1 file with indirect coverage changes

@Mohsen-Khodabakhshi
Copy link
Contributor Author

The question is why this condition is ok in this function (in beat.py):

def _setup_celery_beat_signals(monitor_beat_tasks):
    ...
    if monitor_beat_tasks: # I mean here
        task_success.connect(crons_task_success)
        task_failure.connect(crons_task_failure)
        task_retry.connect(crons_task_retry)

And condition is not ok in _update_celery_task_headers function(in __init__.py):

def _update_celery_task_headers(original_headers, span, monitor_beat_tasks):
    ...
    with capture_internal_exceptions():
        headers = dict(
            Scope.get_isolation_scope().iter_trace_propagation_headers(span=span)
        )

        if monitor_beat_tasks: # I mean here
            headers.update(
                {
                    "sentry-monitor-start-timestamp-s": "%.9f"
                    % _now_seconds_since_epoch(),
                }
            )
        ...

It's cool that in my os I don't have any problem :)

@sl0thentr0py
Copy link
Member

@Mohsen-Khodabakhshi just to rule it out,
is your sentry SDK config somehow different in your worker process and the process that does the beat scheduling?

@Mohsen-Khodabakhshi
Copy link
Contributor Author

Mohsen-Khodabakhshi commented Jul 13, 2024

@sl0thentr0py

from celery import Celery
from celery.schedules import crontab
from sentry_sdk.scrubber import EventScrubber
from sentry_sdk.integrations.celery import CeleryIntegration

from core.settings import settings
import sentry_sdk

from domains.common.services.sentry import SENTRY_DENYLIST

sentry_sdk.init(
    dsn=settings.SENTRY_URL,
    enable_tracing=True,
    traces_sample_rate=1.0,
    event_scrubber=EventScrubber(denylist=SENTRY_DENYLIST),
    integrations=[CeleryIntegration(
        monitor_beat_tasks=True,
    )]
)

celery = Celery(
    __name__,
    include=["domains.concierge_sale.tasks"],
)
celery.conf.result_backend = settings.TRADE_CELERY_RESULT_BACKEND_URL
celery.conf.broker_url = settings.TRADE_CELERY_BROKER_URL
celery.conf.beat_schedule = {
    "shop-analytics-data-every-day": {
        "task": "send_concierge_sale_to_assistant_offer_task",
        "schedule": crontab(minute="*/1"),
    },
}
celery.conf.imports = ["domains.concierge_sale.tasks"]
celery.conf.result_backend_transport_options = {
    "global_keyprefix": settings.TRADE_CELERY_TASK_DEFAULT_QUEUE,
    "visibility_timeout": 172800,
}
celery.conf.celery_task_default_queue = settings.TRADE_CELERY_TASK_DEFAULT_QUEUE

This is my celery_.py file 👆
and I run Celery worker/beat using:

celery -A core.celery_.celery worker --loglevel=info
celery -A core.celery_.celery beat --loglevel=info

In our production: Celery beat and Celery worker each run on separate Kubernetes pod.

@szokeasaurusrex
Copy link
Member

Hi @Mohsen-Khodabakhshi, looks like you are initializing the SDK incorrectly. When using the SDK with Celery Beat, the SDK needs to be initialized within a function decorated with @signals.beat_init.connect and @signals.celeryd_init.connect, like so:

@signals.beat_init.connect
@signals.celeryd_init.connect
def init_sentry(**kwargs):
    sentry_sdk.init(...)

@github-actions github-actions bot removed the Trigger: tests using secrets PR code is safe; run CI label Jul 18, 2024
@antonpirker antonpirker enabled auto-merge (squash) July 18, 2024 11:33
@antonpirker antonpirker added the Trigger: tests using secrets PR code is safe; run CI label Jul 18, 2024
@antonpirker antonpirker merged commit 531f8f7 into getsentry:master Jul 18, 2024
124 of 126 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Trigger: tests using secrets PR code is safe; run CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KeyError('sentry-monitor-start-timestamp-s')
4 participants