Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk-trace-base): improve log messages when dropping span events #4223

Merged
merged 3 commits into from
Dec 15, 2023

Conversation

mkubliniak
Copy link
Contributor

Which problem is this PR solving?

After the number of events added to a span exceeds span limits a warning log message "Dropping extra events." is emitted for each dropped event on the span.

In case the number of span events is much higher than the limit, the log is overloaded by "Dropping extra events." which imo are not necessary.

The specifications says

There SHOULD be a log emitted to indicate to the user that an attribute, event, or link was discarded due to such a limit. To prevent excessive logging, the log should not be emitted once per span, or per discarded attribute, event, or links.

which I'm bit confused about.

Imo the good approach is similar to what BatchSpanProcessor does with maxQueueSize - log debug message for the first dropped event and then log warning message with the number of dropped events when the span ends.

Short description of the changes

When dropping an event we'll first check if this is the first dropped event before logging a message.
When the span is closed and there are some dropped events, we'll emit a warning log with the number of dropped events.

This change could be breaking for users relying on the exact wording of the log message or relying on the warning message being emitted when the first event is dropped.

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

I added a check for log messages to 'should drop extra events'.

  • Run Span.tests.ts

Checklist:

  • Followed the style guidelines of this project
  • Unit tests have been added
  • Documentation has been updated

@mkubliniak mkubliniak requested a review from a team as a code owner October 25, 2023 07:48
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 25, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: mkubliniak / name: Martin Kubliniak (09a3ef8, 555d212)
  • ✅ login: legendecas / name: Chengzhong Wu (b7ef274)

@codecov
Copy link

codecov bot commented Oct 27, 2023

Codecov Report

Merging #4223 (b7ef274) into main (d828041) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4223      +/-   ##
==========================================
+ Coverage   92.22%   92.23%   +0.01%     
==========================================
  Files         332      332              
  Lines        9464     9467       +3     
  Branches     2009     2011       +2     
==========================================
+ Hits         8728     8732       +4     
+ Misses        736      735       -1     
Files Coverage Δ
packages/opentelemetry-sdk-trace-base/src/Span.ts 97.50% <100.00%> (+0.04%) ⬆️

... and 1 file with indirect coverage changes

@legendecas
Copy link
Member

I believe the spec statements were assuming that exceeding span limits, like attribute counts and event counts, is likely to be a bug in the code base in the libraries or end application and can hardly be alleviated at runtime. Excessive logging can lead to a worse case at runtime in such cases.

IMO this is different from the maxQueueSize as the queue would be drained and the condition can be alleviated automatically.

@mkubliniak
Copy link
Contributor Author

I agree that exceeding event counts can be a bug and the mechanism of exceeding the limit is different from maxQueueSize. But the way it is logged could be similar.

Imo emitting the same warning log for each dropped event does not provide additional insight into what's going on. It's not easy to see if all such logs came from a few spans that exceeded the limit by many or if the number of dropped events is typically low and increasing the limit would allow to capture them all.

@legendecas
Copy link
Member

Thank you for the clarification. The change LGTM.

@legendecas legendecas added the sdk:traces Issues and PRs related to the Traces SDK label Dec 15, 2023
@legendecas legendecas merged commit 3e59291 into open-telemetry:main Dec 15, 2023
21 checks passed
rdeavila94 pushed a commit to rdeavila94/opentelemetry-js that referenced this pull request Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sdk:traces Issues and PRs related to the Traces SDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants