-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add public logs for Broadcaster instrospection #2815
Conversation
server/broadcast.go
Outdated
@@ -56,6 +56,8 @@ var submitMultiSession = func(ctx context.Context, sess *BroadcastSession, seg * | |||
go submitSegment(ctx, sess, seg, segPar, nonce, calcPerceptualHash, resc) | |||
} | |||
|
|||
var PublicLogging bool = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this now that we're going down the promtail route
1833c29
to
0a5ee68
Compare
The O community has shown interest in B introspection where some logging can be published to a public facing Loki instance that can be used to debug the selection algorithm. To that end, this change adds a few methods to the clog package to publish logs with a specific tag ([PublicLogs]) and removes any contextual info that should not be published. To log specific lines to the public facing Loki instance, one can do the following as an example: - clog.PublicInfof(ctx, "Updating O ID=%v, reason=%v", "fooID", "bar") The resulting log will have specific contextual info like manifestID, orchSessionID, etc pre-pended along with a [PublicLogs] tag.
5b7a2f3
to
162cf6d
Compare
c537a3f
to
3649d01
Compare
server/broadcast.go
Outdated
@@ -216,6 +216,16 @@ func selectSession(sessions []*BroadcastSession, exclude []*BroadcastSession, du | |||
// threshold is selectable | |||
if len(session.SegsInFlight) == 0 { | |||
if session.LatencyScore > 0 && session.LatencyScore <= SELECTOR_LATENCY_SCORE_THRESHOLD { | |||
clog.PublicInfof(ctx, | |||
"Selected orchestrator reason=%v, eth-address=0x%v, ip-address=%v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to put reason=
at the end of log as it contains commas and spaces. It might be easier for reading and parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better, can we have these as key/value fields rather than baked into the log line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomshutt you mean to extract eth-address
and ip-address
as promtail/loki labels? We could log them as string in broadcaster and then add regex logic in promtail
to extract eth-address
and ip-address
as labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomshutt: yes, the plan is to include some of these fields as key/value fields in clog but we should do that only for ones that are repeatedly logged. I'm planning to use a follow-up PR to add more logs based on this [1] and we can determine which fields should be part of clog's key/value pairs. The other goal is to get this merged so that we can verify the full pipeline from logs to loki and then follow on with optimizations.
@pwilczynskiclearcode: yep, will move reason to end. Let's hold off on labeling anything in promtail/loki until we have the initial pipeline working and verified. We can optimize based on feedback from the O community.
lmk if this sounds ok to you guys.
[1] ad-astra-video@ac3be68#diff-6a8752e5207bc4096237c9a19e0520c8d238fab098e942920f95d4045f886f09
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what its worth, I updated to same approach. I did some conformity updates in commits after this one that better followed the clog presentation. Also added some updates to return information to level where clog context was active (e.g. selection) so could have better conformity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ad-astra-video: for this initial PR, I opted for simpler logs. Several unit tests were failing due to code races when attempting to log certain values (e.g. info related to payment tickets) where it was causing certain http responses to time out.
Also unit tests caught a few panics which were getting triggered when trying to log values like ethcommon.Bytes2Hex(session.OrchestratorInfo.TicketParams.Recipient)
when no segments were in-flight -- this was being caused by a null session.OrchestratorInfo
object under certain conditions. I'm not fully ramped up on this code base yet so didn't quite understand all the conditions that can lead to this. I'll be spending some more time on this to get a better understanding.
Once we verify logging to the public loki instance is working, we can follow up with further changes that will log all the values that will be most beneficial.
3649d01
to
0038aa6
Compare
Updates:
|
6acf64a
to
c46ec62
Compare
c46ec62
to
84554cc
Compare
Codecov Report
@@ Coverage Diff @@
## master #2815 +/- ##
===================================================
+ Coverage 56.62789% 56.74404% +0.11615%
===================================================
Files 88 88
Lines 19086 19128 +42
===================================================
+ Hits 10808 10854 +46
+ Misses 7684 7679 -5
- Partials 594 595 +1
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
What does this pull request do? Explain your changes. (required)
clog: add new logging methods to publish a set of public logs
The O community has shown interest in B introspection where some logging
can be published to a public facing Loki instance that can be used to
debug the selection algorithm. To that end, this change adds a few
methods to the clog package to publish logs with a specific tag
([PublicLogs]) and removes any contextual info that should not be
published.
To log specific lines to the public facing Loki instance, one can do the
following as an example:
The resulting log will have specific contextual info like manifestID,
orchSessionID, etc pre-pended along with a [PublicLogs] tag.
Specific updates (required)
PublicInfof
,PublicCloneCtx
infof
,formatMessage
methods to accept a boolean that indicates if the log is supposed to be public or internal facing.How did you test each of these updates (required)
Does this pull request close any open issues?
Fixes VID-248
Checklist:
make
runs successfully./test.sh
pass --> (NOTE: this never passed locally on my machine but the CI tests pass)