Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add public logs for Broadcaster instrospection #2815

Merged
merged 3 commits into from
Jun 29, 2023
Merged

Conversation

emranemran
Copy link
Contributor

@emranemran emranemran commented Jun 22, 2023

What does this pull request do? Explain your changes. (required)

clog: add new logging methods to publish a set of public logs
The O community has shown interest in B introspection where some logging
can be published to a public facing Loki instance that can be used to
debug the selection algorithm. To that end, this change adds a few
methods to the clog package to publish logs with a specific tag
([PublicLogs]) and removes any contextual info that should not be
published.

To log specific lines to the public facing Loki instance, one can do the
following as an example:

  • clog.PublicInfof(ctx, "Updating O ID=%v, reason=%v", "fooID", "bar")

The resulting log will have specific contextual info like manifestID,
orchSessionID, etc pre-pended along with a [PublicLogs] tag.

Specific updates (required)

  • Update existing clog package to add new methods: PublicInfof, PublicCloneCtx
  • Update definition of existing infof, formatMessage methods to accept a boolean that indicates if the log is supposed to be public or internal facing.
  • Updated/added tests

How did you test each of these updates (required)

  • Unit tests added and all tests pass. Specific tests were added to ensure only non-approved contextual info is not being leaked.

Does this pull request close any open issues?
Fixes VID-248

Checklist:

  • Read the contribution guide
  • make runs successfully
  • All tests in ./test.sh pass --> (NOTE: this never passed locally on my machine but the CI tests pass)
  • README and other documentation updated
  • Pending changelog updated

@@ -56,6 +56,8 @@ var submitMultiSession = func(ctx context.Context, sess *BroadcastSession, seg *
go submitSegment(ctx, sess, seg, segPar, nonce, calcPerceptualHash, resc)
}

var PublicLogging bool = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this now that we're going down the promtail route

@emranemran emranemran force-pushed the emran/public-logs branch 3 times, most recently from 1833c29 to 0a5ee68 Compare June 26, 2023 22:30
The O community has shown interest in B introspection where some logging
can be published to a public facing Loki instance that can be used to
debug the selection algorithm. To that end, this change adds a few
methods to the clog package to publish logs with a specific tag
([PublicLogs]) and removes any contextual info that should not be
published.

To log specific lines to the public facing Loki instance, one can do the
following as an example:
 - clog.PublicInfof(ctx, "Updating O ID=%v, reason=%v", "fooID", "bar")

The resulting log will have specific contextual info like manifestID,
orchSessionID, etc pre-pended along with a [PublicLogs] tag.
@emranemran emranemran force-pushed the emran/public-logs branch 2 times, most recently from 5b7a2f3 to 162cf6d Compare June 26, 2023 22:51
@emranemran emranemran marked this pull request as ready for review June 26, 2023 22:53
@emranemran emranemran changed the title Emran/public logs Add public logs for Broadcaster instrospection Jun 26, 2023
@linear
Copy link

linear bot commented Jun 27, 2023

@@ -216,6 +216,16 @@ func selectSession(sessions []*BroadcastSession, exclude []*BroadcastSession, du
// threshold is selectable
if len(session.SegsInFlight) == 0 {
if session.LatencyScore > 0 && session.LatencyScore <= SELECTOR_LATENCY_SCORE_THRESHOLD {
clog.PublicInfof(ctx,
"Selected orchestrator reason=%v, eth-address=0x%v, ip-address=%v",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to put reason= at the end of log as it contains commas and spaces. It might be easier for reading and parsing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, can we have these as key/value fields rather than baked into the log line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomshutt you mean to extract eth-address and ip-address as promtail/loki labels? We could log them as string in broadcaster and then add regex logic in promtail to extract eth-address and ip-address as labels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomshutt: yes, the plan is to include some of these fields as key/value fields in clog but we should do that only for ones that are repeatedly logged. I'm planning to use a follow-up PR to add more logs based on this [1] and we can determine which fields should be part of clog's key/value pairs. The other goal is to get this merged so that we can verify the full pipeline from logs to loki and then follow on with optimizations.

@pwilczynskiclearcode: yep, will move reason to end. Let's hold off on labeling anything in promtail/loki until we have the initial pipeline working and verified. We can optimize based on feedback from the O community.

lmk if this sounds ok to you guys.

[1] ad-astra-video@ac3be68#diff-6a8752e5207bc4096237c9a19e0520c8d238fab098e942920f95d4045f886f09

Copy link
Contributor

@ad-astra-video ad-astra-video Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what its worth, I updated to same approach. I did some conformity updates in commits after this one that better followed the clog presentation. Also added some updates to return information to level where clog context was active (e.g. selection) so could have better conformity.

ad-astra-video@af65b34

ad-astra-video@9d8d89b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ad-astra-video: for this initial PR, I opted for simpler logs. Several unit tests were failing due to code races when attempting to log certain values (e.g. info related to payment tickets) where it was causing certain http responses to time out.

Also unit tests caught a few panics which were getting triggered when trying to log values like ethcommon.Bytes2Hex(session.OrchestratorInfo.TicketParams.Recipient) when no segments were in-flight -- this was being caused by a null session.OrchestratorInfo object under certain conditions. I'm not fully ramped up on this code base yet so didn't quite understand all the conditions that can lead to this. I'll be spending some more time on this to get a better understanding.

Once we verify logging to the public loki instance is working, we can follow up with further changes that will log all the values that will be most beneficial.

@emranemran
Copy link
Contributor Author

Updates:

  • Addressed comments
  • Planning to merge tomorrow. Any further changes can be addressed in a follow-up PR once we verify loki logging is working as expected.

@emranemran emranemran force-pushed the emran/public-logs branch 9 times, most recently from 6acf64a to c46ec62 Compare June 29, 2023 04:28
@codecov
Copy link

codecov bot commented Jun 29, 2023

Codecov Report

Merging #2815 (84554cc) into master (1b9da7f) will increase coverage by 0.11615%.
The diff coverage is 72.72727%.

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #2815         +/-   ##
===================================================
+ Coverage   56.62789%   56.74404%   +0.11615%     
===================================================
  Files             88          88                 
  Lines          19086       19128         +42     
===================================================
+ Hits           10808       10854         +46     
+ Misses          7684        7679          -5     
- Partials         594         595          +1     
Impacted Files Coverage Δ
clog/clog.go 63.01370% <51.61290%> (+7.01370%) ⬆️
server/broadcast.go 77.87904% <100.00000%> (+0.39169%) ⬆️

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b9da7f...84554cc. Read the comment docs.

Impacted Files Coverage Δ
clog/clog.go 63.01370% <51.61290%> (+7.01370%) ⬆️
server/broadcast.go 77.87904% <100.00000%> (+0.39169%) ⬆️

... and 1 file with indirect coverage changes

@emranemran emranemran merged commit 8bd65fb into master Jun 29, 2023
16 checks passed
@emranemran emranemran deleted the emran/public-logs branch June 29, 2023 05:12
This was referenced Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants