Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement OpenTelemetry to emit rich client-level telemetry on queries (and maybe ingest) #232

Open
andrejpk opened this issue Mar 22, 2022 · 3 comments
Assignees

Comments

@andrejpk
Copy link

andrejpk commented Mar 22, 2022

Is your feature request related to a problem? Please describe.
As a developer or operator of a system using OpenTelemetry behind a REST API, the calls through the Kusto Query client library show up in App Insights (and other monitoring/telemetry tools) as pain HTTP dependency calls. Challenges:

  • Hard to troubleshoot slow queries without seeing commands and seeing the query stats
  • No easy way to correlate client-side logs in App Insights with service diagnostics logs in Azure Monitor

These can be implemented by each client, but many modern

Describe the solution you'd like

This work could be done inside of the Kusto client library or possibly as an add-on package in there are sufficient hooks to wire into it from the outside. (If not and an external instrumentation library is preferred, it may be necessary to add some hooks into a PR into this library).

  1. Create a Span for query execution calls
    Enhance the Kusto Query (and possibly ingest) client using OpenTelemetry, creating a Span around the call into .execute method. On that Span add attributes like:

Query:

  • query statement (opt-in due to possible data leakage into logs)

Response:

  • Number of rows returned in main result set (possibly other result metadata)
  • Returned query stats JSON

The Span would have the client perspective on query duration and wrap the downstream HTTP request which would still land in OpenTelemetry through instrumentation of underlying HTTP client libraries.

  1. Pass the OpenTelemetry trace context into the call to ADX

This would allow joining the client Span data (in App Insights, the dependency table) to ADX's Azure Monitor server-side query logs. This makes it easy to trace a client-side performance issue to the server-side perspective.

Do this by capturing the OpenTelemetry spanContext.traceId into the clientRequestProperties.clientRequestId property in .execute. (Use this as the default unless the client passes a value in for this into .execute

I haven't worked enough with the Ingest side of this library to have an opinion on how OpenTelemetry could add value there. I suggest starting with the query side for this first cut.

Describe alternatives you've considered
Currently wrapping the KustClient to add this functionality. It has been very useful in the short time we have had it.

Additional context
Prior art:

I may have a hackathon team interested in working on this and contributing it here... TBD.

@yogilad
Copy link
Contributor

yogilad commented Apr 7, 2022

Hi @andrejpk

We have a plan to introduce client-side logs to all SDKs. though we do not plan to take a dependency on OpenTelemetry.
Generally, Query requests contain ClientRequestId which allows for to correlate actions with the Kusto backend (though I can't attest as to how it works in App Insights or Azure Monitor.

@AsafMah , FYI.

@andrejpk
Copy link
Author

andrejpk commented Apr 14, 2022

With just out-of-the-box OpenTelemetry instrumentation on a typical Java environment, the call to ADX just just an opaque REST request:
image

It's kind of a dead-end TBH.

But with SQL Server, CosmosDb and others, I automatically get custom Trace data with Spans that give me useful information on the query and ADX's work on that operation so I don't have to copy/paste or join between systems to learn the basics.

On a recent project, my team added custom OpenTelemetry hooks at the application level in our Kusto Client query wrapper library, adding the query command, RequestProperties, QueryCompetionProperties and the returned rowcount in a new ADX Query Trace Span that appears in the App Insights distributed trace tree. With this in place, a system operator can easily troubleshoot an issue entirely in App Insights vs having to search log files or copy/paste/join correlation IDs between different logs. (In other work with CosmosDb and SQL Server, this wasn't necessary -- it was just something we had to enable or configure.)

An instrumentation would define some sensible defaults on what to capture since some of this context is verbose, but OOTB some reasonable defaults would make this more operator and developer friendly to use.

This does seem like a common issue across any version of the library.. .NET, Java, Node/JS all have OpenTelemetry SDKs to enable this.

Are there hooks where an OpenTelemetry add-on package/library could be created? Many libraries do it this way. If the hooks are there, an add-on would be registered on top of the library.

@yogilad
Copy link
Contributor

yogilad commented May 14, 2023

@enmoed , you can close this issue when you're done with OTel integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants