Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xray Sampling does not seem to be taking effect. #163

Open
jamesoneill opened this issue Jan 14, 2022 · 16 comments
Open

Xray Sampling does not seem to be taking effect. #163

jamesoneill opened this issue Jan 14, 2022 · 16 comments
Assignees

Comments

@jamesoneill
Copy link

We are using Xray in ECS fargate, we currently see a lot of health check traces being sampled and as such would like to define sampling rules.

I have defined some rules and believe permissions are working correctly, but i do not see any trend data even for the default sampling rule.

I am using the xray deamon latest repo sampling with dotnet sdk.

Any help you can offer will be much appreciated .

@lupengamzn lupengamzn self-assigned this Jan 18, 2022
@lupengamzn
Copy link

Hi @jamesoneill,

Are you going to sample regular traces and filter out the health check ones? If so, you can follow the instructions here to customize the sampling rule. Note that only the one with the highest priority will be picked up and adopted by your services.

@jamesoneill
Copy link
Author

Yes my aim is to filter out requests coming from our load balancer too the services behind them. As it completely makes the graph built unreadable and just interferes with useful sampling.

Currently I have this configuration with a priority of 1 i tried to set it higher then the default priority but the UI does not allow this.

Reservoir size
0
Requests per second
Fixed rate
0
Percent
Matching criteria

Service name
*
Service type
*
HTTP method
*
URL path
*/ping
Resource ARN
*
Host
*

Am i creating this rule correctly assuming the endpoint i want to filter out are. example.com/ping .

Kind regards,

Jamie.

@lupengamzn
Copy link

Hi @jamesoneill ,

I believe if you want to filter out the unwanted traces, you may have to specify the ones you want to trace in the URL path or Host section. It's a little unintuitive but seems like it's the only option at this moment.

@trobert2
Copy link

I've been having a very similar issue while deploying ECS services (flask and express) along with the x-ray daemon (which I think since it is not the inflection point, should have no impact here).
Basically we have /healthz as the health check for our ALBs.

Configuring the service like so:

AWSXRay.middleware.setSamplingRules({
  version: 2,
  rules: [
    {
      description: 'Health',
      http_method: '*',
      host: '*',
      url_path: '/healthz',
      fixed_target: 0,
      rate: 0.0
    }
  ],
  default: { fixed_target: 1, rate: 0.1 }
})
app.use(AWSXRay.express.openSegment('my-api'))

does not exclude the traces. They still appear in the x-ray.
As I understand it, given that host is * and the urlpath is exactly the same in the health check as it is specified here, this rule should work.

Is there anything else missing here? I keep re-reading the docs and don't seem to figure out where the mistake is.

@willarmiros
Copy link
Contributor

@trobert2 this is more of a question for the Node.js SDK repo, but basically you are configuring local sampling rules by calling the setSamplingRules function. By default, the SDKs use centralized sampling where rules are defined on the X-Ray console. This is the recommended approach as it avoids code change.

However, if you'd like to stick with local sampling rules, you must explicitly disable centralized sampling by calling:

AWSXRay.middleware.disableCentralizedSampling()

docs for reference.

@trobert2
Copy link

Hey @willarmiros,
Thanks for the answer. That makes sense.

I had this issue for 2 separate APIs written in 2 different languages, that's why I mentioned flask alongside the express example.
Now that I am reading your answer, I am understanding it as it is mandatory to call disableCentralizedSampling in order for my rules to take effect. Is that correct?

I have read the document you are referencing and it says:

To use only local rules, call disableCentralizedSampling.

Which to me sounds that I would be using both rule sets, unless I call that function, after which only the local ones will be used. As I have defined no central rules, the rules I have defined using setSamplingRules are to be used. Maybe I was missing something the first time I read the doc and I didn't pay enough attention, sorry if that is the case, but going a second time over it, it is still unclear that unless I disable centralised sampling, my rules will be ignored.

So if I understood your answer correctly, I agree it is not an aws-xray-daemon issue. I think it is a documentation and UX issue. Thanks for taking the time to answer! I think this will help others that stumble onto this thread.

@willarmiros
Copy link
Contributor

willarmiros commented Apr 11, 2022

I am understanding it as it is mandatory to call disableCentralizedSampling in order for my rules to take effect.

That is correct. There is a default centralized sampling rule that will be used for all requests even if you do not define any centralized sampling yourself if centralized sampling is not disabled.

@rishabh-shastri
Copy link

rishabh-shastri commented May 20, 2023

I have a request : example.com/api/conversation/health-check/ping. I have created a rule in AWS Console with the criteria :

ServiceName = *
ServiceType = *
Host = *
ResourceARN = *
HTTPMethod = *
URLPath = /api/conversation/health-check/ping

Yet the traces are being generated. Anything I am doing wrong?

@atshaw43
Copy link
Contributor

Is Reservoir size and Fixed rate set to 0?
Are you sure you are using centralized sampling rules? Which language are you using?

@puddlewitt
Copy link

I am seeing similar behaviour. I have a block-all centralized rule which should be used by API Gateway to apply a sampling decision of do not sample.

Sampling Rule

"SamplingRule": {
    "RuleName": "block-all",
    "RuleARN": "ARN",
    "ResourceARN": "*",
    "Priority": 100,
    "FixedRate": 0.0,
    "ReservoirSize": 0,
    "ServiceName": "api.mydomain.com/",
    "ServiceType": "*",
    "Host": "*",
    "HTTPMethod": "*",
    "URLPath": "*",
    "Version": 1,
    "Attributes": {}
}

I would expect the above rule to decide do not sample, however I see API Gateway using my block-all rule and deciding should sample. I see the centralized rule trend graph display an increase in traffic as the block-all rule is matched and the FixedRate & ReservoirSize of 0 ignored.

I understand that rules don't always apply perfectly (in a eventually consistent system) however this is reproducible and doesn't appear as a centralized rule syncing issue.

@jj22ee
Copy link
Contributor

jj22ee commented Aug 7, 2023

Hey @rishabh-shastri and @puddlewitt

Do you know how many traces are being generated per second, as in, how many requests/second are being made to APIGW?
If the number of requests per second is very low (about 1 request/s), the sampling rule expected to be applied to the APIGW may not be applied. However, at much higher request/s, the sampling rules will likely be applied much more accurately. If the request/s is very low, the number of traces shouldn't incur a significant X-Ray cost.

@puddlewitt
Copy link

@jj22ee Apologies that it has taken me so long to get back to you. To replicate this I have been executing a load test which runs 9 rqs constant.

@jj22ee
Copy link
Contributor

jj22ee commented Nov 13, 2023

The TPS is fairly low that APIGW may not be applying the sampling rules 100% accurately. This is especially true in the case where you want 0 requests sampled in APIGW. You probably can get close to 0 requests sampled, but this unfortunately may not eliminate all sampling. In this scenario, you may want to disable tracing in that particular APIGW stage as a workaround.

@puddlewitt
Copy link

I ran another load test with increased traffic.

5 minutes @20rps I saw 0 recoded traces. So it works as you suggest.

https://docs.aws.amazon.com/xray/latest/devguide/xray-console-sampling.html

X-Ray uses a best-effort approach in applying sampling rules, and in some cases the effective sampling rate may not exactly match the configured sampling rules. However, over time the number of requests sampled should be close to the configured percentage.

@epomatti
Copy link

Since this threat is still opened, I'll share my situation here which is very similar.

I've created these sampling rules, but even with the Limit to 0r/sec then 0 fixed rate, it keeps recording my health checks.

image

Here I can see that health check traces are being captured. I wouldn't except this to happen.

image

This is how I'm starting the app, which is running on App Runner:

ADD https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar /opt/aws-opentelemetry-agent.jar
ENV JAVA_TOOL_OPTIONS=-javaagent:/opt/aws-opentelemetry-agent.jar
ENV OTEL_PROPAGATORS=xray
ENV OTEL_TRACES_SAMPLER=xray
ENV OTEL_METRICS_EXPORTER=none
ENV OTEL_SERVICE_NAME=MyApp

EXPOSE 8080
ENTRYPOINT ["java", "-jar","*.jar"]

No errors in the logs:

01-14-2024 08:53:09 AM Picked up JAVA_TOOL_OPTIONS: -javaagent:/opt/aws-opentelemetry-agent.jar
01-14-2024 08:53:09 AM OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
01-14-2024 08:53:10 AM [otel.javaagent 2024-01-14 11:53:10:920 +0000] [main] INFO io.opentelemetry.javaagent.tooling.VersionLogger - opentelemetry-javaagent - version: 1.32.0-aws

@jamesoneill
Copy link
Author

Im very surprised this has not been sorted out yet, the problem is clear filtering out urls is not working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants