Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnmarshalError: failed decoding error message on PutTelemetryRecord API call #107

Open
srprash opened this issue Mar 15, 2021 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@srprash
Copy link
Collaborator

srprash commented Mar 15, 2021

AWS X-Ray Daemon occasionally fails to send telemetry record to the service using the PutTelemetryRecord API. The debug logs would show the error message like this:

2020-05-10T11:53:00Z [Info] Successfully sent batch of 1 segments (2.943 seconds)
2020-05-10T11:53:47Z [Debug] Send 1 telemetry record(s)
2020-05-10T11:54:49Z [Debug] Send 1 telemetry record(s)
2020-05-10T11:55:51Z [Debug] Failed to send telemetry 1 record(s). Re-queue records. SerializationError: failed to unmarshal response error
	status code: 400, request id: 
caused by: UnmarshalError: failed decoding error message
	00000000  3c 68 74 6d 6c 3e 0d 0a  3c 68 65 61 64 3e 3c 74  |<html>..<head><t|
00000010  69 74 6c 65 3e 34 30 30  20 42 61 64 20 52 65 71  |itle>400 Bad Req|
00000020  75 65 73 74 3c 2f 74 69  74 6c 65 3e 3c 2f 68 65  |uest</title></he|
00000030  61 64 3e 0d 0a 3c 62 6f  64 79 20 62 67 63 6f 6c  |ad>..<body bgcol|
00000040  6f 72 3d 22 77 68 69 74  65 22 3e 0d 0a 3c 63 65  |or="white">..<ce|
00000050  6e 74 65 72 3e 3c 68 31  3e 34 30 30 20 42 61 64  |nter><h1>400 Bad|
00000060  20 52 65 71 75 65 73 74  3c 2f 68 31 3e 3c 2f 63  | Request</h1></c|
00000070  65 6e 74 65 72 3e 0d 0a  3c 2f 62 6f 64 79 3e 0d  |enter>..</body>.|
00000080  0a 3c 2f 68 74 6d 6c 3e  0d 0a                    |.</html>..|

caused by: invalid character '<' looking for beginning of value

What appears to happen is that the daemon may be sending some wrong or missing parameters for the PutTelemetryRecord API generating the 400 error and then is unable to process the response which is an html instead of json.

This causes a lot of noise in the debug logs and needs to be handled by the daemon.
I did some digging and found out that when a telemetry record fails to send, it is re-queued with the next batch and sometimes the next batch would be sent out successfully. This is confusing since the 400 status code in the first place indicated that the telemetry record may be incorrect. I tried printing some of the telemetry records and found that they were fine as the only requird field for the TelemetryRecord is the timestamp.

I will need to talk to the service team to find the root cause of the issue.

@nicolascb
Copy link

Any status?

@stale
Copy link

stale bot commented Jan 8, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in next 7 days. Thank you for your contributions.

@stale stale bot added the stale label Jan 8, 2022
@NathanielRN NathanielRN added bug Something isn't working and removed stale labels Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants