Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Long-poll still causing timeouts #2520

Closed
remvst opened this issue Jan 19, 2021 · 11 comments
Closed

HTTP Long-poll still causing timeouts #2520

remvst opened this issue Jan 19, 2021 · 11 comments

Comments

@remvst
Copy link
Contributor

remvst commented Jan 19, 2021

I'm currently building a signaling server that acts as an intermediate between clients and Janus. The server has a connection-poll job queue that queries the session ID/plugin handle until it gets a 404. If it gets a 200, it runs the same job again. This part is working, I'm able to send and receive audio/video to/from Janus.

However, it seems like Janus incorrectly detects session timeouts. The session timeouts are set to the default 60 seconds, and my polling job is querying the endpoints much more often than that (at least once every 30 seconds).
After 60 seconds, even though I am still polling the connection, Janus closes the session automatically.

Reading the docs, I believe there should be no timeout:

for a more prolonged inactivity with respect to messaging, on plain HTTP the session is usually kept alive through the regular long poll requests, which act as activity as long as the session is concerned.

I'm using v0.10.9 and the video room plugin on an Ubuntu VM on gcloud, but have seen similar results within docker-compose on an older version.

This seems like a bug, but I'm wondering if I'm misreading the docs.

@atoppi
Copy link
Member

atoppi commented Jan 20, 2021

You have not misread the docs, everything should work as intended.
I'm unable to reproduce though.
Are you sure that connections are not being dropped due to having a high number of them ? (check this recent PR)
Does the same happen if you switch to another transport (like websocket) ?

@atoppi
Copy link
Member

atoppi commented Jan 20, 2021

Also are you able to replicate in a local setup ?
Do you have any logs to share ? Janus will log any inbound keepalive request.

@remvst
Copy link
Contributor Author

remvst commented Jan 20, 2021

@atoppi thank you for the quick reply. After looking at it with my coworker, it seems like it was an authentication issue on the GET requests (we were seeing some 403s on Janus logs). This line in the docs explains what I should have done:

The same applies for the long poll GET messages as well, which will need to contain the apisecret as a query string parameter.

I was originally passing the API secret like this (Node.js + axios):

axios.get(url, { apisecret: 'foo' })

This seems to fix it:

axios.get(url + '?apisecret=foo', data);

However, even though this fixed it, the fact that the requests were working before that (I was able to receive offers and answers from Janus and pass them to the clients) seems to indicate that there is a bug within Janus itself. Maybe it read the API secret and responded but didn't update the last_activity field?

@lminiero
Copy link
Member

No bug: if you provide the wrong apisecret, or don't provide it at all, the request never gets to the core, but stays in the transport (where you get the 403). Since it doesn't reach the core, it doesn't count as activity that can reset the last activity flag.

@lminiero
Copy link
Member

Oh I think I see what you mean: you were indeed getting a response to (some?) long polls, even though the API secret was passed the wrong way, and so the transport shouldn't have had access to it. The face you see 403s seems to suggest it rejected at least some of them. I'll have to check if there are instances where the long poll may be done even when the API secret isn't correctly provided (which shouldn't happen). For requests where you did get events, those did reset the last activity counter.

@remvst
Copy link
Contributor Author

remvst commented Jan 20, 2021

Oh I think I see what you mean: you were indeed getting a response to (some?) long polls, even though the API secret was passed the wrong way

I don't think I actually ever saw a 403 coming back to me (I don't remember the job failing at all). I only ever saw 403s in the Janus logs, but I don't think I received them.
All poll (GET) requests were done with that incorrect API secret, and I was able to receive events such as webrtcup (and probably some others that I don't remember).
What I would have expected would be to be rejected entirely and not receive any events, which I don't think Janus currently does?

@lminiero
Copy link
Member

Yes, that's what should happen, this is the code where we do that for long polls:
https://github.com/meetecho/janus-gateway/blob/master/transports/janus_http.c#L1534

You can try adding some manual log lines to the is_api_secret_valid method and the value it returns to see if it's misbehaving for you there.

@lminiero
Copy link
Member

lminiero commented Jan 20, 2021

Wait, looking at the code I think I know what may be happening... these are the checks:

if(gateway->is_api_secret_valid(&janus_http_transport, secret)) {
	/* API secret is valid */
	secret_authorized = TRUE;
}
if(gateway->is_auth_token_valid(&janus_http_transport, token)) {
	/* Token is valid */
	token_authorized = TRUE;
}
/* We consider a request authorized if either the proper API secret or a valid token has been provided */
if(!secret_authorized && !token_authorized) {
	response = MHD_create_response_from_buffer(0, NULL, MHD_RESPMEM_PERSISTENT);
	janus_http_add_cors_headers(msg, response);
	ret = MHD_queue_response(connection, MHD_HTTP_FORBIDDEN, response);
	MHD_destroy_response(response);
	goto done;
}

The if is an and of !secret_authorized and !token_authorized, though: since you only enabled the API secret but not tokens, is_auth_token_valid will always return TRUE meaning it will proceed even if it shouldn't. Can you check if turning the && to || in the if makes it work as expected for you?

@remvst
Copy link
Contributor Author

remvst commented Jan 20, 2021

Thank you, I will try that later (will have to figure out how to compile from OSX).

I believe an easy test setup will be:

  1. setup Janus with API secret
  2. create a session
  3. poll that session while passing the secret incorrectly
  4. expected: 403 right away, actual: 200 keep-alive after 30 seconds???

@remvst
Copy link
Contributor Author

remvst commented Jan 20, 2021

I'm looking at the code right now, and I believe you are correct. is_auth_token_valid() returns TRUE if token auth is disabled:

gboolean janus_transport_is_auth_token_valid(janus_transport *plugin, const char *token) {
	if(!janus_auth_is_enabled())
		return TRUE;
	return token && janus_auth_check_token(token);
}

which will cause the condition to be falsy.

Will still double check later and submit a PR if I manage to compile it :)

@lminiero
Copy link
Member

Closing as I just merged the PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants