Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition with ReconnectableObservable #2

Open
ndglover opened this issue May 26, 2016 · 14 comments
Open

Race condition with ReconnectableObservable #2

ndglover opened this issue May 26, 2016 · 14 comments

Comments

@ndglover
Copy link

Hi Dave,

I believe there is a race condition in the logic for Disposable versus new subscribers. I can't seem to reproduce it with a unit test using TestSchedulers. I can though using more realistic code (see attached), albeit not every time. What seems to happen is the RefCount drops to zero and the disposable is scheduled but not executed, subsequently a new subscriber comes in and gets the existing subject, then the subject is disposed beneath them and no more updates come though.

It only happens when you have a .SubscribeOn

Wondered if you had any thoughts on this?

Crude test code
CrudeTestCode.txt

@RxDave
Copy link
Owner

RxDave commented May 26, 2016

Interesting, thanks for reporting this. I haven't thought it through fully yet, but I can see how there may be a race condition. Perhaps the simplest solution is to dispose of the source subscription within the gate?

@ndglover
Copy link
Author

ndglover commented May 26, 2016

I think that would be a valid improvement but I'm not sure it fixes the issue. I've changed the code and tested it to be sure. It's a lot more reliable now but there is still a race I believe. I can imagine a scenario where you call dispose which is then scheduled, following this control returns and a new subscriber gets returned the existing subject, then the scheduled dispose runs and nulls out the subscription. Does that sound plausible?

Albeit in the case of my test code it's using Scheduler.Default so I wouldn't have thought the above could happen. Presumably subscribers would need to be using different schedulers for this to be the case. I guess that means something else is happening.

@RxDave
Copy link
Owner

RxDave commented May 26, 2016

Hmm I see, it definitely sounds plausible. I wonder whether it's an issue with the operator though or just a higher-level problem.

In the problem case, user code calls Dispose followed by Subscribe; however, SubscribeOn is causing disposal to be deferred, and while it's deferred the Subscribe call occurs. Because Dispose was actually invoked first by user code, the expectation Is that the subsequent call to Subscribe must occur after Dispose, but in fact it could happen in the reverse order due to SubscribeOn. This is a realistic race condition for any hot operator when using SubscribeOn.

Consider an attempt at a solution to the problem in this operator:

  1. When Subscribe is called, we need to know whether Dispose was already scheduled by SubscribeOn, and if so, return the next subject rather than the current subject.
  2. We obviously can't know that, so Subscribe must always returns the current subject.
  3. Therefore, if there's a solution to this problem then it must be within Dispose.
  4. When Dispose is called, we need to know whether any active subscriptions are referencing the subject, and whether any of them were called in user code after Dispose, and if so, cherry pick the latter subscriptions from the source, dispose of the original connection (it still has to happen for its side-effects), recreate a new subject and finally re-subscribe all of the latter subscriptions to the new subject.
  5. We obviously can't know whether SubscribeOn was applied at a higher level, so we actually don't have any information to go on about which subscriptions we'd need to cherry pick (barring the perhaps impossible task of accomplishing that goal regardless).

Therefore, this seems like a higher-level problem to me. Am I missing anything?

@ndglover
Copy link
Author

I agree with you in regards to points 1-5. I still think it is an issue with the operator though. I'm not sure what you mean by higher-level problem. The behavior is not one you would expect worse still it appears to the final subscriber as if they are subscribed but just moments later is disposed. If the original hot sequence that was being multicast has subscription side effects like creating a new socket connection these also get executed after the disposal. This new hot sequence is then never subscribed to.

I can think of 2 options for solving this potentially.

  1. Enforce an IScheduler is passed in. Then call SubcribeOn for the Subject and _source. I believe this might fix the ordering of dispose and subscribe. I'm not 100% confident this would fix it but it seems to work in testing, possibly its just that little bit of trampoline that makes it work more often that not.
  2. Combine the RefCounting and Subscription logic into one operator that returns IObservable. I've used this approach in the past with RefCountDisposable. This would be a big deviation from the current operator though.

I think I'm still missing something though. Given the .Multicast sequence is using a single scheduler why could they be out of order. Is it just because of the Subject vs the _source. So we have subscribed to the _source on the same scheduler but the Subject is on the current thread at the time that Subscribe is called. That's what led me to try option 1.

Let me know what you think.

@RxDave
Copy link
Owner

RxDave commented May 27, 2016

What I meant by "higher level problem" is that it's not the operator's fault that it doesn't avoid race conditions between Dispose and Subscribe in user code. You have to fix the problem in your code instead by ensuring that these methods are called in the "correct" order.

Requiring an IScheduler in the operator doesn't seem like a good solution to me. This operator doesn't need to introduce concurrency, which is what parameterizing with an IScheduler implies, and it certainly shouldn't attempt to force calls to its Dispose and Subscribe methods in some order, which may or may not be correct.. It would also be adding complexity to avoid an edge-case problem.

I believe it's an edge case, because using SubscribeOn is an anti-pattern in Rx. SubscribeOn is one of those operators that are rarely used, and often when it's used it's incorrect to use it in the first place. You should reconsider your use of it, if in fact it's causing the problem.

Your point about using the same IScheduler for disposing and subscribing is a good one though, assuming that we're correct about what's actually causing the strange behavior that you're observing. I could see this problem happening when you have multiple subscriptions, some with SubscribeOn and some without; however, this makes me think even more that it's a higher-level problem. Ultimately, the point is that this operator isn't introducing concurrency, so if there's a race condition it's because that concurrency is being introduced outside of the operator. If the concurrency is causing the Dispose and Subscribe methods to be invoked in an unexpected order, then you'll just have to somehow ensure that the order is correct yourself. That's the contract for this operator. It's similar to the Rx grammar, which requires you to call your OnNext's before your OnError or OnCompleted.

This is potentially a problem with any operator that cares about the order in which Dispose and Subscribe are called; however, I can't think of any other operator that does, so perhaps it is actually a unique problem with this operator.

@RxDave
Copy link
Owner

RxDave commented May 27, 2016

I just examined your code sample again and noticed that you're applying the SubscribeOn operator and then subscribing to the resulting observable twice. I forgot that SubscribeOn returns a cold observable, so that would do it. Subscription and disposal may occur in parallel between each usage!

I suspect that SubscribeOn isn't what you want to be using here. Have you considered using ObserveOn instead?

Edit: To be clear, the reason is because when you subscribe to SubscribeOn's observable, it schedules the subscription and eventually the disposal (on the thread pool, in your example), but it doesn't serialize these calls with other subscriptions to SubscribeOn's observable. Each subscription independently causes its own side effects, which means that the calls to Subscribe and Dispose on the underlying observable may occur simultaneously with respect to multiple subscriptions to SubscribeOn's observable.

@ndglover
Copy link
Author

hmm... I hadn't ever considered SubscribeOn to be an anti-pattern. The logic here is the work inside the subscribe operator might take a little time so you want to offload this to the threadpool. It's not a use case for ObserveOn.

That said I wasn't aware that 2 independent subscriptions could be running simultaneously when using SubscribeOn with the same scheduler. I thought things where run in the order they are scheduled hence any call to .Dispose would be scheduler to occur before the next Subscribe in this case.

Consider the use case were this multicast observable is exposed via a service IService. This service is injected into 2 view controllers that independently subscribe and dispose exactly as my test code does. How would you solve this race condition? I don't believe you'd even be aware of it. No other RX operator springs to mind that would exhibit this behavior. Hence I think it must be solved in the operator itself. Just my opinion and happy to be convinced otherwise.

@RxDave
Copy link
Owner

RxDave commented May 27, 2016

How could it be solved in the operator? Using an IScheduler and SubscribeOn won't help because the operator can't know in which order you intended for Dispose and Subscribe to be called. You just have to be sure that you call them in the order that you intended!

@RxDave
Copy link
Owner

RxDave commented May 27, 2016

IScheduler doesn't care about ordering in general. You're using ThreadPoolScheduler, so it will just call QueueUserWorkItem for each action immediately. The problem lies in the fact that you have two separate subscriptions to SubscribeOn's observable. Each subscription can independently schedule Dispose or Subscribe on the underlying observable. For a given subscription, they'll be scheduled in the "correct" order. But between the two subscriptions, all bets are off. That's the cold observable model.

@RxDave
Copy link
Owner

RxDave commented May 27, 2016

Take a look at the SubscribeOn implementation, it may help. Consider that it returns an observable that you're subscribing to twice, so each subscription can schedule the call to Dispose independently of the other. The operator doesn't attempt to serialize your calls between subscriptions. Each subscription is entirely independent of any other.

public static IObservable<TSource> SubscribeOn<TSource>(IObservable<TSource> source, IScheduler scheduler)
{
    if (source == null)
    {
        throw new ArgumentNullException("source");
    }
    if (scheduler == null)
    {
        throw new ArgumentNullException("scheduler");
    }
    return new AnonymousObservable<TSource>(delegate(IObserver<TSource> observer)
    {
        SingleAssignmentDisposable singleAssignmentDisposable = new SingleAssignmentDisposable();
        SerialDisposable d = new SerialDisposable();
        d.Disposable = singleAssignmentDisposable;
        singleAssignmentDisposable.Disposable = scheduler.Schedule(delegate
        {
            d.Disposable = new ScheduledDisposable(scheduler, source.SubscribeSafe(observer));
        });
        return d;
    });
}

@RxDave
Copy link
Owner

RxDave commented May 27, 2016

Hmm, now I'm thinking that this isn't your problem to begin with.

What seems to happen is the RefCount drops to zero and the disposable is scheduled but not executed, subsequently a new subscriber comes in and gets the existing subject, then the subject is disposed beneath them and no more updates come though.

This couldn't be the case because SubscribeOn happens before RefCount; i.e., the call to Dispose is scheduled, and then RefCount decrements whenever it gets called, not the other way around.

@RxDave
Copy link
Owner

RxDave commented May 27, 2016

Well, to be clear, the order in which Dispose (on the disposable returned by Connect) and Subscribe are called may still be the problem. And that's an issue because of the race condition that you're introducing with SubscribeOn.

@RxDave
Copy link
Owner

RxDave commented May 27, 2016

Perhaps it's the following sequence of events that's causing the issue.

  1. Observer A calls Subscribe.
    1. SubscribeOn schedules the call to RefCount.Subscribe.
  2. The scheduled call executes:
    1. RefCount calls Multicast.Subscribe.
    2. Multicast creates a new subject.
    3. RefCount calls Multicast.Connect.
  3. Observer A calls 'Dispose' at the same time as observer B calls 'Subscribe'.
    1. SubscribeOn schedules both calls (race) to RefCount.Dispose and RefCount.Subscribe.
    2. RefCount.Dispose and RefCount.Subscribe execute in parallel (multi-core).
    3. RefCount calls Dispose (multicast subscription) and Multicast.Subscribe in parallel.
      1. I checked the implementation of RefCount and it doesn't synchronize these calls.
    4. Multicast.Subscribe (observer B) subscribes to the same subject as observer A.
    5. RefCount.Dispose lock is acquired for observer A before RefCount.Subscribe for observer B:
      1. Count goes to 0.
      2. RefCount calls Dispose on the Multicast connection.
      3. Multicast nulls the subject (same for observer A and B).
    6. RefCount.Subscribe lock is acquired for observer B:
      1. Count goes to 1.
      2. RefCount calls Multicast.Connect.
      3. Multicast creates a new subject.

Therefore, perhapsRefCount is actually the "problem" because it doesn't serialize the calls to Dispose (connection) and Subscribe like you're expecting. It allows them to race. Although, I'm not sure if that's really a "problem" or ultimately this is just an unintended consequence of using SubscribeOn.

@RxDave
Copy link
Owner

RxDave commented May 27, 2016

You could prove my hypothesis by creating a new version of RefCount that calls Dispose and Subscribe within the lock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants