Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crashes caused by failing to unlock or destroy a static mutex while the app is being terminated #577

Merged
merged 3 commits into from
Sep 22, 2017

Conversation

nguyenhuy
Copy link
Member

@nguyenhuy nguyenhuy commented Sep 20, 2017

This is an interesting crash. Here is an example report in Pinterest app, happening on the static cacheLock mutex of ASImageNode:

screen shot 2017-09-20 at 7 09 29 pm

screen shot 2017-09-20 at 7 09 42 pm

There are other similar crashes related to __staticMutex in ASTextKitContext reported in #136. For instance, see thread #0 and #5 in this report, or thread #0 and #14 here.

Basically, what's happening is that all static mutexes are destroyed when an app is terminated by the system. At the same time, a static mutex is being unlocked or destroyed on another thread. The unlock or destruction yields a non-zero error code, which triggers an assertion, even in production.

A similar issue was discussed and fixed in Realm-Core (realm/realm-core#2243). For Texture, I went with a different approach that is arguably safer, IMHO. This should fix #136.

Mutex initialization using the PTHREAD_MUTEX_INITIALIZER does not immediately initialize the mutex. Instead, on first use, pthread_mutex_timedlock_np() or pthread_mutex_lock() or pthread_mutex_trylock() branches into a slow path and causes the initialization of the mutex. Because a mutex is not just a simple memory object and requires that some resources be allocated by the system, an attempt to call pthread_mutex_destroy() or pthread_mutex_unlock() on a mutex that was statically initialized using PTHREAD_MUTEX_INITIALIZER and was not yet locked causes an EINVAL error.

(https://www.ibm.com/support/knowledgecenter/en/ssw_i5_54/apis/users_65.htm)

@@ -46,28 +46,42 @@ @interface ASBasicImageDownloaderContext ()
@implementation ASBasicImageDownloaderContext

static NSMutableDictionary *currentRequests = nil;
static ASDN::RecursiveMutex currentRequestsLock;
static ASDN::StaticMutex currentRequestsLock = ASDISPLAYNODE_MUTEX_INITIALIZER;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is a recursive mutex in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope for no good reason! Otherwise deadlocks…

@3a4oT
Copy link

3a4oT commented Sep 20, 2017

@nguyenhuy thanks for the agenda and patch, very interesting. I still not understand how it triggers the assertion. Just two questions: 1 Have you reproduced it? And when do you plan next release?;)

Copy link
Member

@Adlai-Holler Adlai-Holler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow it's awesome to know what is going on here.

What do you think about changing assert in ASDISPLAYNODE_THREAD_ASSERT_ON_ERROR into NSCAssert so that we won't get crashes in production? assert is enabled unless NDEBUG is defined – which most apps don't do.

Copy link
Member

@garrettmoon garrettmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nguyenhuy Since this is a failure case when the app exits I'm wondering if there's a solution which doesn't pollute client (to the lock) code. It seems that there is no user facing cost to these crashes?

While this is more formally correct, the react solution seems cleaner and likely has the same user facing impacts?

@@ -46,28 +46,42 @@ @interface ASBasicImageDownloaderContext ()
@implementation ASBasicImageDownloaderContext

static NSMutableDictionary *currentRequests = nil;
static ASDN::RecursiveMutex currentRequestsLock;
static ASDN::StaticMutex currentRequestsLock = ASDISPLAYNODE_MUTEX_INITIALIZER;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope for no good reason! Otherwise deadlocks…

Copy link
Contributor

@maicki maicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @nguyenhuy. I would agree with @garrettmoon. The realm solution seems may a bit more simpler and we can repeat this pattern in other places if this repeats.
Otherwise it may makes it harder to change code were we handling the unlock of the lock.

@nguyenhuy
Copy link
Member Author

@3a4oT To answer your questions:

  • If a static mutex has been destroyed on the main thread during an app exit, calling pthread_mutex_unlock() to unlock it on a background thread causes the method to return a non-zero value. The value then triggers the assertion here.
  • No, I have not been able to reproduce it.
  • Soon™ :P

@nguyenhuy
Copy link
Member Author

nguyenhuy commented Sep 21, 2017

@Adlai-Holler Good idea. I went ahead and convert all assertions in ASThread to ASDisplayNodeCAssert.

@garrettmoon @maicki Thinking again, my approach requires more effort on client-side and so it is not scalable. I changed to the approach used in Realm.

@@ -46,11 +46,12 @@ @interface ASBasicImageDownloaderContext ()
@implementation ASBasicImageDownloaderContext

static NSMutableDictionary *currentRequests = nil;
static ASDN::RecursiveMutex currentRequestsLock;
// Allocate currentRequestsLock on the heap to prevent destruction at app exit (https://github.com/TextureGroup/Texture/issues/136)
static ASDN::StaticMutex& currentRequestsLock = *new ASDN::StaticMutex;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double-checked the code here and I'm 99% sure that this mutex doesn't need to be recursive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool :)

@nguyenhuy
Copy link
Member Author

Merging. Thanks, everyone!

@nguyenhuy nguyenhuy merged commit 8e0aa1e into TextureGroup:master Sep 22, 2017
bernieperez pushed a commit to AtomTickets/Texture that referenced this pull request Apr 25, 2018
…le the app is being terminated (TextureGroup#577)

* Fix crashes caused by failing to unlock or destroy a static mutex while the app is being terminated

* Allocate static mutexes on the heap memory to avoid destruction at app exit

* ASThread to use ASDisplayNodeCAssert() instead of assert()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Crash] ASDN::Mutex::~Mutex()
5 participants