[RFC] Thread spawn hook (inheriting thread locals) #3642

m-ou-se · 2024-05-22T11:53:15Z

Rendered

m-ou-se · 2024-05-22T11:53:51Z

cc @epage - You said you wanted this for your new test framework. :)

m-ou-se · 2024-05-22T12:21:44Z

Implementation, including use in libtest: rust-lang/rust#125405

m-ou-se · 2024-05-22T12:29:52Z

Demonstration:

Code

#![feature(thread_spawn_hook)]

use std::cell::Cell;
use std::thread;

thread_local! {
    static ID: Cell<usize> = panic!("ID not set!");
}

fn main() {
    ID.set(123);

    thread::add_spawn_hook(|_| {
        let id = ID.get();
        Ok(move || ID.set(id))
    });

    thread::spawn(|| {
        println!("1:     {}", ID.get());
        thread::spawn(|| {
            println!("1.1:   {}", ID.get());
            thread::spawn(|| {
                println!("1.1.1: {}", ID.get());
            }).join().unwrap();
            thread::spawn(|| {
                println!("1.1.2: {}", ID.get());
            }).join().unwrap();
        }).join().unwrap();
        thread::spawn(|| {
            ID.set(456); // <-- change thread local `ID`
            println!("1.2:   {}", ID.get());
            thread::spawn(|| {
                println!("1.2.1: {}", ID.get());
            }).join().unwrap();
            thread::spawn(|| {
                println!("1.2.2: {}", ID.get());
            }).join().unwrap();
        }).join().unwrap();
    }).join().unwrap();

    thread::spawn(|| {
        println!("2:     {}", ID.get());
    }).join().unwrap();
}

Output

1:     123
1.1:   123
1.1.1: 123
1.1.2: 123
1.2:   456
1.2.1: 456
1.2.2: 456
2:     123

text/3642-thread-spawn-hook.md

the8472 · 2024-05-22T13:21:44Z

Could this be made more controllable through thread::Builder? We could make it Clone, modify a global default instance and new() would then clone from the default, giving users the option to modify it further (such as modifying the set of hooks) before spawning additional threads.

kennytm · 2024-05-22T13:21:49Z

text/3642-thread-spawn-hook.md

+
+However, the first (global) behavior is conceptually simpler and allows for more
+flexibility. Using a global hook, one can still implement the thread local
+behavior, but this is not possible the other way around.


Hmm sure yes it is likely implementation-wise simpler but I don't think it's conceptually simpler.

In particular we already have the Command::pre_exec which I think is the parallel for Process.

I don't see how Command::pre_exec is related. That provides a closure to run for one specific Command, not a way to hook into all future Commands.

It seems theoretically possible to implement the global behavior via the thread-local behavior, using a thread-local hook that installs itself in the new thread. But I agree that that's not ideal.

@m-ou-se If the local effect isn't thread::Builder::add_spawn_hook() I'm not sure what is meant by "local effect" in this section then 🤔

m-ou-se · 2024-05-22T13:26:58Z

Could this be made more controllable through thread::Builder? We could make it Clone, modify a global default instance and new() would then clone from the default, giving users the option to modify it further (such as modifying the set of hooks) before spawning additional threads.

That might make sense for the stack size, but not for the thread name. And those are (at least today) the only two settings that a Builder has.

I don't think it makes sense to allow 'modifying' the set of hooks. If you want to change/remove any hook, you need a way to identify them. Using a number or string as identifier has many issues, and using some kind of ThreadSpawnHookHandle for each of them seems a bit much, without clear use cases..

the8472 · 2024-05-22T13:35:01Z

The default thread name would be empty, as it is today.
And people have asked for more features in builders such as affinity or thread priorities (or hooks that can take care of those for specific threads): rust-lang/libs-team#195

I don't think it makes sense to allow 'modifying' the set of hooks. If you want to change/remove any hook, you need a way to identify them.

The current proposal provides add. With the builder we could at least have clear to opt-out for some subset of threads.
Since the closures have to be 'static anyway we could just do it based on typeID? The types could be made nameable via existential types.

m-ou-se · 2024-05-22T13:47:31Z

The current proposal provides add. With the builder we could at least have clear to opt-out for some subset of threads. Since the closures have to be 'static anyway we could just do it based on typeID? The types could be made nameable via existential types.

Why would you want to fully clear the hooks? Without knowing which hooks are registered, you can't know which things you're opting out of, so you basically can't know what clear even means. E.g. would you expect clearing these hooks to break a #[test] with threads?

I strongly believe this should just work like atexit() (global, only registering, no unregistering).

(Even if we were to allow some way to skip the hooks for a specific builder/thread, it doesn't make sense to me to have a global "clear" function to clear all the registered hooks (from the 'global builder' as you propose), because you can't really know which other hooks (of other crates) you might be removing.)

Since the closures have to be 'static anyway we could just do it based on typeID?

TypeId is not enough. Two hooks might have the same type but a different value (and different behaviour), e.g. with different captures. (Also if you register already boxed Box<dyn Fn> as hooks, they will all just have the (same) TypeId of that box type.)

the8472 · 2024-05-22T14:15:51Z

Why would you want to fully clear the hooks? Without knowing which hooks are registered, you can't know which things you're opting out of, so you basically can't know what clear even means.

The RFC already states that threads can be in an unhooked state through pthread spawn. And I assume some weird linkage things (dlopening statically linked plugins?) you could end up with a lot more threads not executing hooks?
So unlike atexit it should already be treated as semi-optional behavior.

Additionally I'm somewhat concerned about a tool intended for automatic state inheritance being a global. In the Enterprise Java world such kind of implicit behavior/magic automatisms occasionally get abused to do terribly bloated things which means it's great when that bloat can be limited to a smaller scope or at least be opted out of.
For similar increasingly complex inheritable process properties problems linux systems are trying to occasionally shed this kind of state (e.g. sudo vs. systemd's run0, or process spawn servers instead of fork)

In the context of a test framework I can imagine tests running in sets and then clearing global state (or at least enumerating it to print information what didn't get cleaned up) to decouple tests from each other.

E.g. would you expect clearing these hooks to break a #[test] with threads?

"break" in the sense of not doing output capture, as it already is today? Yeah, sure, if the tests need a clean environment for whatever reason that might be a side-effect.

m-ou-se · 2024-05-22T14:47:44Z

as it already is today?

Today, inheriting the output capturing is done unconditionally when std spawns a thread. There is no way to disable it or bypass it for specific threads (other than just using pthread or something directly). What I'm proposing here is a drop-in replacement for that exact feature, but more flexible so it can be used for other things than just output capturing. It will work in the exact same situations.

I'm not aware of anyone doing anything specifically to avoid inheriting libtest's output capturing (such as using pthread_create directly for that purpose).

Additionally I'm somewhat concerned about a tool intended for automatic state inheritance being a global.

The destructors for e.g. thread_local!{} are also registered globally (depending on the platform). atexit() is global. All these things start with a global building block.

I agree it'd be nice to have some sort of with_thread_local_context(|| { ... }) to create a scope in which all threads spawned will have a certain context, but the building block you need to make that feature is what is proposed in this RFC.

joshtriplett · 2024-05-22T17:03:08Z

I do think we're likely to want deregistration as well, but I think we'd want to do that via a scoped function rather than naming. I don't think it has to be in the initial version in any case.

EDIT: On second thought, I'm increasingly concerned about not having the ability to remove these hooks.

ChrisDenton · 2024-05-22T17:13:01Z

Are there use cases other than testing frameworks in mind? Do they have needs that differ from the testing framework case?

joshtriplett · 2024-05-22T17:15:12Z

I think hooks are going to be quite prone to potential deadlocks. That's not a blocker for doing this, just something that'll need to be documented.

tmccombs · 2024-06-02T22:50:31Z

Are there use cases other than testing frameworks in mind?

I can think of a few other use cases:

creating a new span in a tracing framework for the new thread that uses the thread local context of the parent thread as the parent span.
creating a new logger in TLS with configuration based on the local logger of the parent thread
inheriting a value for a thread local variable across spawning a new thread could have several use cases
possibly recording metrics on when threads are started, although that would probably be most useful if there was some way to hook into thread destruction as well.
Initialize a thread local random number generator seeded by the parent thread's random number generator. I'm not sure if that is advantageous over current approaches to thread local RNGs, but it would be possible with this.

programmerjake · 2024-06-02T23:30:25Z

possibly recording metrics on when threads are started, although that would probably be most useful if there was some way to hook into thread destruction as well.

TLS drop implementations could likely be used for that.

kpreid · 2024-06-03T01:00:30Z

text/3642-thread-spawn-hook.md

+thread afterwards, which will execute them one by one before continuing with its
+main function.
+
+# Downsides


When this feature is used as intended, it will now be the case, where it wasn't before, that Rust threads inherit some characteristics from the thread which spawned them. This could result in unintended nondeterministic behavior when a thread is spawned lazily. For example, rayon lazily spawns its global thread pool the first time a parallel operation is used.

Therefore, I think that

The above should be listed as a downside (it's one more piece of complexity that library authors may need to keep in mind).

There should be a way to opt out in thread::Builder, resulting in the new thread having the same state as if it was created first thing in main().

(In another design I have seen for tasks inheriting state, for a language other than Rust, there was no opt-out, but there was also no equivalent of static items, so it was always possible for the action that captured the local state (here, thread spawning) to be be done exactly at the time of construction of the thing that interacts with the spawned thread, rather than lazily later. Because Rust has static items but a thread can't be spawned at program loading time, I think we need this option.)

kpreid · 2024-06-03T01:29:17Z

text/3642-thread-spawn-hook.md

+
+# Rationale and alternatives
+
+## Use of `io::Result`.


A hazard here is that lazy library authors might write an insufficiently contextful error. In particular, they might use the ? operator to return a real IO error from some kind of per-thread resource creation; such errors are already cryptic, and become more so when separated from their original context.

Therefore, it might make sense to ensure that, if thread::Builder::spawn() fails due to a hook, the error it returns is a wrapped error whose to_string() is something along the lines of "failed to execute thread spawn hook" and whose Error::source() is the error the hook returned. (The message could even include the Location of the set_hook() call, so that when debugging, one knows which hook to blame.)

If that is done, then the error type can, and perhaps should, be something other than io::Error, because there is no advantage to the hook producing an io::Error specifically, and making it an IO error might be misleading. (Use Box<dyn Error + Send + Sync> if nothing better comes to mind.)

text/3642-thread-spawn-hook.md

Co-authored-by: Kevin Reid <[email protected]>

m-ou-se force-pushed the thread-spawn-hook branch 2 times, most recently from dfbcd39 to de2c8b3 Compare May 22, 2024 11:57

m-ou-se changed the title ~~[RFC] Thread spawn hook~~ [RFC] Thread spawn hook (inheriting thread locals) May 22, 2024

Add thread_spawn_hook rfc.

2acdaeb

m-ou-se force-pushed the thread-spawn-hook branch from de2c8b3 to 2acdaeb Compare May 22, 2024 12:07

m-ou-se mentioned this pull request May 22, 2024

Add std::thread::add_spawn_hook. rust-lang/rust#125405

Open

epage reviewed May 22, 2024

View reviewed changes

text/3642-thread-spawn-hook.md Show resolved Hide resolved

Add relevant history.

71364c4

epage mentioned this pull request May 22, 2024

Capturing stdout/stderr in custom test harnesses rust-lang/testing-devex-team#8

Open

kennytm reviewed May 22, 2024

View reviewed changes

Add note on a global registration lang feature.

1e18872

kpreid reviewed Jun 3, 2024

View reviewed changes

Suggest Once instead of AtomicBool.

17da70a

Co-authored-by: Kevin Reid <[email protected]>

Amanieu removed the I-libs-api-nominated Indicates that an issue has been nominated for prioritizing at the next libs-api team meeting. label Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Thread spawn hook (inheriting thread locals) #3642

[RFC] Thread spawn hook (inheriting thread locals) #3642

m-ou-se commented May 22, 2024 •

edited

Loading

m-ou-se commented May 22, 2024

m-ou-se commented May 22, 2024 •

edited

Loading

m-ou-se commented May 22, 2024

the8472 commented May 22, 2024 •

edited

Loading

kennytm May 22, 2024

m-ou-se May 22, 2024

joshtriplett May 22, 2024

kennytm May 22, 2024

m-ou-se commented May 22, 2024 •

edited

Loading

the8472 commented May 22, 2024 •

edited

Loading

m-ou-se commented May 22, 2024

the8472 commented May 22, 2024

m-ou-se commented May 22, 2024

joshtriplett commented May 22, 2024 •

edited

Loading

ChrisDenton commented May 22, 2024

joshtriplett commented May 22, 2024

tmccombs commented Jun 2, 2024

programmerjake commented Jun 2, 2024

kpreid Jun 3, 2024

kpreid Jun 3, 2024

[RFC] Thread spawn hook (inheriting thread locals) #3642

Are you sure you want to change the base?

[RFC] Thread spawn hook (inheriting thread locals) #3642

Conversation

m-ou-se commented May 22, 2024 • edited Loading

m-ou-se commented May 22, 2024

m-ou-se commented May 22, 2024 • edited Loading

m-ou-se commented May 22, 2024

the8472 commented May 22, 2024 • edited Loading

kennytm May 22, 2024

Choose a reason for hiding this comment

m-ou-se May 22, 2024

Choose a reason for hiding this comment

joshtriplett May 22, 2024

Choose a reason for hiding this comment

kennytm May 22, 2024

Choose a reason for hiding this comment

m-ou-se commented May 22, 2024 • edited Loading

the8472 commented May 22, 2024 • edited Loading

m-ou-se commented May 22, 2024

the8472 commented May 22, 2024

m-ou-se commented May 22, 2024

joshtriplett commented May 22, 2024 • edited Loading

ChrisDenton commented May 22, 2024

joshtriplett commented May 22, 2024

tmccombs commented Jun 2, 2024

programmerjake commented Jun 2, 2024

kpreid Jun 3, 2024

Choose a reason for hiding this comment

kpreid Jun 3, 2024

Choose a reason for hiding this comment

m-ou-se commented May 22, 2024 •

edited

Loading

m-ou-se commented May 22, 2024 •

edited

Loading

the8472 commented May 22, 2024 •

edited

Loading

m-ou-se commented May 22, 2024 •

edited

Loading

the8472 commented May 22, 2024 •

edited

Loading

joshtriplett commented May 22, 2024 •

edited

Loading