Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement custom thread pool #158

Open
kgorking opened this issue May 20, 2021 · 2 comments
Open

Implement custom thread pool #158

kgorking opened this issue May 20, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request improvement Makes the code cleaner and/or faster

Comments

@kgorking
Copy link
Owner

kgorking commented May 20, 2021

Using the parallel std works "fine I guess", but because it's so generic, a lot of the meta-information I have available is not used and potential performance is wasted. There is also absolutely no point in dynamically scheduling work when the layout and execution order is known in advance.

My custom thread pool would work with static scheduling, where each thread has a pre-determined list of jobs to complete, where the jobs are running systems on batches of components. The jobs for each thread will be arranged according to data accessed and system dependencies.

Consider 2 systems, each writing to components A and B respectively, on an 8-thread machine. The work for each system results in 5 jobs each.

1 2 3 4 5 6 7 8
A A A A A B B B
B B

Because the two systems are independent, they can be executed in parallel. If I add another system that reads from A and B and writes to C, a naive implementation would schedule the work as follows:

1 2 3 4 5 6 7 8
A A A A A B B B
B B C C C C C

C running on thread 3 now has to pull data from A and B into its cache that are already present on thread 1 and 6. If I instead use cache-aware ordering the work could be rearranged in the following manner to exploit cache temporal locality:

1 2 3 4 5 6 7 8
A A A A A
B B B B B
C C C C C

This now opens up another optimisation: collapsing jobs. Because the 3x5 jobs operate on the same range of components, the jobs can be merged and save 2 passes over the data:

1 2 3 4 5 6 7 8
A
B
C
A
B
C
A
B
C
A
B
C
A
B
C
@kgorking kgorking added enhancement New feature or request improvement Makes the code cleaner and/or faster labels May 20, 2021
@kgorking kgorking self-assigned this May 20, 2021
@kgorking
Copy link
Owner Author

If I time each job on a thread I can get a threads total runtime for all its jobs, which I can then use to shift jobs between threads to ensure the minimum makespan, if one thread runs longer than the others.

@kgorking kgorking linked a pull request May 23, 2021 that will close this issue
@kgorking
Copy link
Owner Author

kgorking commented Nov 4, 2021

Skip-lists for work in each thread should simplify the design

@kgorking kgorking removed a link to a pull request Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request improvement Makes the code cleaner and/or faster
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant