-
-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unity Jobs and Burst #204
Comments
Hi again!
Well, the biggest issue with Burst would be that it's not supported on all platforms (WebGL), but since the Jobs System can be used without Burst then that's not a big issue :) And looks like the Jobs system is available in 2018+? I don't think I'll be supporting anything older than that in future releases anyway, so that should be fine! I'm curious about your numbers and implementation though. Which dataset importers are you testing with? I would expect that most of the work that the SimpleITK importer does can't be made any faster using the jobs system (if it's even possible to use it form a job?), since it deals with managed data and the underlying implementations is already in native code. So in your case, is it the post-processing that takes the most time? More specifically, gradient calculation? Do you know which parts takes the most time? Things like gradient and TF calculation is definitely something that could be improved using jobs! Though, I'm not sure if we even need to store this data on the CPU, so I've also thought of the possibility of using compute shaders for that, but again WebGL is annoying with its lack of compute shader support 😅 So yes, I think this is a great suggestion! Though, I'm curious about exactly what code you moved to the jobs system (not the actual SimpleITK-related stuff I guess?). I'll also do some investigation on my side :)
That has been there since day 1 (when this was just a simple weekend project I was not planning to touch again 😂 ) so I can't say I remember for sure, but I believe it was because of precision. I haven't put any work into investigating if switching to Color32 has a visual impact. But in theory there can potentially be a big difference, between 32 bit floats and 8 bit unsigned integers. Though, whether or not you will notice a difference would likely depend on both your data and how you use the gradients. Since the 2D TF editor is still very basic, it's not that easy to make a good test case 😅 And it might be more noticable if we change how gradients are calculated, as suggested in #198. That would be even slower though, so again - using the Jobs System or compute shaders would likely be needed :) |
Ye i suppose burst can be just disabled if the platform doesnt support it. Compute shaders sounds nice, but also like overkill to me :) The majority time of waiting in the app now is in between bursts, when waiting for the next burst job to be initialized so i also think GPU would not help much more here.
Yes, there is like no point of doing anything with simpleITK. SimpleITK is extremely quick even for very large datasets. I only added burst to all demanding things in postprocessing in VolumeDataset.cs . So now downscaleData, finding min/max, creating default texture, creating gradient texture have burst variants (in my project i also have bunch of other postprocessing regarding segmentation and separate layers where burst helped by a ton) . Creating gradient took always the longest time.
Like in theory yes, but practically to me it seems worth it to switch to Color32 with the amount of memory saved. The one thing is how precisely it is stored, the other thing if a human eye can even see such a precision. It also means extremely large datasets will probably not crash with OutOfMemory exception anymore, cause the gradient previously took extremely lot of space. But it might be because i am mainly working with 1D TF, so it might be different with 2D. |
Ok, thanks for the info! Which dataset did you test with btw? One of the ircadb ones? I guess it must have been a rather big one. But yes, this is definitely a great use case for the Jobs system! I suppose we could easily combine it with the current async code as well. We probably don't want to block the main thread while the importers are doing their work (especially for the OpenDICOM importer, which I know some people use on WebGL and other platforms where the SimpleITK integration isn't available), so we could probably trigger the jobs after the importer has done its work and then have it await the job? Not sure how you've done it, but I'll take a look at your repo :)
Yes, it's not unlikely that Color32 is good enough for 2D transfer functions as well! But I suppose we could support both, and add a setting for this? More code to maintain though, so I'll do some testing and consider just switching to Color32 completely as you suggested. Also, we might also want to consider changing the texture format on the GPU as well. Currently it uses half precision floats when supported. If we switched to |
That 2:30 minutes is for internal hospital specific dataset, which is huge with 730 slices. But even for this dataset SimpleITK is very fast in therms of single seconds. Ircads are piece of cake, the whole loading and processing for them is usually under 10 seconds :D
Yes i have it combined with async methods, it works great. You can take a look here if you want. This class will be probably quite different from your version, cause there are a lot of modifications. But you could see how it works on CreateTextureInternalAsync method quite easily, other methods work with very similar logic.
Ye, that is great idea if it turns out there might be slight visual difference.
Yep, i am already using RGBA32 texture format with Gradients in my project. Havent done performance tests yet, but there should be some improvement. |
Yes, SimpleITK is lightning fast! That's a really big improvement in import time then 😁 I would expect it to be faster with the job system and Burst, but this is even more than I had imagined. Thanks a lot for sharing!
Thank you very much! I'll have a look at it sometime soon. I think this fits nicely together with some other things I wanted to do, such as improving the gradient calculations (#198) and maybe adding some filters (gaussian blur, etc.).
Ok, good! I suspect it's a bit faster, because I saw a big improvement when switching to half precision floating point format. And again, thanks a lot for all you feedback, suggestion and code contributions so far! I've heard a lot of positive feedback regarding the async loading and sphere cutout tool from other people using this plugin :D |
Oh no, im sorry. My formulation of that sentence was confusing. What i ment is, that SimpleITK is so fast, that there is not really point of doing anything with it, because it only takes few seconds to import that huge dataset, and the rest of the time was spent in postprocessing. So that 2:30 ->17 second was achieved by simply adding jobs&burst to postprocessing. I havent messed with SimpleITK.
Filters sounds really nice. Jobbing&bursting the postprocessing should be fairly straightforward. Do you want me to make a PR or do you want to do it yourself, with adding bunch of stuff around it and disclaimers of Unity 2018+ ?
Haha that is nice to hear :D Take these contributions as a big thanks for the amazing library of yours, that made it possible to make an app in reasonable time-window, that surgeons plan to use in real clinical environments :) |
Sure, more user settings is better :D I wonder how different will it look.
Oh, if you are already getting to it i would probably leave it to you :D I havent started with adding my changes to the fork for PR yet.
That is cool. Yes please :) |
I again @SitronX ! Maybe it would be better to replace the original data array, with a NativeArray and just pass that directly to the jobs (and use a lock to make sure it's not modified elsewhere before the job finishes)? However, Unity's default serialisation doesn't seem to work with native containers, and AFAIK there is no way to do custom serialisation, except using OnBeforeSerialize to copy the data to a normal array - which requires another large memory allocation... So switching to NativeArray would break serialisation. Do you have any suggestions? Of course, the original data array being this large is another problem. Even if you have a lot of free memory, you may not be able to do one massive linear memory allocation, so dividing the data up into chunks could help. But I'd also like to avoid doing the extra memory allocation if possible. |
Hi @mlavik1
Well one way is, that it could be just for users that have enough of memory. Users with low memory, would just use the old method?
Yep, this is a good way to avoid that issue from first point, to have everything in NativeArray. I dont know how you want to approach this, but in my solution i didnt launch all jobs at once, i still have it in stages. When one stage finishes, the jobs from next stage are only then started, so the same data is not affected by multiple jobs at once and thus the lock is not really needed. Launching all jobs at once with locks would also make progress reporting very hard.
Well, this serialisation you are talking about, is it regarding saving the data of the volumeObjects? In that case, i havent actually used it in my project, so i didnt look much into it. But i think Nativearrays and Nativecontainers should be serialisable, because there is a thing as NetCode for ECS which should heavily rely on serialisation of native containers to send data. I also found this where there are some examples of serialisation of NativeArrays even in NetCode for gameObjects. It is not the default way to serialise it, but can this be used in your case?
Ye, it is probably good idea to split the original massive array into multiple ones. Tbh i never had any problems with allocation myself, even when i used some massive datasets, but it doesnt hurt to split it just to be safe. |
Yes, I'm starting to think this is the best solution. There's going to be a lot more code to maintain though.. But it's probably worth it :)
Oh yes, that's right. What I meant is that I might need a lock to make sure it doesn't break if the data array is modified from outside the jobs (it's public, and users are free to modify it as they want). But that shouldn't be a big issue anyway.
I should probably have made that more clear, since there are different types of serialisation in Unity now 😅 I've thought a lot about it, but for now maybe the best solution is to:
Another thing I thought about is to write the data to a file in StreamingAssets, but it feels a bit over-kill and would be hard to maintain (delete when no longer referenced, etc..). Though, something like that would probably be needed for large time series dataset (which I've also been requested to implement haha). But for now, I suppose the above might be a good solution? I guess it's mostly the same as you have done, except that I will use the old implementation as a fallback. |
Aha, well sadly i dont have any experience with this, so i cant really help. Maybe some 3rd party open-source serializers exists for this purpose? Or maybe that Netcode could somehow help even if it looks intimidating? Sometimes it is not bad to just borrow a small part from ecosystems like this. Example is right here with Bursts and Jobs that are in DOTS ecosystem, but we purposely ignore the ECS part. It is just and idea though, i havent used NetCode yet so i have no idea how complex it is :D But if there is no other way to do it, i guess duplicating the data with backup solution in case of low memory is the only way for now. |
Thanks for the feedback @SnowWindSaveYou - I think you're right. I'll give that a try when I'm back form holiday, thanks :) |
Hello @mlavik1 ,
I have been recently researching Unity Jobs and Burst compiler. And i tried to incorporate it here with loading, processing data, making textures, etc.... By how your library processes the data, it is really nicely setup for parallel workflow. I have to say results are really nice. My huge dataset with 730 slices previously loaded and processed data around 2:30 minutes. Now it lasts around 17-20 seconds with Jobs+Burst and CPU is not even maxxed out thanks to SIMD etc... Normal datasets are blazing fast in therms of seconds.
I also have other processing and loading of segmentation in my project, so here loading and processing would be even faster.
Default.mp4
Jobs.mp4
Jobs+Burst.mp4
Due to the vectorization and other stuff, it is kinda hard to keep exact track of loading state and it would slow the process by a ton, so i removed the percent reporting and only have stages reporting now, but i think it is kinda pointless to track exact percent when with Burst everything is matter of seconds.
The downside might be that Burst is supported from Unity 2018+ i think, so older Unity versions would not work.
I know you made bunch of progression trackers, even for Gradient texture, but i think having blazing fast load is better than watching the loading indicator :D
What do you think about this?
Edit: Also the question. Is there a reason why Gradient is saved in Color, instead of Color32? Gradient texture takes extremely lot of space and with the use of Color32 the memory usage would be 1/4. To me, the Color32 looks the same and i dont see any loss of detail. Was there a particular reason why you used Color class?
The text was updated successfully, but these errors were encountered: