Posts: 21
Threads: 5
Joined: Aug 2023
Reputation:
0
I'm getting a major performance hit from my obi ropes, and it's significantly worse when using the Burst as the backend rather than Oni. FPS is ~75 with all ropes disabled, 30 with Burst, and 40 with Oni.
Using Unity 2022.3.13f1
I've got 11 ropes in the scene - each has its own solver.
So the story is...
After looking into some performance issues today, I noticed that our obi ropes were the biggest offender. I googled around and realized that I didn't have all the packages installed that I needed for it to use Burst, which should theoretically improve performance a lot. I installed what I needed and I no longer got the package dependency warning, but performance tanked badly. At that point, I was getting an error similar to the one here:
http://obi.virtualmethodstudio.com/forum...-3492.html
I do have the Jobs debugger, safety checks, and leak detection all disabled.
I've tried a few different things to improve performance. It helped a bit, but still isn't good enough. Specifically what I've done is:
- Reduced solver steps from 4->3
- Reduced rope resolution
- Increased decimation
Also tried switching to the line renderer instead of extruded mesh, but that didn't make a difference, presumably because the performance issues are coming from the simulation rather than the rendering.
I also tried installing the old Jobs package individually, but that didn't seem to make any difference either.
Let me know if you want any specific profiling info - I wasn't sure what would be useful and didn't want to flood this post with unnecessary info.
I suspect that my best bet for a big improvement is getting the Burst backend to work properly, but something is definitely wrong there. Any ideas what might be the issue and how I can solve it?
Posts: 6,347
Threads: 24
Joined: Jun 2017
Reputation:
400
Obi Owner:
13-03-2024, 12:47 PM
(This post was last modified: 13-03-2024, 12:48 PM by josemendez.)
Hi!
Quote:I've got 11 ropes in the scene - each has its own solver.
Make sure all solvers are updated by the same updater component: if you also have one updater per rope then you're taking no advantage of multithreading.
When faced with any performance issue, it's always a good idea to use the profiler as it will tell you (and us) what's exactly the problem. Please run your app with deep profiling enabled for a few seconds, then export the profiling session and either post it here (if small enough) or send it to support(at)virtualmethodstudio.com
kind regards,
Posts: 21
Threads: 5
Joined: Aug 2023
Reputation:
0
13-03-2024, 03:37 PM
(This post was last modified: 14-03-2024, 11:11 AM by ShawnF.)
Thanks for the quick response!
They're all on the same updater component, so that's not the issue at least.
Do you think it's worth the effort of moving all the ropes to share one solver? Would that make a difference?
EDIT - Updated the links here...
Anyway, here's the profile data. I ran it in an almost empty level with just the basic game manager / player stuff and a bunch of ropes.
Oni: https://drive.google.com/file/d/1U3mr45Z...drive_link
Burst: https://drive.google.com/file/d/1ehNKe2u...drive_link
Posts: 21
Threads: 5
Joined: Aug 2023
Reputation:
0
Hey, just wanted to ping this again since I think the thread may have gotten forgotten. Any ideas on what could be causing me to not be able to use the Burst backend / why it seems to be slower than Oni?
Posts: 6,347
Threads: 24
Joined: Jun 2017
Reputation:
400
Obi Owner:
19-03-2024, 07:57 AM
(This post was last modified: 19-03-2024, 08:08 AM by josemendez.)
(13-03-2024, 03:37 PM)ShawnF Wrote: Hey, just wanted to ping this again since I think the thread may have gotten forgotten.
Yes, it flew under my radar - thanks for bumping it, and please accept my apologies!
(13-03-2024, 03:37 PM)ShawnF Wrote: Do you think it's worth the effort of moving all the ropes to share one solver? Would that make a difference?
It will save up some memory, but performance wise there should not be any discernible difference.
(13-03-2024, 03:37 PM)ShawnF Wrote: Anyway, here's the profile data. I ran it in an almost empty level with just the basic game manager / player stuff and a bunch of ropes.
Oni: https://drive.google.com/file/d/1U3mr45Z...drive_link
Burst: https://drive.google.com/file/d/1ehNKe2u...drive_link
Both sessions seem to be profiling completely different scenes? In the Oni one there's around 12 ropes, but only 3 in Burst?... it's pretty difficult to compare performance in these conditions.
Also according to your profiling data, Burst seems to be consistently faster than Oni by a small margin - even with deep profiling enabled. I'm not sure why you're experiencing the opposite? Note that the performance difference between both backends isn't huge (except on mobile platforms, where Burst is clearly faster) since both use very similar technology: Oni is hand-written multithreaded/vectorized code that comes precompiled, while Burst is auto-vectorized and compiled when building the standalone. At the end of the day however, both make use of multiple threads and SIMD.
Keep in mind that deep profiling injects a lot of extra measurement calls in C#, making it run a lot slower than usual. Oni is unaffected by this since it's precompiled C++, so Unity is unable to add any extra stuff into it. As a result, Burst's performance is hindered a lot more by deep profiling than Oni's.
Here's screenshots from your profiling sessions:
Oni: 8.1 ms/frame
Burst: 4.7 ms/frame
In the Burst one you can see that the work done by Burst (thin aquamarine lines at the bottom) takes less time than the work done by Oni in the top screenshot, however there's a lot more waiting time between actual work, due to the profiling overhead. Subtracting the time worker threads spend waiting from the total time, Burst is faster than Oni. But then again, there's less ropes in the Burst scene so any comparisons are moot.
To hone in on smaller differences, it would be important to profile the *exact same* scene using both. Otherwise it is hard to extract conclusions, since the work done by each backend is completely different.
kind regards,
Posts: 21
Threads: 5
Joined: Aug 2023
Reputation:
0
Ack, sorry about that. I actually did do a profile of them both on the same map , but uploaded the wrong files. :/ I just ran it again from scratch to be 100% sure that everything was exactly the same. You can find the profiles here:
Oni: https://drive.google.com/file/d/11aDctSf...drive_link
Burst: https://drive.google.com/file/d/1caXFWU8...drive_link
In this version, I'm getting an average of ~110 FPS with Oni and ~30 with Burst. It's a pretty big difference, which makes me think that the burst version isn't working correctly for some reason.
And sorry about the slow response - I was gone for GDC/vacation and didn't think to follow up on this while I was away.
Posts: 6,347
Threads: 24
Joined: Jun 2017
Reputation:
400
Obi Owner:
02-04-2024, 07:52 AM
(This post was last modified: 02-04-2024, 07:55 AM by josemendez.)
(01-04-2024, 04:15 PM)ShawnF Wrote: In this version, I'm getting an average of ~110 FPS with Oni and ~30 with Burst. It's a pretty big difference, which makes me think that the burst version isn't working correctly for some reason.
Hi!
As explained before, deep profiling will have a huge negative effect on Burst while having no effect at all in Oni. It's useful to determine what's slowing down some specific piece of code (that's the reason why I initially asked you for a deep-profiled session of Burst), but not when making A/B tests or absolute performance measures.
Quote:Keep in mind that deep profiling injects a lot of extra measurement calls in C#, making it run a lot slower than usual. Oni is unaffected by this since it's precompiled C++, so Unity is unable to add any extra stuff into it. As a result, Burst's performance is hindered a lot more by deep profiling than Oni's.
The reason is that deep profiling tells Unity to disable all compile-time optimizations and inserts extra instrumentation instructions in the code to measure time taken by every single function call, which makes it run a lot slower.
Burst is compiled by Unity, so it is affected by this. However Oni comes as a precompiled C++ library - essentially a black box- which means it can't be changed in any way by Unity and is hence unaffected by deep profiling.
As a result, Burst-compiled code will basically behave as regular C# when using deep profiling (or when using a development built with deep profiling support enabled), and you'll lose most of its benefits.
Furthermore, in your case FixedUpdate is being called 5 (!!) times per frame when using Burst:
but only 2 when using Oni:
This is a quite bad case of death spiraling, which makes me suspect your project either has a very large maximum allowed timestep or a very small fixed timestep.
- How's performance with deep profiling disabled?
- What are your project's timestep settings?
Posts: 21
Threads: 5
Joined: Aug 2023
Reputation:
0
So to clarify, the 100 vs 30 fps was with no profiling enabled. That's just the default state.
Fixed Timestep is the default 0.02. Maximum Allowed Timestep is set to 0.1, although to be honest I don't remember changing it and don't remember why I did so. My guess is that I was just testing something temporarily and forgot to change it back.
Posts: 6,347
Threads: 24
Joined: Jun 2017
Reputation:
400
Obi Owner:
02-04-2024, 08:54 AM
(This post was last modified: 02-04-2024, 08:55 AM by josemendez.)
(02-04-2024, 08:52 AM)ShawnF Wrote: So to clarify, the 100 vs 30 fps was with no profiling enabled. That's just the default state.
Could you share profiling sessions with deep profiling disabled, to be able to compare both Burst and Oni on the same grounds?
(02-04-2024, 08:52 AM)ShawnF Wrote: Fixed Timestep is the default 0.02. Maximum Allowed Timestep is set to 0.1, although to be honest I don't remember changing it and don't remember why I did so. My guess is that I was just testing something temporarily and forgot to change it back.
0.1/0.02 = 5, so that's consistent with the profiling results. This will allow Unity to update physics at most 5 times per frame, which can be a lot. Typically your max allowed timestep should be a small multiple of your timestep, something like 0.04 (2 updates per frame) or 0.06 (3 updates per frame).
Posts: 21
Threads: 5
Joined: Aug 2023
Reputation:
0
Thanks for the heads up about maximum fixed timestep - I'll set it to 0.06 for now and read up on it a bit to make sure I understand it better.
In the meantime, here's the non-deep profiled versions:
Oni: https://drive.google.com/file/d/1Xp1TaIh...drive_link
Burst: https://drive.google.com/file/d/1H6sbukm...drive_link
|