Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Bug / Crash  Performance with Burst on is very bad
#11
(02-04-2024, 09:30 AM)ShawnF Wrote: Thanks for the heads up about maximum fixed timestep - I'll set it to 0.06 for now and read up on it a bit to make sure I understand it better.

In the meantime, here's the non-deep profiled versions:

Oni: https://drive.google.com/file/d/1Xp1TaIh...drive_link

Burst: https://drive.google.com/file/d/1H6sbukm...drive_link

Hi,

Finally realized the problem: since each rope is in its own solver, you're paying for job scheduling once per rope. Since ropes in your case seem to be fairly simple (maybe just a couple dozen constraints each or so?) Burst spends way more time scheduling threads than actually doing meaningful work. As a result you have small bursts of simulation interleaved with scheduling:

[Image: rWuqDvP.png]

While Oni's simulation code is slower, it doesn't pay as much for thread scheduling since it doesn't use Unity's job system but C++'s std::thread, so it will be faster in situations where you have many solvers with very little work to do for each solver: in this case the bottleneck isn't the simulation itself, it's managing jobs.

Compare performance for both backends with a setup with 12 (still very simple) ropes in a single solver:

Burst:
[Image: R4aDtgD.png]

Oni:
[Image: WuhHtrn.png]

You can see timings are similar, both taking around 2.7 ms per frame. In this case I would advise to merge multiple ropes into a single solver if possible, otherwise you'll be paying for thread juggling since each solver has to push its own work onto Unity's job scheduler regardless of how little actual work it has to push.
Reply
#12
Okay, thanks! I actually had that on my list of general performance improvements to try (putting all the ropes into one solver), but it's a bit time-consuming to redo some of my existing setup, so I kept putting it off. I'll give it a try soon and let you know if it solves my issues.

If this ends up making a big difference, I'd actually recommend putting that into the documentation for solvers / general setup in the same way that you recommend to only have one updater per scene.

Before reworking my logic / scenes, I decided to give this a test in a version of the same level where I put a bunch of ropes on a single solver (but without the additional script I use for my gameplay logic). It does seem worth doing in any case though, as even with Oni there's a little bit of an improvement in terms of framerate.

However... while the big difference in performance between Oni and Burst goes away, Oni is still SLIGHTLY better, which seems like something must still not be working as intended? 

Oni: https://drive.google.com/file/d/1dE_pCfd...drive_link

Burst: https://drive.google.com/file/d/1wR9B0C9...drive_link
Reply
#13
(02-04-2024, 01:17 PM)ShawnF Wrote: However... while the big difference in performance between Oni and Burst goes away, Oni is still SLIGHTLY better, which seems like something must still not be working as intended? 

Seems about right for me, as you add more workload to the solver, the difference should start to be the other way around.

This is similar to how GPU/CPU scale in terms of workload vs performance: small workloads run faster on the CPU since each individual core is faster, but large workloads take less time on the GPU since it has many more cores.
[Image: CPU_GPU_scaling.png]

The base cost of scheduling multiple threads is higher in Burst, but threads get work done faster. You just need to make sure you got enough work for them.

Note there's ongoing efforts in Unity to make job scheduling faster:
https://www.google.com/search?client=saf...8&oe=UTF-8

I also started a thread on this subject quite a while ago in the Unity forums:
https://forum.unity.com/threads/schedule...be.808491/
Reply
#14
Ah, interesting! I was under the impression that Burst was meant to be faster across the board, but it sounds like you only really see the benefits once you hit a certain critical mass.

Hm. I'll need to make a decision then once I get more of a sense of how many ropes I'll want to max out at in a scene. So far, I never actually have had more than 8-10 or so, so it might end up making sense to just stick with Oni.

Anyway, appreciate you looking into this! I feel like I have a much better sense of my options and what to expect now.
Reply
#15
(02-04-2024, 02:33 PM)ShawnF Wrote: Ah, interesting! I was under the impression that Burst was meant to be faster across the board, but it sounds like you only really see the benefits once you hit a certain critical mass.

They're supposed to be in the same performance ballpark. After all they both use similar technology, the only difference being that Burst is an integral part of Unity and Oni was custom built (since at the time it was written, Unity had no support whatsoever for multithreading or vectorization).

Burst however takes a slight hit in thread scheduling, but it results in faster code since its automatic vectorization is more efficient, so for many use cases it is slightly faster. I expect Unity to improve scheduling performance in the future.

Also, Burst usually wins in mobile devices since it takes advantage of the Neon instruction set, which Oni can't (it uses SSE/AVX).
Reply