Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Optimizing Obi Cloth Scheduled Jobs
#1
I've been attempting to optimize Obi Cloth's scheduler since I noticed that there was idling in between jobs during runtime. One thing that I've done (at least for my use case) was that I changed the ApplyCollisionContraintsJob and BurstColliderCollisionContraintsBatch & BurstColliderFrictionConstraintsBatch jobs from IJob into an IJobFor with parallel scheduling. This helped reduce the Obi's Cloth's overall time with job completion.

While I think this is helpful, I'm not sure if parallelizing the constraints jobs breaks other things as it seems like the jobs don't mutually rely on other elements within the same array it is processing (please see the ProfilerAnalyzer-Results).



The part that I am interested in trying to optimize is the amount of idle in between jobs. I am wondering if there are any tips to try and optimize this and if its possible to schedule and chain a few jobs together to reduce the # of times Obi Cloth calls JobHandle.Complete() on prior input data (for example, when the GenerateContactsJob is scheduled and then DequeueIntoArrayJob is chained after instead of forcing the GenerateContactsJob to complete before we schedule DequeueIntoArrayJob). (Please see the Profiler-Results attachment.)



I'm willing to experiment a bit and see what I can come up with as it requires a bit of refactoring on the jobs, but I'm wondering if there are any suggestions or warning flags I should be aware of. The hardware where I'm profiling Obi Cloth is on: AMD Ryzen 9 5900 HS
And this is done in the Editor (release mode) with Leak Detection off, Burst enabled, Jobs Debugger disabled.


Attached Files Thumbnail(s)
       
Reply
#2
(21-07-2022, 10:09 PM)initialPrefabs Wrote: I've been attempting to optimize Obi Cloth's scheduler since I noticed that there was idling in between jobs during runtime. One thing that I've done (at least for my use case) was that I changed the ApplyCollisionContraintsJob and BurstColliderCollisionContraintsBatch & BurstColliderFrictionConstraintsBatch jobs from IJob into an IJobFor with parallel scheduling. This helped reduce the Obi's Cloth's overall time with job completion.

While I think this is helpful, I'm not sure if parallelizing the constraints jobs breaks other things as it seems like the jobs don't mutually rely on other elements within the same array it is processing (please see the ProfilerAnalyzer-Results)

Hi!

As soon as you have rigidbodies in your scene, this will lead to a race condition: several particles writing to the same memory location simultaneously (BurstMath.ApplyImpulse, line 300-ish in BurstColliderCollisionConstraintsBatch.cs).

The only way to correctly parallelize that is to ensure only one particle can update rigidbody velocities at a time, which is doable using CAS (compare-and-swap) operations. Or, make sure there are no rigidbodies in your scene, which is a quite restrictive assumption.

For the GPU backend in Obi 7, since there's no two-way rigidbody coupling support (at least not at launch), contact solving is fully parallel. And since it happens on the GPU, it can handle orders of magnitude more contacts.


(21-07-2022, 10:09 PM)initialPrefabs Wrote: The part that I am interested in trying to optimize is the amount of idle in between jobs. I am wondering if there are any tips to try and optimize this and if its possible to schedule and chain a few jobs together to reduce the # of times Obi Cloth calls JobHandle.Complete() on prior input data (for example, when the GenerateContactsJob is scheduled and then DequeueIntoArrayJob is chained after instead of forcing the GenerateContactsJob to complete before we schedule DequeueIntoArrayJob). (Please see the Profiler-Results attachment.)

In order to allocate space for the contacts array and batch data, the collision spatial acceleration structure has to be built and the contacts queue been generated. These allocations have to happen in the main thread, so GenerateContactsJob must be completed: otherwise you don't know how much memory needs to be allocated. You cannot chain GenerateContactsJob and DequeueIntoArrayJob together.

As you can see in the profiler pic you attached, you wouldn't be saving 0.09 ms by chaining these together. The main thread is busy scheduling other jobs a bit earlier, about 0.05 ms after the end of the previous job, and is also busy doing some other stuff right after the end of the previous job. So at least half of those 0.09 ms are spent on the main thread.
Reply
#3
Quote:Hi!

As soon as you have rigidbodies in your scene, this will lead to a race condition: several particles writing to the same memory location simultaneously (BurstMath.ApplyImpulse, line 300-ish in BurstColliderCollisionConstraintsBatch.cs).

The only way to correctly parallelize that is to ensure only one particle can update rigidbody velocities at a time, which is doable using CAS (compare-and-swap) operations. Or, make sure there are no rigidbodies in your scene, which is a quite restrictive assumption.

For the GPU backend in Obi 7, since there's no two-way rigidbody coupling support (at least not at launch), contact solving is fully parallel. And since it happens on the GPU, it can handle orders of magnitude more contacts.
Ah okay, thanks for the quick reply, I'll likely revert this change I had sneaked in on my end then since we'll likely want to use rigidbodies in our scene anyways (on the CPU) since we are currently GPU bound at the moment.

Quote:In order to allocate space for the contacts array and batch data, the collision spatial acceleration structure has to be built and the contacts queue been generated. These allocations have to happen in the main thread, so GenerateContactsJob must be completed: otherwise you don't know how much memory needs to be allocated. You cannot chain GenerateContactsJob and DequeueIntoArrayJob together.

As you can see in the profiler pic you attached, you wouldn't be saving 0.09 ms by chaining these together. The main thread is busy scheduling other jobs a bit earlier, about 0.05 ms after the end of the previous job, and is also busy doing some other stuff right after the end of the previous job. So at least half of those 0.09 ms are spent on the main thread.
Ah okay, thanks for the validating my concerns.

P.S. Thanks for making Obi Cloth and making it Burst friendly. Sonrisa
Reply