Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help  High variance in ObiFixedUpdate performance impact
#1
I've been struggling with optimizing my ropes for quite some time now and tried all of the recommended solutions I've been able to find in the forums.
But for some reason, I can't get the ObiFixedUpdater under control. 

This application is built for Oculus Quest and runs perfectly at 72 frames when no Obi Solvers are enabled (but meshed ropes are rendered).

Using Fixed Updater, 1 substep - no unity physics substep (even though I have a single two-way dynamic attachment).
Fixed and max timestep locked together: 0.02 and 0.06. (both tested results below)
Total Particles: 18
Smoothing: 0
Segment: 4
Unchecked all unnecessary constraints on the solver, and only have 1 iteration on everything (except 2 on parallel collisions).

I should mention I also have a 3 particle antenna (rod) with some smoothing, but that has a negligible performance impact according to my testing. 

I'm getting around 50-60 frames when I activate the rope's solver, but when running the profiler directly on the Oculus Quest I notice a big performance variation.

As you can see below some of the worst cases are around 10-12ms, while others are as low as 3 ms. 


Timestep 0.02 - 11.48ms:
[Image: 11ms.png]
Timestep 0.02 - 3.05ms:
[Image: 3ms.png]

Timestep 0.06 - 9.14ms:
[Image: 0-06-timestep.png]

Timestep 0.06 - 1.44ms:
[Image: 0-06-timestep-low.png]


It doesn't seem like this is the timestep death spiral, or v-sync (which is forced off). 

So my question is simply;
Is this expected behavior? If not, what can I do to improve my performance?
Reply
#2
Performance can vary depending on what's in the scene, how close the rope is to potential colliders, even the speed of the rope particles. Obi's broad phase works using two multilevel grids: one for colliders, another for particles.

- Each step, Obi checks whether any collider has changed its bounding box. Any colliders that have changed are removed from the collider grid and reinserted in any cells it overlaps. Colliders that do not move are not updated.

- Each step, a list of particles that have moved from one cell of the grid to another one is constructed. They are removed from their old cell and inserted in the new cell. Particles that haven't moved between cells are not updated.

This way, both broad phase structures (grids) are updated incrementally, but the cost increases with the amount of moving particles/colliders. Once both structures have been updated:

- For each cell in the particle grid, a bounding box is calculated that envelopes the particles and their velocity vector. Then, this bounding box is tested against the colliders grid. Contacts are generated between the particles and All colliders in the cells overlapped by this velocity-bounding box.
So, fast moving particles generate more contacts (as they can potentially have more collisions in a single step), and are smore costly.

With this in mind, imagine a scene full of colliders around a swinging pendulum-like rope: When the pendulum swings from left to right, the velocity of the particles is higher, and more speculative contacts will be generated against the surrounding colliders. At its lowest kinetic energy point (maximum potential energy), the pendulum has very little to no velocity, and almost no contacts will be generated.

Also, timestep length has an impact on this: lower timesteps result in smaller position deltas, so less performance variance.

Do you get the same variance if you disable collision constraints completely?
Reply
#3
(14-04-2020, 03:00 PM)josemendez Wrote: Performance can vary depending on what's in the scene, how close the rope is to potential colliders, even the speed of the rope particles. Obi's broad phase works using two multilevel grids: one for colliders, another for particles.

- Each step, Obi checks whether any collider has changed its bounding box. Any colliders that have changed are removed from the collider grid and reinserted in any cells it overlaps. Colliders that do not move are not updated.

- Each step, a list of particles that have moved from one cell of the grid to another one is constructed. They are removed from their old cell and inserted in the new cell. Particles that haven't moved between cells are not updated.

This way, both broad phase structures (grids) are updated incrementally, but the cost increases with the amount of moving particles/colliders. Once both structures have been updated:

- For each cell in the particle grid, a bounding box is calculated that envelopes the particles and their velocity vector. Then, this bounding box is tested against the colliders grid. Contacts are generated between the particles and All colliders in the cells overlapped by this velocity-bounding box.
So, fast moving particles generate more contacts (as they can potentially have more collisions in a single step), and are smore costly.

With this in mind, imagine a scene full of colliders around a swinging pendulum-like rope: When the pendulum swings from left to right, the velocity of the particles is higher, and more speculative contacts will be generated against the surrounding colliders. At its lowest kinetic energy point (maximum potential energy), the pendulum has very little to no velocity, and almost no contacts will be generated.

Also, timestep length has an impact on this: lower timesteps result in smaller position deltas, so less performance variance.

Do you get the same variance if you disable collision constraints completely?

So the rope is not moving at all, and I only have static colliders in the scene (using the static physics material).

In the editor Obi seems to run more stable as you can see from this image:
Timestep 0.0138 (red circle indicates when I turned of collision constraints)
[Image: build-no-colission-test.png]
There's no difference at all. 
I'm doing a build now, but IL2CPP takes time - will edit this post.

[EDIT] Build version - with no collision detection
[Image: no-colission-build.png]
Reply
#4
Hi!

I compiled for Android using il2cpp ((Galaxy S6, I don't own Quest, so that's the closest I can do), but I'm unable to reproduce such variability in performance.

¿Can you describe your scene in more detail? (amount of ropes/solvers, renderer settings, etc)

Also, what oculus hardware level are you running at?
Reply
#5
(14-04-2020, 04:19 PM)josemendez Wrote: Hi!

I compiled for Android using il2cpp ((Galaxy S6, I don't own Quest, so that's the closest I can do), but I'm unable to reproduce such variability in performance.

¿Can you describe your scene in more detail? (amount of ropes/solvers, renderer settings, etc)

Ok, I just started writing an explanation but my build just finished (without the antenna rod, with 0.02 timestep, and with substep unity physics enabled), and I noticed a very high increase in performance, as you can see below (left side is previous build).
[Image: no-antenna.png]

However, there is still high variance in the performance and with a slightly longer rope I dip back to under 60 fps sometimes.  
[Image: loger-rope.png]



My antenna looks like this;
[Image: rod.png]

With the dynamic attachment being a simple object with 0.001 mass.



Here is the continued explanation of my scene;

So I basically have 16 ropes in pairs under 8 solvers. 
A pair consists of rope (A) with one end statically attached to an animated object and another statically attached to intermediate object LoopAttatchment.
The other rope (B) is the loop, which has both its ends attached to LoopAttatchment - one static and one dynamic (to avoid weirdness). The loop then has another two static attachments for an animated character to grab and move (or other animations or spring joints). 

Rope (A) is being extended and shortened on demand by either scripted events or through player interaction. 

These 8 pairs of ropes are enabled and disabled on demand by the scripted scenario, so I can pause the simulation, but keep the visual when the ropes are not moving. 
I'm using rope extruders now due to the finding in my last post (http://obi.virtualmethodstudio.com/forum...p?tid=2150).

I would love to send the project but I would have to make a new project and take out everything not relevant to this issue to satisfy my employer.

Render settings:
Using URP 7.3.1, Unity 2019.3.6f1
[Image: render-settings.png] 

Player Settings:
[Image: player-settings.png]
Reply
#6
(14-04-2020, 04:19 PM)josemendez Wrote: Hi!

I compiled for Android using il2cpp ((Galaxy S6, I don't own Quest, so that's the closest I can do), but I'm unable to reproduce such variability in performance.

¿Can you describe your scene in more detail? (amount of ropes/solvers, renderer settings, etc)

Also, what oculus hardware level are you running at?

Any new insights? 
I still have unstable performance on the quest but stable in editor.
Reply
#7
(20-04-2020, 10:27 AM)TheMunk Wrote: Any new insights? 
I still have unstable performance on the quest but stable in editor.

I could not reproduce on any hardware I've tried, mostly mobiles (that are probably closest to Quest hardware-wise). ms/frame are quite stable.

What hardware level are you using, or are you relying on automatic throttling? (if this is still a thing in Quest as it was in Rift/Go, I don't know for sure). The built-in throttling system might be working against you, by reducing the level when a frame is light, causing the next frame to take longer to simulate, jumping to a higher hardware level, then reducing it back  to save battery, and so on. It's the only possible cause that comes to mind, since this issue is clearly hardware-dependent. Try setting a fixed CPU hardware level every frame, and see if that improves stability.
Reply
#8
(20-04-2020, 10:34 AM)josemendez Wrote: I could not reproduce on any hardware I've tried, mostly mobiles (that are probably closest to Quest hardware-wise). ms/frame are quite stable.

What hardware level are you using, or are you relying on automatic throttling? (if this is still a thing in Quest as it was in Rift/Go, I don't know for sure). The built-in throttling system might be working against you, by reducing the level when a frame is light, causing the next frame to take longer to simulate, jumping to a higher hardware level, then reducing it back  to save battery, and so on. It's the only possible cause that comes to mind, since this issue is clearly hardware-dependent. Try setting a fixed CPU hardware level every frame, and see if that improves stability.

I'm already setting the cpu and gpu levels to 4 (max), although not every frame but the OVR metrics tool tells me it stays at 4. Maybe its thermal throttling, but then I guess it would be a more permanent decrease in performance (also the device does not feel hot).
Reply
#9
(20-04-2020, 10:34 AM)josemendez Wrote: I could not reproduce on any hardware I've tried, mostly mobiles (that are probably closest to Quest hardware-wise). ms/frame are quite stable.

What hardware level are you using, or are you relying on automatic throttling? (if this is still a thing in Quest as it was in Rift/Go, I don't know for sure). The built-in throttling system might be working against you, by reducing the level when a frame is light, causing the next frame to take longer to simulate, jumping to a higher hardware level, then reducing it back  to save battery, and so on. It's the only possible cause that comes to mind, since this issue is clearly hardware-dependent. Try setting a fixed CPU hardware level every frame, and see if that improves stability.

Could it be due to dynamic particle attachments? I got some weird results with very high increase in performance which may be connected to me changing all dynamic attachments on a rope to be connected to rigibodies which are kinematic for particles.

[EDIT]
It looks more like its because two solvers are simulated at the same time. I can simulate a 50+ particle rope alone at 72 frames, but as soon as i toggle another rope with just 3 particles I return to the ~50 frames. (This would also explain why I noticed the initial performance increase when I disabled the 3 particle antenna solver).
Is there something with the new obi 5.X and multiple solvers (in the same updater) that doesn't work well together on android?
Reply
#10
(21-04-2020, 04:05 PM)TheMunk Wrote: Could it be due to dynamic particle attachments? I got some weird results with very high increase in performance which may be connected to me changing all dynamic attachments on a rope to be connected to rigibodies which are kinematic for particles.

[EDIT]
It looks more like its because two solvers are simulated at the same time. I can simulate a 50+ particle rope alone at 72 frames, but as soon as i toggle another rope with just 3 particles I return to the ~50 frames. (This would also explain why I noticed the initial performance increase when I disabled the 3 particle antenna solver).
Is there something with the new obi 5.X and multiple solvers (in the same updater) that doesn't work well together on android?

Internally, each solver spawns multiple tasks that get attended by a task scheduler (thread pool). Each actor is decomposed into a few tasks, but it does not matter if they are all under same solver or separate solvers as the tasks end up being exactly the same. So one solver does the same work as multiple ones, as it all comes down to tasks and a thread pool.

Just to test this, do you see any performance difference if you have the 50 particle rope and the 3 particle rope under the same solver, as opposed to having a solver for each one?
Reply