Obi Official Forum
Help ObiSolver.Lateupdate() bad performance - Printable Version

+- Obi Official Forum (https://obi.virtualmethodstudio.com/forum)
+-- Forum: Obi Users Category (https://obi.virtualmethodstudio.com/forum/forum-1.html)
+--- Forum: Obi Rope (https://obi.virtualmethodstudio.com/forum/forum-4.html)
+--- Thread: Help ObiSolver.Lateupdate() bad performance (/thread-1562.html)

Pages: 1 2 3


ObiSolver.Lateupdate() bad performance - TheMunk - 16-12-2019

So I've been reading a lot up on the forums here regarding optimizing ropes and most of the time the issue is death spiraling with the fixed update - however my issue seems to be the rope extruder taking up a lot of CPU time. Specifically 7 ms. Turning off the extruder individually on the ropes gives me the performance back.

Main issue: 

Obisolver.LateUpdate() taking 7 ms cpu time. 


Rope Info:
3 ropes
~200 particles in total
simulated on fixed Update.
2 substeps
Standard contraints simulations (3 iterations, 1 relaxation)
A few handles and a few colliders (less than 10) most static.
Extruder is using a 4 segment section

Time settings:
Fixed timestep: 0.01388889 (Oculus Quest)
Max: 0.0139

Solutions tried:
Using the linerenderer, but lost almost the same amount of performance gained from obisolver.lateupdate() to camera.FireOnPreCull(). (EDIT: seems to be the UpdateRenderer on the line renderer)
EDIT: using 3 line renderers on the same ropes (200 particles) all with 0 smoothing yields a 5,4 ms time on Camera.FireOnPreCull() - see latest attached screenshot


Turning smoothing down to 0 - seems to give around 1,5 ms for a single rope. Having all on 0 smoothing and 0,2 resolution yields 5,4 ms of obiSolver.Lateupdate()


System info:
CPU: AMD Ryzen 5 1600 Six-core processor
CPU: GTX 1070





Is this expected performance or am I missing something?


Edit:
Tried creating a new scene and making 3 ropes, all with 0 smoothing, default 8-section rope-section, one with .5 resolution, one with .3 and one with .2.
All 3 ropes on same solver: total particles 285
Getting around 3,8 ms on obisolver.lateupdate(). - why is there such a big difference?

EDIT 2:
Also, out of general curiosity, which is better for performance? high smoothing or high resolution? and how dependent on the renderer is either?


RE: ObiSolver.Lateupdate() bad performance - josemendez - 16-12-2019

(16-12-2019, 03:33 PM)TheMunk Wrote: So I've been reading a lot up on the forums here regarding optimizing ropes and most of the time the issue is death spiraling with the fixed update - however my issue seems to be the rope extruder taking up a lot of CPU time. Specifically 7 ms. Turning off the extruder individually on the ropes gives me the performance back.

Main issue: 

Obisolver.LateUpdate() taking 7 ms cpu time. 


Rope Info:
3 ropes
~200 particles in total
simulated on fixed Update.
2 substeps
Standard contraints simulations (3 iterations, 1 relaxation)
A few handles and a few colliders (less than 10) most static.
Extruder is using a 4 segment section

Time settings:
Fixed timestep: 0.01388889 (Oculus Quest)
Max: 0.0139

Solutions tried:
Using the linerenderer, but lost almost the same amount of performance gained from obisolver.lateupdate() to camera.FireOnPreCull(). (EDIT: seems to be the UpdateRenderer on the line renderer)
Turning smoothing down to 0 - seems to give around 1,5 ms for a single rope. Having all on 0 smoothing and 0,2 resolution yields 5,4 ms of obiSolver.Lateupdate()


System info:
CPU: AMD Ryzen 5 1600 Six-core processor
CPU: GTX 1070





Is this expected performance or am I missing something?


Edit:
Tried creating a new scene and making 3 ropes, all with 0 smoothing, default 8-section rope-section, one with .5 resolution, one with .3 and one with .2.
All 3 ropes on same solver: total particles 285
Getting around 3,8 ms on obisolver.lateupdate(). - why is there such a big difference?

EDIT 2:
Also, out of general curiosity, which is better for performance? high smoothing or high resolution? and how dependent on the renderer is either?

What version are you using? Obi 4.X or 5.X?

Rope resolution determines the amount of particles in your rope. Together with smoothing, this allows you to decouple simulation quality from rendering quality.

At the end of every frame, each particle is used as a control point for a bézier spline (generated using an extremely optimized version of the Chaikin corner-cutting algorithm). This algorithm outputs multiple curve points, depending on the smoothing parameter. At 0 smoothing, one point is created for each particle. As you increase smoothing, additional points in-between particles are added.

The renderer takes all these points as input and generates a mesh by placing a rope section at each point, generating vertices for each one, and stitching them together with triangles.

So the amount of points outputted after rope smoothing is what largely determines the renderer's performance. The amount of particles in the rope is the amount of points generated at 0 smoothing, and increases with higher smoothing values.

What's better for performance depends a lot on your needs. The rule of thumb is to use as few particles as you can get away with, then increase smoothing if rendering is too "blocky". You should almost never need smoothing > 0 if the rope has high resolution.


RE: ObiSolver.Lateupdate() bad performance - TheMunk - 16-12-2019

(16-12-2019, 04:48 PM)josemendez Wrote: What version are you using? Obi 4.X or 5.X?

Rope resolution determines the amount of particles in your rope. Together with smoothing, this allows you to decouple simulation quality from rendering quality.

At the end of every frame, each particle is used as a control point for a bézier spline (generated using an extremely optimized version of the Chaikin corner-cutting algorithm). This algorithm outputs multiple curve points, depending on the smoothing parameter. At 0 smoothing, one point is created for each particle. As you increase smoothing, additional points in-between particles are added.

The renderer takes all these points as input and generates a mesh by placing a rope section at each point, generating vertices for each one, and stitching them together with triangles.

So the amount of points outputted after rope smoothing is what largely determines the renderer's performance. The amount of particles in the rope is the amount of points generated at 0 smoothing, and increases with higher smoothing values.

What's better for performance depends a lot on your needs. The rule of thumb is to use as few particles as you can get away with, then increase smoothing if rendering is too "blocky". You should almost never need smoothing > 0 if the rope has high resolution.


Using Obi 4.1.

Yes i guessed thats how particles and rendering was done, but whats bugging me the most is that it isn't the physics simulation that takes time but it seems to be the rendering and updating of the mesh/line. Isn't the line renderer supposed to increase performance "dramatically"? Weird when i don't seem to gain much more than 1-2 ms on the 3 ropes.
(thanks for the quick answer)


RE: ObiSolver.Lateupdate() bad performance - josemendez - 16-12-2019

(16-12-2019, 04:58 PM)TheMunk Wrote: Using Obi 4.1.

Yes i guessed thats how particles and rendering was done, but whats bugging me the most is that it isn't the physics simulation that takes time but it seems to be the rendering and updating of the mesh/line. Isn't the line renderer supposed to increase performance "dramatically"? Weird when i don't seem to gain much more than 1-2 ms on the 3 ropes.
(thanks for the quick answer)

Rendering has received quite an overhaul in 5.0, it's 20-30% faster than in 4.X. That's why I asked.

Thing is, simulation is performed using our own multithreaded physics engine, written in C++, with hand-optimized SIMD. Rendering has to be performed in Unity, using Unity's vector math (which is terribly underperformant). So in many situations, rendering is more expensive than simulation (which I know is counterintuitive as the complexity of the simulation is much greater).

The line renderer improves performance because the amount of work done for each curve point is far less (just two vertices instead of 8 of the default section/extruded renderer). Even then, math in Unity is awfully slow compared to what can be done with proper vectorization in unmanaged code.

We expect this to change once we move to Unity's new mathematics library with support for vectorization and make use of Jobs and the Burst compiler.


RE: ObiSolver.Lateupdate() bad performance - TheMunk - 16-12-2019

(16-12-2019, 05:03 PM)josemendez Wrote: Rendering has received quite an overhaul in 5.0, it's 20-30% faster than in 4.X. That's why I asked.

Thing is, simulation is performed using our own multithreaded physics engine, written in C++, with hand-optimized SIMD. Rendering has to be performed in Unity, using Unity's vector math (which is terribly underperformant). So in many situations, rendering is more expensive than simulation (which I know is counterintuitive as the complexity of the simulation is much greater).

We expect this to change once we move to Unity's new mathematics library with support for vectorization and make use of Jobs and the Burst compiler.

Ahh ok, ill give 5.0 a try then. Thanks for the detailed answer!


RE: ObiSolver.Lateupdate() bad performance - josemendez - 16-12-2019

(16-12-2019, 05:06 PM)TheMunk Wrote: Ahh ok, ill give 5.0 a try then. Thanks for the detailed answer!

Note that in 5.0, a lot of things have changed. Rope editing workflow is completely different (read as 'better'). If you're just starting a project, go ahead, but I'd advise against upgrading mid-project!


RE: ObiSolver.Lateupdate() bad performance - TheMunk - 17-12-2019

(16-12-2019, 05:09 PM)josemendez Wrote: Note that in 5.0, a lot of things have changed. Rope editing workflow is completely different (read as 'better'). If you're just starting a project, go ahead, but I'd advise against upgrading mid-project!

So I've looked at the new 5.0 rope documentations and it looks neat (like the multiple cursors on one rope which I currently have in a hacked solution Guiño ). If i can get 20-30% performance increase i think it is worth it (ropes are basically the essential and most performance heavy part of the project). I'm past the first prototype of the project and already had in mind rebuilding some of the ropes for better performance. I'm currently only relying on handles and the following Obi methods:

ObiRope.GetParticlePosition(i);
ObiRope.RestLength
ObiRopeCursor.normalizedCoord;
ObiRopeCursor.ChangeLength();

Are any of these methods changed (drastically) in 5.0? 


Also a side question; is it better performance wise to have multiple solvers if the ropes are not supposed to interact with each other?



A few suggestions I've written down during the past month working with Obi Rope:

It would be nice to have a "tether" or "handle" for ropes which does not lock to particles but rather works as a "via" point for the rope curve. To do this now I have to fiddle around with collisions and on an Oculus Quest I can't hold a particle resolution high enough for the collisions to make sense. 

So I'm forced to use handles and either manipulate each side of the handled particle to maintain the correct amount of particles across the whole rope, or split the rope up into two separate ropes and manipulate them separately. 


Furthermore, it would be nice to have different resolutions or smoothing levels across a rope. This way I can make nice looking simulations at critical points (like the issue above with collisions) or at areas closest to the player.


RE: ObiSolver.Lateupdate() bad performance - josemendez - 17-12-2019

(17-12-2019, 10:55 AM)TheMunk Wrote: ObiRope.GetParticlePosition(i);
ObiRope.RestLength
ObiRopeCursor.normalizedCoord;
ObiRopeCursor.ChangeLength();

Are any of these methods changed (drastically) in 5.0? 

GetParticlePosition() now expects a solver index instead of an actor index. Same for all GetParticleX() methods.
ObiRope.RestLength has been renamed restLength.
ObiRopeCursor.normalizedCoord has been renamed cursorMu.
ObiRopeCursor.ChangeLength() stays the same.

(17-12-2019, 10:55 AM)TheMunk Wrote: Also a side question; is it better performance wise to have multiple solvers if the ropes are not supposed to interact with each other?

It's pretty much the same. Having multiple solvers has a slight performance impact, but independent ropes are simulated in parallel regardless of where they are: same solver, different solver, doesn't matter.

(17-12-2019, 10:55 AM)TheMunk Wrote: It would be nice to have a "tether" or "handle" for ropes which does not lock to particles but rather works as a "via" point for the rope curve. To do this now I have to fiddle around with collisions and on an Oculus Quest I can't hold a particle resolution high enough for the collisions to make sense. 

You mean a "sliding" hole trough which rope can freely pass, right? It's in our to-do list, but isn't as easy to implement as it sounds. It needs to be a constraint over all rope particles, that dynamically changes its position along the rope. This is extremely useful for medical applications like suturing thread, catheters, etc (not our main interest, but many users have pointed it out), as you can ensure the rope passes trough a point without the need for collision detection.

Quote:Furthermore, it would be nice to have different resolutions or smoothing levels across a rope. This way I can make nice looking simulations at critical points (like the issue above with collisions) or at areas closest to the player.

Doable but wouldn't work as you expect. The thing with particle-based representations (or any discrete representation) is that convergence speed (roughly == rope stiffness) varies depending on resolution. So areas with higher resolution would be more elastic than areas with low resolution. To solve that, you're forced to either use a very small timestep (so that the highest-resolution zones behave correctly) or use continuum-dynamics, that are largely topology-agnostic. Both are quite expensive, so the point of adaptive resolution is moot if in order to have consistent behavior you need to calculate everything as if the entire rope was at the highest resolution.

Thanks lot for the ideas! Sonrisa


RE: ObiSolver.Lateupdate() bad performance - TheMunk - 17-12-2019

(17-12-2019, 11:57 AM)josemendez Wrote: GetParticlePosition() now expects a solver index instead of an actor index. Same for all GetParticleX() methods.
ObiRope.RestLength has been renamed restLength.
ObiRopeCursor.normalizedCoord has been renamed cursorMu.
ObiRopeCursor.ChangeLength() stays the same.


It's pretty much the same. Having multiple solvers has a slight performance impact, but independent ropes are simulated in parallel regardless of where they are: same solver, different solver, doesn't matter.


You mean a "sliding" hole trough which rope can freely pass, right? It's in our to-do list, but isn't as easy to implement as it sounds. It needs to be a constraint over all rope particles, that dynamically changes its position along the rope. This is extremely useful for medical applications like suturing thread, catheters, etc (not our main interest, but many users have pointed it out), as you can ensure the rope passes trough a point without the need for collision detection.


Doable but wouldn't work as you expect. The thing with particle-based representations (or any discrete representation) is that convergence speed (roughly == rope stiffness) varies depending on resolution. So areas with higher resolution would be more elastic than areas with low resolution. To solve that, you're forced to either use a very small timestep (so that the highest-resolution zones behave correctly) or use continuum-dynamics, that are largely topology-agnostic. Both are quite expensive, so the point of adaptive resolution is moot if in order to have consistent behavior you need to calculate everything as if the entire rope was at the highest resolution.

Thanks lot for the ideas! Sonrisa
Yes i meant "sliding" hole. Thanks for the detailed explanation! I look forward to playing with the new 5.0!  Sonrisa


RE: ObiSolver.Lateupdate() bad performance - TheMunk - 04-08-2021

(16-12-2019, 05:03 PM)josemendez Wrote: We expect this to change once we move to Unity's new mathematics library with support for vectorization and make use of Jobs and the Burst compiler.
 
Hi Again,
What's the status of this? I ask because I can confirm that the rope rendering (extruder or line-renderer) is still the MOST heavy operations of running ropes on mobile devices (we're upgraded to oculus quest 2 but having hard times with ~300 particles and just the line renderer. I can run roughly twice as many particles when not using any rope renderers.

Also, I noticed significantly lower performance when building with IL2CPP over Mono.
200 particles on two ropes with a line renderer and full smoothing runs around 72 fps on the Quest 2 with Mono, but 31 fps with IL2CPP ARM64.
Any apparent reasons for this?

As a last question, it looks like we used to be able to make bending constraints at start/ends of ropes;

https://youtu.be/pe5mROQqPv8?t=86

I would like to mimic this behavior for particle attachments and/or stitches.