Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Performance issue on specific Windows desktop machine
#1
Hello Obi team,
 
We are experiencing an issue where the same Unity scene runs at a low FPS only on a specific machine, while we believe it shouldn’t according to the specs.
Running on Windows 10 pro, unity version 2021.3.8f.
 
On one machine we have
 
AMD Ryzen 5900X processor (12 cores, 24 threads -  overclocked to 4.8Ghz)
Nvidia RTX 3080
32GB RAM
 
With this machine our scene runs at 60-70FPS. Disabling the Job Debugger and safety checks doesn’t seem to make any difference.
 
On a new machine we have
 
Intel i9-10980XE Extreme (18 cores, 36 threads - 3.0GHz, no overclocking)
Nvidia RTX 3090 Ti
128GB RAM
 
With this machine the FPS of the scene goes down to about 15-20FPS.
 
We tried benchmarking the new machine with 3D Mark and it seems to work as expected.
 
Is the difference between the two CPUs so vast to justify such a drop in FPS? Or is there another reason such as the increased number of threads?
 
Or can you think of anything else?


Many thanks
Reply
#2
Hi!

Short answer: profile.

Long answer: the profiler will tell you *exactly* why your app is slower on one of the CPUs, in far more detail that I or anyone else can. Whenever you have any kind of performance issues, profiling is always the answer.

There's a couple things I can think of that will make a Burst-based physics simulation slower on the i9. First is that while it has more threads, it has lower clock speed. So unless your simulation has room to exploit parallelism, having more slower threads will naturally slow things down (because there's not enough parallel work to keep more threads busy, and the busy ones are slower).

Another would be death spiraling. Even if the i9 is only slightly slower than the Ryzen, this can trigger the need to call FixedUpdate() more often, which means your simulation is updated more than once per frame. If the profiler shows more than 1 FixedUpdate() call per frame then this is most likely the cause. Increasing your fixed timestep and/or reducing your max allowed timestep  will improve things (both settings can be found in Unity's Time preferences).

Quote:We tried benchmarking the new machine with 3D Mark and it seems to work as expected.

3DMark is primarily a graphics/GPU benchmark, won't tell you much about your CPU which is where physics simulation is performed. Some of the newest benchmarks do include a CPU specific test, but you need to focus on that. Also, depending on the time stepping scheme their demos use, there can be a big difference in performance with Unity games (Unity uses fixed-timestepping, which can at times be more costly than say a semi-fixed or variable time stepping scheme).

let me know if I can be of further help,
kind regards,
Reply
#3
Thanks a lot!

I think this is most probably what the issue is. I am not very familiar with the profiler but I can see the FixedUpdate being called multiple times during the PlayerLoop. This is with the default FixedUpdate setting of 0.02. We have 34 Workers (36 threads minus the 2 Unity uses for main thread and rendering).

   

With a FixedUpdate setting of 0.05 things get better and we go up from ~5FPS to about 50FPS

   

I am finding it hard ti decipher the profiler window, is there anything I should be looking at specifically?
I can see there is still quite a bit of idle time on each Worker for each frame?

Many thanks

Michele
Reply
#4
(18-10-2022, 11:34 AM)michele_lf Wrote: Thanks a lot!

I think this is most probably what the issue is. I am not very familiar with the profiler but I can see the FixedUpdate being called multiple times during the PlayerLoop. This is with the default FixedUpdate setting of 0.02. We have 34 Workers (36 threads minus the 2 Unity uses for main thread and rendering).

With a FixedUpdate setting of 0.05 things get better and we go up from ~5FPS to about 50FPS

Hi Michele,

Yes, that makes sense. This is what is called death spiraling, affects all fixed timestepping based schemes like the one used by Unity. Reducing the max allowed timestep can also help, by limiting the amount of times FixedUpdate() is called in cases where the simulation is too heavy.


(18-10-2022, 11:34 AM)michele_lf Wrote: I am finding it hard ti decipher the profiler window, is there anything I should be looking at specifically?
I can see there is still quite a bit of idle time on each Worker for each frame?

Idle time in the worker threads means the main thread is busy doing something. It can be just preparing new work for the workers, or doing some work of its own.

There's very few things in Obi that aren't parallelized, one of them however is force areas which get updated in the solver's BeginStep(). Are you using wind/force areas together with large clothes in your scene?

kind regards,
Reply
#5
Ha! That is exactly what we are doing. Fairly large cloth with ~6600 particles (vertices).

We are using a custom version of the ObiAmbientForceZone, where we use 3D Perlin noise and we sum it up with various difference forces to obtain the desired behaviour. We then use actor.solver.wind to apply to each particle.

Everything else in the solver is set to parallel with 1 iteration for now. We are using self collisions, surface collisions are prohibitive at the moment.

We were thinking of shifting the force calculations to the Jobs system or to a compute shader to increase performance.

Any thoughts or reasons why this is a bad idea?

By the way this asset is phenomenal!!!

Many thanks

M
Reply
#6
(18-10-2022, 11:27 PM)michele_lf Wrote: Ha! That is exactly what we are doing. Fairly large cloth with ~6600 particles (vertices).

We are using a custom version of the ObiAmbientForceZone, where we use 3D Perlin noise and we sum it up with various difference forces to obtain the desired behaviour. We then use actor.solver.wind to apply to each particle.

Evaluating 3D perlin noise in the main thread in the CPU is extremely expensive (unless you only evaluate it at a handful of points). The default implementation only evaluates 2D perlin noise once per frame, to get time-varying wind intensity turbulence. If you're evaluating noise at each particle's position to get spatially-varying directional noise, you should definitely write a job to parallelize this.

The force zone's ApplyForcesToActor() method is called during the solver's OnBeginStep callback, as you can see in the base class for force zones (ObiExternalForce.cs). So this is where the 15 ms during BeginStep are being spent on.

(18-10-2022, 11:27 PM)michele_lf Wrote: Everything else in the solver is set to parallel with 1 iteration for now. We are using self collisions, surface collisions are prohibitive at the moment.

Note that "parallel" constraint evaluation doesn't have anything to do with multithreading, both parallel and sequential methods are multithreaded. Parallel in this context refers to the order in which multiple constraints are applied to a single particle, which affects convergence speed and force distributions. See "Evaluation mode" at the very end of: http://obi.virtualmethodstudio.com/manua...olver.html

(18-10-2022, 11:27 PM)michele_lf Wrote: By the way this asset is phenomenal!!!

Thank you for your kind words! I'm currently working on making it even more phenomenal Sonrisa

kind regards,
Reply
#7
Thank you so much! It sounds like we have a strategy Sonrisa 

We'll let you know how we get on!

Very best wishes

Michele
Reply