05-08-2018, 10:18 AM
(This post was last modified: 05-08-2018, 10:35 AM by josemendez.)
(03-08-2018, 09:32 PM)ibbybn Wrote: Sounds indeed like it wouldn't help at all except giving us near the same kind of performance while having access to the whole code.
However wouldn't porting to c# and optimising this way also give instant Playstation/Xbox compatibility?
Yes it would, but that's only half of the story.
There's tons of stuff that would be impossible to achieve in C# and whose absence would severely hurt performance/functionality:
For starters, using the STL is quite a bit faster compared to using C#'s generic list or even arrays. std::copy, std::fill and others are just much faster than the available C# counterparts most of the time. Also, all math operations just run at least x2 faster in C compared to C#, without any specific optimizations (we've profiled). We use our own math library internally, in C# we used Vector3, Vector4, Matrix4x4, and the like. Imho they're fast enough for simple math, but a huge no-no when it comes to heavy, high-performance stuff.
Our job/task system has a specific kind of task tailored towards many small independent chunks of work. For instance, when updating thousands of fluid particles in parallel, you obtain the best load balancing by using very small tasks (ideally one per particle). However many small tasks can potentially clog the threadpool. Instead of pushing thousands of tasks to the task scheduler (which would increase thread contention) we have a single task which contains a list of chunks, and an atomic counter. Every time a thread picks up this task, it increments the atomic counter and processes one chunk. This happens until all chunks have been processed, and the entire task gets removed from the scheduler. This effectively allows for lock-free concurrent processing of many small independent tasks (which is a very common case in Obi), unachievable in Unity as of now even with ECS/burst.
For character clothing, we completely override Unity's mesh skinning system as it is way too slow. Ours uses the task scheduler and SIMD instructions. Even then, it is extremely math-heavy and relying on C#'s math classes would not yield any performance gains (more likely, the opposite).
We also have different code paths for NEON, SSE and AVX. This allows us to maximize performance for each target platform. Relying on Unity for this would be too risky as we would lose a lot of control. It is not clear which kind of SIMD instructions can be emitted by burst, or how good it is at optimizing certain processing patterns. For instance, calculating distances between pairs of points in groups of 4/8 can be done very efficiently using matrices and row reductions, but would burst identify this?
There's many more reasons why we switched from C# to C/C++ (Obi 1.0 was pure C#), so it's not likely we will turn back unless we see proof that performance will at least match that of our current implementation. The multiplatform argument is a good one, but we prefer to have a good-as-possible product running on a few platforms, than a mediocre one running on all of them.