Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
ECS / Jobs / using the Burst Compiler planned ?
#1
Hey Obi Team,
wondering if these Unity 2018 features are planned to be used? I guess it would mean porting more from c++ to c#?
Would be interesting to see the performance gains. It seems fitted perfectly for particle systems running on CPU.
I ported a simple boid system ( fish swarm ) to jobs/ecs hybrid and got 20x performance gain so far.
Reply
#2
(02-08-2018, 10:28 PM)ibbybn Wrote: Hey Obi Team,
wondering if these Unity 2018 features are planned to be used? I guess it would mean porting more from c++ to c#?
Would be interesting to see the performance gains. It seems fitted perfectly for particle systems running on CPU.
I ported a simple boid system ( fish swarm ) to jobs/ecs hybrid and got 20x performance gain so far.

Hi there,

Obi already uses something very similar to ECS internally, as it was designed with the same performance goals in mind:

- Particles are laid out in memory in a SoA fashion: separate arrays for positions, velocities, etc. Traversal is linear when possible, to maximize cache hits and throughput. These arrays are 16-byte aligned to make use of SIMD (SSE) instructions. This is the same core idea behind ECS. It is not something new, CPU particle systems have been designed this way for as long as the concept of cache memory exists.

- A work-stealing job scheduler dynamically distributes jobs across all cores. Jobs (tasks, in Obi terminology) are organized in a graph-like structure to avoid race conditions and make sure things happens in the order they're supposed to. This is the same technology behind Intel's TBB, Cilk, or C#'s TPL, again nothing new.

Moreover, all of this is written in C/C++ using SIMD, hand-optimized. The burst compiler is basically designed to translate C# code to efficient, vector-friendly C++ code, which is what we have by default.

Because of this, no gains are to be reaped from ECS or burst in our case. It is very unlikely we will make use of them.
Reply
#3
(03-08-2018, 03:45 PM)josemendez Wrote: Hi there,

Obi already uses something very similar to ECS internally, as it was designed with the same performance goals in mind:

- Particles are laid out in memory in a SoA fashion: separate arrays for positions, velocities, etc. Traversal is linear when possible, to maximize cache hits and throughput. These arrays are 16-byte aligned to make use of SIMD (SSE) instructions. This is the same core idea behind ECS. It is not something new, CPU particle systems have been designed this way for as long as the concept of cache memory exists.

- A work-stealing job scheduler dynamically distributes jobs across all cores. Jobs (tasks, in Obi terminology) are organized in a graph-like structure to avoid race conditions and make sure things happens in the order they're supposed to. This is the same technology behind Intel's TBB, Cilk, or C#'s TPL, again nothing new.

Moreover, all of this is written in C/C++ using SIMD, hand-optimized. The burst compiler is basically designed to translate C# code to efficient, vector-friendly C++ code, which is what we have by default.

Because of this, no gains are to be reaped from ECS or burst in our case. It is very unlikely we will make use of them.

Sounds indeed like it wouldn't help at all except giving us near the same kind of performance while having access to the whole code.
However wouldn't porting to c# and optimising this way also give instant Playstation/Xbox compatibility?
Reply
#4
(03-08-2018, 09:32 PM)ibbybn Wrote: Sounds indeed like it wouldn't help at all except giving us near the same kind of performance while having access to the whole code.
However wouldn't porting to c# and optimising this way also give instant Playstation/Xbox compatibility?

Yes it would, but that's only half of the story.

There's tons of stuff that would be impossible to achieve in C# and whose absence would severely hurt performance/functionality:

For starters, using the STL is quite a bit faster compared to using C#'s generic list or even arrays. std::copy, std::fill and others are just much faster than the available C# counterparts most of the time. Also, all math operations just run at least x2 faster in C compared to C#, without any specific optimizations (we've profiled). We use our own math library internally, in C# we used Vector3, Vector4, Matrix4x4, and the like. Imho they're fast enough for simple math, but a huge no-no when it comes to heavy, high-performance stuff.

Our job/task system has a specific kind of task tailored towards many small independent chunks of work. For instance, when updating thousands of fluid particles in parallel, you obtain the best load balancing by using very small tasks (ideally one per particle). However many small tasks can potentially clog the threadpool. Instead of pushing thousands of tasks to the task scheduler (which would increase thread contention) we have a single task which contains a list of chunks, and an atomic counter. Every time a thread picks up this task, it increments the atomic counter and processes one chunk. This happens until all chunks have been processed, and the entire task gets removed from the scheduler. This effectively allows for lock-free concurrent processing of many small independent tasks (which is a very common case in Obi), unachievable in Unity as of now even with ECS/burst.

For character clothing, we completely override Unity's mesh skinning system as it is way too slow. Ours uses the task scheduler and SIMD instructions. Even then, it is extremely math-heavy and relying on C#'s math classes would not yield any performance gains (more likely, the opposite).

We also have different code paths for NEON, SSE and AVX. This allows us to maximize performance for each target platform. Relying on Unity for this would be too risky as we would lose a lot of control. It is not clear which kind of SIMD instructions can be emitted by burst, or how good it is at optimizing certain processing patterns. For instance, calculating distances between pairs of points in groups of 4/8 can be done very efficiently using matrices and row reductions, but would burst identify this?

There's many more reasons why we switched from C# to C/C++ (Obi 1.0 was pure C#), so it's not likely we will turn back unless we see proof that performance will at least match that of our current implementation. The multiplatform argument is a good one, but we prefer to have a good-as-possible product running on a few platforms, than a mediocre one running on all of them.
Reply
#5
This article discusses how Unity is addressing the math and C# container performance issues in Burst

http://lucasmeijer.com/posts/cpp_unity/
Reply
#6
Hi,

We've been testing ECS more thoroughly, to determine if it would be beneficial to port Obi.

Unfortunately the results have been quite underwhelming up to now. On very simple tests (multiple chains and grids of distance constraints), ECS performance is invariably around 50%-70% of our current C++ implementation. Not quite sure why that would be, but my best guess is that hand-optimized SIMD is the culprit. Everything else in ECS conceptually seems to mirror Obi's underlying framework.

Will investigate further though, as I believe ECS has great potential and would open up Obi to a lot of new platforms.
Reply
#7
Hi,

As you know we've been evaluating ECS for its use with Obi, and we have now managed to get a basic ECS version of Obi running within 5% of our current performance. This means there is a real possibility we will consider porting Obi to ECS in the future. Still very soon to give more details, but there is hope.

This would mean two things: automatic support for all platforms, and full engine source code included.
Reply
#8
That would be amazing!
Reply
#9
(17-01-2019, 05:19 PM)josemendez Wrote: Hi,

As you know we've been evaluating ECS for its use with Obi, and we have now managed to get a basic ECS version of Obi running within 5% of our current performance. This means there is a real possibility we will consider porting Obi to ECS in the future. Still very soon to give more details, but there is hope.

This would mean two things: automatic support for all platforms, and full engine source code included.

Great news! I think this will make many people who thought twice before buying because of console support reconsider. Unity also wrote they want to be faster than c++ when it comes out of preview so still some speed gains to come hopefully.
Can I ask what exactly made the last jump from 50-70% to this? Also converting lots of code to this right now.
Reply
#10
(17-01-2019, 10:50 PM)ibbybn Wrote: Great news! I think this will make many people who thought twice before buying because of console support reconsider. Unity also wrote they want to be faster than c++ when it comes out of preview so still some speed gains to come hopefully.
Can I ask what exactly made the last jump from 50-70% to this? Also converting lots of code to this right now.

Using chunk iteration API instead of mindlessly injecting stuff around made a lot of difference in performance for us. However, it felt quite dirty to use, so I hope it evolves into something less verbose.
Reply