Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Bug / Crash  Invalid memory access using libOni backend
#1
I recently switched from the Burst backend to libOni due to a crash that would occur late in the gameplay (posted here).
It's been a while that I tried long gameplay, but unfortunately libOni seems affected as well. About 35 minutes into gameplay, this "crash to desktop" occurred:


Code:
========== OUTPUTTING STACK TRACE ==================

0x00007FFC8E5A84F0 (libOni) Ordinal0
0x00007FFC8E5EB319 (libOni) GetPointCloudAnisotropy
0x00007FFC8E5F054A (libOni) GetPointCloudAnisotropy
0x00007FFC8E5F13B5 (libOni) GetPointCloudAnisotropy
0x00007FFC8E5A74A6 (libOni) Ordinal0
0x00007FFC8E5C2EDF (libOni) UpdateColliderGrid
0x0000012A65040C6C (Mono JIT Code) (wrapper managed-to-native) Oni:UpdateColliderGrid (single)
0x0000012A65040B8B (Mono JIT Code) [C:\Dev\Wrestle\Assets\Plugins\Obi\Scripts\Common\Backends\Oni\OniColliderWorld.cs:36] Obi.OniColliderWorld:UpdateWorld (single)
0x0000012A3E2C46CF (Mono JIT Code) [C:\Dev\Wrestle\Assets\Plugins\Obi\Scripts\Common\Collisions\ObiColliderWorld.cs:383] Obi.ObiColliderWorld:UpdateWorld (single)
0x0000012A3E2C20FB (Mono JIT Code) [C:\Dev\Wrestle\Assets\Plugins\Obi\Scripts\Common\Updaters\ObiUpdater.cs:58] Obi.ObiUpdater:BeginStep (single)
0x0000012A3E2C180B (Mono JIT Code) [C:\Dev\Wrestle\Assets\Plugins\Obi\Scripts\Common\Updaters\ObiFixedUpdater.cs:48] Obi.ObiFixedUpdater:FixedUpdate ()
0x0000012A122F2448 (Mono JIT Code) (wrapper runtime-invoke) object:runtime_invoke_void__this__ (object,intptr,intptr,intptr)
0x00007FFC655DE034 (mono-2.0-bdwgc) mono_jit_set_domain
0x00007FFC6551E724 (mono-2.0-bdwgc) mono_object_get_virtual_method
0x00007FFC6551E8BC (mono-2.0-bdwgc) mono_runtime_invoke

The GetLastError output that Unity produces for crashes again indicates an invalid memory accesses, much like for the Burst crash I reported.

This happens in a development build of Unity 2021.2.7 on Windows 10, 64-bit. I've also encountered this same crash in the editor once, which was when I was working on procedural generation that created Obi colliders and rigidbodies.

Again, like in the thread related to the Burst backend, I sadly cannot give a minimal example to reproduce. There's no specific action that seems to lead to a crash, and everything is just fine for over half an hour. The scenario is roughly the same: there are 12 Obi ropes with about 16 particles each. These are there from the beginning and always stay enabled. There are also quite some Obi rigidbodies and colliders (around ~300 seems plausible) that can interact with the ropes and are enabled and disabled in irregular intervals depending on their game states.

Now that both backends seem to be crashing for me at some point, it'd be crucial to have some kind of hint as to what may be going wrong. I'd be ready to try some debug build of libOni, maybe with debug information included, to get a more meaningful stack trace - unless you can already work with the above? The goal has to be that whatever happens, there should never be a crash to desktop, so I also wouldn't mind trying a build with manual assertions and error logging, even if that costs performance.

Many thanks in advance!

I'm also going to post in Made with Obi now that my project is public. The videos are teasers for the game, but they should give you a better idea as to what I'm even doing.
EDIT: See here, and maybe also this video, where there are many more characters in the game, which describes the crashing scenario much better.
Reply
#2
Hi there!

Quote:0x00007FFC8E5EB319 (libOni) GetPointCloudAnisotropy

GetPointCloudAnisotropy() is only called when there's fluid particles present in a solver. I assume you're only using ObiRope, so this method is never actually called. The fact that it appears in the stack trace is a very strong indicator that this issue stems from memory corruption: use of uninitialized memory, out of bounds access on unmanaged memory buffers, etc.

You mention this happens in both backends as well, which makes sense since both backends have identical memory access patterns.

Bad news is that despite the diagnosis being straightforward, these are extremely hard to debug since they can be caused by *literally anything* in your project, including code that's completely unrelated to Obi. The stack trace points you to where the program crashed, but there's no indication to where memory is being incorrectly handled.

The only ways to debug this are either using a memory debugger (Valgrind), and/or slowly dissect the project in a binary-search fashion until the issue disappears, then begin re-adding stuff until the issue is narrowed down.

Ropes themselves do not perform any kind of memory handling at runtime: once instantiated, they do not allocate anything nor are they moved around in memory. It's safe to assume that ropes are not the culprit. You do mention that there's many rigidbodies/colliders being enabled/disabled at runtime, and that the issue happened in-editor while working on procedural generation of these: that's a good starting point.

Could you share more details about the pattern in which these are enabled/disabled? Ideally, share the script(s) you use to procedurally generate these?

kind regards,
Reply
#3
(14-02-2022, 09:03 AM)josemendez Wrote: Hi there!


GetPointCloudAnisotropy() is only called when there's fluid particles present in a solver. I assume you're only using ObiRope, so this method is never actually called. The fact that it appears in the stack trace is a very strong indicator that this issue stems from memory corruption: use of uninitialized memory, out of bounds access on unmanaged memory buffers, etc.

You mention this happens in both backends as well, which makes sense since both backends have identical memory access patterns.

Bad news is that despite the diagnosis being straightforward, these are extremely hard to debug since they can be caused by *literally anything* in your project, including code that's completely unrelated to Obi. The stack trace points you to where the program crashed, but there's no indication to where memory is being incorrectly handled.

The only ways to debug this are either using a memory debugger (Valgrind), and/or slowly dissect the project in a binary-search fashion until the issue disappears, then begin re-adding stuff until the issue is narrowed down.

I see. I'm not sure what Unity does by itself, but the only unmanaged code I'm using apart from libOni (or the Burst backend) is the Steamworks API, and that wasn't being used (to my knowledge - I'll make sure about it). So whatever memory corruption occurs should be caused by Obi or Steamworks, I'm not that deep into Unity's Mono that I'd know what managed code could possibly cause this (I am not using any other Burst code either).

But yes, these things are nasty to debug. Valgrind may be an option, I never used it on Windows but I'll have a look. Did you build the dll in "RelWithDebInfo" mode or similar, because apparently there are unmangled function names?

(14-02-2022, 09:03 AM)josemendez Wrote: Ropes themselves do not perform any kind of memory handling at runtime: once instantiated, they do not allocate anything nor are they moved around in memory. It's safe to assume that ropes are not the culprit. You do mention that there's many rigidbodies/colliders being enabled/disabled at runtime, and that the issue happened in-editor while working on procedural generation of these: that's a good starting point.

Could you share more details about the pattern in which these are enabled/disabled? Ideally, share the script(s) you use to procedurally generate these?

In the Editor, it happened very sporadically as well. I should probably also note that I recently disabled domain reloading; I made sure my entire project supports this and it works fine, but of course, I don't know about the effect this may have on Obi. In any event, the crashes happened before I did this and also, this has absolutely no effect on builds, so it's very probably not to blame.

The ropes are generated procedurally as well, but only once in the beginning and then they're just there. I use Obi rigidbodies and colliders for the body parts of characters (capsules and boxes that belong to the ragdoll), as well as for the various weapons you see (steel chairs, baseball bats etc.), such that they may interact with the ropes. They are created once when the corresponding character or weapon is instantiated, and destroyed once the character leaves (after being eliminated from the game) or the weapon is destroyed (some can break). I don't have any special handling for destruction; I rely on ObiCollider's and ObiRigidbody's OnDestroy implementations here.

The creation looks about like this for characters:
Code:
            _obiColliders = new ObiCollider[num];
            _obiRigidbodies = new ObiRigidbody[num];
            for (var i = 0; i < num; i++)
            {
                var c = colliders[i];
                var oc = c.gameObject.AddComponent<ObiCollider>();
                oc.Thickness = Settings.Ragdoll.ObiColliderThickness;
                oc.Filter = ObiUtil.RagdollPartFilter;
                oc.enabled = _obiCollidersEnabled;
                _obiColliders[i] = oc;

                var r = rigidbodies[i];
                _obiRigidbodies[i] = r.gameObject.AddComponent<ObiRigidbody>();
            }

Nothing special in my book. It looks about the same for weapons and only happens once in their Start method.
For characters, the colliders are only ever capsules and boxes, and each ObiRigidbody has exactly one ObiCollider; weapons may also have (convex) mesh colliders and furthermore multiple ObiColliders per ObiRigidbody.

Enabling and disabling happens very often, in fact. Basically, whenever I make a rigidbody kinematic in Unity, I disable its ObiColliders.
Characters become kinematic once they become controllable by the player, and non-kinematic when they ragdoll. This happens very often in gameplay. Weapons become kinematic once picked up and "attached" to the character, and they become non-kinematic again when being dropped.

Code:
var kinematic = _rigidbody.isKinematic;
foreach (var c in _colliders) c.obiCollider.enabled = c.collider.enabled && !kinematic;

As you see, I use the "enabled" flag for it - is there a different or preferred way?

After giving it some thoughts, these are more things that I could imagine causing issues:
  • I'm not enabling and disabling in FixedUpdate necessarily, but pretty much whenever. May this be an issue?
  • I keep the ObiRigidbodies enabled. Should I disable them as well?
  • There may be occasions where I enable or disable an ObiCollider and they are destroyed in the same frame, at least I can't rule it out now. Could this cause desyncs of sort?
  • I spotted some code for my characters that synchronizes the ObiRigidbodyies' kinematicForParticles flags with the rigidbodies' isKinematic state. That is leftover code which I just removed, because I decided I don't want any rope interaction when rigidbodies are kinematic. This also happened as often as characters were switching between control and ragdoll.
Reply
#4
(14-02-2022, 08:59 PM)pdinklag Wrote: I should probably also note that I recently disabled domain reloading; I made sure my entire project supports this and it works fine, but of course, I don't know about the effect this may have on Obi.

Domain reload disabling support has only been recently added in Obi 6.4. See:
http://obi.virtualmethodstudio.com/faq.html

Quote:Does it support Configurable Enter Play Mode?
Obi 6.4 and above do. Older versions do not support configurable enter play mode, since disabling domain reloading requires special handling of static data which wasn't available back then

Prior to 6.4 disabling domain reload would cause in runtime errors complaining about null collider/rigidbody references. It's unlikely that this is related to the issue at hand though. What Obi version are you using?

(14-02-2022, 08:59 PM)pdinklag Wrote: I see. I'm not sure what Unity does by itself, but the only unmanaged code I'm using apart from libOni (or the Burst backend) is the Steamworks API, and that wasn't being used (to my knowledge - I'll make sure about it). So whatever memory corruption occurs should be caused by Obi or Steamworks, I'm not that deep into Unity's Mono that I'd know what managed code could possibly cause this (I am not using any other Burst code either).

There's a pretty high chance that Obi is the culprit, however we can't rule out other reasons. I once had a specially nasty bug that only happened when installing Obi and Aura 2 together in the same project, had to be fixed by the Aura team.


(14-02-2022, 08:59 PM)pdinklag Wrote: I'm not enabling and disabling in FixedUpdate necessarily, but pretty much whenever. May this be an issue?

A long as you're not destroying colliders while Obi is updating the simulation (which happens during the ObiFixedUpdater's FixedUpdate() method) it should be fine. The only way to do this is during the solver's OnSubstep event. Are you using any of Obi's
callback events?

(14-02-2022, 08:59 PM)pdinklag Wrote: I keep the ObiRigidbodies enabled. Should I disable them as well?
No need, colliders reference rigidbodies, not the other way around. A rigidbody that's referenced by no active collider isn't even accessed at runtime.

(14-02-2022, 08:59 PM)pdinklag Wrote: There may be occasions where I enable or disable an ObiCollider and they are destroyed in the same frame, at least I can't rule it out now. Could this cause desyncs of sort?
Not that I'm aware of, Obi updates its internal collider/rigidbody representation once per FixedUpdate(). If you enable/disable colliders in the same Update for instance, Obi's physics engine will not even know.

(14-02-2022, 08:59 PM)pdinklag Wrote: I spotted some code for my characters that synchronizes the ObiRigidbodyies' kinematicForParticles flags with the rigidbodies' isKinematic state. That is leftover code which I just removed, because I decided I don't want any rope interaction when rigidbodies are kinematic. This also happened as often as characters were switching between control and ragdoll.

The difference between a kinematic rigidbody and a non-kinematic one is just the inertia tensor/mass values. Memory layout does not change so there's a very slim chance that this has anything to do with the issue.

I'm going to try and replicate this on an empty project, by creating a scene that generates lots of colliders and randomly enables/disables them at runtime. It's very unlikely this will successfully reproduce the issue, so if necessary would it be possible for you to share the project where this is happening so that I can take a closer look at it?

kind regards,
Reply
#5
Thanks for your insight!


Quote:Prior to 6.4 disabling domain reload would cause in runtime errors complaining about null collider/rigidbody references. It's unlikely that this is related to the issue at hand though. What Obi version are you using?

I believe I missed the release of 6.4, but the timing couldn't be better then. I was using 6.3 so far, but didn't experience any problems like that. Will update to 6.4 before doing anything further!




Quote:A long as you're not destroying colliders while Obi is updating the simulation (which happens during the ObiFixedUpdater's FixedUpdate() method) it should be fine. The only way to do this is during the solver's OnSubstep event. Are you using any of Obi's callback events?

Nope, I really just wanted the rigidbody interaction as a visual effect of sorts, I don't need to react to any events right now.

Quote:Not that I'm aware of, Obi updates its internal collider/rigidbody representation once per FixedUpdate(). If you enable/disable colliders in the same Update for instance, Obi's physics engine will not even know.

That's how these things are supposed to work, good!  Sonrisa

Quote:I'm going to try and replicate this on an empty project, by creating a scene that generates lots of colliders and randomly enables/disables them at runtime. It's very unlikely this will successfully reproduce the issue, so if necessary would it be possible for you to share the project where this is happening so that I can take a closer look at it?

Independently of that I decided to do the same, but rather than simple randomness, I will try to simulate the patterns from the actual project, just much faster in hope it crashes after less than 35 minutes.  Gran sonrisa


If I can't find a way to reproduce it this way, I'll think about how I can share the project.
Reply
#6
While the empty project with demo scripts didn't get me any results yet, I have successfully provoked the crash in my game and in the Editor quite often today. I installed some very extensive logging about everything I ever do with ObiColliders (custom made so I know it's being flushed properly), and I believe that I am starting to see a pattern.

In 7 out of 7 times now, during the frame where the crash occurs, some ObiColliders are disabled and a character is spawned, causing the creation (not just enabling, but instantiation) of 15 new ObiColliders and an ObiRigidbody each. I'm suspecting that this causes some kind of data anomaly that results in the bad memory access. I'll try and give it a more direct shot tomorrow.

EDIT: That alone doesn't seem to be it. Would have been too easy. I'll probably switch to Burst and try if I can sneak in some more elaborate debugging in there, because attaching Valgrind seems much more complicated and from experience, it'd probably be way too slow to actually get to a crashing scenario.
Reply