Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Bug / Crash  Invalid memory access using Burst backend
#1
On two different occasions now, I have experienced a crash (i.e., handled by Unity's crash handler, exiting the application) in a build with the Burst backend. Since both crashes have the exact same stack trace, it appears to be a systematic issue somewhere. See the attached log.

Note that the outputs from GetLastError are localized and my OS is German. Es wurde versucht, auf eine unzulässige Adresse zuzugreifen translates to There was an attempt to access an invalid address.

As can be seen in the log, the crash happens in Obi.BurstColliderWorld.IdentifyMovingColliders. Unfortunately, I cannot give any concrete steps to reproduce it. What I can tell, however, is that both times, the crash occured when there were "many" Obi colliders and rigidbodies. I cannot tell how many precisely, but picture a fighting game where each character has 15 Obi colliders/rigidbodies (one for each part of a ragdoll) and there are always exactly 12 Obi rods in the scene (ropes of a ring). A rough guess of 16 characters seems plausible, that should give 240 colliders and rigidbodies, but not all of them are active (enabled flag) at all times.

So just to rule this out: Is there any known limit on the number of active Obi colliders / rigidbodies with the Burst backend?

What would speak against that is that the crash doesn't occur right away, but really only after an extended playtime of 10 to 15 minutes. This also makes it a bit hard to reproduce.

My environment:
Built player on Windows 10, 64-bit
Unity 2021.2.4
Burst 1.6.3
Collection 1.1.0
Jobs 0.11.0-preview6
Mathematics 1.2.5

Please let me know if I can provide any further meaningful information. I also have the crash dumps in case you can use them.

EDIT: I just tried a lengthy session using the Oni backend and everything seems fine, so this should be specific to Burst only.


Attached Files
.zip   obi_crash.zip (Size: 5.15 KB / Downloads: 3)
Reply
#2
Hi there,

Thanks for the detailed report!

Quote:So just to rule this out: Is there any known limit on the number of active Obi colliders / rigidbodies with the Burst backend?

Nope, there's no hard limit. If you use a ton of active colliders/rigidbodies it will slow things down, but there's no limit to the amount of them you can have or how many colliders can a particle be simultaneously in contact with, for that matter.

Will investigate this and get back to you!
Reply
#3
As mentioned in the corresponding thread regarding libOni, I decided to switch back to Burst in the hope to be able to debug this somehow using the source code.
Luckily, this was as easy as commenting the BurstCompile attribute for IdentifyMovingColliders and provoke the crash in the Editor (which I still don't know how to do deterministically, but least I got the means to make it happen within few minutes).

I got it down to this (Obi 6.4):
Code:
IndexOutOfRangeException: Index 0 is out of range of '0' Length.
Unity.Collections.NativeArray`1[T].FailOutOfRangeError (System.Int32 index) (at <ad50157ee00e45cdb3c8bd67012f8804>:0)
Unity.Collections.NativeArray`1[T].CheckElementReadAccess (System.Int32 index) (at <ad50157ee00e45cdb3c8bd67012f8804>:0)
Unity.Collections.NativeArray`1[T].get_Item (System.Int32 index) (at <ad50157ee00e45cdb3c8bd67012f8804>:0)
Obi.BurstColliderWorld+IdentifyMovingColliders.Execute (System.Int32 i) (at Assets/Plugins/Obi/Scripts/Common/Backends/Burst/Collisions/BurstColliderWorld.cs:154)
Unity.Jobs.IJobParallelForExtensions+ParallelForJobStruct`1[T].Execute (T& jobData, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, Unity.Jobs.LowLevel.Unsafe.JobRanges& ranges, System.Int32 jobIndex) (at <ad50157ee00e45cdb3c8bd67012f8804>:0)
Unity.Jobs.JobHandle:ScheduleBatchedJobsAndComplete(JobHandle&)
Unity.Jobs.JobHandle:Complete()
Obi.BurstColliderWorld:UpdateWorld(Single) (at Assets/Plugins/Obi/Scripts/Common/Backends/Burst/Collisions/BurstColliderWorld.cs:117)
Obi.ObiColliderWorld:UpdateWorld(Single) (at Assets/Plugins/Obi/Scripts/Common/Collisions/ObiColliderWorld.cs:397)
Obi.ObiUpdater:BeginStep(Single) (at Assets/Plugins/Obi/Scripts/Common/Updaters/ObiUpdater.cs:56)
Obi.ObiFixedUpdater:FixedUpdate() (at Assets/Plugins/Obi/Scripts/Common/Updaters/ObiFixedUpdater.cs:46)

Looks like it's trying to access collisionMaterials at index 0 when the array is empty. Indeed, it seems as though it's always empty.

I'm not sure what to make of this, and I can't find any code that constructs a BurstColliderShape so I can't find out what's slipping a 0 as its materialIndex.
But I hope this helps!
Reply
#4
(17-02-2022, 07:24 PM)pdinklag Wrote: As mentioned in the corresponding thread regarding libOni, I decided to switch back to Burst in the hope to be able to debug this somehow using the source code.
Luckily, this was as easy as commenting the BurstCompile attribute for IdentifyMovingColliders and provoke the crash in the Editor (which I still don't know how to do deterministically, but least I got the means to make it happen within few minutes).

I got it down to this (Obi 6.4):
Code:
IndexOutOfRangeException: Index 0 is out of range of '0' Length.
Unity.Collections.NativeArray`1[T].FailOutOfRangeError (System.Int32 index) (at <ad50157ee00e45cdb3c8bd67012f8804>:0)
Unity.Collections.NativeArray`1[T].CheckElementReadAccess (System.Int32 index) (at <ad50157ee00e45cdb3c8bd67012f8804>:0)
Unity.Collections.NativeArray`1[T].get_Item (System.Int32 index) (at <ad50157ee00e45cdb3c8bd67012f8804>:0)
Obi.BurstColliderWorld+IdentifyMovingColliders.Execute (System.Int32 i) (at Assets/Plugins/Obi/Scripts/Common/Backends/Burst/Collisions/BurstColliderWorld.cs:154)
Unity.Jobs.IJobParallelForExtensions+ParallelForJobStruct`1[T].Execute (T& jobData, System.IntPtr additionalPtr, System.IntPtr bufferRangePatchData, Unity.Jobs.LowLevel.Unsafe.JobRanges& ranges, System.Int32 jobIndex) (at <ad50157ee00e45cdb3c8bd67012f8804>:0)
Unity.Jobs.JobHandle:ScheduleBatchedJobsAndComplete(JobHandle&)
Unity.Jobs.JobHandle:Complete()
Obi.BurstColliderWorld:UpdateWorld(Single) (at Assets/Plugins/Obi/Scripts/Common/Backends/Burst/Collisions/BurstColliderWorld.cs:117)
Obi.ObiColliderWorld:UpdateWorld(Single) (at Assets/Plugins/Obi/Scripts/Common/Collisions/ObiColliderWorld.cs:397)
Obi.ObiUpdater:BeginStep(Single) (at Assets/Plugins/Obi/Scripts/Common/Updaters/ObiUpdater.cs:56)
Obi.ObiFixedUpdater:FixedUpdate() (at Assets/Plugins/Obi/Scripts/Common/Updaters/ObiFixedUpdater.cs:46)

Looks like it's trying to access collisionMaterials at index 0 when the array is empty. Indeed, it seems as though it's always empty.

I'm not sure what to make of this, and I can't find any code that constructs a BurstColliderShape so I can't find out what's slipping a 0 as its materialIndex.
But I hope this helps!

Hi,

Thanks for the detailed report and hints to where the issue lies! I've been so far unable to reproduce it, but at least now I know where to look. This does narrow down the bug considerably.

When disabled, colliders are moved to the end of the colliders array and the amount of active colliders is reduced by one. This really sounds like somehow an inactive collider is being accessed, but its corresponding material in the materials array is not there. Will try to reason about how this might be the case and hopefully find a fix before next week.
Reply
#5
More news:
Code:
var identifyMoving = new IdentifyMovingColliders
{
    movingColliders = movingColliders.AsParallelWriter(),
    shapes = world.colliderShapes.AsNativeArray<BurstColliderShape>(cellSpans.count),
    rigidbodies = world.rigidbodies.AsNativeArray<BurstRigidbody>(),
    collisionMaterials = world.collisionMaterials.AsNativeArray<BurstCollisionMaterial>(),
    bounds = world.colliderAabbs.AsNativeArray<BurstAabb>(cellSpans.count),
    cellIndices = cellSpans.AsNativeArray<BurstCellSpan>(),
    colliderCount = colliderCount,
    dt = deltaTime
};
JobHandle movingHandle = identifyMoving.Schedule(cellSpans.count, 128);

I added the following line before this happens:
Code:
if(cellSpans.count > world.colliderShapes.count) {
    Debug.LogErrorFormat("BurstColliderWorld: cellSpans.count={0}, world.colliderShapes.count={1}",
        cellSpans.count, world.colliderShapes.count);
}

And this is logged rather often, not just when crashing, which means that the cell spans array is oftentimes larger than the collider shapes array, thus resulting in plenty of bad accesses on the shapes array in the job (because it's a parallel for over cellSpans.count items)

I'm honestly not sure what these cell spans are, but if their number is supposed to be synchronized to the number of colliders, that's where I'd look. It appears as if there's a phyiscs frame (or so) of delay in the updates, because the message is only logged once and then it takes a bit until it's logged the next time (usually with different numbers)

I suppose because I'm removing colliders many times by disabling, the bad locations often still contain realistic information and thus it doesn't crash. Then maybe when I add plenty of colliders and the cell spans have been updated but the colliders haven't (for whatever reason), then of course the bad locations will contain garbage. This is only my guess, of course.
Reply
#6
(18-02-2022, 08:58 AM)pdinklag Wrote: More news:
Code:
var identifyMoving = new IdentifyMovingColliders
{
    movingColliders = movingColliders.AsParallelWriter(),
    shapes = world.colliderShapes.AsNativeArray<BurstColliderShape>(cellSpans.count),
    rigidbodies = world.rigidbodies.AsNativeArray<BurstRigidbody>(),
    collisionMaterials = world.collisionMaterials.AsNativeArray<BurstCollisionMaterial>(),
    bounds = world.colliderAabbs.AsNativeArray<BurstAabb>(cellSpans.count),
    cellIndices = cellSpans.AsNativeArray<BurstCellSpan>(),
    colliderCount = colliderCount,
    dt = deltaTime
};
JobHandle movingHandle = identifyMoving.Schedule(cellSpans.count, 128);

I added the following line before this happens:
Code:
if(cellSpans.count > world.colliderShapes.count) {
    Debug.LogErrorFormat("BurstColliderWorld: cellSpans.count={0}, world.colliderShapes.count={1}",
        cellSpans.count, world.colliderShapes.count);
}

And this is logged rather often, not just when crashing, which means that the cell spans array is oftentimes larger than the collider shapes array, thus resulting in plenty of bad accesses on the shapes array in the job (because it's a parallel for over cellSpans.count items)

I'm honestly not sure what these cell spans are, but if their number is supposed to be synchronized to the number of colliders, that's where I'd look.

It is normal for the amount of cell spans to be larger than the amount of active colliders, in fact this will very often be the case. The collision detection broad phase uses a multilevel grid (a 4-D regular grid, x,y,z being spatial coordinates and w being level), each collider is inserted in the grid and its cell span is the min and max cells overlapped by it.

When colliders are disabled / removed, they're moved to the end of the colliders list. Then, colliderShapes.count is reduced, however cellSpans.count is not. This is the mechanism used to identify disabled colliders and remove them from the grid: being lists, the backing arrays for both cellSpans and colliderShapes still have the same size. IdentifyMoving colliders will access colliders past colliderShapes.count, and identify that they're disabled precisely because they're past the amount of cells (line 173 of BurstColliderWorld):

Code:
// if the collider is at the tail (removed), we will only remove it from its current cellspan.
// if the new cellspan and the current one are different, we must remove it from its current cellspan and add it to its new one.
if (i >= colliderCount || cellIndices[i] != newSpan)

So this is not the source of the problem, although it could be related. Something certainly smells fishy around there. Would it be possible for me to take a look at your project, or at least get my hands on a simplified scene that reproduces the issue? This way I can join you more effectively on debugging.

kind regards,
Reply
#7
(18-02-2022, 09:18 AM)josemendez Wrote: When colliders are disabled / removed, they're moved to the end of the colliders list. Then, colliderShapes.count is reduced, however cellSpans.count is not. This is the mechanism used to identify disabled colliders and remove them from the grid: being lists, the backing arrays for both cellSpans and colliderShapes still have the same size. IdentifyMoving colliders will access colliders past colliderShapes.count, and identify that they're disabled precisely because they're past the amount of cells (line 173 of BurstColliderWorld):

Right, my alarm should rather trigger if the cell spans count exceeded the collider shapes capacity rather than count, and indeed, that never happens. Thanks for the insight!


Quote:Would it be possible for me to take a look at your project, or at least get my hands on a simplified scene that reproduces the issue? This way I can join you more effectively on debugging.

I understand, and I wish I could reproduce this in a simplified scene. I tried to provoke this by deliberately disabling ObiColliders and adding new ones in the same frame, but so far, I never got these issues outside of my project. Just so you understand me as well here, it's an unreleased game that's announced to release late this year, that's why I'm reluctant to share the project. This problem has also awakened my ambition as an algorithm engineer now.  Gran sonrisa


In any event, I have a hypothesis now that could explain everything.
When a character is spawned, I create their ObiColliders and instantly disable them, because they start in a non-ragdoll state.

The materialIndex of a collider shape is only ever set by the corresponding shape tracker in UpdateIfNeeded, but that's never called for those colliders because they have been disabled. The shape struct that's left in the backing array will have its materialIndex still at zero. I added another field called "everUpdated" to ColliderShape (and also the Burst counterpart so there's no alignment issues), which is set to 1 when UpdateIfNeeded is called on a collider. It was also still at zero when the error occurred, which seems to confirm my theory.


I guess that a very certain order of enable/disable/create transactions has to occur so that the cellSpans count is incidentally so that these newly created and never updated colliders even get considered in the IdentifyMovingColliders job. That's probably why this is so hard to reproduce and why it hasn't happened to anybody yet. But I hope this seems plausible?

My use case may seem a bit strange, but it's actually what I want (I wish there was AddComponentDisabled in Unity...). IMO, the real issue here is that the materialIndex of a collider shape is (auto-)initialized as zero - a completely invalid value in my case where I don't use Obi collision materials - when it should be -1 to indicate "none (yet)". This can easily be fixed in ObiColliderHandle.CreateCollider. I just tried this and was unable to provoke the error again, so that may be the simple solution. I'll do a longer playtest in a build later.

Code:
colliderShapes.Add(new ColliderShape() { materialIndex = -1 });
Reply
#8
(18-02-2022, 11:41 AM)pdinklag Wrote: IMO, the real issue here is that the materialIndex of a collider shape is (auto-)initialized as zero - a completely invalid value in my case where I don't use Obi collision materials - when it should be -1 to indicate "none (yet)". This can easily be fixed in ObiColliderHandle.CreateCollider. I just tried this and was unable to provoke the error again, so that may be the simple solution. I'll do a longer playtest in a build later.

Code:
colliderShapes.Add(new ColliderShape() { materialIndex = -1 });

It's been quite a while now and I have never experienced this crash again, so this actually seems to have been the problem. It also explains why both Burst and libOni were affected. Luckily, the fix is that easy and you may want to incorporate this into Obi. Thanks for the hints to get to the bottom of this!
Reply
#9
(24-04-2022, 08:51 AM)pdinklag Wrote: It's been quite a while now and I have never experienced this crash again, so this actually seems to have been the problem. It also explains why both Burst and libOni were affected. Luckily, the fix is that easy and you may want to incorporate this into Obi. Thanks for the hints to get to the bottom of this!

Hi pdinklag,

I also tested this, and your fix works perfectly. It has been already introduced into Obi's development branch, and will be shipped in all future updates starting with 6.5.

thanks!
Reply