Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Bug / Crash  Obi Solvers are adding HUGE forces to rigidbodies in scenes without any ropes!
#1
Bug 
I've encountered a consistently reproducible bug that is so critical it renders ObiRope virtually unusable.

I have a scene:
  • A scene with spherical rigidbodies with X, Y and Z rotations locked (which produces an inertia tensor of 0 along locked axes.)
  • The spherical rigidbodies are manipulated via script.
  • The spherical rigidbodies have an ObiRigidbody and ObiCollider attached initially.
  • Initially, no ropes, cloth, or any other Obi objects (aside from the aforementioned) are present in the scene.

Reproduction:
  1. On a fresh unity start after closing it down completely, I run the scene.
  2. Initially everything works correctly.
  3. So long as I do not place a rope in the scene, I can stop and play the scene as many times as I like.
  4. I place the rope in the scene - which I do while playing; it isn't initially in the scene - it works correctly at first. I use a script to attach a rope to two surfaces selected by the player. Also worth nothing that no ObiSolvers are present in the scene until a rope is added.
  5. Then after stopping and starting a new play session - WITHOUT placing any ropes in the new play session - the spherical rigidbodies are given such extreme forces that they are often sent straight into NaN-land. It doesn't appear to matter where the ropes are attached in the previous session.
  6. Running a debugger and inspecting the values provided to ObiRigidbody.UpdateVelocities, they are often insane values with a magnitude greater than 1.0e+20, if not outright NaN.
  7. Even if I add a condition that the provided velocities are only added if their magnitudes are < 1000, the sphere rigidbodies will be accelerated with a huge amount of force along either the X, Y or Z axis in either direction.
  8. Removing the ObiRigidbody (and ObiCollider) components prevents this.
  9. The only way to bring ObiRigidbody into a usable state is to re-start the editor.
I previously had 5.1 installed, and immediately after upgrading the bug started to occur.
Reply
#2
Hi hatchling,

Can you share a scene/project that reproduces this (send to support(at)virtualmethodstudio.com)?

I've been testing, but so far I've been unable to get this behavior. What I've done is programmatically add a rope between a static collider and a rigidbody with all rotational axis constrained (using a modified version of the RuntimeRopeGenerator script) at runtime. Then, get out of play mode, and play again. Things behave just like the first time, no ghost forces applied to rigidbodies.

The behavior you describe (or the way I understood it) doesn't make much sense to me, since ObiRigidbody does not store/serialize velocities between play sessions. There's no way that I can think of for data from the previous play session to affect the next one.

Quote:Also worth nothing that no ObiSolvers are present in the scene until a rope is added.

With no solvers/updaters in the scene ObiRigidbody.UpdateVelocities shouldn't even be called, so in theory there's no way the rigidbody velocities can be modified by Obi.
Reply
#3
(12-05-2021, 07:50 AM)josemendez Wrote: With no solvers/updaters in the scene ObiRigidbody.UpdateVelocities shouldn't even be called, so in theory there's no way the rigidbody velocities can be modified by Obi.

I know, that's what makes it so weird. Nonetheless, removing the ObiRigidbody components prevents the issue and the values originate from Obi classes. Weird huh?

I'll try to isolate the cause to as much as possible before sending a repro project. It seems to be triggered by specific conditions on another rigidbody, but when it occurs, ALL ObiRigidbody instances are affected and given values like (0.0, 0.0, NaN) for linearDelta and (1006847000000000.0, 0.0, 0.0) for angularDelta.

In the mean time, I'll also try to narrow down the source of the scrambled values coming in. I'm guessing it is interpreting junk data stored in a static class that isn't cleared between sessions, somehow. It doesn't make sense to me either.

I think I might have narrowed down the issue somewhat.

In my case, I am creating a "global" solver, accessible through a static class, which all rope instances assign themselves to. This is to ensure that all actors can mutually collide.

When it is needed (I modified the code to check if a given actor has a solver parent, and if not, uses the global solver), a check occurs to see if a global solver exists already. If not, it will instantiate a solver prefab from the resources folder. The solver's game object is marked as "DontSave" and "DontDestroyOnLoad", as to prevent it from needlessly regenerating on scene switches. However, if my understanding is correct, it should destroy itself between play sessions, as it only gets created during play sessions.

It looks like after a session in which a rope is instantiated (and thus, the global solver is created), the first global solver and its updater stick around between play sessions. It seems to remember which ObiRigidbodies it had previously encountered (which would be all of them that existed at the time, I presume) and even retains their associated native arrays for velocities and such. But now, they're pointing to repurposed locations in memory that are now filled with data that gets misinterpreted as insane float values. So the next time this ghost solver re-awakens and encounters rigidbodies it saw before, it'll apply this junk data to them immediately.

This could be an issue with Unity's DontDestroyOnLoad or my interpretation of how it works. Nevertheless, it is strange that so much of its state persists between play sessions if it is erroneously kept alive between play sessions, despite the assemblies being reloaded when the play button is depressed, as documented here: https://docs.unity3d.com/ScriptReference...eload.html

Another thing to note is that, if it is surviving between play sessions, the object is not showing in the hierarchy after exiting a play session. (It is not marked as "Hide".)

While this may be a very specific set of circumstances, it might be worth investigating why these native arrays persist and let you read from - or even worse, write to - these bad memory addresses.

I'll investigate this further tomorrow and get a demo project ready to send.

P.S.: Apologies for my harsh initial statement, it was not meant to imply that ObiRope is virtually unusuable outright, just under the circumstances I was presented with. It was a bit of a shock to see such a critical problem occur so consistently immediately after upgrading though, and having a shared global solver seemed like an innocent design choice.
Reply
#4
I've managed to create a demo project that doesn't quite replicate the same issue mentioned here. But it does demonstrate Obi failing quite spectacularly when attempting to use a global solver object as described earlier.

If burst is used, it spams errors about native stuff not being released and jobs not being scheduled properly. If Oni is used, it outright crashes.

The script used to produce the bug isn't doing anything overly spooky, just creating a ObiSolver marked as "DontDestroyOnLoad".

I'll email you with instructions.

My email provider appears to bug out whenever I try to upload a file that is too large. I'll upload the repro project here instead.

To reproduce:
1. Open Scenes/SampleScene
2. Press play.

You'll see:
- Errors flooding the console relating to jobs and native memory.
- The generated rope net falls apart.

Notes:
- Burst and Jobs should be installed in the project.
- I've modified RopeNet.cs to use the "global solver" defined in TriggerObiBug.cs
- If you uncomment the line on line 20, which causes the ObiSolver to use Oni instead of Burst, it will crash Unity shortly after pressing play.
- These aren't the same symptoms of the bug mentioned earlier but they likely have a common cause.

It appears that I am neither able to email you the project, nor upload it here.

Instead I'll share the two scripts, TriggerObiBug.cs and RopeNet.cs, as well as the scene.

Attempting to add attachment to this thread...

After testing, it appears DontDestroyOnLoad isn't what's important to triggering the bug in the scene provided. If you comment out the line, restart the editor, and re-run, it'll still freak out.

Now it seems like there was a bit of a false alarm with some of this. In the code I was inadvertently adding the same solver twice to the same updater, which caused a ton of errors to appear.

But after I fixed this problem, I tried re-adding the DontDestroyOnLoad() stuff, which then causes different errors to appear after stopping a play session.


Attached Files
.zip   Scenes.zip (Size: 7.35 KB / Downloads: 1)
Reply
#5
Moving back to my original project, I removed the use of "DontDestroyOnLoad" and the problem went away; no more insane values.
Reply
#6
Just for the record, I am using these package versions:

Burst Version 1.4.8 - April 30, 2021
Jobs Version 0.8.0-preview.23 - January 22, 2021
Mathematics 1.2.1 - 2020-08-06
Collections 0.15.0-preview.21 - 2020-11-13
Reply
#7
Hi hatchling,

Thanks a lot for the details and the sample scripts! I will try to reproduce this and understand why it happens, then patch it.

Will get back to you once I have some news.
Reply
#8
While I still haven't reproduced the exact same results in the repro project as I have in the main project, you may still be able to track it down.

One key thing I noticed in the main project while debugging is, during the second play session, the solver and updater from the previous play session (which were marked with DontDestroyOnLoad) were hitting breakpoints, indicating their activity. If you can detect this in the demo project (even though it doesn't always cause problems), you'll probably be very close to the root cause.

DontDestroyOnLoad is definitely a good lead. If you have trouble, I can try doing the opposite approach to reproducing the bug - instead of building up to the situation I think caused the bug in the repro project, I can remove stuff from the main project until it stops. Hopefully I can remove almost everything and still have the exact same symptoms occur.

It might also be a good idea to do something about the same solver being added multiple times to one or more updaters. It REALLY seems to hate this. Burst will puke all over the console, and Oni will outright crash Unity.
Reply
#9
(13-05-2021, 08:19 AM)Hatchling Wrote: While I still haven't reproduced the exact same results in the repro project as I have in the main project, you may still be able to track it down.

One key thing I noticed in the main project while debugging is, during the second play session, the solver and updater from the previous play session (which were marked with DontDestroyOnLoad) were hitting breakpoints, indicating their activity. If you can detect this in the demo project (even though it doesn't always cause problems), you'll probably be very close to the root cause.

DontDestroyOnLoad is definitely a good lead. If you have trouble, I can try doing the opposite approach to reproducing the bug - instead of building up to the situation I think caused the bug in the repro project, I can remove stuff from the main project until it stops. Hopefully I can remove almost everything and still have the exact same symptoms occur.

My approach is usually getting the bad stuff ™ to happen. Then step trough the code, reason about why is happening and write a unit test that triggers it. Once I understand it, think how to fix it without breaking anything else. Then run the unit tests and verify that it no longer happens. I'll get to it later this morning, depending on how resilient this one is I think I can have a workaround before the weekend. Thanks for the tips!

(13-05-2021, 08:19 AM)Hatchling Wrote: It might also be a good idea to do something about the same solver being added multiple times to one or more updaters. It REALLY seems to hate this. Burst will puke all over the console, and Oni will outright crash Unity.

The behavior in this case is basically undefined, as it's running the same jobs twice in parallel. Might work fine, might cause a race condition and crash. This is warned about in the manual:
http://obi.virtualmethodstudio.com/tutor...aters.html

Quote:There can be multiple updaters in your scene, however you must avoid having the same solver referenced by multiple updaters as this will update that solver more than once per frame, leading to unpredictable results. Also keep in mind that a solver that is not referenced by any updater will not have its simulation updated.

I agree that I should probably check for this in-editor and place a warning in the updater or just refuse to run the simulation if the user adds the same solver twice, though. Won't catch the case where different updaters update the same solver, but at least it will warn about the single updater case. Will do in the next update.
Reply
#10
(13-05-2021, 08:38 AM)josemendez Wrote: I agree that I should probably check for this in-editor and place a warning in the updater or just refuse to run the simulation if the user adds the same solver twice, though. Won't catch the case where different updaters update the same solver, but at least it will warn about the single updater case. Will do in the next update.

I suppose you could have solvers and updaters mutually keep references. When a solver is told to be added to an updater, the updater will check the solver's stored updater. If it is non-null, it tells you about it and/or refuses to add the solver to the updater, or removes it from the first updater. Not sure if the overhead involved in this check would outweigh the extra robustness to user error.
Reply