Unity 2019's spinlocks against the World

Unity Against the World

That's the History.

When KSP 1.8. came, using Unity 2019, my old Mac Mini 5.1 (i5, 2 cores, 4 HyperThreads) didn't handled it. I just could not fire up KSP and keep the machine useable - the whole thing started to stutter. Facebook, Youtube, command line terminals, you name it. Everything were stuttering, it was impossible to watch a video!

Since I was going to buy a new old rig anyway (that by pure chance ended up being an Mac mini 6.2 - 4 Cores, 8 HTs), I didn't gave it to much attention. The new old Potato ended up handling KSP >= 1.8.0 and that allowed me to keep using it for development (besides KSP 1.7.3 still being way more performatic, being the reason my main playing is still 1.7).

And so KSP 1.12 came, and screwed everything again: KSP 1.12 did to my Mac Mini 6.2 what KSP 1.8 did on the 5.1 . Krap.

Oh, well. Life goes on. I still can run KSP 1.12 for some "quick" and small tests and use 1.11 for the main workload until I figure out a way to buy yet another new old potato. :P

But then I realized I made a less than ideal decision making on a thingy called Refunding from KSP-Recall. At that time, I had pretty little time to fool around and made things the fast as I could in order to work around the problem under my nose, rushed the thingy into the Mainstream and gone back to day job, and by doing this I ended up bullying the GC. I then optimised a bit the memory usage, but the bulling just would not stop. It ended up being (another) bug on KSP that was causing a memory leak, that was triggering the GC a lot, that was being sabotaged by spinlocks on the waiting threads!

Checking the worst processor hogs on the KSP's process, it came to my attention that almost 100% of the CPU time was being used on a system  call related to Semaphores inside an Unity thread, the dispatch_semaphore_wait_slow that by itself was spinning around os_semaphore_wait that by its turn was calling semaphore_wait_trap:

Process-Thread-View.png

Interesting. Checking the other threads of the KSP's process, I got horrified!

Process-ManyThread-View.png

LOTS AND LOTS OF THREADS hogging up 100% of the CPU, a clear indication of busy waits!

It's not a surprise Refunding was provoking a memory leak - there're som many threads busy waiting for the GC that there's no CPU left for the GC itself, and so we have a dead lock!!!

Process-All-Threads-Screwed.png

Every single thread, including OSX HID Input (Human Interface Devices) are boggling the CPU at 100%! It's evident now why the input is so sluggish when your crafts gets to big for your rig!!!!

Digging a bit more on the subject around the Internet, I came to a Swift code like this one:

        if (mutex_sem != 0)
            kr = semaphore_wait_signal_trap(cond_sem, mutex_sem);
        else
            kr = semaphore_wait_trap(cond_sem);

What explains a lot - semaphore_wait_trap is used without a mutex, and so somebody somewhere is using a spinlock to do the job - you know, we need to synchronize things between threads, right?

Remembering a very productive exchange with darthgently, I decided to use one of the tricks he taught me, the MONO_THREADS_PER_CPU environment variable. It tells mono to, well, limit the number of threads per CPU. :)

By limiting this number, we would have less spinlocks on the process, and so the poor CPU would have a better chance to do its job instead of spinning around the same code waiting for something that will never happens because the CPU is being completely screwed up by the waiting threads.

Since I'm on a UNIX machine, this is what I did:

KSP-with-MONO-THREADS-PER-CPU.png

The command MONO_THREADS_PER_CPU=1 ./KSP.app/Contents/MacOS/KSP will set the environment variable and then call the executable KSP. On exit, the environment variable will be lost, so no chance for it to linger and end up screwing up some other process you call later.

And that, my friends, solved the problem for me. My old Mac Potato 5.1 is now able to run KSP 1.8.1 . Barely, the game is still sluggish but the rest of the machine is useable! I can even watch Youtube videos while running KSP 1.8.1, something that was impossible for this machine 17 months ago!

My less old Mac Potato 6.2 managed to withhold some more abuse from KSP 1.8.0 to KSP 1.11.2 because it have twice the cores of my previous rig, but now on KSP 1.12.0 ~~someone increased the default number of threads per CPU again (or something similar)~~ [edit: see below], and it screwed up my i7: there's no point on increasing the working threads with spinlocks, all you will get is more threads waiting for something that will never happens because your threads are preventing everything else to run!

Note: Since the last time I revisited this article, I managed to diagnose the reason KSP 1.12.x screwed up so badly my punny Mac Potato 6.2: THE TEXTURES. Squad essentially quadrupled the VRAM footprint and this completely wrecked my GPU, as it has a maximum of 1536MB of VRAM. By manually shrinking the textures sizes to a quarter of the original size (more or less the sizes on the KSP 1.5 era) KSP 1.12.x behave nicely on my rig!

Aftermath

Right now, I'm being able to run KSP >= 1.8 on my oldest rig by using this trick. I'm trying now to figure out the best compromise of threads for my rig (2 appears to be acceptable, I will try 4 on my next time window for this). I'm pretty confident that a similar trick will "solve" the issue for my less older Mac Potato, so soon I'm be able to test drive things (and diagnose problems) on KSP 1.12.x, something that until now I was not able to do properly.

These hacks are not solving the bad performance of KSP itself, besides it is getting slightly better (or less worse) too. What will solve the problem for good is using MUTEXES instead of spinlocks, and this is something I do not currently know if it's doable by options or environment variables.

I will update this article with my findings as they happen.

Addenda

On KSP 1.7.3

Out of curiosity, I fired up KSP 1.7.3 (Unity 2017) and inspected the process the same way I did on KSP 1.8.1:

KSP-with-MONO-THREADS-PER-CPU.png

We still have lots of threads screwing up the cores with semaphore_wait_trap, but we also have some others that don't!

Some unnamed threads are using nanosleep, and the OSX HID Input one is using mach_msg_trap.

We have now good evidences that my thesis have teeth.

On KSP 1.3.1

On KSP 1.3.1 (Unity 5) I got similar results!

KSP-with-MONO-THREADS-PER-CPU.png

It worths to mention that KSP 1.3.1 and older are simply the best performatic KSP versions on my i5. Point.

I think we have a pattern here. Unity is (ab)using dispatch_semaphore_wait_slow on Version 2019, and this is royally screwing KSP on anything not top notch (and probably hindering top notch machines as they would probably perform better without this mess).

Conclusions

Besides being a Bad Move™ (and Unity would have better results on the field without using that kind of stunt), what's really screwing up things is not necessarily the spinlocks, but using them where a proper MUTEX is really mandatory. The HID Input thread appears to be one of them, at least.

I think it's more than due time for Unity to start getting their shit together and do things right. For a change.


Lisias T.
2021-0706 T 07:00 Zulu
Revised on:
    2021-0706 T 13:23 Zulu
    2022-1005 T 21:54 Zulu

Also published on Forum.