== A one in a million bug in Switch kernel == Nintendo Switch firmware 14.0.0 was released yesterday. It contained many minor changes to their kernel. One of them, was that during user-mode cache operations (flush / clean / zero), it now sets a secret byte in the thread local storage (TLS) to 1. If an interrupt is received, kernel-mode reads the user-mode byte from TLS, and if it's equal to 1, the kernel performs a memory barrier. Why is this complicated TLS communication scheme necessary between user-mode and kernel? Nintendo would not introduce this out-of-the-blue, there is some weird hardware phenomenon going on. This took some time to figure out, but imagine the following sequence of instructions executing: dc civac, x8 add x8, x8, #32 dc civac, x8 add x8, x8, #32 dc civac, x8 <------- what happens if you take an interrupt here? add x8, x8, #32 dc civac, x8 add x8, x8, #32 dsb sy <------- memory barrier ret An interrupt may be received by the CPU at any point during game execution. Interrupts may lead to "core migration", which is when the kernel scheduler moves a thread to a different CPU core. If we imagine a core migration in this code sequence, we can clearly see the problem: dc civac, x8 <--- Core 0 add x8, x8, #32 <--- Core 0 dc civac, x8 <--- Core 0 add x8, x8, #32 <--- Core 0 dc civac, x8 <--- Core 1 [interrupt! core migration] add x8, x8, #32 <--- Core 1 dc civac, x8 <--- Core 1 add x8, x8, #32 <--- Core 1 dsb sy <--- Core 1 [memory barrier] ret Do you see the problem? There was never a memory barrier on core 0! This means that *not necessarily* all cache ops are completed by the time the function returns! For a brief time, the physical DRAM, for some of the cache lines, will be incorrect. So to summarize, if the CPU: (1) takes an interrupt inside a function like this (super rare) AND (2) the scheduler decides to perform core migration (super rare) Then, you'd get some graphical glitches (games mainly use cache operations when talking to the GPU). In this situation, devs would probably blame faulty DRAM chips or CPU errata, but this is totally a pure software bug! This bug has existed since day zero, which means that it took 5 years (!) for Nintendo to track it down. Credits to whoever nameless employee at Nintendo found this bug! The attention to detail is incredible. And how do you even find / debug a bug like this? Makes you think, do Linux, Windows and Mac handle this properly? Honestly, I doubt it! Thanks to SciresM for discussion / diff. --plutoo