I am not using Arch these days, but to investigate this, I did some testing on Arch directly, as well as in Ubuntu 22.04 and 24.04 distrobox environments on NixOS, and also directly on NixOS. Admittedly, it’s a bit like apples to oranges to peaches (?) comparison, on top of not being particularly rigorous. But here are my observations, for what it’s worth:
The Ubuntu 22.04 distrobox with Python 3.10 and ROCm 5.6, under NixOS, seemed to be the most stable. InvokeAI (the version I installed a while ago - not sure about the most up-to-date one) was fine generating images up to 1024x1024 and using models up to 7.2GB in size (the largest ones I had that my InvokeAI supported). I was able to crash it, eventually, but I would say it was almost dependable, for text2image inference.
ComfyUI was easier to crash, but still handled those same cases Ok. With larger models, such as Flux and Flux derived ones, it was mixed success: some crashed on loading, some in VAE, and some successfully produced 1024x1024 images.
Forge was more successful with Flux than Comfy, but still crashed on loading the larger sizes. The error messages on crashes were similar to the ones listed by @Justin_Weiss, and in the AMD bug report he linked.
Ubuntu 24.04 distrobox with ROCm 6.1 and bare NixOS with 6.0 were less stable still. 6.1 was, probably, the worst. 6.0 felt more usable, and I was able to generate 1024x1024 images with (moderately) large SDXL models in ComfyUI, although not reliably. 6.1 would crash almost immediately.
On Arch, the same pattern remained: 6.0 was more stable than 6.1, ComfyUI was able to generate larger images with the SDXL and other ~6-7GB models. But would also crash every now and then.
All in all, the state of memory handling throughout the amdgpu/ROCm/SD stack, and the reliability and fit of the different component versions still leave much room for improvement. Hopefully, the queue-related bug report referenced above will be addressed - although it’s not clear if the problem is, indeed, in amdgpu, or elsewhere in the stack. Newer versions of ROCm seem to handle issues worse than the older ones on this platform, but, perhaps it’s also a matter of configuration - I used everything with default settings - or maybe of a version match with other components…
I did not have the amdgpu.sg_scatter=0 on either NixOS or Arch: I removed that a couple of kernel versions ago, and adding it back, at least on NixOS, didn’t improve the stability. My system is FW13 with AMD 7840u, and 64GB of RAM. The kernel versions were 6.10.7 on NixOS and 6.10.8 on Arch
Also, a useful tip: if your inference is constantly crashing, or is suddenly running very slow, regardless of what you do, clean the ~/.config/miopen folder. I spent several hours trying to understand why my previously working under ROCm 6.0 ComfyUI setup started to run excruciatingly slow after a crash, before finally tracing it to the memory cache in that folder. Ironically, the presence of the cache files does make loading larger models more stable - so, if you have to delete them, running an inference with a smaller model first might help to load a larger model later…