@Michael_Edward_Davis one of the new qwen3.5 models seems to be working well for me, just came out a couple days ago. So my current setup is opencode with oh-my-opencode and 3 different models in docker containers. I have one of these 3 options for the different agents that are in oh-my-opencode – small for easy stuff like explore agents, medium for planning, large for heavy coding
small - qwen3-4b-instruct
medium - qwen3.5-35B-A3B at Q5
large - qwen3-coder-next at Q4
in about 3 weeks worth of time I was able to recreate an app completely vibe coding (and I started with 0 knowledge of running local llms). A lot of that time was spent tinkering with settings, different models, learning how to use the agents, and even working around a lot of diff errors when these new models came out. If I had it all dialed in like I do now it prob would have took a week or less. I am quite impressed to be able to have made a working product using almost exclusively local models on 1 framework desktop. The GUI I created a prototype in about 15 prompts with the local model then fed it to Claude Opus 4.6 and told it to make me a more professional looking one… it gave me the end result with just 1 prompt
one of the new qwen3.5 models seems to be working well for me, just came out a couple days ago.
I tried Qwen3.5-122B-A10B at Q6 on Linux using llama.cpp and it works fine for me with 128 GB of RAM after I applied the following tweaks:
The model needs 106 GB at Q6, so I had to increase the ttm.pages_limit kernel parameter from 25165824 (96 GiB) to 29360128 (112 GiB).
I use the vk_radv shim from the amd-vulkan-prefixesAUR package. This shim has been broken since February 2026, when a vulkan-radeon driver v26 dropped, which renamed /usr/share/vulkan/icd.d/radeon_icd.x86_64.json to radeon_icd.json. The shim doesn’t pick up that file name so I had to patch it.
How’s fast is token generation for 122B-A10B? I tried UD-Q4_K_XL on llama.cpp but the model must have gotten stuck in a loop because it didn’t finish loading after an hour. I’m wondering if it’s worth trying again at a different quantization.