Which language models are you using?

Rand_o · February 27, 2026, 2:54am

@Michael_Edward_Davis one of the new qwen3.5 models seems to be working well for me, just came out a couple days ago. So my current setup is opencode with oh-my-opencode and 3 different models in docker containers. I have one of these 3 options for the different agents that are in oh-my-opencode – small for easy stuff like explore agents, medium for planning, large for heavy coding

small - qwen3-4b-instruct
medium - qwen3.5-35B-A3B at Q5
large - qwen3-coder-next at Q4

in about 3 weeks worth of time I was able to recreate an app completely vibe coding (and I started with 0 knowledge of running local llms). A lot of that time was spent tinkering with settings, different models, learning how to use the agents, and even working around a lot of diff errors when these new models came out. If I had it all dialed in like I do now it prob would have took a week or less. I am quite impressed to be able to have made a working product using almost exclusively local models on 1 framework desktop. The GUI I created a prototype in about 15 prompts with the local model then fed it to Claude Opus 4.6 and told it to make me a more professional looking one… it gave me the end result with just 1 prompt

Claudia · March 1, 2026, 3:51pm

one of the new qwen3.5 models seems to be working well for me, just came out a couple days ago.

I tried Qwen3.5-122B-A10B at Q6 on Linux using llama.cpp and it works fine for me with 128 GB of RAM after I applied the following tweaks:

The model needs 106 GB at Q6, so I had to increase the ttm.pages_limit kernel parameter from 25165824 (96 GiB) to 29360128 (112 GiB).
I use the vk_radv shim from the amd-vulkan-prefixes^AUR package. This shim has been broken since February 2026, when a vulkan-radeon driver v26 dropped, which renamed /usr/share/vulkan/icd.d/radeon_icd.x86_64.json to radeon_icd.json. The shim doesn’t pick up that file name so I had to patch it.

The following server command line works for me:

Coding tasks:

vk_radv llama-server \
  -hf unsloth/Qwen3.5-122B-A10B-GGUF:UD-Q6_K_XL \
  -c 16384 -t 16 -cram 0 -np 1 \
  --min-p 0.00 --temp 0.6 --top-k 20

General tasks:

vk_radv llama-server \
  -hf unsloth/Qwen3.5-122B-A10B-GGUF:UD-Q6_K_XL \
  -c 16384 -t 16 -cram 0 -np 1 \
  --min-p 0.00 --temp 1.0 --top-k 20

Guest383 · March 4, 2026, 5:40pm

How’s fast is token generation for 122B-A10B? I tried UD-Q4_K_XL on llama.cpp but the model must have gotten stuck in a loop because it didn’t finish loading after an hour. I’m wondering if it’s worth trying again at a different quantization.

Claudia · March 4, 2026, 10:40pm

On Linux using the RADV driver, UD-Q6_K_XL infers at ~ 17 tokens/s for me in llama.cpp-vulkan.

Guest383 · March 5, 2026, 4:55pm

Thanks, I’m going to have to look into using Vulkan. I get ~15.0 t/s using llama.cpp-hip (ROCm) on Linux.

inteks · March 5, 2026, 10:00pm

i am using llama.ccp vulkan on Windows and getting 20-25t/s … dont forget to disable thinking ;o)

c:/llama.cpp.vk/llama-server.exe --host 0.0.0.0 --port 8123 --model C:\\Users\\Admin.lmstudio\\models\\mradermacher\\Qwen3.5-122B-A10B-heretic-i1-GGUF\\Qwen3.5-122B-A10B-heretic.i1-Q4_K_M.gguf --chat-template-kwargs “{“enable\_thinking”: false}” -c 81920 --keep 1024 --no-mmap --flash-attn on --cache-type-k q8\_0 --cache-type-v q5\_0 --context-shift --metrics --ubatch-size 3072 --batch-size 3072 --mmproj C:\\Users\\Admin.lmstudio\\models\\mradermacher\\Qwen3.5-122B-A10B-heretic-i1-GGUF\\Qwen3.5-122B-A10B-heretic.mmproj-f16.gguf

inteks · March 7, 2026, 4:34pm

do NOT set cachetype to Q5 !! then it will fallback to CPU use

--cache-type-k q8_0 --cache-type-v q8_0 instead, or leave parameter out

Topic		Replies	Views
[TRACKING] Will the AI Max+ 395 (128GB) be able to run gpt-oss-120b? Framework Desktop framework-desktop-ai-max-300 , ai	35	15054	January 25, 2026
Llama.cpp/vLLM Toolboxes for LLM inference on Strix Halo Framework Desktop	56	9756	February 2, 2026
AMD AI Max+ 395 128GB with cline Framework Desktop ai	14	1670	September 5, 2025
AMD Strix Halo (Ryzen AI Max+ 395) GPU LLM Performance Tests Framework Desktop ai	17	19472	September 29, 2025
Framework 13 + Ryzen AI + Linux Distro + LLM Linux ubuntu , fedora	20	4603	February 11, 2026

Which language models are you using?

Related topics