Help Me Make Up My Mind (FW13 Ryzen AI 9 HX 370)

Same Post on Reddit

Hi! From the first moment I have fallen in love with framework the concept is just perfect and should be an example for the industry, so I have clear that my next laptop will be a fw, the problem is the when, so I have a couple of questions to see if you could answer them.

My ideally set up would be:

Model: FW13
APU: Ryzen™ AI 9 HX 370
RAM: 2x48gb ddr5-5600

  1. People who already have this equipment with this setup or like this, how has it been running local llms, I am not very involved in the world of llms but I would like to use them for privacy, to code or to speed up some task.
  2. For performance reasons I might be interested in using an egpu to speed up the use of llms, the fw13 ports are T4, from what I’ve read it has a bandwidth limitation but I understand that even if I don’t use all the power of the egpu the result will improve considerably, has anyone tried it?
  3. I love the idea of making use of expansion cards external storgae to boot different OS, my idea is to have a w11 and different linux distros to play with, but I have read that it is not fully recommended despite fw says: "Both are fast enough to run apps and even boot an operating system from", Support does not recommend (REDDIT) Has anyone experienced performance problems, would you recommend it?
  4. The upgradability that fw offers is incredible, the idea of knowing that if in two years time they release a new board with T5 or Oculink ports or even a coreboot compatible board I can simply buy the new board and sell the old one on the marketplace to reduce the contamination, as you know "the one man’s ‘trash’ is another’s treasure". My question is if any of you have sold old hardware to framework to sell on their marketplace, did they give you a good price? In the end this helps to make upgrades cheaper and facilitates the sale in a safe way instead of selling directly to a third party.
  5. If anyone with this set up is using archlinux, how is it going so far, any problems?

If anyone can answer any questions I would appreciate it! Thanks in advance.

1 Like

Go for the Ryzen AI series. AMD has released GAIA (on GitHub), which allows you to run local LLMs directly on the AI chip’s Neural Processors.

2 Likes

GAIA looks very good, I hope it will be ported to linux soon!

This is the thread that goes over why support does not recommend it.
https://community.frame.work/t/responded-1tb-expansion-card-disconnects-randomly

I love these questions! can’t answer most of them since you asked for community input but can confirm that framework does not have a buy back policy, however you can sell your old parts or the full laptop in the community marketplace.

1 Like

Yes, This is the first thread I found talking about this problem, in the reddit post I did quote it but I have removed it here because the forum limits to a maximum of two links per post (I think it doesn’t make a lot of sense to limit it to only two links). TY anyway

I’d love to have your answers if you want, plus you have the Framework Team Role so people can decide whether to trust your answer or just take it with a grain of salt.

I think that’s a limitation for the new users :slight_smile: probably as a spam prevention.

1 Like

I’m also checking if it’s possible to install and boot OS from expansion cards, my understanding is that Windows installation on expansion cards is not supported, but you should be able to try with Linux. Speed will be higher and power consumption will be lower running the OS on the internal storage drive, but you can install Linux on a Storage Expansion Card. Let us know if you have any issues if you do!

2 Likes

Just to clarify It is definitely possible to install windows on an external storage expansion card. My question was more about performance/issues and from what some users have told me they haven’t had any problems. How to install Windows 10 on an expansion card - #11 by Cheese

Note however, that these LLMs created by AMD, not popular LLMs that you would expect an “AI” CPU to be able to run.

gfx1150 support (890M) not working · Issue #40 · likelovewant/ollama-for-amd

Interestingly, inference performance stays the same [when using a build of ROCm patched to support the APU], about 7.1-7.8 tok/s for deepseek-r1:14b q4 on this hardware (AMD HX 370, LPDDR5X 7500), when I run all of these variants:

  • Stock Ollama CPU, num_threads=20 (it’s a 12-core + SMT CPU)
  • llama.cpp with Vulkan
  • llama.cpp with AVX512
  • ollama-for-amd with ROCm

So, most likely it’s a memory bandwidth limitation. The CPU is free to do other things if GPU is used, of course.

So, I would not advise using the AMD Ryzen AI 300 series to learn about AI.

However, on my own laptop I have been able to use Ollma 0.8.6 for Windows to run the gemma3 4 billion (4b) parameter LLM with the open-webui Docker container as a web frontend on Docker Desktop on Windows 11, and I get a response rate of 16 tokens/second. Gemma herself claims that is on the low-end of what to expect from a 4090. So, if you just want to play with some LLMs and not use PyTorch directly, this is fine.

My hardware is

Model: Framework Laptop 13
APU: Ryzen™ AI 9 HX 370
RAM: 2x48gb ddr5-5600

The response to this prompt was generated in about 1 minute and 16 seconds.

Prompt:

What’s a reasonable response token rate for running the latest Gemma model, with 4 billion parameters, on consumer hardware?

Response:

Okay, let’s break down a reasonable response token rate for running the latest Gemma 4B model on consumer hardware. This is a complex topic with a lot of variability, but here’s a realistic assessment, considering current performance as of late 2023/early 2024:

1. Key Factors Affecting Token Rate:

  • Hardware: This is the biggest factor.
    • GPU: A high-end NVIDIA GeForce RTX 4090 (24GB VRAM) is currently the sweet spot for running 4B models reasonably well on consumer hardware. An RTX 3090 (24GB) will also work, but might be slower. Lower-end GPUs (RTX 3060 or lower) will be extremely limited.
    • CPU: A modern CPU (e.g., Ryzen 7 or Intel i7) is needed to handle data transfer and processing.
    • RAM: 16GB is a minimum, 32GB is highly recommended. The model needs space for computations and temporary storage.
    • Storage: An SSD is crucial for fast loading of the model.
  • Quantization: This is the technique that dramatically impacts performance. Gemma 4B supports quantization, allowing you to represent the model’s parameters with fewer bits (e.g., 8-bit or 4-bit).
    • 4-bit Quantization (GPTQ/GGML): This is the most impactful technique for consumer hardware. It significantly reduces memory usage and speeds up inference.
    • 8-bit Quantization: Still a performance improvement over full-precision, but less dramatic than 4-bit.
  • Inference Framework: The software you’re using to run the model matters.
    • llama.cpp/GGML: This is a popular, optimized framework for running LLMs on CPUs and GPUs, particularly well-suited for 4-bit quantization. Generally delivers the best performance on consumer hardware.
    • vLLM: A more sophisticated framework focused on efficient throughput. Might provide slightly better speeds with more optimization, but can be more complex to set up.
    • Transformers (Hugging Face): Can be used, but often less optimized than llama.cpp/vLLM for this model size.
  • Prompt Length & Model Complexity: Longer prompts and more complex tasks will naturally take longer to generate responses.
  • Decoding Strategy: The method used to generate the next token (e.g., greedy decoding, beam search) can affect speed.

2. Realistic Token Rate Estimates (as of Late 2023/Early 2024):

Here’s a breakdown based on the above factors:

  • Optimistic (RTX 4090, 4-bit Quantization, llama.cpp): 15-25 tokens per second. This is achievable with shorter prompts and relatively simple tasks. You could see higher rates with careful prompt engineering.
  • More Typical (RTX 4090, 4-bit Quantization, llama.cpp): 8-15 tokens per second. This is a more realistic expectation for many use cases.
  • RTX 3090 (24GB), 4-bit Quantization, llama.cpp: Likely to be slightly slower than the RTX 4090, maybe 6-12 tokens per second.
  • RTX 3060 (12GB), 4-bit Quantization, llama.cpp: This will be significantly slower, potentially only 3-7 tokens per second, and may require aggressive prompt truncation to avoid out-of-memory errors.

3. Important Notes & Caveats:

  • These are estimates. Actual token rates will vary significantly based on the specific workload.
  • Latency: Don’t just focus on token rate. Latency (the time it takes to receive a single token) is also important.
  • Out-of-Memory Errors: 4B models can still be memory-intensive. Careful prompt engineering (truncating prompts) is often needed to avoid errors.
  • Ongoing Optimization: Inference frameworks are constantly being optimized. Expect future versions to offer improved token rates.

Resources for Further Research:

To give you a more tailored estimate, could you tell me:

  • What GPU do you have (exact model)?
  • What is your RAM?
  • Which inference framework are you planning to use (e.g., llama.cpp, vLLM)?
  • What kind of tasks do you plan to run (e.g., simple chatbot, code generation, document summarization)?

I’m sorry because I’m just starting to research LLMs but basically from what I understand is that as of today amd has not provided official support for the gfx1150 890M. However a patched version of rocm exists GitHub - likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU: ROCm Library Files for gfx1103 and update with others arches based on AMD GPUs for use in Windows. and it has been shown that the improvement is not considerable compared to using the cpu directly instead of the igpu and that it is most likely a bandwidth problem, so there is nothing that can be done about it.

So I guess it will be a matter of time before amd supports the gfx1150 890M and programs like LM studio will be able to make use of the igpu instead of the cpu, right? This will not significantly improve performance because of the bandwidth limitation but it will free up the cpu for other tasks.

To be honest I’m just interested in playing with llms and if possible to replace chatgpt for my daily use for privacy issues, so as far as I can see the quality is pretty decent and viable from the prompt you shared.

Since you have the model with the configuration I was thinking of, if you can answer the question: How about fan noise and battery life at low and moderate workloads on Linux?

Correct. Here is what Task Manager looks like when I run a prompt on Ollama with unpatched ROCm. All CPU use, no GPU use.

AMD has not made any indication that they are going to officially add ROCm support for gfx1150/890M, unfortunately.

I’m using Windows as my OS, with Windows Subsystem for Linux (WSL) when I’m doing Linux tasks, so I can’t comment on Linux use, but the system says silent during low workloads. During the LLM prompt the fans were loud but nothing unbearable. I haven’t had the system long enough to get a full idea of the battery life, but it’s more than enough to get you through at least a few hours of casual use.

1 Like

Well as you say it’s a shame that amd is releasing a series with the ‘AI’ branding and not supporting key features such as the use of igpu to run llms.

Thank you very much for taking the time to answer in such detail! It has helped me a lot and I’m sure it will help other users as well.

1 Like

possible and supported are two different things, we are not the OS police after all :slight_smile:

That’s somewhat disappointing (not the FW Team’s fault). I was rather thinking that an AMD AI chip would be fully supported by ROCm. I, too, ordered one hoping to run offline models taking advantage of the AI chip and iGPU.

Hi, here is my two cents:

Bought first FW13 with Ryzen 7 7840u. Ran dual boot W10 with Ubuntu LTS. W10 worked with the driver package from FW and Ubuntu out of the box. It was an excelent experience*

Later I found out, I don’t need W10 for games as I discovered Steam compability and Proton, so I did not need to log into W10 in a half a year.

Ordered new Ryzen 9 370 HX. Swapped the board and … that’s when the nightmare started:**

TL;DR:
After I was confident the system is stable, I put my old board on the local Craigslist with the idea to link to it in FW Marketplace. Im in EU and using mail service accross the EU is expensive, so tried to sell localy first. And I just sold it yesterday.

For new board 7840u FW asked in our country 15K including VAT, offered it for 12K, settled for 11K.
I’m quite happy so far.

Did not test LLM yet.

**Nighmare of upgrading with unsupported Ubuntu:
Set up BIOS, booted up to Ubuntu and it seemed working flawlessly, but ever now and then the whole system (or maybe just the input and display?) froze. Requiring hard reset. This became more and more frequent. Searched for solutions, found out there is no official support for Ubuntu on the new chip, tried anyway. Updated to the latest Ubuntu, tried latest Kernels. No dice. Only option left was to install supported Distro. Downloaded Fedora 42, and installed it on the USB, created separate two partitions, moved the /home, linked it to Ubuntu and during install to Fedora. All of that with freezes every 5 minutes.
Fedora installed great, had no issues, can now multi-super boot to almost any kernel/Ubuntu combination, Windows and Fedora. All on same 2TB drive.

*issues with FW13:
Keyboard: the coating on frequently used keys wears off. Right now quarter of the keys “w, a and space” weared off from playnig subnautica last summer. For longer sessions I recommend wireless keyboard like from Logitech or a controller.
Another issue is responsiveness of some key groups, especially around the keys “P, 0, -, [, O”
Recently even keys like “D” fail to be typed when stroke. Its very annoying.
I plan on ordering the new keyboard in hopes this will be fixed.
Bezel above the first expansion slot group (nearest to the display):
I mean the Aluminum frame part between the keyboard and the expansion slot is super thin and for some reason it started bending out of the laptop. Also since it is in the place of expansion slot, I cannot grab it easily and try to rebend it back to the original form even after disassembly. It happened on both sides. A cosmetic issue for sure but at this price point I would expect this should not happen. Replacing means buying the whole lower aluminum body part.