Performance parameters for Ollama, newbie question

Hello everyone, I am new to Framework and Linux as well. I’m having a lot of fun getting to know everything, but it’s also taking up a lot of my time ;). Anyway, nice to be here now, so far I like the framework very much.

I use Fedora 42

Linux 6.15.3-200.fc42.x86_64

I use the Framework Ryzen™ AI 9 HX 370 with 64 GB Ram and 4 TB hard disk.

I have a question for you: I tried Ollama today and had mistral-small3.2:24b running on it, it runs fine ~3 tokens/sec, but sometimes it takes quite a long time until a prompt is read in. Now I’m wondering if there’s anything I can do to improve this. I’ve looked at the performance data with glances and I think the GPT memory is maxed out, but can someone please explain this to me, I don’t quite understand it yet, does this have something to do with the memory bandwidth and can I improve this?

Oh and by the way, once my whole laptop crashed, locked up and closed all programs, that was not so nice, the problem report said signal desktop quit unexpectedly. But I guess no one can help me with this question so quickly. I hope it doesn’t happen again.