DGX Spark vs. Strix Halo - Initial Impressions

Heh, turned out VLLM compiles and runs just fine without NVidia-provided container. Just needed to set an environment variable specifying the arch.

1 Like

In other news, there is some new activity in amd-dev branch of vllm project, so hopefully some improvements are coming in 0.11.1 release. But amdsmi python package is still crashing, so there is that.

1 Like

what strang is that the one from fedora 42 work.

$ amd-smi version
AMDSMI Tool: 24.7.1+unknown | AMDSMI Library version: 24.7.1.0 | ROCm version: N/A

no, not this one. amdsmi Python module, and only on cleanup. amd-smi commandline tool works.

look there is someting wrong with amd-smi for rocm-7+:

  • fedora 42 / rocm 6.3:
$ amd-smi list
GPU: 0
    BDF: 0000:c1:00.0
    UUID: 00ff1586-0000-1000-8000-000000000000
    KFD_ID: 29672
    NODE_ID: 1
    PARTITION_ID: 0
  • fedora 43 / rocm 6.4:
$ amd-smi list
WARNING: User is missing the following required groups: render, video. Please add user to these groups.
GPU: 0
    BDF: 0000:c1:00.0
    UUID: 00ff1586-0000-1000-8000-000000000000
    KFD_ID: 29672
    NODE_ID: 1
    PARTITION_ID: 0
  • fedora 44 / rocm 7.0:
$ amd-smi list
WARNING: User is missing the following required groups: render, video. Please add user to these groups.
GPU: 0
    BDF: N/A
    UUID: N/A
    KFD_ID: 29672
    NODE_ID: 1
    PARTITION_ID: 0

Note: all done on toolbox runing on silverbue 42 …

Strix Halo on Framework MB:

  • FA: on
  • mmap: off
  • GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON
  • ngl: 999
  • n_ubatch=4096
  • backend: rocm
model size params test t/s
Mistral-Small-2506 43.91 GiB 23.57 B pp1 4.69 ± 0.00
Mistral-Small-2506 43.91 GiB 23.57 B pp1 4.69 ± 0.00
Mistral-Small-2506 43.91 GiB 23.57 B pp2 9.19 ± 0.00
Mistral-Small-2506 43.91 GiB 23.57 B pp3 11.23 ± 0.00
Mistral-Small-2506 43.91 GiB 23.57 B pp4 12.86 ± 0.01
Mistral-Small-2506 43.91 GiB 23.57 B pp8 25.37 ± 0.04
Mistral-Small-2506 43.91 GiB 23.57 B pp12 37.53 ± 0.06
Mistral-Small-2506 43.91 GiB 23.57 B pp16 49.17 ± 0.08
Mistral-Small-2506 43.91 GiB 23.57 B pp24 70.87 ± 0.15
Mistral-Small-2506 43.91 GiB 23.57 B pp32 89.94 ± 0.45
Mistral-Small-2506 43.91 GiB 23.57 B pp48 122.01 ± 0.61
Mistral-Small-2506 43.91 GiB 23.57 B pp64 145.84 ± 0.60
Mistral-Small-2506 43.91 GiB 23.57 B pp96 207.52 ± 0.55
Mistral-Small-2506 43.91 GiB 23.57 B pp128 269.40 ± 0.95
Mistral-Small-2506 43.91 GiB 23.57 B pp192 229.28 ± 0.15
Mistral-Small-2506 43.91 GiB 23.57 B pp256 291.95 ± 0.70
Mistral-Small-2506 43.91 GiB 23.57 B pp384 358.48 ± 0.89
Mistral-Small-2506 43.91 GiB 23.57 B pp512 418.56 ± 0.65
Mistral-Small-2506 43.91 GiB 23.57 B pp768 401.40 ± 1.40
Mistral-Small-2506 43.91 GiB 23.57 B pp1024 438.28 ± 1.35
Mistral-Small-2506 43.91 GiB 23.57 B pp1536 439.35 ± 0.80
Mistral-Small-2506 43.91 GiB 23.57 B pp2048 438.40 ± 1.04
Mistral-Small-2506 43.91 GiB 23.57 B pp3072 432.32 ± 0.48
Mistral-Small-2506 43.91 GiB 23.57 B pp4096 423.00 ± 0.47
Mistral-Small-2506 43.91 GiB 23.57 B tg16 4.69 ± 0.00
Mistral-Small-2506 43.91 GiB 23.57 B pp512+tg64 38.69 ± 0.01

The user in toolbox is not a member of the required groups.

:~$ ll /dev/kfd
crw-rw-rw-. 1 root render 235, 0 25 oct.  11:21 /dev/kfd
:~$ ll /dev/dri/renderD128 
crw-rw-rw-. 1 root render 226, 128 25 oct.  11:21 /dev/dri/renderD128

it is not needed with fedora, there is rw for all user. So the chech is “wrong”. I never need it on this OS. (may be needed on server / coreOS release?)
and rocm work fine without.

⬢ [zzzzzz@toolbx ~]$ getfacl /dev/dri/card1 
getfacl : suppression du premier « / » des noms de chemins absolus
# file: dev/dri/card1
# owner: nobody
# group: nobody
user::rw-
user:4294967295:rw-
group::rw-
mask::rw-
other::---

and have user ACL right too. on cardN.

I try to add groups:

sudo usermod -a -G video,render $LOGNAME

But it did not work , did not add user to the groups.

Edit: find how to add user in this groups on host but not how to have them in toolbox. I have to look what is th “good” way for that (and after if is is realy needed…)