Sadly, ROCm Remains VaporWare

Background: I own a 16 inch – but I essentially ripped out the GPU (which, was easy enough to do in a framework laptop, for sure, though it was definitely not my original intent :slight_smile: ) I did this because, frankly, I was not smart enough to figure out how to use the GPU for general purpose computation (not necessarily LLMs, just any kind of GPU work – a la CUDA).

At the launch yesterday: somebody senior at AMD (responsible for CUDA-equivalent stuff, I think it was @AnushElangovan) mentioned this: https://github.com/nod-ai/TheRock repo which is supposedly the answer to ROCm questions going forward. I have no idea who nod-ai are and how/if they actually relate to AMD, but the utter lack of any tutorial / detailed plan for leveraging the GPUs / AI Max makes me seriously doubt that I would be able to take advantage of the Desktop for hobbyist computational work (again, not necessarily LLMs). This is a shame, my original plan was to buy “one of each” Framework products (I already have a 13 inch as well)… I’m a big fan of framework’s big-picture mission statement.

Punch Line: for the time being, I am really hesitating to order a Desktop… I want to “early adopt” and all that, but the ROCm story is just too flimsy.

Finally: there’s a typo on the product page: https://frame.work/desktop?tab=machine-learning it’s “inference” not “interference”…

3 Likes

Firstly, just to be clear, the official rocm GitHub is here: GitHub - ROCm/ROCm: AMD ROCm™ Software - GitHub Home

Secondly, it was stunning how out-of-touch that AMD lead was with how bad the ROCm experience is for anyone using a consumer GPU that is not an 7900XTX. He also talked up some awesome CI and contribution model. Maybe that exists internally or for large enterprise customers. Not for us normal folks though.

Their software development processes really seem lacking. (based on the kinds of bugs and hacks that you see escape)

Seeding 100x desktops to open source folks is great. Full stop. But realize this cost is still less than 1-year of hiring a single FTE to help fix things. And they clearly need to hire more than that.

The nod.ai team was acquired by AMD, and they have been talking a good talk. But so far (at least publicly) there are no results.

In conclusion:

Since the success of Desktop now hinges on ROCm, I hope the Framework team can impress upon AMD the need to fix the ROCm SW ecosystem and hire some actually good software architects (and give them the funds + authority) to improve their overall SW development practices.

David

2 Likes

I have had some success with ROCM and a FW16

1 Like

@James3 Your thread illustrates the issue exactly. This is so janky.

FYI, the build you used from AMD did not bother to include the chip-specific files you needed for your FW16 GPU. Yes using “export HSA_OVERRIDE_GFX_VERSION=11.0.0” can fool the code in to running but it will not be stable (as there are different chip bugs the GPU-specific files will work around) nor performant (as the tuning for your chip’s caches and execution latency are unknown.).

Newer ROCm builds (and third party builds) may have the dat files for performance tuning, but it’s questionable how accurate those are nor does it address the stability issues. Those need to be coded in the bundled internal compiler.

Wow, HSA. That’s quite the blast from the past there still lurking in their code. Heterogenous Systems Architecture, what they’ve been trying to get people to use since the Kaveri days of APUs.

2 Likes

And just for clarity, below is a screenshot (as of today 27-FEB-2025) of the officially supported and tested ROCm GPUs.

Note the absence Strix Halo (Max+ 395) as used by Desktop. Yes it’s true Max+ 395 is new, but nvidia somehow manages day-1 support for their hardware despite being closed-source and thus no community help/testing.

Worse, note the lack of any of the GPU options for FW16. :exploding_head: Neither the integrated nor dedicated options have any official ROCm support or testing. And this hardware has been out for a long time now.

5 Likes

@David6

I agree. If AMD really are supporting FW with the FW Desktop for AI / ML / LLM models , applications etc. then they (AMD) do need to add the AMD APUs to the officially supported ROCM column.

1 Like

I fully agree !
I won’t buy any AMD hardware for working on LLM until they support of all their AI based product lines with ROCm. My FW16 with 7700S is waiting.

If it were the case, I would have been certainly among the preordering people of the FW desktop.

2 Likes

Oh, I hadn’t seen that support matrix before… currently using ROCM on a desktop RX6800 and it’s working just fine, from what I can tell.

I am just upgrading to an RX7900 XTX though, so will see if there are any noticeable improvements other than the speed.

Similar topic here : Status of AMD NPU Support - #9 by S.H
I used ROCm with RX6700XT and 7800XT so far and it works, but needs the HSA… hack. From the posts in my thread it looks like onnx is the only way to leverage NPU units.

HSA hack? Do you have any details on this?

If I run rocminfo on my device, it seems it’s working fine.

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 3900X 12-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 3900X 12-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4673                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            24                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    65747792(0x3eb3b50) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65747792(0x3eb3b50) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    65747792(0x3eb3b50) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1030                            
  Uuid:                    GPU-631267104c9817bf               
  Marketing Name:          AMD Radeon RX 6800                 
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      4096(0x1000) KB                    
    L3:                      131072(0x20000) KB                 
  Chip ID:                 29631(0x73bf)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2475                               
  BDFID:                   2304                               
  Internal Node ID:        1                                  
  Compute Unit:            60                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 120                                
  SDMA engine uCode::      85                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1030         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

At the FW 2nd Event AMD were also on stage:
Anush Elangovan - VP of AI software at AMD
Talked about ROCM.
2023 - Focused on day zero support of Model (LLAMA, DeepSeek)
2024 - performant day zero support.
2025 - Focus on accessibility of ROCM
We get pytorch to work on all of AMDs AI hardware.
Starting from the Laptops, to Desktops, to the Instint GPUs.

So, if ROCM and pytorch does not work, it sounds like better support might be coming in 2025.

By HSA… hack I mean

I consider this a hack since it fools ROCm to think it is using an RX7900XTX.
I think for the RX6700XT it was HSA_OVERRIDE_GFX_VERSION=10.3.0 but I am not sure since it has been a while since I used the RX6700XT.

AMD Homepage says:

AMD ROCm™ is an open software stack including drivers, development tools, and APIs that enable GPU programming from low-level kernel to end-user applications. ROCm is optimized for Generative AI and HPC applications, and is easy to migrate existing code into.

I am not sure that AMD NPU will be included in ROCm.

Hmm, I recently install Ollama and Invoke.Ai on my Arch based machine with an RX6800 and I’ve not had to mess with this variable, things seem to be running just fine and using radeontop and btop I’ve confirmed that during generation, my CPU is largely idle.

There is a touch of instability in the Invoke app… sometimes after I’ve thrown a lot into the render pipeline the UI will crash, but it’s not frequent enough to really bother me and it seems to recover well… mostly, I think it’s just the Invoke UI rather than rocm… as the majority of the time when I relaunch the UI, the GPU is still working on the output.

Your mileage may vary if you’re using Windows with WSL, if that’s the case?

I’m using fedora linux. But RX6800 is not the same as RX6700XT in this regard. As far as I remember RX6800 is gfx1030 where as RX6700XT is gfx1031 (which is/was not included in ROCm). But I may be wrong.

1 Like

ROCM 6.3.3 from here:

ROCM 6.3.3 seems to like the following GPUs, others need the HSA override:
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx1012.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx1151.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx1200.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx1201.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat”
“/opt/rocm-6.3.3/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat”

The FW16 is gfx1103 so needs the HSA override.

To find out which is which:

3 Likes

Thanks for the info… I just assumed that they’d just include all 6x00 and 7x00 series cards… but seems not. This is definitely something they should expand on.

Agreed 100%. And it’s not just the missing .dat files which other posters have pointed out are included on newer versions. FYI on the newer builds you can get away without the HSA override for more RDNA3 chips.

The bottom line is that there is reason that AMD does not list these GPUs on the supported list. They simply have not yet worked out all the bugs/issues. Nor are they regularly testing for said issues.

It’s good they’ve released the partial support. It’s bad they’ve not actually finished the job.

1 Like

FWIW; Fedora builds gfx1103.

1 Like