AMD ROCm does not support the AMD Ryzen AI 300 Series GPUs

I’ve used ComfyUI with PyTorch running on AMD’s ROCm AI framework on my desktop using Windows Subsystem for Linux (WSL) with an AMD 7900XTX dedicated GPU successfully, and I was curious to see how a laptop APU designed for AI workloads would compare. Sadly, I can’t get PyTorch to work with the Framework Laptop 13 AMD Ryzen AI 9 HX 370 with Radeon 890M with 96 GB of system memory.

It turns out, AMD AMD ROCm does not support the Radeon 890M. In fact, when support was requested, AMD pointed users to third-party patches! So, if you were hoping to use your new AMD Ryzen AI 300 Series laptop with PyTorch, it’s not going to work. AMD’s marketing is being misleading here. If you are going to call something the “AI series” it should work with your own AI framework.

rocminfo

WSL environment detected.
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen AI 9 HX 370 w/ Radeon 890M
  Uuid:                    CPU-XX
  Marketing Name:          AMD Ryzen AI 9 HX 370 w/ Radeon 890M
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      49152(0xc000) KB
  Chip ID:                 0(0x0)
  Cacheline Size:          64(0x40)
  Internal Node ID:        0
  Compute Unit:            24
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  Memory Properties:
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    48965100(0x2eb25ec) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    48965100(0x2eb25ec) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    48965100(0x2eb25ec) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 4
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    48965100(0x2eb25ec) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*** Done ***

So, ROCm is not detecting the Radeon 890M GPU as a GPU agent for ROCm, just the CPU as a CPU agent. This means that the GPU is not usable in ComfyAI, or other AI apps that use ROCm (via PyTorch).

python3 -c 'import torch; print(torch.cuda.is_available())'
False
 ~/a/ComfyUI (master)> python main.py
Checkpoint files will always be loaded safely.
Traceback (most recent call last):
  File "/home/sean/ai/ComfyUI/main.py", line 137, in <module>
    import execution
  File "/home/sean/ai/ComfyUI/execution.py", line 13, in <module>
    import nodes
  File "/home/sean/ai/ComfyUI/nodes.py", line 22, in <module>
    import comfy.diffusers_load
  File "/home/sean/ai/ComfyUI/comfy/diffusers_load.py", line 3, in <module>
    import comfy.sd
  File "/home/sean/ai/ComfyUI/comfy/sd.py", line 7, in <module>
    from comfy import model_management
  File "/home/sean/ai/ComfyUI/comfy/model_management.py", line 221, in <module>
    total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
                                  ^^^^^^^^^^^^^^^^^^
  File "/home/sean/ai/ComfyUI/comfy/model_management.py", line 172, in get_torch_device
    return torch.device(torch.cuda.current_device())
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sean/ai/ComfyUI/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py", line 1026, in current_device
    _lazy_init()
  File "/home/sean/ai/ComfyUI/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py", line 372, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No HIP GPUs are available
4 Likes

This was a problem with the 780M too, which some people could work around by overriding some environment variables. Have you tried this? [Feature]: ROCm Support for AMD Ryzen 9 7940HS with Radeon 780M Graphics · Issue #3398 · ROCm/ROCm · GitHub

1 Like

I just tried HSA_OVERRIDE_GFX_VERSION=11.0.2 python main.py and HSA_OVERRIDE_GFX_VERSION=11.0.1 python main.py with ComfyUI and PyTorch 2.7 stable and the PyTorch 2.8 nightly ROCm builds. No change. :cry:

That’s really interesting, since Framework specifically mentions ROCm in their marketing materials for the Ryzen AI 300 mainboards (Framework | Introducing the Framework Laptop 13 powered by AMD Ryzen).

Would really love to hear the Framework team say more about why they mentioned ROCm when they introduced the new mainboards.

3 Likes

Further information would be interesting for me, too … AMD does not provide any timelines, even for their dekstop-GPUs … There seems to be some progress, but it is unclear if the very limited support for their products will only be getting better with UDNA, which is not very useful for me as a Ryzen AI 300 buyer now.

That blog post specifically calls out ROCm support because of the NPU, but it does not mention the GPU. It’s a subtle but critical distinction.

Because this processor also has a 50 TOPS NPU, it supports Copilot+ and an increasing number of ROCm-compatible open source AI toolkits.

In my original post, you can see that rocminfo detected the CPU as a ROCm agent, but not the GPU. So, some ROCm workloads will work, and others will not, depending on how it is used by the appplication.

AMD’s fragmented support for ROCm is one of the key reason NVIDIA has and continues to dominate the AI space. CUDA works on almost any NVIDIA GPU going back years.

1 Like

To illustrate that point, for consumer GPUs, AMD ROCm currently is only fully supported on the AMD Radeon 7900 series.

System requirements (Linux) — ROCm installation (Linux)

Meanwhile, the latest release of the NVIDIA CUDA framework (12) supports the RTX 5070 and above.

CUDA GPU Compute Capability | NVIDIA Developer

Would be nice to have all the ROCm and AI related concerns in one thread since it makes tracking and troubleshooting the issue easier for everyone that needs it.

I was able to get it working on my HX 370 in my reply in the other thread however I found performance to be subpar when testing with Applio, another caveat is I’m on archlinux not windows or a debian-based system (since I’m used to it and find it more logical in normal operation).

Forgot to post it but here is my output from rocminfo.

rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen AI 9 HX 370 w/ Radeon 890M
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen AI 9 HX 370 w/ Radeon 890M
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      49152(0xc000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5157                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            24                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32119920(0x1ea1c70) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    32119920(0x1ea1c70) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32119920(0x1ea1c70) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32119920(0x1ea1c70) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1150                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      2048(0x800) KB                     
  Chip ID:                 5390(0x150e)                       
  ASIC Revision:           4(0x4)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2900                               
  BDFID:                   49408                              
  Internal Node ID:        1                                  
  Compute Unit:            16                                 
  SIMDs per CU:            2                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       APU
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 29                                 
  SDMA engine uCode::      11                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16059960(0xf50e38) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    16059960(0xf50e38) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1150         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***
3 Likes

Agreed. The Framework mods here have the ability to merge threads. I created a new thread because My issue was on Windows, not Linux.

I think the maintainer of the Arch Linux packages must do some tweaking to get ROCm that GPU to show up in rocminfo, because it does not work that way with AMD’s packages.

I did it, found a way to fix inference times, reducing conversion of 18 second audio file from 125 seconds to 10 seconds in Applio and “fixing” the errors I was getting with MIOpen, I’ve yet to try this with other AI related tasks like stable-diffusion or LLMs so please test and report back.

You’ll need to add these environment parameters before running the py script.

 export MIOPEN_FIND_MODE=FAST
 export MIOPEN_USER_DB_PATH="/home/user/.cache/miopen/3.3.0.a85ca8a54-dirty/"
 export HSA_OVERRIDE_GFX_VERSION=11.0.2

Replace the file path for the user with the one for your machine, since it’s probably different to mine it should be the folder with a file like gfx1102_8.ukdb, I’m unsure if both envs parameters for MIOpen is necessary but it works for me and I don’t want to further experiment in case something breaks.

However other problems I noticed still occur, I’m unsure how to monitor GTT because my physical RAM isn’t being used to it’s max, I have 32GB of 5600MHz RAM but it only uses about 1GB when running Applio which feels wrong since it should be able to use more.

I’m not on Windows but from memory and research to fix the issues I got on Linux, ROCm on Windows doesn’t support iGPUs while you can “force” it on Linux by adding the env parameters, this was my experience with the 7840U and seems to be the same with the HX370.

I ran into these ROCmLibs for the 780M which are built for Windows but I’m unsure how helpful these would be but try them, hope it gets it working.

1 Like

For those interested in RDNA ROCm support, I keep a general set of documents here: https://llm-tracker.info/howto/AMD-GPUs

Recently, I’ve been poking specifically with Strix Halo (gfx1151) and have been able to successfully compile PyTorch, AOTriton, and CK. gfx1150 should be similar. It’s a bit of an adventure, but is now possible: https://llm-tracker.info/_TOORG/Strix-Halo#building-pytorch

Those interested specifically in the state of PyTorch for AMD APUs should track:

3 Likes