BIOS Feature Request: Add ability to specify UMA size on AMD APUs

@Wrybill_Plover @the_artist @tom_d Sorry to tag you guys, but just in case: anyone has used efibootmgr before? seem it would be possible to modify the efivariables with a script right before boot to at least skip doing the change manually every time the computer boots

I did use efibootmgr before, but only to manage the UEFI boot entries.

I donā€™t see how it can help here, to be honest. Theoretically, it might be possible to create a boot-time shim that will set the variables and then chain-boot the main system, but it would be that shim that all the work would have to go into. If I understand it correctly, the most value efibootmgr would be able to add is to then set the shim as the active boot entry.

TL;DR
Smokeless seems to add boot entries that include the menus so that you dont require to boot from USB every time.

To my surprise seems smokeless did something like that. I found a bunch of boot entries with unspecific names after my first use

.
efibootmgr output:

BootCurrent: 0008
Timeout: 0 seconds
BootOrder: 0008,0000,2001,2002,2003,0003,0004,0005,0006,0007
Boot0000* Fedora        HD(1,GPT,fe561df9-de7c-49a6-98a7-dcab5c6d27e9,0x800,0x12c000)/\EFI\fedora\shim.efiRC
Boot0003* UEFI Misc Device      VenHw(77e79a1e-e1fb-491f-a7c1-fa1b5412532a){auto_created_boot_option}
Boot0004* UEFI Misc Device 2    VenHw(1c54c333-24ff-4506-a9d6-0a624e09ae7e){auto_created_boot_option}
Boot0005* UEFI Misc Device 3    VenHw(8f1c1ac6-fbc0-4dab-a8be-b412a13c8b45){auto_created_boot_option}
Boot0006* UEFI Misc Device 4    VenHw(7517821f-d9e1-44c1-a75a-d054cef3f8f8){auto_created_boot_option}
Boot0007* UEFI Misc Device 5    VenHw(e0ba9b98-dd2d-4434-bb94-599cc9e4305d){auto_created_boot_option}
Boot0008* Fedora        HD(1,GPT,fe561df9-de7c-49a6-98a7-dcab5c6d27e9,0x800,0x12c000)/\EFI\fedora\shimx64.efi
Boot2001* EFI USB Device        RC
Boot2002* EFI DVD/CDROM RC
Boot2003* EFI Network   RC

Tried to find where those venhw guid pointed to but couldnā€™t so I went and tried them. They correspond to the smokeless menu items.

Seems to work, the only issues is that when exiting and saving the feedback it gives you is sort of non-existent or inaccurate. you have to hit f10, go back to the boot options select your OS, and it finally asks to save. I tried doing it in a few other ways and it booted normally without actually changing anything.

I have checked my boot partition and elsewhere. smokeless does not seem to have added any binaries. My guess is the venhw guids point to stuff already on the firmware. It is interesting that on the dp portion of the boot entry sends some hex values that donā€™t make much sense when translated to decimal or ascii, maybe they are like the key combo to get to the hidden bios settings in other motherboards.

I opened up some of the efi files in smokeless to try to figure out what they are doing but I donā€™t want to trace the functions manually it is going to take me too much time that is not my strong suit. I tried because I got kinda paranoid seeing the entries added tot he boot menu. I have checked my outgoing traffic for most of yesterday and nothing out of the ordinary. Again, it also seems smokeless didnā€™t add any binaries.

Your last screenshot comes through very blurry on my end, so Iā€™m not sure what the blue message in the middle of the screen is saying, unfortunately.

I would think the the entries youā€™re seeing are just artifacts of the system adding your USB media as boot entries to manage one-time booting from the USB. They are not something Smokeless-specific, nor would they actually load any changes made in Smokeless. They are, essentially, placeholders for those one-time boots. Usually, you can either ignore those, or delete them using, e.g., efibootmgr.

Unless you are seeing that they do change the settings without the Smokeless drive being inserted, in which case Iā€™m totally wrong :smile:

Last prompt is asking me to apply changes.

It does work. My initial idea to avoid having to use a usb every time was to put smokeless on a fat32 partition and add the boot entry. but when I was about to do that I saw these entries and started playing with them.

It is not the solution that I wanted but it is much better than no solution, or having to use a usb drive every time. If im not doing any ML work i can forget about it. If I do I just have to hit f12 during boot make changes and done.

i used the smokeless beta by the way.

I am always paranoid about unknown stuff running without me knowing so I will keep testing. I am not sure if smokeless changes a variable and makes the menus accessible but from the boot entries but my suspicion is that similar to other bios it is just sending a hex value to have the bios think you doing the correct input for the hidden menu. If it is not that I am guessing the dev of smokeless had enough insight that i generates the menu and add the boot entries, if that is the case i dont even know where it is stored. I have checked by boot partition and there is nothing extra. I have gone though it with a fine tooth comb and nothing. Short of dd and checking the binary data manually. I wish they just open-sourced it

If my guess is correct then the menus should be accessible by adding the boot entries, because if we are using the same bios it will point to the same place.

I have found a couple of reddit posts and win raid post (i think) where ppl ask about the extra boot options with the same guid (after using smokeless). if not bios then the guid must point to what? VenHw usually is to point to specific hardware vendor ID, and according to various AIs the values on those boot prompts do not correspond to the nvme but to ā€˜PCI bridgeā€™ (which could still be the nvme drive). but it is unspecific enough that it might just point to somewhere on the firmware.

I dont know enough about efi to know or to trace where those guids are pointing to. I have search for them on my system and cant find them at all. which also has me believing they point to somewhere on the firmware.

stable diffusion works miles better with the extra ram. 512x512 takes seconds instead of minutes. i can do 1080x1080 in about a minute. granted the test i have done were not bench-marked and simple prompts. but it uses about 8GB works no issues. I stop using the --novram. similar situation with LM studio or just llama.cpp (although llama added support for igpus via a flag)

1 Like

I was able to use Smokeless to successfully set 16GB UMA Frame Buffer as well! The main difference I noticed so far was being able to do some outpainting in InvokeAI, which previously always ran out of memory. I havenā€™t got a chance to try larger models or images yet.

However, what is strange is that Iā€™m not seeing the boot entries youā€™re seeing, @Xal, either on the BIOS Boot Manager screen, or in the efibootmgr. Like you, I was using the Beta version, booting it from a FAT32-formatted USB. Not sure what weā€™re doing differently. :man_shrugging:

1 Like

Another great news: I was able to follow the advice from here, and use force-host-alloction-APU to get StableDiffusion to dynamically allocate VRAM from the GTT, without the need to do anything with the UMA.

SDXL models worked, generating 1024x1024 images didnā€™t run out of memory or crash, and the speed was similar to what I saw after successfully using Smokeless, maybe just a tad slower: about 1.9it/s for SD-1 512x512, and about 3.5s/it for SDXL 1024x1024 and ~1.3it/s for 512x512.

3 Likes

thanks, this is very useful. Covers most of my needs. I guess the one things pending would be legacy apps that cant be updated to accomodate for igpus

try adding them and see if something happens:

Boot0003* UEFI Misc Device      VenHw(77e79a1e-e1fb-491f-a7c1-fa1b5412532a){auto_created_boot_option}
      dp: 01 04 14 00 1e 9a e7 77 fb e1 1f 49 a7 c1 fa 1b 54 12 53 2a / 7f ff 04 00
    data: 4e ac 08 81 11 9f 59 4d 85 0e e2 1a 52 2c 59 b2
Boot0004* UEFI Misc Device 2    VenHw(1c54c333-24ff-4506-a9d6-0a624e09ae7e){auto_created_boot_option}
      dp: 01 04 14 00 33 c3 54 1c ff 24 06 45 a9 d6 0a 62 4e 09 ae 7e / 7f ff 04 00
    data: 4e ac 08 81 11 9f 59 4d 85 0e e2 1a 52 2c 59 b2
Boot0005* UEFI Misc Device 3    VenHw(8f1c1ac6-fbc0-4dab-a8be-b412a13c8b45){auto_created_boot_option}
      dp: 01 04 14 00 c6 1a 1c 8f c0 fb ab 4d a8 be b4 12 a1 3c 8b 45 / 7f ff 04 00
    data: 4e ac 08 81 11 9f 59 4d 85 0e e2 1a 52 2c 59 b2
Boot0006* UEFI Misc Device 4    VenHw(7517821f-d9e1-44c1-a75a-d054cef3f8f8){auto_created_boot_option}
      dp: 01 04 14 00 1f 82 17 75 e1 d9 c1 44 a7 5a d0 54 ce f3 f8 f8 / 7f ff 04 00
    data: 4e ac 08 81 11 9f 59 4d 85 0e e2 1a 52 2c 59 b2
Boot0007* UEFI Misc Device 5    VenHw(e0ba9b98-dd2d-4434-bb94-599cc9e4305d){auto_created_boot_option}
      dp: 01 04 14 00 98 9b ba e0 2d dd 34 44 bb 94 59 9c c9 e4 30 5d / 7f ff 04 00
    data: 4e ac 08 81 11 9f 59 4d 85 0e e2 1a 52 2c 59 b2

I booted a few time in row from smokeless i also spend a few minutes going through the menus. maybe enough time for it to add the entries? will check my history to link the post where some ppl saw the same behavior i did.

Adding the entries should work if my assumptions of of no binaries and ā€œdataā€ being the the combo for hidden menu (like in other motherboards).

Did you get a libstdc++ error at some point when compiling? also did you even get a malloc thingy to be ignored by python?

No. The only thing I needed to compile was forcegttalloc.c, I compiled it with hipcc following the instructions on the GitHub page, and it compiled without any errors or warnings.

However, I did get an libstdc++ loading error at first, when I tried running InvokeAI through the provided invoke.sh script with the LD_PRELOAD variable set:

šŸ“¦[user@rocm invokeai]$ LD_PRELOAD=../force-host-alloction-APU/libforcegttalloc.so HSA_OVERRIDE_GFX_VERSION=11.0.0 ./invoke.sh 
dirname: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory

Instead of troubleshooting the path references in the script, I just activated the venv manually, and executed invokeai-web directly, like this:

(.venv) šŸ“¦[user@rocm invokeai]$ LD_PRELOAD=../force-host-alloction-APU/libforcegttalloc.so HSA_OVERRIDE_GFX_VERSION=11.0.0 invokeai-web

It managed to find the libraries just fine then.

Not sure what you mean. I only tried this with InvokeAI so far, not directly from Python, and didnā€™t get any memory-related errors (that I noticed).

I canā€™t get it to compile within the virtual environment. With sudo it canā€™t find libstdc, even when i -L the directory. Without sudo it gives me a permission error that wont go away even when using a temp folder or changing the permissions to allow everyone.

I compile out of the environment, but of course that is no good. When putting that in the environment it gets ignored as it cant find the libraries in the right place.

Im sure im drowning in a puddle. I guess im too tired will retry everything in the next few days.

Are you using a container? (Anything else aside from the pything venv).

Yes. Iā€™m using the distrobox, which I set up mostly following the guide I mentioned here. I didnā€™t need to add anything to the system to compile the module, just got the source from the force-host-alloction-APU repo, and used the already installed in the distrobox hipcc.

I donā€™t think you need sudo for anything other than the initial installation of ROCm or other system-wide packages. I didnā€™t use sudo for anything at all this time around, since I already had the distrobox set up a few months ago, when I first experimented with InvokeAI on the Framework.

Finally managed. for whatever reason the linker was not finding the library, ended up just doing a symlink to trick it. This solution seems to work better than the UMA smokeless solution. Even in --highvram (UMA) it kept crashing as it moves models in and out of memory it was causing the igpu to reset when vram got low (depending on what I was doing) which made the whole thing get stuck.

Yes, Iā€™m pretty happy with it too, so far :slightly_smiling_face:. Really thankful to Carlos Segura for putting the memory allocation module together!

Speaking of crashing, I did have the entire Hyprland session crash on me a few times, as I was running generations on different models and resolutions. But, it well could be just the general instability of the other components involved. In my experience, StableDiffusion implementations tend to be not particularly stable, as a ruleā€¦

Even more good news! Automatic native VRAM allocation from GTT in the AMDKFD driver (which is used by ROCm, etc.) on APUs was pushed to the upcoming Linux 6.10, and is already present in 6.10-rc1!

Phoronix reported on it here: https://www.phoronix.com/news/Linux-6.10-AMDKFD-Small-APUs.

I just tried 6.10-rc1, and was able to run SDXL generations out of the box, with no changes to the tool set, or the dynamic library loading tricks mentioned above!

5 Likes

Im glad to read this. Thanks for pointing it out.

On SD crashing yes, some of the components cause that. I have also observed that it will also crash with if Python thinks there is not enough vram (even with the workarounds). While the ram used is the systemā€™s with the module the whole thing will just crash and reset the graphics and session if not ā€œenoughā€ vram. I most stable I have been able to run is with the UMA set to 16GB, and the allocation module. I once this a batch of 100+ images 1024*1024.

Batch might not be the right word as I queued a bunch of images. Been trying to get some art concepts done as I am getting into game dev again.

I can confim it work! With fedora 40 that now have kernel 6.10.3 it is now working out of the box.

if needed you can change default GGT size with kernel param (default is 1/2 of RAM):

# for 16Go:
amdgpu.gttsize=16384

How have you been running this? Iā€™ve tried ComfyUI and stable-diffusion-webui off and on over the last few months, and have never got it to generate more than one image before crashing my entire session, even using kernel 6.10.7 and ROCm 6.1.

Without specifying HSA_OVERRIDE_GFX_VERSION, I get ā€œHIP error: invalid device function.ā€ If I specify HSA_OVERRIDE_GFX_VERSION=11.0.0, it will rarely generate an image but will usually flash the screen black and crash the whole desktop session, needing a restart. Looking through github issues, I thought it was just that there was no combination of kernel / pytorch / ROCm that would work on AMD APUs, but it sounds like itā€™s working for some people?