I had some success with ROCm nightly and ComfyUI in a custom podman container, but due to a kernel bug (fixed in 6.18), it can lead to random crashes under heavy GPU load. Looking at your homepage, my installation is similar, just isolated from the host system.
Containerfile
FROM python
RUN pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx1151/
RUN git clone https://github.com/comfyanonymous/ComfyUI
WORKDIR /ComfyUI
RUN pip install -r requirements.txt
EXPOSE 80
ENTRYPOINT python main.py --listen 0.0.0.0 --port 80
justfile
image := "comfyui"
port := "8188"
date := `date --iso-8601`
build:
podman build -t {{image}}:{{date}} .
rebuild:
podman build -t {{image}}:{{date}} --no-cache .
run:
podman run -it --rm -p {{port}}:80 -v "/path/to/project-files:/ComfyUI/user/default" -v "/path/to/models:/ComfyUI/models:ro" -v "/path/to/output:/ComfyUI/output" --device=/dev/kfd --device=/dev/dri {{image}}
I’m using “just” here, but batch files would also do. My workflow right now is to have tagged container images that I can manually update and, in case of a problem, just re-tag and run a previous image. Next step might be to integrate ComfyUI Manager, because many workflows require custom nodes/plugins - that’s also why I prefer a container without a shared home folder.
I run with a fixed BIOS setting of 512MB and this kernel settings (128GB model):
amd_iommu=off ttm.pages_limit=33554432
The pages_limit makes essentially all RAM available as VRAM on demand. So far, all ROCm and Vulkan programs (including games) work as expected.