[RESPONDED] Hard resets running VMs on AMD 7640U

@jwp can you explain or link some documentation on how I can expose those feature one by one ? I’m not sure how to do that, but basically using kvm64, I’m loosing the AMD SEV feature which makes me unable to boot a Linux VM and inside that Linux VM boot Windows for instance.

EDIT :
Running virsh capabilities showed me most of the flags, I think but I’m still not sure where I should insert them in the XML, most likely inside <cpu>....</cpu>

Here is the list :

      <feature name='ht'/>
      <feature name='monitor'/>
      <feature name='x2apic'/>
      <feature name='osxsave'/>
      <feature name='erms'/>
      <feature name='invpcid'/>
      <feature name='cmt'/>
      <feature name='avx512f'/>
      <feature name='avx512dq'/>
      <feature name='avx512ifma'/>
      <feature name='avx512cd'/>
      <feature name='avx512bw'/>
      <feature name='avx512vl'/>
      <feature name='avx512vbmi'/>
      <feature name='pku'/>
      <feature name='ospke'/>
      <feature name='avx512vbmi2'/>
      <feature name='gfni'/>
      <feature name='vaes'/>
      <feature name='vpclmulqdq'/>
      <feature name='avx512vnni'/>
      <feature name='avx512bitalg'/>
      <feature name='avx512-vpopcntdq'/>
      <feature name='flush-l1d'/>
      <feature name='avx512-bf16'/>
      <feature name='xsaves'/>
      <feature name='mbm_total'/>
      <feature name='mbm_local'/>
      <feature name='cmp_legacy'/>
      <feature name='extapic'/>
      <feature name='ibs'/>
      <feature name='skinit'/>
      <feature name='wdt'/>
      <feature name='tce'/>
      <feature name='topoext'/>
      <feature name='perfctr_nb'/>
      <feature name='invtsc'/>
      <feature name='ibrs'/>
      <feature name='stibp-always-on'/>
      <feature name='amd-ssbd'/>
      <feature name='amd-psfd'/>
      <feature name='lbrv'/>
      <feature name='svm-lock'/>
      <feature name='tsc-scale'/>
      <feature name='vmcb-clean'/>
      <feature name='flushbyasid'/>
      <feature name='decodeassists'/>
      <feature name='pause-filter'/>
      <feature name='pfthreshold'/>
      <feature name='v-vmsave-vmload'/>
      <feature name='vgif'/>
      <feature name='vnmi'/>
      <feature name='svme-addr-chk'/>
      <feature name='no-nested-data-bp'/>
      <feature name='lfence-always-serializing'/>
      <feature name='null-sel-clr-base'/>
      <feature name='auto-ibrs'/>

EDIT 2: I think I found the way how to expose feature I think, just check the flags in lscpu and then add the one you wish to expose in the <cpu> :

  <cpu mode="custom" match="exact" check="none">
    <model fallback="allow">kvm64</model>
    <feature policy="require" name="ibpb"/>
    <feature policy="require" name="spec-ctrl"/>
    <feature policy="require" name="ssbd"/>
    <feature policy="require" name="virt-ssbd"/>
    <feature policy="require" name="svm"/>
    <feature policy="require" name="svm-lock"/>
  </cpu>

Yup you figured it out

Related, nesting VMs with hardware acceleration within disparate OS’s/hypervisors can be super fragile. Generally nesting is only supported on specifically tested combinations, and even then generally of the same hypervisor type.

Well, it may be broken occasionally, but it should never reset the host. It would be interesting to find out whether this happens on Windows as well. This could help to narrow down the issue to hypervisor or CPU issue.

1 Like

So far testing kvm64 and a lot of feature exposed, the nested Windows VM did not boot past the Tianocore BIOS.

Changing the CPU to qemu64, it is working fine for now (if I hit a reset, I will update here), so far I haven’t had a reset on my host where previously exposing the whole CPU in pass through did reset my host.

I will most likely try to narrow down the feature flags list to what is explicitly needed + security flags.

If you can narrow it down, this would be super valuable to figure out what’s going on! :crossed_fingers:Thank you for putting so much effort into this!

I frequently encounter a hard reset of the host while using the Windows 11 virtual machine.

CPU: 7840U
Host UEFI: 3.05
Distro: Gentoo Linux (23.0 hardened profile)
Kernel: 6.6/6.9/6.10/6.11 (dist-kernel)
RAM: 16GiB*2 DDR5-5600 (JEDEC)

#!/bin/bash
qemu-system-x86_64 \
-machine q35 \
-cpu host \
-smp cores=4 \
-accel kvm \
-m 12G \
-rtc base=localtime,clock=host \
-display vnc=127.0.0.3:1 \
-device virtio-vga,xres=2256,yres=1504 \
-drive if=pflash,format=raw,readonly=on,file=OVMF_CODE.fd \
-drive if=pflash,format=raw,file=OVMF_VARS.fd \
-device virtio-balloon-pci \
-object rng-random,filename=/dev/random,id=rng0 \
-device virtio-rng-pci,rng=rng0 \
-device virtio-blk-pci,drive=win11-disk,discard=on,physical_block_size=4096,logical_block_size=4096 \
-drive file=win11-disk.qcow2,if=none,id=win11-disk \
-nic user,model=virtio-net-pci \
-device qemu-xhci \
-device usb-tablet,bus=usb-bus.0 \

I’m not alone, this is extremely frustrating for those who frequently use virtual machines, and this issue has existed for nearly a year without being fixed.

https://www.reddit.com/r/framework/comments/192cm4a/qemu_win11_guest_recommendationsinstability_fw13/

I’m facing the same issues for a while now. I could narrow it down that it doesn’t have anything to do with the VM (runs great on other machines) and probably also not much with my system, so I came here!

Host: FW 13 AMD 7840U, 6.10.13-3
VM: Win11 on qemu/libvirt

Random crashes when using the VM in a way that the host reboots. No logs whatsoever. Also, no observable patterns to when it happens.

I have a OSX VM which runs with CPU: qemu64, that doesn’t have these issues. So probably CPU features may be the right path.

There might be a fix here:

Temporary work around documented here:
https://bugzilla.kernel.org/show_bug.cgi?id=219009

2 Likes

This is so amazing! Will test this ASAP. It definitely felt like a hardware issue and now we know.

Also, another thing to try is

ectool panicinfo

See if that has any info when the problem happens.