It’s not just this specific implementation.
The whole concept of what ReBAR is used for is extremely latency sensitive. Because it’s all about direct vs. indirect data transfers, which is just a giant trade off.
For this you essentially always need a heuristic when it’s worth it to use the direct transfers that ReBAR enables for classic GPUs. This heuristic is typically in the driver and can easily be wrong if the PCIe bandwidth and latency Is not what is expected for a specific GPU model.
And the less CPU time is available (on the core initiating the transfer), the more any direct transfers will hurt. And the only benefit direct transfers provide is less latency. If the discrepancy between direct and indirect is negligible compared to the transfer latency of the PCIe bus itself, there are no benefits to be gained, only penalties.
Sadly, we do not have measurements for eGPU (better USB4 PCIe) latencies, so we are still learning where that line is.