[TRACKING] HDMI screen black after wake from hibernation, only on left front port

suliblian · May 13, 2023, 12:22pm

Framework 11th Gen Core i7-1165G7, running Manjaro. [edit: BIOS GFW30.03.17]

When the HDMI module sits in the front left port, the HDMI external screen remains black after waking from hibernation, or after boot. This happens irregularly (or I didn’t see the secret correlation to … whatever).

Then, mouse can move there, I can move new app windows that appear there (not visibly) with shortcuts to the internal screen.

Display settings allow a max resolution of 1024x768 (it is capable of 1920x1080).
xrandr does not detect any higher resolution either. The gap between screens may be due to the external display’s smaller size while the internal screen remained at its position.

Going to another TTY (alt-F2) displays the command line only on the internal screen. startx after a loging in there resulted in the icons that are usual on the external screen now being shown on the internal, but the task bar is not moved with them. Desktop and Plasma have different ideas of what is active???

Unplug – replug does not help. HDMI remains black.

After a mainboard reset the external screen temporarily allowed up to 1280x1024, but there was no other change.

Tested with different cables, two external monitors, one HDMI module and on all ports. Only the left front port (or the combination of this port with this module) show that behavior.

Because it’s only one port, I doubt the OS is the culprit.

Now the laptop is about a year old and has not been used in the first half of that time. I’d claim a warranty case, but unfortunately I have applied the 11Gen rework*, so that is gone. I can sorta live with the status quo, but it feels so … unsatisfying.

*problem appeared before

Any suggestions what to do?

David_Alexander · May 17, 2023, 5:01am

I get very similar behavior also on manjaro. Randomly happens after sleeping or unplugging the monitor. HDMI screen reports (via xrandr) incorrect non-standard resolutions that vary between restarts. For me it happens in multiple ports. The only way I can get it working again is randomly putting the expansion card in different slots and restarting. Often it takes 5+ iterations of this and I can’t find any sort of pattern for when it decides to work again. Very frustrating. Owned my laptop for over a year and only started happening in the last few months.

Matt_Hartley · May 18, 2023, 5:23pm

It very likely is as it has been reported by others on Manjaro. That said, if you would prefer to stick with Manjaro vs Fedora, we can try to drill this down. To be clear, we vet distros as outlined here - officially supported distros are vetted on a bi-weekly basis to test updates, community distros are tested occasionally to make sure the basics work only.

Verify behavior on Fedora 38 (GNOME) live USB. If it happen there as well, please report back what happened.
I prefer to avoid using any creative boot parameters here as the Live USB of Fedora will tell us pretty quickly if we have a module issue or if this is a OS/driver related.

suliblian · May 21, 2023, 9:04am

TL;DR: ~~Faint hope a kernel update has helped.~~ Fedora showed no problems, but does not hibernate.

Thanks for the readiness to help solving this. The testing with Fedora showed no issue, so you were right re OS causing it. I am reluctant to hop distro, prefer to resolve Manjaro issue.

Tests with Fedora

Fedora 38 live (booted from image via ventoy)
Fedora 38 install to external usb ssd (clean, except locale, time zone, keyboard)

Several rounds of suspension or reboot with both – no problems re black screen.
However, Fedora has ditched hibernation a while ago and on Manjaro I am relying on it heavily. Hibernation may be a starting point to look deeper into.
@David_Alexander : have you per chance observed if the black screen appeared only after suspension/sleep, hibernation, cold boot, reboot?

Then I read up (and groped in the dark, I am truly out of my depth here) about drivers:

$ inxi -G | grep driver
Device-1: Intel TigerLake-LP GT2 [Iris Xe Graphics] driver: i915 v: kernel
Display: x11 server: X.Org v: 21.1.8 with: Xwayland v: 23.1.1 driver: X:

$ inxi -G | grep i915
Device-1: Intel TigerLake-LP GT2 [Iris Xe Graphics] driver: i915 v: kernel
loaded: modesetting dri: iris gpu: i915 resolution: 1: 1920x1080

$ inxi -G | grep X
Device-1: Intel TigerLake-LP GT2 [Iris Xe Graphics] driver: i915 v: kernel
Display: x11 server: X.Org v: 21.1.8 with: Xwayland v: 23.1.1 driver: X:
API: OpenGL v: 4.6 Mesa 23.0.3 renderer: Mesa Intel Xe Graphics (TGL GT2)

$ lsmod | grep i915
i915                 3211264  109
drm_buddy              20480  1 i915
ttm                    94208  1 i915
drm_display_helper    184320  1 i915
cec                    81920  2 drm_display_helper,i915
intel_gtt              28672  1 i915
video                  65536  1 i915

$ systool -m i915 -av
Module = "i915"

Attributes:
coresize            = "3211264"
initsize            = "0"
initstate           = "live"
refcnt              = "112"
srcversion          = "A72273723CFFA38D973B02D"
taint               = ""
uevent              = <store method only>

Parameters:

Sections:

$ modinfo i915
modinfo: ERROR: Module i915 not found.
# # # => ???? # # #

$ mhwd
> 0000:00:02.0 (0300:8086:9a49) Display controller Intel Corporation:
--------------------------------------------------------------------------------
NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
video-linux            2018.05.04                true            PCI
video-modesetting            2020.01.13                true            PCI
video-vesa            2017.03.12                true            PCI

$ mhwd -li --pci --usb
> Installed PCI configs:
--------------------------------------------------------------------------------
NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
video-linux            2018.05.04                true            PCI

UPD
Warning: No installed USB configs!

# mhwd-gpu
:: status
warning: could not find '/etc/X11/xorg.conf.d/90-mhwd.conf'!

Kernel parameter look harmless to me (the “ibt=off” can go, I guess):

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash cryptdevice=UUID=:... root=/dev/mapper/luks-... resume=/dev/mapper/luks-... udev.log_priority=3 ibt=off lsm=landlock,lockdown,yama,safesetid,integrity,apparmor,bpf"

~~Then I updated to the most recent kernel; from 6.1.26 to 6.1.29. No black screen in the short time so far, but it had happened irregularly – it’s too early for a sigh of relief.~~
Updating to Kernel 6.1.29 did not help.
~~@David_Alexander : Could you see if your situation improves with this kernel update?~~

Matt_Hartley · May 22, 2023, 6:57pm

Fedora is best with suspend Fedora and I both do not support hibernate officially. That said, you can dig into the power handling for HDMI coming out of hibernate.

Using suspend, I have seen success (other computers) with usbcore.autosuspend=-1 as a boot parameter, but that is for suspend to ram. I have not explored this using hibernation (to disk). This of course, for HDMI expansion cards or other USB based docks, etc.

suliblian · May 23, 2023, 11:43am

I’ll look into autosuspend. Was not on my radar because the black screen appeared irregularly, and the resolutions available were changed.

Another idea, if power is involved: Maybe other usb power sinks lowered voltage too much?

Matt_Hartley · May 23, 2023, 5:30pm

Good plan, it’s something I have used from time to time.

Difficult to say, depends on what is connected perhaps.

suliblian · June 20, 2023, 5:33pm

This proved to be temporarily – possibly a coincidence with other changes. But which?

Solved for me.
(I use only the left front port for HDMI – if you find it doesn’t work elsewhere, give us a heads-up!)

~~It’s a rogue autosuspend.~~

~~Autosuspend is in the kernel, and apparently it doesn’t have the decency to leave the HDMI card alone.~~

~~It’s easy to correct, though:~~
~~1. Install TLP and deactivate everything in it if you use other tools (powertop) and want to leave them alone. If you don’t you needn’t.~~
~~2. Activate autosuspend, and make sure that the HDMI card (32ac:0002) is active in the autosuspend denylist.~~

~~Arch users will chose the hard way.~~

~~Its upstream USB root hub (1d6b:0002 for left front) ~~can be suspended, I didn’t see any bad effects.~~ may be suspended too, if nothing else is downstream from it.~~

~~edit: how can I add “[SOLVED]” to the topic?~~

Fraoch · June 20, 2023, 6:32pm

It’s up to @Matt_Hartley or @Loell_Framework to do this, so they can stop slaving away at it trying to find a solution.

Loell_Framework · June 21, 2023, 2:08pm

Hi @suliblian , great to know your issue is resolved, thanks @Fraoch for flagging, have changed it to SOLVED.

suliblian · June 21, 2023, 3:46pm

Thanks!

suliblian · July 5, 2023, 9:14am

It came back. Left front port only.

Black screen and bad display setting as in OP survive warm and cold boots. All USB devices on TLP USB autosuspend list denied autosupend. Only relocating the card into different port allows resetting to desired state.

Mod: please remove “[SOLVED]” in title. Thanks!

Matt_Hartley · July 7, 2023, 12:44am

At this stage, it’s worth creating a support ticket and linking to this thread. They’ll ask for logs and so forth.

suliblian · July 7, 2023, 10:44am

Will do. Thanks.

suliblian · August 17, 2023, 8:49pm

After a long back-and-forth with support, and lots of tests and tries (different kernels, various distro ISOs and installations, comparing with other ports – these things multiply), I have given up now.

The front left port is definitely a weak spot in this design.

And now it acts up in Windows:

EDIT: I also have the DIY edition.

Scott_H · August 17, 2023, 9:20pm

I agree with you that there’s some suspicious behavior from the front left USB c port on the motherboard. I have external display flickering issues when the FW HDMI adapter is in this front left port. I also have issues with a USB C to HDMI cable when using this port. The other three ports behave better.

Matt_Hartley · August 18, 2023, 5:18pm

@Scott_H reviewing your ticket history, the latest looks like Support has provided a solution going forward.

@suliblian I have also reviewed your ticket and noted that this appears to be Manjaro related and not affected with installed Fedora 38 or a Fedora 38 live ISO, and in Manjaro live ISO.

Therefore, Scott is on his way to a working solution and suliblian’s issue appears based on data provided to be related to the state of the Manjaro install. Per the notes I am reading and please correct me if I am mis-reading the cx notes here:

“The black screen issue does not persist when using an installed Fedora 38 or a Fedora 38 live ISO, and in Manjaro live ISO.”

I am merely setting expectations on how support came to the conclusion.

suliblian · August 18, 2023, 8:40pm

That about sums it up, but that the issue is Manjaro dependent was clear early on. Yet to make sure I went through the suggestions to test this hibernation triggered issue on ISOs (in hibernation, RAM content is written to disk, which is not possible on an ISO; this does not really test for the problematic behavior), also on an installed Fedora (which does not come with hibernation; ditto).

I’m still convinced that either the firmware treats the left front port in a way that makes it less reliable than the other ports, or that the underlying cause is a hardware issue – maybe some power lead that is too long or thin and on higher load peaks has too much resistance which then causes some logic part to fail, or maybe a flawed chip, or something in that direction.

And I curse myself for modding the mainboard so warranty on this part is void.

EDIT: I have since done a fresh install from latest ISO; issue persists.

suliblian · December 20, 2023, 11:15pm

Brief update: this is very likely a hardware issue. I could demonstrate for support on video(*) that the front left port USB-3 was dead, and a live shutdown of USB-3 of same on front rear. (USB-2 was OK.) Support agreed to send a replacement board that was installed today. Now wait and see…

(*) The recorded tests were set up like this:

Before testing, all Framework modules were removed to exclude them as a possible source of errors.
To check for an existing USB connection, the USB configuration was continually polled in terminal, running 10x/second the command lsusb -vt.
The output of the test device (if found on USB-3) was highlighted in terminal. Each poll output covered about half the screen, so two poll results with their highlights were visible at a time.
For the actual test, a test device was repeatedly plugged in all ports of the Laptop, and the terminal output observed. This was recorded in the videos.

=============

Below are the conclusions I also sent to support, appended to a very detailed report of observations, tests, and issues. Maybe they help others to hunt down a whack-a-mole issue. If you want you can skip to the last three paragraphs for the conclusion.

Note that the other users mentioned below were found researching the black screen issue. I now think this is an early sign of a failing USB-3 in connection with an HDMI module. I hope I’m wrong.

My former tests (some details can be found in the black screen support conversation) were done to find the cause of a black screen waking from hibernation.

They showed that I could boot without a black screen into an installed Fedora 38 or a Fedora 38 live ISO, and in Manjaro installation and live ISO (ISO boots loaded via Ventoy).

Fedora suspension was also OK. But Fedora has no hibernation OOB, so the actual problem could not be tested. Hacking around to enable hibernation would result in a non-standard configuration, so I did not go there.

Testing for post-hibernation black screen with an ISO is not possible; ISOs have no writable swap file or partition. The black screen originally happened only after hibernation, later rarely on boot also.

Searching the Community forum for clues I found other users suffering from the same issue. Some of them saw it later than I did – later begin of laptop usage, or varying time from deployment to issue manifestation?
Below are the respective forum posts. I did not search elsewhere; there may be others who posted outside the Community forum.

[TRACKING] HDMI screen black after wake from hibernation, only on left front port

David_Alexander, also on Manjaro. He already saw it at his post date with multiple ports. I had more affected ports later on.

Scott_H had the same issue running Windows.

Robin_Fruytier1, OS not mentioned

HDMI expansion adapter does not work in one slot - #2 by Robin_Fruytier1

Fabrizio on Windows11

HDMI expansion adapter does not work in one slot - #19 by Philip_Lawton

debuser on Debian 12, Debian Trixie, and Ubuntu live on both left ports (right not tested).

Philip_Lawton: “I am having the same issue in the same slot on the AMD 7040 MoBo”

So this issue manifests on machines running

Manjaro

SUSE Tumbleweed

Windows 11

Debian 12

Debian Trixie

Ubuntu live

an unspecified Windows

an unspecified OS.

– not excluding others. OSs 3, 7, 8 might be identical – Windows 11 --, though.

These are all data points indicating that the issue is not solely caused by an unfortunate choice of distros on my part.

Also, Ubuntu is an officially Framework supported distro and its upstream Debian is not immune, too.

Could the RTC rework play a causal role? But AFAIK, none of these users applied the RTC rework.
And if the black screen was related to the rework, it would have manifested immediately after the rework (and I for sure would have seen the connection and not complain two months later in the Community forum about it!).
For these reasons I think we can safely rule out a mainboard damage from the rework.

Tests with powered hubs for all power-hungry USB devices rule out that too low voltage is the cause.
So do the tests shown in the video – some ports work, others don’t. Unless the mainboard has voltage issues on some ports and not on others, that would be design or manufacturing flaw.
I think that at this point we can skip checking my earlier various USB setups. The issue manifests with a single stick and not even the FW modules plugged in.

Why some distros (Fedora, mostly Ubuntu) appear immune may be due to different driver versions they shipped. But…

A driver that affects a single port and then increasingly more must undergo an evolution – I never heard of something like that.

Also (albeit a really exotic race condition is theoretically conceivable), why would a driver affect the USB intermittently and do so on several rather different operation systems?

And Windows and Linux do not use the same driver (maybe the same code base, with the same compiler, etc. – extremely unlikely that this resulted in the same erratic behavior at runtime at so fundamentally different OSs).

Software is a dead end; thinking turns to hardware. A coincidence of several weak points, all of which have to be severe enough that together they manifest the issue. Possibly, driver or firmware or operating system form part of this causal nexus.

This sounds exotic, but does happen. As an example of such a complex hardware issue which has appeared in an apparently unsystematic pattern, you may have heard about the recent SanDisk SSD defects that appear after various times of use. Bad solder plus too small contact area between parts and board plus heat that caused the solder to form bubbles – this did not allow a clear pattern of breakdowns to emerge.
SanDisk Extreme Pro Failures Result From Design and Manufacturing Flaws, Says Data Recovery Firm | Tom's Hardware

The sample size of Framework users might be too small and/or they might have not yet used their machine long enough to make a clear pattern appear. IMO, the already seen affected boards point into that direction at this time, though. With longer use and more motherboards in use this would become more visible, I guess.

smkuehnhold · April 11, 2024, 1:00am

I have an eerily similar problem on NixOS 23.11 with the same mainboard. With two monitors plugged in (one DP, the other HDMI), I rarely get even one monitor to turn on after the first cold boot. And even after several tries, I am lucky to get even one. Hotswap also appears to be broken. When I manage to get a monitor to turn on, I sometimes do get the the wrong resolution reported for the other monitor that’s off, but not always.

I do on occasion see some errors appear in my logs/systemd boot screen(?), but I am not knowledgeable enough to know if these are just noise. Which ones appear (if any) seem to be mostly random. These errors in no particular are as follows:

i915 0000:00:02.0: [drm] *ERROR* [CONNECTOR:308:DP-1][ENCODER:307:DDI TC1/PHY TC1][LTTPR 1] Failed to start channel equalization
i915 0000:00:02.0: [drm] *ERROR* Link Training Unsuccessful
usb usb2-port4: Cannot enable. Maybe the USB cable is bad?

Google searching this issue in various ways leads to bug reports surrounding usb4/thunderbolt docking stations, but I have had no luck with any of the proposed solutions. I am plugging directly into modules.

I have also tried several cables, modules, and monitors with little to no change in behavior. BIOS is 3.17 with the latest firmware available via fwupdmgr