[TRACKING] Framework 16 & Linux 6.9.0-rc4/rc5 - extreme screen flickering - anyone else?

I mean to me it sounds like there is some fundamental issue with Freesync that got introduced in 6.9. It’s better to find and fix the fundamental issue otherwise we’re “fixing” this “just” for Framework 16 and not all the other monitors and panels that support it.

Agreed. I just don’t know yet how to help finding the freesync issue.
Suggestions?

Ps. in the mean time I am performing a clean complication of 6.9.0-rc5.

I mean the steps I said where you apply the patch to make freesync work for each step of the bisect are probably the way to go. It’s really not any more tedious than a regular bisect.

To confirm, a clean 6.9.0-rc5 with commit 2f14c0c8cae8e9e3b603a3f91909baba66540027reverted gives at least a working system, probably without Freesync.

I will start the bisection with applying the patch before the build, in essence enabling Freesync for the Framework 16 laptop, starting at tag v6.8 and ending at v6.9.0-rc4.

This is kinda a shot in the dark, but there is another report of flickering with VRR. The dates don’t really line up though as you’re seeing it in 6.9 and they have a revert even to 6.8.7.

7900 XTX flickers on kernels >= 6.5 (#2904) · Issues · drm / amd · GitLab

But if the bisect is a bust you can see if that revert in the last comment helps.

Hey folks. Let me drop my three cents here.

First and foremost, it’s impossible not to appreciate the fact that we’re able to identify the problem and get in touch with developers directly to discuss and fix them. This is what makes this project extremely unique and frankly, this is a reason I decided to pay more for FW16 instead of going with a new ThinkPad as I used to. If this model continues you’re gonna keep me as your client for a long time (and others as well, surely).

In all this awesomeness I find the last exchange of messages rather confusing. Arthur has found a legitimate issue that is likely to bite us directly or indirectly in the closest future. It costed his time, pushed him to learn new skills, depending on the person this may be a good or bad thing. The point is that he did the effort for the community. I don’t like the fact the he met a slight pushback with suggestion that he failed the bisect, when the offending diff is not only clearly touching the area of VRR, which absolutely may affect the screen flickering, but was also submitted exactly for those laptops.

If I may suggest, the correct course of action shouldn’t be to push Arthur to find the potentially correlated issue (assuming there is any, as we don’t know for sure), but rather reproduce the issue he has and tackle it professionally. I bet that there are others here who’d be willing to help (me included, I haven’t received by FW16 yet though).

I understand that unsupported distro may be of an issue here, but kernel in general is distro-agnostic and if there is an issue it will eventually surface. Not to say that “supported distro” is a very flexible statement when you can simply compile/upgrade the kernel in Fedora or Ubuntu by yourself.

Maybe I’m stepping ahead the line here. I don’t mean to fingerpoint or criticize anyone without knowing the broader context, I would love to see this issue resolved though as I’m also a kind of person who likes and must experiment with -rc kernels from time to time. It would be very reassuring to know, that potentially found bugs are treated appropriately.

1 Like

Not at all. Everybody is welcome to join on the subject, including your view. :slight_smile:

If I wouldn’t have posted this finding here in the Framework community, then I would have done it on the Linux Kernel Mailing List, since it is seems to be a kernel regression. This is a Linux / opensource mentality and one of my donation to the opensource community and hopefully to the rest of the world.

I am fine to do the bisecting, although it is a long… :smiley:
As long as I feel the ambience is good and we all want to move forward, then I am fine.

The thing that surprises me is that I nobody else has tested kernel 6.9.0 with a Framework 16 laptop and reported the issue. Community effort is helpful, but I would expect (or hope) that Framework is actively testing their products on existing and new software to come. I bet it is a capacity issue and there are probably higher priorities. It is impossible to test every piece of combination of hardware and software, so FW has to make choices.

Framework is still a startup and it needs to grow in all directions, including testing for issues like this. I appreciate their concept, mindset and openness. The startup and mindset combined make me willing to help out from the community, including this bisect.

Let’s hope this second bisect will show the root cause of the Freesync/VRR issue. By now I am halfway the bisect. I expect to have the result within a day.

Sharing love and beer appreciated. :smiley: (I really should stop drinking espressos)

Your reply very well reflects my concerns and hopes :slight_smile: .

I just realized that I do have 5900HX box and monitor that supports FreeSync + Arch installed on it, so I may give it a try. However not the same GPU/screen, if there’s a deeper issue with the VRR support it may very well explode, what would establish a good quality signal.

Could you provide a repro steps? Have you just installed and ran linux-mainline, without any additional params? Optionally, could you provide the amdgpu parameters (I assume that defaults may differ between GPUs), they can be read via sysfs interface in case you’re not familiar with the concept:

sudo grep -rn . /sys/module/amdgpu/parameters/

Yes.

Install linux-mainline, boot linux-mainline. That’s it.
AFAIK, no custom parameters, all default.

The second bisect is finished.
Below is the offending commit.

Does this make more sense and is it helpful?

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.9-rc5&id=5950efe25ee02df4983864b3bc1f460ad5c8d862

arthur@pb450:~/.cache/yay/linux-mainline/src/linux-mainline$ git bisect good
5950efe25ee02df4983864b3bc1f460ad5c8d862 is the first bad commit
commit 5950efe25ee02df4983864b3bc1f460ad5c8d862
Author: Tom Chung <chiahsuan.chung@amd.com>
Date:   Wed Dec 6 22:07:51 2023 +0800

    drm/amd/display: Enable Panel Replay for static screen use case
    
    [Why]
    Enable the Panel Replay if eDP panel and ASIC support.
    (prioritize Panel Replay over PSR)
    
    [How]
    - Setup the Panel Replay config during the device init
      (prioritize Panel Replay over PSR).
    - Separate the Replay init function into two functions
      amdgpu_dm_link_setup_replay() and amdgpu_dm_set_replay_caps()
      to fix the issue in the earlier commit that cause PSR and Replay
      enabled at the same time.
    
    Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
    Acked-by: Alex Hung <alex.hung@amd.com>
    Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  |  42 +++++++-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c |  59 +++++++---
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_replay.c   | 119 ++++++++++++---------
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_replay.h   |   4 +-
 drivers/gpu/drm/amd/include/amd_shared.h           |   1 +
 5 files changed, 157 insertions(+), 68 deletions(-)

Reverting commit 5950efe25ee02df4983864b3bc1f460ad5c8d862 on the bisected and patched kernel makes the system boot again.

I will also build another clean 6.9.0-rc5 and revert only the above commit, to verify that this is an offending commit.

I will report back when finished.

I am unable to revert commit 5950efe25ee02df4983864b3bc1f460ad5c8d862 on any of the 6.9-rc kernels. It results in several merge conflicts.

Yes this is more sensible. Can you please raise a ticket with the details on AMD Gitlab? I’ll ping some people with it.

Also if you use the amdgpu module parameter to disable psr on an otherwise broken kernel does does the issue go away?
I think it will also affect panel replay.

It’s a blast Arthur, thanks for tracking this one down.

I can’t find an amdgpu boot parameter by keywords psr, panel, or replay.

https://www.kernel.org/doc/html/latest/gpu/amdgpu/module-parameters.html

What is the parameter/value I need to set?

Sorry I should have included that, not everyone knows it.

amdgpu.dcdebugmask=0x10

Nothing to add, but please do keep us updated here and link to the AMD Gitlab once you have it. Thanks

This did not prevent the issue from happening. I have tried it on a clean 6.9.0-rc5 kernel.

Ah, looking at the code I was wrong there; is a separate debug option for panel replay. Try amdgpu.dcdebugmask=0x400

1 Like