Occasional hard reboot in Zoom or YouTube

It happened this morning when running separate from the dock. I wasn’t watching video or anything, just editing a file in a terminal when it suddenly rebooted.

The spreadsheet is up to date with the hours leading up to the reboot: system-monitor - Google Sheets

I added a chart on the second tab. It’s apparent in the chart that something was ramping up prior to the reboot, but I haven’t dug into the data to see what specifically. EDIT: There doesn’t seem to be a smoking gun. CPU and GPU were essentially idle. Temps were rising across the board but nothing dramatic.

The only additional data point at the moment is that I had plugged in the charger a little while before the unexpected reboot. I’m not sure whether that corresponds with the temperature ramp, but it might.

Here’s the average temp for the period of time leading up to the reboot this morning. The start of the rise at the end might correspond with plugging in the power, but I’m not sure.

Hi.
Thank you for the raw data.
When you plug the PSU in, there is normally a short dip in performance for about 1 seconds.
Then the PSU updates the PMF profiles to allow more Watts.
Plugging in a PSU also increases the PCIe bandwidth.

  1. On PSU: PCIe Gen4
  2. On Battery: PCIe Gen3
    It is lower on battery to save power.
    So, I think that probably explains the rise in the graph after you plugged in the PSU.
    All the RAM temps are below 50 C, so that is good.
    It discounts one theory, that is might be RAM chips getting too hot, causing problems.
    Too hot is > 85%, so it is no where near that here.

Was the reboot instant again? I.e Editing a document and immediate reboot, or was there a delay? I.e. Freeze for about 20 seconds, then reboot. I guess that as you “just stepped away”, you might not be able to answer that.

In this case I didn’t step away. I was in the middle of typing. The reboot was effectively instantaneous… if there was a freeze before rebooting, it was only about a second.

It happened again last evening. This time while I was watching YouTube.

Here is the chart. The only thing I see is that GPU activity was up, which makes sense since I was watching YouTube.

Here is the bottom of the ectool console after rebooting:

PORT80: 3F90
PORT80: 3F94
[12389326.562400 charge_request(16528mV, 0mA)]
PORT80: 3F74
PORT80: 3F70
PORT80: 3F90
[12389326.812900 charge_request(16520mV, 0mA)]
PORT80: 3F94
PORT80: 3F74
PORT80: 3F70
PORT80: 3F90
PORT80: 3F94
PORT80: 3F74
PORT80: 3F70
PORT80: 3F90
[12389327.370700 HC 0x0002]
[12389327.373100 HC 0x000b]

Please let me know if there’s anything else I can do to help debug.

Thank you for the data capture.
Some things to check.

  1. how close to the forced reboot (FR) is the data capture? I.e. are there any lost data, that did not get written, just before the reboot.
  2. what video resolution where you watching at the time of the FR.

Based on your data.

  1. the cpu and gpu were less than 50% utilised at the FR.
  2. the EC did not reset.
  3. a video was being played
  4. the FR can happen without playing a video, but it happens far less often than when playing a video.
  5. the RAM temp was less than 50 C when the FR happened.

My theory, that i have not been able to disprove yet, is that maybe it is caused by a bitflip when reading from ram.
Playing video obviously accesses a lot of ram for all the video frames. We see more ram access makes the FR more likely to occur. Maybe a gpu reading from ram uses a different data path than cpu reading from RAM, thus why tools like memtest86 are not detecting it.
But counter to that, one would expect to see more display artifacts if that was the case.

Another think to consider is whether a misbehaving PCIe device can cause this FR. I guess only AMD could answer that.
We have the same SSD, but different wifi card. So maybe the SN850X is causing this. I don’t know how to prove that or not.

Summary:
Not really much closer to a cause. Just some new things ruled out.

1 Like

Thank you, James. I’m worried that my laptop is unstable and we won’t figure it out, but your continued engagement gives me encouragement.

About your questions:

  1. how close to the FR is the data capture?

I think it is probably close. I’m polling every 5 seconds and the process that writes the CSV flushes after every line. I could increase the polling frequency if that would help, but there is a lower limit of approximately 1 second because of the sub-processes that collect the data.

  1. what video resolution were you watching at the time of the FR

I think it was 4K. In this case I had two 4K 60Hz monitors connected to the dock, and I was watching the video on one of them. This is my most common setup. Including the laptop panel, it is three screens.

I’m somewhat encouraged by the fact that the FR happens even when not using this setup, though, since it seems more likely to be solved if it’s not specific to my display configuration.

Maybe I should run memtest86 overnight to make sure it doesn’t detect something.

@Aron_Griffis
I am just another user like you.
The main problem with this Forced Reboot (FR) is that Framework have not been able to reproduce it.
So, anything we can do that makes it more likely to appear will help.
Along the way, we are at least finding out what is not responsible and what does not contribute to the likelihood of the problem.

1 Like

Hey, I’m coming from an issue I posted: Freezing During GPU Load. I find that this issue is very similar to mine, but my laptop freezes instead. After you talked about the issue, possibly being about the path of the GPU reading from memory. I had the idea to swap the slot my memory was in.

I have a 1x16GB 4800 from Crucial, and it is normally in slot 0 (Left). When it is in that slot, I get freezing under load. However, when I switched the stick to slot 1 (right). The freezing turned into instant rebooting.

Maybe try switching the slot your RAM is in?

Interesting! I have two sticks but maybe I should take one out and try this experiment. I’ll consider keep you posted. (Also, my RAM is DDR5-5600)

I might have had a similar issue, writing this in the hope that it helps you diagnose yours.

I was using a Framework 13 on Fedora GNOME (similar enough to Bluefin as I understand). Rather than an instant reboot, for me it was more an instant shutdown. Screen went black and power off, then I would have to turn it on again, it did not try to reboot. The issue did not feel linked to a specific program for me; you mentioned video but later mentioned that it happened once not linked to video. For me it just happened if I kept the computer on for a long enough time (~4+ hours), it never happened soon after I turned on the computer.

For me the issue seemed linked to GNOME, as I believe it went away when I switched to a different desktop environment, and it never felt linked to a specific program or even CPU/GPU usage. My guess was memory leak, but I didn’t investigate it closely enough for that to be an educated guess.

If that sounds similar, then temporarily changing desktop environment is something to try to pinpoint the problem; though I know that can be a disruptive change particularly on Bluefin.

How repeatable is the difference?
Slot 0 - freeze FTH
Slot 1 - reboot. FTR
I think 5 times would be conclusive enough.

Before I made my post, my first two issues after placing the RAM in Slot 1 were both instant reboots. However, after seeing your message, I tried 4 more times.

  1. Instant reboot
  2. Freeze (no caps lock flashing)
  3. Freeze (no caps lock flashing)
  4. Freeze (caps lock flashing)

So, I was just “lucky” with those first few crashes in getting an Instant reboot, but it seems the Slot, probably, doesn’t matter.

Also, the kern.log and syslog files both display corrupted data right as the freeze happens. Displaying long strings of “\00\00\00\00\00\00\00\00\00\00\00\00\00\”.

1 Like

I followed this link to see what was being reported. I’m not experiencing what is being described on this thread but in that Github tracker someone posted that there were 3 scenarios being reported and mine was the third:

3. the screen goes blank, but does not power off and does not reboot.

I haven’t been able to repeat simply because I’ve just avoided using it but I went into an interview some time right after upgrading to the HX 370. My computer froze right after the interview started with a blank screen, or so I thought. He could see and hear me and I could hear him but couldn’t see him. We conducted the interview that way. This was using Zoom in Firefox.

@Matt_Hartley Can Framework offer any insight or debugging suggestions? (I just updated the original problem description at the top, in case you haven’t been tracking this.)