Uneven CPU thermals!

Thermal Grizzly PhaseSheet, one-week update.

10 minutes from cold, 16486 pts

single run after the 10 minutes, 16941 pts

10 minutes from hot, 16775 pts

I think that this is the winner for me. There’s no need to replace the pad with something else.
Maybe if they release a better heatsink, I’ll buy that in the future.

3 Likes

I was trying to test this using a Fedora live USB, anticipating the email from support… I now have significant doubts about the accuracy of wattage reporting from Linux. I noticed, when I run y-cruncher as root, it renders the system pretty much non-responsive, by default. At the same time, sensors reports on the order of 10 watts used, while I’m seeing a, maybe, 40~ watt bump from a hardware USB power monitoring widget. If I run y-cruncher not as root, sensors reports some 36 watts. But if CPU load affects the power stats… the stats probably aren’t at their best during heavy multicore workloads.

So all I’ve really got to go on is the Cinebench number, and I don’t really trust that, running it in wine as I have been. I can’t easily get that going in a live environment, either.

Also, anybody know what incantation causes Fedora to give you a copy of turbostat? The kernel devel package doesn’t seem to come with turbostat.c. That would be too easy.

(it’s the kernel-tools package)

I’m not seeing that same lowered reporting of power from turbostat, though for some reason it is telling me that it’s only pulling close to 31 watts now… Maybe that was the sleep bug. Guess I’ll mess with this more later.

So do we know if the RMA boards ship with LM again or PTM?

My 16 runs HOT and the fans are constantly ramping so need to look at this test tonight.

Running anything as root in Linux essentially bypasses the OS scheduler if I recall, similar to setting the process to “realtime” priority on Windows. That’s likely why the rest of the system has a hard time even updating sensors.

That’s not quite it. Linux also has process priorities, referred to as “niceness”. y-cruncher defaults to maximum priority when you run it as root (opposite what its docs say). It gets a priority of 0 (default) if you don’t run it as root. you can

ps ax -o pid,ni,pcpu,cmd 

or so to see niceness. Though, on my machine I only see a single column for it, not the full value. it’s a - though, which is telling, with lower numbers being higher priority.

Seems like whatever method sensors uses to calculate wattage is susceptible to being preempted and giving bad values as a result.

As seen in the Thread i removed the Stock Heatspreader from the Vaporchamber and i am currently running a 20x20x0,8mm Coppershim between PTM7950 Sheets. The Change is now about 2 Weeks old, currently no Degradation. Peak TDP is currently 78,6w after a Cool Boot going down to 54w sustained (PPT Limit) without even touching 100C under sustained TDP on any Core. (stock Settings - Maximum Performance Powerprofile) I am getting above 16k Points in CBR23 when running Back to back single Runs. I touch 100C and the Thermal Limit if i use X86 Universal Tuning Utility and overide the PPT Limits (Premade Profile - Extreme/Performance) Then its riding the Thermal Limit with 100C from above 70w TDP down to above 58w TDP sustained and i hit about 16,5k Points consistently. Bear in mind i only run the 7840hs not the 7940hs.

2 Likes

The way yours is acting with a new shim with ptm compared to how mine acted with the original shim with ptm really makes me think that it’s an issue with how the two pieces are joined together from factory. I am surprised that my laptop being a more recent build didn’t have a new heatsink design with this issue resolved. Surely they’ve figured this out already? Maybe it’s still just a heatsink lottery?

That makes sense. I haven’t run linux desktop for at least a few years now and I’m just trying to dabble in it again so I’m a bit rusty.

I can say that sensors also do not report correctly in windows when running a benchmark with “realtime” priority, though the benchmark numbers seemed to be better in the end once the system started responding again.

Mine was a Batch 20 shipped in June 24 and i got an Board RMA with a more Recent Board with the added Keyboard Deflection Kit in July. We suspect it is:

  1. the Shim/Heatspreader bonding to the Vaporchamber is Crap and
  2. the Liquid Metal running off.

Those are the Main Culprits that cause the Degradation.

The Shim bonding is an Issue Framework knows of since the Preproduction and Press Units as they Said they had about 20% of the Units Underperforming.

So the Liquid Metal has to be revised, and Coolermaster as the Heatsink Manufacturer has to redo their Heatsink and the QM on the Unit Assembly has to be revised to find Units with higher Heat Resistance.

Oh wow.

12210 in CB R3.

Core 4 locked at 100c, lowest was core 1 at 73c, and peak power 34W.

image

image

image

Thats RMA time, right?

(edit - raised a support ticket)

Thanks for creating the ticket!
The more we report this, the more they’ll need to consider fixing it.

I hope you’ll be lucky and get a good mainboard and heatsink, but I think the issue is still there.

2 Likes

New Record !!! (but not in a good way)


Seeing this thread made me test my PC (I had some doubt because mangohud nearly always reported thermal throttling in game but I didn’t had the courage to test it)

Temps while running cinebench r23 multicore
Framework 16 Batch 1 with GPU module and r7 7840HS
Peak voltage 36 watt

i got an RMA reply from support. new board is coming soon. mine was sitting core 5 100c and core 1 50c. 34w…

For everyones info, the first contact with support is as follows:

1 Like

Its always different Questions, depending which Support Operator is working on your Ticket. I had a 7 Question Form with similiar and other Questions.

This is the list of questions I had when I opened my ticket:

Follow up questions:

1 Like

Round 2

  1. Kindly remove all the expansion cards and try to test again to see if there will be a different result.
  2. Do you feel the air blowing out of the vents?
  3. Could you also please tell us if you are using the dGPU module for this test?
  4. Kindly test on both the dGPU module and expansion bay shell to see if there will be any difference in the result.
1 Like

You have been send to the Person who doesn’t have any Clue. Its the Script they sent out when the Issue is in the wrong Department. But yeah you will get sent to the Right one after some Rounds.

1 Like

What is the best method to find out if your board is from a defective batch? Are there any tests that can be run to identify if its defective? I am not experiencing any issues with thermals with my machine (shipped in September) it is 16 with 7940HS. No fan noise or any indication of overheating. I don’t have the eGPU is that why? I am now paranoid I have no way to determine if my board is defective or not!

I shared a list with some predefined questions from the support.

Try to follow all the steps and write a detailed email when contacting the support team.

1 Like