Uneven CPU thermals!

After applying Thermal Grizzly Conductonaut liquid metal, I am seeing scores of 16561 on 10 mins and 16785 on single run cold. Peak package power is 74w and seemed to settle somewhere around 58-59w on the 10 min test.
Prime95 was drawing 63w+ continuously.
I am also seeing much closer core temps, though core 4 is still the hottest same as before. Instead of a delta of 8-9c on average, it’s now more like 3-4c
This is significantly better than I was getting with a brand new laptop with original TIM.

I had already removed the plastic protector around the CPU area while testing previously, so I had a friend redesign it and do a vinyl cutout on his plotter. I can share the file with anyone who may need it if you message me. In theory 100c should be fine for standard vinyl decal material from what I’ve read at least. I’ll have to do some further testing there though.

All that being said, to anyone reading this, please do not use conductonaut unless you both have experience with it and can afford a new motherboard if you mess up. I guess the second point alone would be fine if you’re at a point a $1k board death doesn’t matter to you.

So I think I’ve found my solution for now, but I’m also going to continue testing for a while and see if it degrades over time like the original TIM. I’m guessing that ptm7950 may not be working due to a warped shim/inconsistent contact. The only other thing I could think of is poor bonding between the shim and the vapor chamber, which liquid metal may be helping improve by spreading the heat out more effectively.

3 Likes

I’m on Linux. Don’t think I can read per-core temperatures, but, is this bad? Sounds like maybe it’s bad.

The score appears to be pretty low, but it’d take more testing to see if it was thermal throttling or if it’s just configured wrong. I’m not yet familiar with power management on modern Ryzen systems in Linux. (I’ll be fixing that as soon as my new ssd finally arrives)

pretty sure it was thermal. I was running sensors k10temp-pci-00c3 acpitz-acpi-0 and this Tctl value read 100.1C while one of those ACPI temperatures read 99.9C. As long as those values aren’t completely wrong, something was getting hot.

Okay. If you are hitting 100C while only drawing 35-40w (what it appears to be on the screenshot) you definitely have an issue. You can see my results above and they’ve actually improved now that I’ve gone through burn in to let my cooler settle in with the material I used around it to contain the liquid metal. With PTM7950 and the original liquid metal I was limited much more, but still ahead of where you are. (for reference all my testing was done in ~69-72F ambient with decent air movement in the room, sitting flat on a table)

If you can verify you can’t draw 45w continuously without throttling, I would contact FW support honestly, because that’s below their advertised capability. I’m very sure they’re aware of this issue and from people’s responses here it seems to be a heatsink lottery and maybe worse on earlier shipments.

3 Likes

I’ve been running GTKStressTesting on Linux and my PPT seems to range between 35 W and 39 W from sensors. GTKStressTesting even says 33-34W. From sensors, it does seem to occasionally spike over to 40 W for one split second, but the average is way below that.

Is my CPU in the “bad” range? Do you think this is a good benchmark or should I try something else?

Getting very similar results here, with GTKStressTest and turbostat. PkgWatt seems to be the same thing as PTT and mine stays in the 30’s pretty consistently, too.

I managed to get Cinebench R23 to run on Linux using Lutris.

I am getting 13957. Plugged in with the 180W charger, Balanced power plan.


Could anybody more knowledgeable give me some feedback?

I would try it with the performance power plan. I’m not sure how it will compare with Linux, but on Windows, even though the average wattage and core clocks didn’t increase, for some reason my Cinebench scores did increase when I switched from balanced to best performance. I would give that a shot before coming to any conclusions.

2 Likes

Performance plan, my scores went up from 13k to 14k. So, that game me a free 1000 points or so!

I am still very confused at the reading, though.
If CoreWatt is anything to go by, I reach a CoreWatt of 49. Which is strictly better than 45 and it implies a cooling solution is great shape.
For whatever reason, PkgWatt stays at 37-38W even in high performance.

I am puzzled. How is it possible that the total package wattage is lower than the core wattage? It’s almost decidedly user error on my part mind you - I’m not sure if I’m reading the correct amounts here.

Tried again with the balanced profile on a cold boot, and things seem to be better!

From a cold boot, it seems to start off at 42.7 W, and it gradually deteriorates averaging 40 W under load as the CPU (temp4 in sensors) hits 100C and starts thermal throttling.

This is after the test has been running for a while and the CPU hit 100C:

Went from 14149 to 14612 - slight bump from yesterday, and firmly out of the 13xxx’s. Interestingly, 14149 was achieved with the performance plan and 14612 was achieved in balanced mode.

I have then tried running stress-ng --matrix 16 and I got similar behaviour: balanced power plan, started off at 55.93 CorWatt and 41.42 PkgWatt and it finally landed on 54.65 CorWatt and 41.42 PkgWatt, which looks better.

Very comparable if not only a touch worse results with the performance power plan:

It’s not in the 30’s most of the time anymore, at least. Would you say this is finally a good result? There is a known bug with CPU boost and wake from sleep on Linux, so I am wondering if last night’s assumption that my cooler was out of spec was actually, instead, a manifestation of that particular sleep bug.

EDIT: That didn’t last long.
I waited a few minutes and ran another test:

And got a measly 14083:

Which is the lowest score under high-performance I got.
Looks slightly better on stress-ng but still worse:
image

Make sure you don’t run anything besides the benchmarks. Even having Firefox open with a couple pages will ruin the results!

Holy crap, my 7940hs may be a golden sample then. I am able to (and currently) run a stable undervolt of -30 and have only ever gotten over 16k once when I first got my laptop with a -25mv undervolt. I think I am going to try out the ptm pad and see if I can bump performance. I boost to 60 watts for a few seconds, sit at 55 watts for about 15 seconds, then settle at 45 watts for all core at 100c for multi core on r23

1 Like

Good idea! I tried to do a couple runs of Cinebench R23 with it being open exclusively (and Lutris of course, which is required to run it). The second one was after a cold boot.

First run, averaging 38 W power draw:

Ended up at 14479 points.

I then powered down the PC, let it cool a bit and performed another test cold-booted:

Similar power draw here, ended up at 14453.

The results I got with matrix also seem to be comparable to before:

sensors here indicate that I am hitting thermal throttling for a power draw of about 40 Watts.

Currently trying to gauge where I stand here. It seems to be strictly better than the folks who cannot get above 30-33W, but I am not hitting the target of 15k and the clocks do not go above 3800 MHz.

Thermal Grizzly PhaseSheet, one-week update.

10 minutes from cold, 16486 pts

single run after the 10 minutes, 16941 pts

10 minutes from hot, 16775 pts

I think that this is the winner for me. There’s no need to replace the pad with something else.
Maybe if they release a better heatsink, I’ll buy that in the future.

3 Likes

I was trying to test this using a Fedora live USB, anticipating the email from support… I now have significant doubts about the accuracy of wattage reporting from Linux. I noticed, when I run y-cruncher as root, it renders the system pretty much non-responsive, by default. At the same time, sensors reports on the order of 10 watts used, while I’m seeing a, maybe, 40~ watt bump from a hardware USB power monitoring widget. If I run y-cruncher not as root, sensors reports some 36 watts. But if CPU load affects the power stats… the stats probably aren’t at their best during heavy multicore workloads.

So all I’ve really got to go on is the Cinebench number, and I don’t really trust that, running it in wine as I have been. I can’t easily get that going in a live environment, either.

Also, anybody know what incantation causes Fedora to give you a copy of turbostat? The kernel devel package doesn’t seem to come with turbostat.c. That would be too easy.

(it’s the kernel-tools package)

I’m not seeing that same lowered reporting of power from turbostat, though for some reason it is telling me that it’s only pulling close to 31 watts now… Maybe that was the sleep bug. Guess I’ll mess with this more later.

So do we know if the RMA boards ship with LM again or PTM?

My 16 runs HOT and the fans are constantly ramping so need to look at this test tonight.

Running anything as root in Linux essentially bypasses the OS scheduler if I recall, similar to setting the process to “realtime” priority on Windows. That’s likely why the rest of the system has a hard time even updating sensors.

That’s not quite it. Linux also has process priorities, referred to as “niceness”. y-cruncher defaults to maximum priority when you run it as root (opposite what its docs say). It gets a priority of 0 (default) if you don’t run it as root. you can

ps ax -o pid,ni,pcpu,cmd 

or so to see niceness. Though, on my machine I only see a single column for it, not the full value. it’s a - though, which is telling, with lower numbers being higher priority.

Seems like whatever method sensors uses to calculate wattage is susceptible to being preempted and giving bad values as a result.

As seen in the Thread i removed the Stock Heatspreader from the Vaporchamber and i am currently running a 20x20x0,8mm Coppershim between PTM7950 Sheets. The Change is now about 2 Weeks old, currently no Degradation. Peak TDP is currently 78,6w after a Cool Boot going down to 54w sustained (PPT Limit) without even touching 100C under sustained TDP on any Core. (stock Settings - Maximum Performance Powerprofile) I am getting above 16k Points in CBR23 when running Back to back single Runs. I touch 100C and the Thermal Limit if i use X86 Universal Tuning Utility and overide the PPT Limits (Premade Profile - Extreme/Performance) Then its riding the Thermal Limit with 100C from above 70w TDP down to above 58w TDP sustained and i hit about 16,5k Points consistently. Bear in mind i only run the 7840hs not the 7940hs.

2 Likes