Uneven CPU thermals!

Hi All,

Just wanted to post my experience with the “shim/ptm sandwich” (cpu->tpm->copper shim->tpm->cooler/vapor chamber) method.

Core 4 is my heater and I haven’t seen it go above 97C after 30 mins of torture. 55W steady.

4 Likes

I love that many folliwed my lead. I just sacrificed 2 Heatsinks :winking_face_with_tongue: My tinkering made the way for steady performance as nearly anyone can do the sandwich in under/about an hour.

3 Likes

Yes! And thank you :slight_smile:! Definitely made it into a new machine in my case :100:.

Hola
My recent venture in switching over to the PTM pad:
Background: Typical scenario like so many have posted, I was getting thermal throttled with 1 core hitting 100C and the next closest one hitting around 88C. CineBench r23 scored didn’t go over 12,500. I didn’t check for power draw during those tests.

Switching over to PTM pad - Notable items worth mentioning:

  1. Watched Allen Dev’s video - Youtuber showing his venture into using the PTM pad was super helpful in giving me a real overview of the task on hand including some of the trickier aspects to be looking out for (like removing the heatsink). Video: https://www.youtube.com/watch?v=6uooJGSmWkU&lc
  2. Using a hair dryer - Using the wife’s hair dryer for like 45 secs made the removal of the heatsink very, very easy. I am very surprised that Framework has not put this in their guide!
  3. Beware of extra sticky plastic layer on the CPU heatsink - On the CPU heatsink there is a black sponge that surrounds the central rectangle plate that needs to be removed. However, in my case, when I removed this black sponge I also noticed that there was an additional thin transparent plastic layer that remained on the heatsink, and I didn’t see anyone mentioning this before (maybe for most folks the extra plastic layer was removed when the black sponge was peeled off). Pictures below.

Aftermath:

  1. Sustained power under heavy loads / benchmarking - Laptop is now able to sustain around 54W of power during heavy loads / benchmarking with CB23.
  2. Thermal limits - I am now rarely seeing any of the cores hitting 100C, I seem to be hitting power limits rather than thermal constraints. Either way, all cores are all within 5 - 6 C during this heavy operations.
  3. CB23 score - Able to get around 15,800’ish.

Pictures:

6 Likes

For me, that black sponge barrier layer stayed on the motherboard itself, surrounding the cpu. I did mine fairly early after getting the machine though.
It could be over time the heat cycles ends up having it stick to the heatsink for some.
If it’s not impeding the cleaning or contact of the shim to cpu die, you don’t really need to remove it.

I’m going through this myself right now. One of the support replies wants me to take the entire laptop apart to drain the power to “reset the motherainboard” and it’s like :face_with_spiral_eyes: “…But… that has nothing to do with thermal checks? You had me Live USB Fedora Ubuntu for your psensor and s-tui [Fedora doesn’t have psensor in their repos, BTW] to show you what my Arch (which I use, BTW) install showed you?”

Like, they have a replacement part/“fix” that they offered until two weeks ago. Which is why I mentioned it (being part of the batches that might be affected by it) since my warranty is about to expire and this support ticket chain has been a huge fiasco and “you guys should have this documented already since you offered a part and I just wanted to know if I am in the affected batch and if so a replacement board and/or fix for it is possible before my warranty expires?”

5 Likes

Are you saying they are no longer offering the PTM fix to customers?

Your experience sounds like mine but before they were offering the PTM fix. They were just making me dance like a monkey until my warranty ran out.

2 Likes

I would expect the warranty to apply for an issue that was reported during the warranty period. Was your experience different from that?

Never got that far to know. Support just kept making me send them screenshots proving the cpu temperature issue over and over again over a span of a couple months. Then as soon as my warranty period would have expired, they told me I needed to wait for the PTM send out from Framework to resolve the problem and that was it.

I ordered it, but since they were taking even longer to ship that out, I just ended up buying the stuff with my own money and replacing the liquid metal myself.

That’s not a very good experience. I took the plunge and did the rework with a shim once Framework released their instructions on how to remove it. So far it is working ok.

Until my most recent encounter I have had good experiences with Support. In this case it is not bad, just inefficient and with some errors. I got a 12" and after maybe a half-dozen reboots the fan is not detected on boot, so I have to push the power button to continue booting. The fan does work if I run fw-fanctrl, so I have that going and the machine is usable, fortunately. If the machine were not usable I would be less patient, as I initially contacted them on Monday evening, 6/30. After some back and forth they decided that the fan module needs to be replaced, so they shipped me one. That arrived yesterday. But, it’s the module for the 13" 12th gen board, not the 12" machine. So I’m back to sending pictures of the wrong part that was shipped out so that I can get the right part. Hopefully the fan is in fact the problem, otherwise it will be back into the loop again presumably to get the motherboard replaced.

So, this episode has not been as good, but I am trying to stay patient and work through it.

Following up on this, there was a firmware update for the FW12 that Support pointed me to, which has resolved the issue. Fingers crossed. I am not sure if I missed a post about it or if there was not one, but whatever the case, the system is fully working now. Thank you to the Support team and to the Firmware team!

1 Like

This. I don’t think they’ve sold the PTM on the marketplace themselves, but going by that, they stopped selling the Thermal Pad June 26th-29th.

And I messeaged them the day or two after that (30th-31st) and have been given the runaround “do this, document it for us” stuff that is driving me insane. I have stopped responding, I’ll probably respond and go “look, I don’t see why stripping the entire machine, draining it of power and then putting it back together will fix the thermal issues. If you guys do not know what systems are affected: ok. I was simply asking if there is a fix since I’m about (am now out) of warranty so I wanted to check if I was affected and if so get that fixed before the warranty expired.”

My experience with the support as a break/fix person has been 50/50. I get the need for scripts, but outside of ONE tech response, I feel like they’re just following a script without understanding the actual issues I brought up with them.

3 Likes

Hallo you all, just wanted to add some images on the replacement of my thermal pad. I am glad that I have a full electronics lab at my disposal as cleaning the processor and cooler would have been difficult.
It really looks like the original liquid metal pad was burnt. I do not know how this is possible. I also send a message to framework support on this, hopefully they can provide some extra information.
Before the replacement my R23 scores where around 11k and the power draw of the processor reached around 30W. after the replacement I reach just over 15K and around 48W (with peeks going to almost 60).
The first two images are before cleaning


(I can only add 2 images at a time so hopefully I can add the rest to the next posts)

The cooler cleaned up nicely


Some really burnt spots on the processor took quite some time to clean.

(see next post for more)

Some more images of the “burnt” spots


(see next post for more)

edit unfortunately I can not add more posts.

Even after all the cleaning there is still some staining that is impossible to clean. Used quite a lot of IPA.

4 Likes

Thanks for those great macro shots. The Burns look pretty severe but should be as bad overall. The Silicon isn’t whats computing, you could event lap the die without harming anything. has been done often in the enthusiast scene. Keep it like it is now, great performance improvement.

1 Like

I wanted to share my own experience and crowdsource some thoughts about my switch to PTM from LM as I’m now having driver issues (and I suspect I need to buy a new PTM pad and reinstall).

I was Batch 20 and got the free PTM pad from the request link, and was originally hitting ~13.8k in CB23 at best and ~826 in CB24 while being stuck at 39W at best. All in all things operated in an acceptable manner for the most part other than fan noise was higher than expected at some points.

While the replacement operation went well overall, I did need to remove my RAM to get the heat spreader off without bending anything (cleaning the spotty and crusty LM was a chore, but everything went as expected), the pad I had to apply was slightly short horizontally and slightly taller vertically compared to the die.
First boot everything went well, benchmarking was pretty OK:
CB 23 - ~15.2k, +1.3k or so
CB 24 - 877, +51
Wattage was averaging around 42 but could spike to 45.

Played a round of Halls of Torment to test built in graphics were going OK - got some AMD driver warnings, but it played ok.
Booted Monster Hunter Wilds to ensure that worked, seemed to work as intended.

However, upon, next boot the next day, it hung completely (seemingly after the FW logo but before Windows 11 loaded). Hard power cycled - loaded with built in GPU drivers semi - working and DGPU was disabled in device manager. I re-enabled the DGPU, rebooted, and everything is working as expected - except the same driver warnings on game boot.

After the hard power cycle shenanigans again, I did more benchmarking to get similar scores, did not boot any games (and therefore got no driver warnings), shut it down, and upon next boot still had the same occurrences, except I got a glitched screen before I hard rebooted (colors are similar to my Windows 11 lockscreen).

Reseating the DGPU did not help and at one point I noticed even the built in drivers fully failed to load instead of partially loading with errors. Gave this a dozen or so heat cycles to see if the PTM would settle before I took the heat spreader off again.

I also reinstalled the FW drivers, which did not resolve things, and then updated to the AMD latest ones which I was running without issues prior to this operation, which did not resolve anything.

I reopened the heat spreader again to nudge the PTM around (thinking I may have needed to redistribute the potential vertical excess to the horizontal), and noticed the (transistors?) got PTM on them on the vertical dimension, but the spread horizontally was rather fine and even spread over the die. The picture below was taken before I tried to cleanup a bit (I don’t have a spare pad and therefore did not reapply IPA to the transistors, but I did use a spudger to cleanup a bit more around them and then recompressed the PTM more evenly across the die - the excess you see on the die is from scraping off the sides and before compression).

Unfortunately this did not resolve the hard boot issues, though it does seem to have fixed the weird graphical issues that were sometimes occurring when the hard boot issue occurs.
In addition, my benchmarking is worse then the original swap and I’m stuck at around 39W-41W after some settling in with middling scores (14283 / 14651 and 851).
Now the laptop tends to freeze on initial boot but seems to load the drivers when I hard cycle it.

My Thoughts on the Supplied Process:
Guide needs some adjustment as is noted already in the comments (removing middle plate and possibly needing remove RAM is missing), and the pad I got was slightly the wrong size to where even slight misalignment probably caused my issues, and I can’t seem to redistribute it properly. Given a lot of success in this thread, that last part might have been a slight size difference in my pad or down to an issue with me judging if the pad was centered.

My Theory on my Problems:
I believe what is happening is that the PTM is solid, not making full contact, and fails to load the drivers on boot, but is making the PTM more malleable. At that point, on reboot, PTM contact is good enough to allow things to proceed as normal.
This does not explain why I still benchmarked quite a bit better originally, and now middling better on reboot, given my previous thermal throttling issues with LM were not causing driver issues. In addition that doesn’t explain why I get driver warnings when launching games, even if they are not graphically intensive (this also occurs if I’m running on the dgpu or 780M).

My only other thought is that I may have somehow damaged the mainboard or the heatspreader and not realized it, though the initial boost in benchmarking makes me wonder about that.

My Next Plans:
Since I did not file a support ticket before my warranty expired (since I was on the list to get this pad), I believe I don’t have any recourse there to try a different mainboard, heatsink, or PTM pad without cashing out.
Given the only modifications before issue occurrence were the thermal solution swap, I assume I need to buy a new PTM pad, clean up the die and surrounding transistors again, and try the new pad. I still do not believe I damaged any other components such as the heat spreader when operating.

My Questions (if anyone has any opinions):
What is the risk of damage from the mainboard due to PTM contacting the transistors next to the die but within the CPU section where the heatspreader mounts?
I saw in the guide that overhang was acceptable as long as it was even (which was not possible on my pad, but should have been close enough), which leads me to believe this should have been fine (combined with previous PTM research), but given I ensured even mounting pressure and thought my initial application was OK, but I’m searching for what the original issue could have been still.

With FW and LTT being out of the Honeywell PTM for sale in the small size, are there opinions on the next acceptable source for acceptable results? I’ve seen a recommendation for a Thermal Grizzly pad on Amazon however I’m open to alternatives. I believe there have been some other suppliers for Honeywell in the past from less known shops, but if the consensus is it’s all effectually the same I’d rather avoid making new accounts.

Kicking myself for not getting better pictures of the process, but without having another thermal solution on hand I’m loathe to remount the heat spreader again in fear of not being able to boot at all.
Would love to hear any alternative theories if anyone has any as well.

3 Likes

Ptm is not electrically conductive. Getting it on other parts will do nothing to cause the issues you see.

Poor heatsink contact may cause spiking thermals that lead to a GPU or cpu shutdown, but usually not the graphical artifacting you see. I would check the condition of the dgpu bridge for bent pins or bad connection, and also check the cables for your display connector.

That doesn’t look like a driver failure to me, more likely a hardware issue, potentially from your comment about the ram. You shouldn’t need to remove the ram for the repasted. Where you using the ram location as a lever point to get the heatsink off? I’m not sure I understand why the ram sticks needed to be removed

6 Likes

I’ll reinspect the interposer section for issues with the DGPU but I don’t think that would cause the integrated graphic failures as well, so I’ll more closely examine the display cable as well for sure.
I removed the left RAM stick to lever the heatsink at the arrow depicted in the guide - I could not get the spudger underneath this section to apply any leverage without removing the left RAM stick (otherwise it would apply pressure to the RAM as well, which I sought to avoid).


The RAM has been operating as expected at 5600 and with the capacity expected in the meantime, though I did reseat the stick again when I readjusted the PTM to be safe (the heatsink came off much easier in the second round so I doubt it would have been needed if I didn’t want to reseat it anyway to be safe).

Oh, you remove the motherboard entirely to do the repasted? I guess I can see how that becomes needed. I wonder if the motherboard being outside the chassis causes bending force on it when the heatsink gets pried off. While screwed into the case, I assume there may be less flex on the components when prying off the heatspreader.
My other thought is there may be some solder points that’ve come loose, hence why you get this weird intermitted artifacting.

2 Likes

@Eric_S
If you have a full electronics lab, I would suggest an ultra sonic bath for the main board might help.

For any others, I would be interested to identify all the chips on the top and bottom of the board, with the heat sink removed. I am looking for the BIOS flash chip and the EC controller flash chip.

1 Like