Framework 16 to MXM Gpu - V0.1 Prototype design

That’s quite baffling actually. Is this EEPROM related though? Feel free to correct my understanding but is what’s happening:

Framework boots on > Reads EEPROM > Thinks it’s an ssd, (or defaults to it?) > Starts publishing PEX clock on SSD?

I just don’t understand why it wouldn’t default to… like… the pin labeled PEX_RST?

Yeah, I’m unsure either actually. It should be confirmed after I move that resistor with the general PCIe device EEPROM config from framework. If I find the time I might do it in a few hours, and if the board still works, it might be that my initial EEPROM config was just wrong. I’ll update you on that, but I recommend doing the same thing as me and just populating a 0 Ohm resistor between PEX_RST and your device’s RST pin, but also having pads available between SSD1_RST and the device’s RST pin, just in case.

1 Like

I’d certainly be interested to try, but I’d love to be able to check properly before sending off another board. Please do update me - this is a really interesting problem!

According to a post from FW. The EEPROM is read by the EC. If then uses the settings from the EEPROM to configure the Expansion Bay.
One setting says to use the SSD1_RST, another setting says to use the PEX_RST, another setting is for the amount of lanes.
That is how the EEPROM controls how the PCIe interface on the expansion bay works.

Yup, that’s how it should be, so I’m leaning towards my config just being wrong. Although I might need that initial behaviour, since I do want my oculink board to be able to do 4x2 bifurcation using an oculink Y cable by just flashing different data onto the EEPROM. For regular x8 use case, this is odd behaviour tho and thats what caused issues for me too.

Looks like that even with the configuration provided by framework, the issue does not go away. I moved my 0 Ohm resistor back to connect PEX_RST to RST pin on the oculink connector and it would not work no matter what. It starts working immediately after going back to SSD1_RST. Will need to clear this up with framework since it really is such odd behavior.

1 Like

Hmm. @nrp or @catastrophic - any way you could clear this up?

To add to this, as I have already mentioned it while replying to Kieran in the OCuLink thread.

I am not using the EC firmware included with v4.02 BIOS, as I have a patch that applies higher SoC power limits using a 98W charger instead of 100W due to my Caldigit TS4 dock only exposing itself as a 4.9A 20V charger and this then results in the CPU not wanting to go full performance. So I am on the BIOS version right before v4.02 with the EC firmware that is currently in the GitHub repo.

I don’t know if there might have been changes that could affect this between the current repo state and the EC firmware bundled with v4.02, but I did want to mention it if it is relevant.

V0.3 boards have just showed up. I admit I’m not super hopeful, but I guess they’ve got slightly shorter traces? Will update after assembling them.

13 Likes

I’m sorry, that new board is downright adorable.

3 Likes

Hmm. Same old same old assembly issues. I was hoping new solder paste, (chipquik SMDLTLFP10T5), might help with some of the solder spreading, but it doesn’t seem to have made much of an improvement. To people with experience soldering pitches this small, (0.5mm): advice?

Photos below show the connector removed because of bridges and lots of tiny solder balls failing to adhere to the pads.

1 Like

Toaster oven reflow can be effective: The Toaster Oven | EZ Make Oven | Adafruit Learning System

1 Like

Hi,

The temperatures are not getting high enough for the solder. Specifically, the temperature of the pcb pads and the connector pads is not high enough.
The SMDLTLFP10T5 has a datasheet with a reflow profile. Are you following that?
I guess you are up against the plastic connector, with its warping when heated too much.
Do you have the datasheet for the connector? One can then match a solder paste to the connector datasheet.

You could try to use solder paste that melts at lower temperature than the SMDLTLFP10T5. There are trade offs thought, as lower temperature solder can be less reliable.
You can also manufacture PCBs with solder resist applied between the pads.
For these difficult solder tasks, you really need a reflow oven, so that it warms the whole thing to just below the point at which the connector will be damaged, and then peaks for a very short time, just enough to melt the solder, but not long enough to damage the connector.

Looking at some datasheets:
The Solder needs to get to 165 C for about 30 seconds for a good solder.
There are MXM connectors available that can take 280 C, so you should be able to find a solution that works.
E.g. Connector: https://www.digikey.com/en/products/detail/aces-connectors/52741-3140A-002/16672070

1 Like

That was the spec that I followed, (heating to 165 degrees), but I’m concerned that the hotplate might not be as well calibrated as it claims. Finding lower temperature solder paste is certainly a possibility, but I think I’d be getting nervous about its potential to loose structural integrity at high temperatures and with the spring load of holding in the mxm card. I’ll try again with a higher temperature, and if that doesn’t work go for the reflow oven.

Slightly changing the topic - has there been any update on why Filip’s solution with SSD_RST worked? This feels like it might be the crux of some of the issues we’re experiencing.

It might be worth getting a thermal camera or borrowing one, to check whether it is getting hot enough or not.

You will need to turn the hotplate up to more than 165 to get the solder to flow properly, probably nearer to 200 or maybe even 250. Don’t forget that the PCB will be radiating heat away from the top surface, so to get the solder pads to a guaranteed solder flow temperature the hot plate will need to be at an elevated temperature.

1 Like

Hey again.

Just tried the new fixed board, (after much faff fixing the mxm connector), and wasn’t met with much success. I’m starting to learn more about the MXM standard though and thought some of it might be useful to be more widely known.

The MXM standard is famously complex and abused by companies who lock down their own specifications, wrecking compatibility with other devices.

This means a few things.

  • Some * MXM GPUs rely on their VBIOS being provided by the motherboard itself. I think this may be the issue with the one I’m using at the moment, (5450 MXM GPU HP 520)

  • It’s often not enough to just have a basic PCIE link with the cards - many need to know that they’re installed with the PRSNT, (present within the machine), pins, which I haven’t hooked up to anything on v3. It should be noted though that in * ALittleWifi’s MXM immobiliser this isn’t connected, so maybe this should be taken with a grain of salt.

  • Some MXM GPUs are apparently just grouchy when installed into non OEM systems, as I fear might be the case with the one I’m testing with at the moment.

All in all a bit of an irritating affair. I’m going to try with the original nvidia card I was testing with before, but I may not have access to it until the 4th.

Very open to ideas and suggestions.

If it doesn’t work with the nvidia card I’ll invest in a V4 version with Filip’s SSD_RST fix just to try something a bit new. Maybe one last call to @catastrophic and @nrp for some conclusion on why the SSD_RST bug is happening?

Thanks everyone for your patience. We’ll get there soon.

Joe

3 Likes

Just a thought.
Maybe aim a bit lower.
What about using this MXM board:

You then just plug a M.2 NVME SSD into it, and see if it is alive.
It won’t need any VBIOS, but will at least be able to test that the PCIe lanes are working and a few other things, just enough to get the SSD working.

Looking through the EC source code, it is not clear if the EC controls the SSD_RST and PEX_RST or not.
It also appears the MXM is not very standard, with different amount of pin counts for different versions.

I’m not sure that’s the problem though. The nvme drive would only check for the PCIE lanes working, and I’m not sure that’s the issue any more. Instead, I think it’s all the auxiliary stuff that mxm GPUs specifically require that’s causing problems.

Edit:

My bad, read too quickly. You mean use the adapter board to confirm that the PCIE lanes are definitely working?

Yes, to check the PCIe lanes are OK, and the various reset/present pins etc. are working ok.
If you can get that as far as “lspci” working to an m.2 SSD on the MXM card, then it should at least get lspci working with the GPU card.
That level has to be working, before VBIOS or anything else GPU specific.

It does make a pretty big assumption that the MXM pinouts match though.
There are so many MXM pinouts.
MXM-I
MXM-II
MXM-III
MXM-III(HE)
MXM-IV
MXM-A
MXM-B
MXM-B+