Framework 16 to MXM Gpu - V0.1 Prototype design

Wow that’s certainly an option. I mean putting test points on every pin to make sure I can actually access and test them without having to solder to 0.5mm pins on the mxm connector. The test card isn’t a bad idea, but considering how much trouble getting just one PCIE connector working has been I admit I’m not enthusiastic about trying to integrate with occulink. I know it should just work, and it’s a more understood standard, but adding more complexity doesn’t feel like the right move here.

Unfortunately I’ve already had it on v3.07 already and I’ve tried both. I’m certainly interested in what’s going on here though.

Oh, that was my own ectool here:

But you can do the same with:

sudo framework_tool --expansion-bay
Expansion Bay
  Enabled:       false
  No fault:      true
  Door closed:   true
  Board:         UmaFans     <- This should say "Dual Interposer"
  Serial Number: UMA FAN
[ERROR] Response(Unavailable)

The rust source code has this depending on the board id values:

       match (self.board_id_0, self.board_id_1) {
            (BOARD_VERSION_12, BOARD_VERSION_12) => Ok(ExpansionBayBoard::DualInterposer),
            (BOARD_VERSION_13, BOARD_VERSION_15) => Ok(ExpansionBayBoard::UmaFans),
            (BOARD_VERSION_11, BOARD_VERSION_15) => Ok(ExpansionBayBoard::SingleInterposer),
            (BOARD_VERSION_15, BOARD_VERSION_15) => Err(ExpansionBayIssue::NoModule),
            // Invalid board IDs. Something wrong, could be interposer not connected
            _ => Err(ExpansionBayIssue::BadConnection(
                self.board_id_0,
                self.board_id_1,
            )),

The board ID depends on certain resistors you should have placed on the board. With version_15 meaning “not connected” or “no resistor present”.
It is useful to check the board IDs, because it tells you if the interposer is connecting to your PCB correctly.

Upgraded back to v4.03 to check, and it is indeed detected by the EC like always, but of course the connected GPU is never detected. I compiled your ectool version and also ran framework_tool:

interfaces:0xffffffff
State:      0x00000005:
Module:     Present
Fault:      None
Hatch:      Closed
Board_ID_0: 12 (0x0000000c)
Board_ID_1: 12 (0x0000000c)

And

Expansion Bay
  Enabled:       true
  No fault:      true
  Door closed:   true
  Board:         DualInterposer
  Serial Number: FRAOCULINKTERRAILS
  Config:        Pcie8x1
  Vendor:        PcieAccessory
  Expansion Bay EEPROM
    Valid:       true
    HW Version:  8.0

These firmware issues are driving me nuts.

EDIT: Also did an ectool reboot using ECTool.efi to completely restart the EC but it of course didn’t change a thing.

Reading the 0xfed815a0 from Linux userspace:

See example program below:

// mmio_dev_mem.c
#define _POSIX_C_SOURCE 200112L
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <errno.h>
#include <string.h>

int main(int argc, char **argv) {
    if (argc != 4) {
        fprintf(stderr, "Usage: %s <phys_addr_hex> <offset_hex> <length>\n", argv[0]);
        return 2;
    }

    unsigned long phys = strtoul(argv[1], NULL, 0);
    unsigned long offset = strtoul(argv[2], NULL, 0);
    size_t length = (size_t)strtoul(argv[3], NULL, 0);

    int fd = open("/dev/mem", O_RDWR | O_SYNC);
    if (fd < 0) {
        fprintf(stderr, "open /dev/mem failed: %s\n", strerror(errno));
        return 1;
    }

    unsigned long page_size = sysconf(_SC_PAGESIZE);
    unsigned long page_base = phys & ~(page_size - 1);
    unsigned long page_offset = phys - page_base;
    size_t map_len = page_offset + length;

    void *map = mmap(NULL, map_len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, page_base);
    if (map == MAP_FAILED) {
        fprintf(stderr, "mmap failed: %s\n", strerror(errno));
        close(fd);
        return 1;
    }

    volatile uint32_t *reg = (volatile uint32_t *)((char *)map + page_offset + offset);

    // Read 32-bit register
    uint32_t val = *reg;
    __sync_synchronize(); // compiler memory barrier
    printf("Read 0x%08x from phys 0x%lx + 0x%lx\n", val, phys, offset);

    // Write 32-bit register (example)
    //uint32_t newval = 0xA5A5A5A5;
    //*reg = newval;
    //__sync_synchronize();
    // printf("Wrote 0x%08x\n", newval);

    if (munmap(map, map_len) != 0) {
        fprintf(stderr, "munmap failed: %s\n", strerror(errno));
    }
    close(fd);
    return 0;
}

compile with:
gcc -O2 -Wall -o mmio_dev_mem mmio_dev_mem.c

Read the mmio 0xfed815a0 value:

sudo ./mmio_dev_mem 0xFED81000 0x05a0 0x1000
Read 0x00e50000 from phys 0xfed81000 + 0x5a0

Note: You can also write values, see the commented out write bit in the C code.
It uses “/dev/mem”. Some kernel / Linux configurations block it, so you might need to unblock it before the program works.

I think it will be difficult to program, unless we know what each bit does, because my guess is that some bits set the GPIO for in/out/tristate, whether it triggers an interrupt, if the interrupt is level or edge triggered etc. with only 1 of the 32 bits being the actual set it high or low bit.

1 Like

I get “0x00e50000” with no GPU there.
Kieran shows it with “0x00a40000”.

So, maybe someone can put a scope on the pcie reset pin and see if it changes state when an 0x00e50000 or a 0x00a40000 is written to that register.

1 Like

I feel like we should create a separate thread for this issue to stop filling up this project’s own thread. I created one here Issues with getting a PCIe device detected using v4 BIOS on the 7940HS with another update.

Ok, I thought the problems were with getting it to see the MXM Gpu.
I did not think we were discussing Oculink ports.