Wow that’s certainly an option. I mean putting test points on every pin to make sure I can actually access and test them without having to solder to 0.5mm pins on the mxm connector. The test card isn’t a bad idea, but considering how much trouble getting just one PCIE connector working has been I admit I’m not enthusiastic about trying to integrate with occulink. I know it should just work, and it’s a more understood standard, but adding more complexity doesn’t feel like the right move here.
Unfortunately I’ve already had it on v3.07 already and I’ve tried both. I’m certainly interested in what’s going on here though.
Oh, that was my own ectool here:
But you can do the same with:
sudo framework_tool --expansion-bay
Expansion Bay
Enabled: false
No fault: true
Door closed: true
Board: UmaFans <- This should say "Dual Interposer"
Serial Number: UMA FAN
[ERROR] Response(Unavailable)
The rust source code has this depending on the board id values:
match (self.board_id_0, self.board_id_1) {
(BOARD_VERSION_12, BOARD_VERSION_12) => Ok(ExpansionBayBoard::DualInterposer),
(BOARD_VERSION_13, BOARD_VERSION_15) => Ok(ExpansionBayBoard::UmaFans),
(BOARD_VERSION_11, BOARD_VERSION_15) => Ok(ExpansionBayBoard::SingleInterposer),
(BOARD_VERSION_15, BOARD_VERSION_15) => Err(ExpansionBayIssue::NoModule),
// Invalid board IDs. Something wrong, could be interposer not connected
_ => Err(ExpansionBayIssue::BadConnection(
self.board_id_0,
self.board_id_1,
)),
The board ID depends on certain resistors you should have placed on the board. With version_15 meaning “not connected” or “no resistor present”.
It is useful to check the board IDs, because it tells you if the interposer is connecting to your PCB correctly.
Upgraded back to v4.03 to check, and it is indeed detected by the EC like always, but of course the connected GPU is never detected. I compiled your ectool version and also ran framework_tool:
interfaces:0xffffffff
State: 0x00000005:
Module: Present
Fault: None
Hatch: Closed
Board_ID_0: 12 (0x0000000c)
Board_ID_1: 12 (0x0000000c)
And
Expansion Bay
Enabled: true
No fault: true
Door closed: true
Board: DualInterposer
Serial Number: FRAOCULINKTERRAILS
Config: Pcie8x1
Vendor: PcieAccessory
Expansion Bay EEPROM
Valid: true
HW Version: 8.0
These firmware issues are driving me nuts.
EDIT: Also did an ectool reboot using ECTool.efi to completely restart the EC but it of course didn’t change a thing.
Reading the 0xfed815a0 from Linux userspace:
See example program below:
// mmio_dev_mem.c
#define _POSIX_C_SOURCE 200112L
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <errno.h>
#include <string.h>
int main(int argc, char **argv) {
if (argc != 4) {
fprintf(stderr, "Usage: %s <phys_addr_hex> <offset_hex> <length>\n", argv[0]);
return 2;
}
unsigned long phys = strtoul(argv[1], NULL, 0);
unsigned long offset = strtoul(argv[2], NULL, 0);
size_t length = (size_t)strtoul(argv[3], NULL, 0);
int fd = open("/dev/mem", O_RDWR | O_SYNC);
if (fd < 0) {
fprintf(stderr, "open /dev/mem failed: %s\n", strerror(errno));
return 1;
}
unsigned long page_size = sysconf(_SC_PAGESIZE);
unsigned long page_base = phys & ~(page_size - 1);
unsigned long page_offset = phys - page_base;
size_t map_len = page_offset + length;
void *map = mmap(NULL, map_len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, page_base);
if (map == MAP_FAILED) {
fprintf(stderr, "mmap failed: %s\n", strerror(errno));
close(fd);
return 1;
}
volatile uint32_t *reg = (volatile uint32_t *)((char *)map + page_offset + offset);
// Read 32-bit register
uint32_t val = *reg;
__sync_synchronize(); // compiler memory barrier
printf("Read 0x%08x from phys 0x%lx + 0x%lx\n", val, phys, offset);
// Write 32-bit register (example)
//uint32_t newval = 0xA5A5A5A5;
//*reg = newval;
//__sync_synchronize();
// printf("Wrote 0x%08x\n", newval);
if (munmap(map, map_len) != 0) {
fprintf(stderr, "munmap failed: %s\n", strerror(errno));
}
close(fd);
return 0;
}
compile with:
gcc -O2 -Wall -o mmio_dev_mem mmio_dev_mem.c
Read the mmio 0xfed815a0 value:
sudo ./mmio_dev_mem 0xFED81000 0x05a0 0x1000
Read 0x00e50000 from phys 0xfed81000 + 0x5a0
Note: You can also write values, see the commented out write bit in the C code.
It uses “/dev/mem”. Some kernel / Linux configurations block it, so you might need to unblock it before the program works.
I think it will be difficult to program, unless we know what each bit does, because my guess is that some bits set the GPIO for in/out/tristate, whether it triggers an interrupt, if the interrupt is level or edge triggered etc. with only 1 of the 32 bits being the actual set it high or low bit.
I get “0x00e50000” with no GPU there.
Kieran shows it with “0x00a40000”.
So, maybe someone can put a scope on the pcie reset pin and see if it changes state when an 0x00e50000 or a 0x00a40000 is written to that register.
I feel like we should create a separate thread for this issue to stop filling up this project’s own thread. I created one here Issues with getting a PCIe device detected using v4 BIOS on the 7940HS with another update.
Ok, I thought the problems were with getting it to see the MXM Gpu.
I did not think we were discussing Oculink ports.