[SOLVED] Any attempt to instal a linux distro is failing eventually

I am trying to get my framework running again, after my previous debian broke down.

However I already have tried multiple debian builds (see my stackexchange post for further information). They all have in common, that the graphic installer failes or get stuck at some points (mostly in “Select and Install Software”) and the log console 4 choose different issues at any try.

I wanted to see if other OS can be installed so I tried the officially supported ones. I will add more details for others when I am able to try them. Maybe this has happend to anyone or somebody got an idea.

A) First up was Kubuntu 22.04, which either didnt show anything after startup besides the desktop or got stuck after I started the graphic install process . And the log console was saying nothing but get_swap: bad swap ...


Another approach was first “Try Kubuntu” and after start up the KDE Crash Handler said something like

Executeable: drkonqi PID:4518 Signal: Segmentation fault (11) Time:…

Id like to mention, that Ive read segmentation fault before when trying to install desktop environment packages in a Debian Bookworm Alpha non-free Installer (see stackexchange).

B) In Ubuntu 22.04 I got stuck write at the Language settings when straight going for the graphic installation. Eventually the screen here blacked out and said 4 lines like.

... pci 0000:00:07.0: DPC: RP PIO log size 0 is invalid

Running the installation through the Try Ubuntu Desktop I got greeted by

Sorry, Ubunut 22.04 has experienced an internal error. I dont think its necessary to type the error details here, but I made a photo.



The installation moved on from there and greeted me with a non-matching source copy on the CD/DVD, from where neither Retry nor Skip does anything

C) I tried fedora and it got stuck as well, even when running as live disk and trying to install smartmontools on the storage.

D) Debian live worked for a while today, but eventually switches to a black log screen.


But I managed to run smartctl and memtester, see answer below.

It looks like an underlying hardware issue to me, with the most probable culprit being the storage.
In second position there would be the memory, and in third the CPU/Motherboard, but if the issue came from those I’d suppose everything would often freeze/crash instead of displaying errors.

So I managed to run smartctl on the storage and memtester on the ram, which might be usefull? As far as I understand the storage had a few unexpected shutdowns but nothing else quite serious, instead there is an issue with my ram?

Smartctl

user@debian:~$ sudo smartctl -a /dev/nvme0n1 
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-18-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WD_BLACK SN850 1TB
Serial Number:                      2140K0442002
Firmware Version:                   614600WD
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      8224
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 4a49e41554
Local Time is:                      Wed Oct 19 09:29:46 2022 UTC
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     84 Celsius
Critical Comp. Temp. Threshold:     88 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W    9.00W       -    0  0  0  0        0       0
 1 +     4.10W    4.10W       -    0  0  0  0        0       0
 2 +     3.50W    3.50W       -    0  0  0  0        0       0
 3 -   0.0250W       -        -    3  3  3  3     5000   10000
 4 -   0.0050W       -        -    4  4  4  4     3900   45700

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        33 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    52,131,485 [26.6 TB]
Data Units Written:                 19,234,590 [9.84 TB]
Host Read Commands:                 397,288,347
Host Write Commands:                682,977,048
Controller Busy Time:               1,215
Power Cycles:                       982
Power On Hours:                     770
Unsafe Shutdowns:                   116
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    5

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

memtester on 10GB

user@debian:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       964Mi        12Gi       535Mi       2.1Gi        13Gi
Swap:             0B          0B          0B
user@debian:~$ sudo memtester 10000 2
\memtester version 4.5.0 (64-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 10000MB (10485760000 bytes)
got  10000MB (10485760000 bytes), trying mlock ...locked.
Loop 1/2:
  Stuck Address       : testing   0FAILURE: possible bad address line at offset 0x858e0a48.
Skipping to next test...
  Random Value        : /^C
[...]
FAILURE: 0xdded12461d9e4bc7 != 0xdded12461d9e4b7c at offset 0x614bc498.
FAILURE: 0xefe74a9371b37c36 != 0xefe74a9371b37c37 at offset 0x614bc508.
FAILURE: 0x6e57e9267fffe1ce != 0x6e57e9267fffe1ab at offset 0x614bc518.
FAILURE: 0xfffd2d5fc157debb != 0xfffd2d5fc157de10 at offset 0x614bc588.
FAILURE: 0x5e7e647b7ffd9948 != 0x5e7e647b7ffd99af at offset 0x614bc598.
FAILURE: 0xfa17b185f7bf464b != 0xfa17b185f7bf468b at offset 0x614bc608.
FAILURE: 0xbf6ff501ffb6347b != 0xbf6ff501ffb634e4 at offset 0x614bc618.
FAILURE: 0x5fbb59e57ffbe326 != 0x5fbb59e57ffbe359 at offset 0x614bc688.
[...]

memtester on 500MB

user@debian:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       964Mi        12Gi       535Mi       2.1Gi        13Gi
Swap:             0B          0B          0B
user@debian:~$ sudo memtester 500 2
\memtester version 4.5.0 (64-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 500MB (524288000 bytes)
got  500MB (524288000 bytes), trying mlock ...locked.
Loop 1/2:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking zeros       : ok
  8-bit Writes        : ok
  16-Bit-Writes       : ok
  
Loop 2/2:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking zeros       : ok
  8-bit Writes        : ok
  16-Bit-Writes       : ok
  
Done.
user@debian:~$ echo $?
0

But I should mention, that the memtest with 500MB and even lower down to >50MB yielded the same failures as 10GB on a different try.

If you have two sticks of RAM, try removing one and retest? You can try to reseat the problematic stick to see if that’s it, but if it’s a bad stick it just needs to be replaced.

1 Like

So the failure indicates that it actually would be a bad stick?

I will check, whether I got some old ones laying in my collection box an might check with those first? I guess they are pretty bad performance-wise, but should do the trick?

1 Like

Yes, for the purpose of testing, I would recommend:
-Test the stick in another machine to confirm it’s good
-Remove both of your existing sticks, put in the known-good, retest

I recognize my background is more working with desktops, where swapping RAM is a two-minute ordeal. If other people have better recommendations, you might want to follow theirs first :wink:

1 Like

Cheers! Currently I only have my Crucial CT16G4SFRA32A available. But I collected a few from older notebooks, I have never used/checked. Dont know if that works, but I will give them a look,

1 Like

Hello again. As I am currently investigating a possible firmware update of my storage (advised by Framework support) and another memory stick is on the way I tried memtest86 as well and got Failures with every test try. See Pictures for reference.

I guess those confirm the tests with memterster before.



1 Like

Yuck! Good luck!

2 Likes

Live distros should not touch the disc, except maybe for swap … I really don’t think that’s your issue

The first thing, I’d do is to remove the ram stick(s), and reinstall it/them a few times, many times, the pins scraping against the socket clean off a small bit of dirt / dust, and it solves the issue.

If this is not the case, based on what you posted, I’m 90% certain that it’s either a faulty mem. dimm, a bad motherboard trace/solder, or a faulty CPU.

2 Likes

Until you get a full pass running Memtest86, it would be very unwise to attempt firmware upgrades of anything.

Reliable RAM is the foundation of everything else in the system.

6 Likes

Thanks everybody. I suppose you are right. After the Western Digital “Diagnosis Tool” wasnt able to find a firmware update for my SSD. And I showed a video of how the Live Distros crashed the support decided to send me a new RAM (as this one was included in my DIY order).

Ill give an update, if that helps.

2 Likes

Hi @Pratched, my name is Matt Hartley and I am the (brand new) Linux Support Lead for Framework. Got this post on my radar now. Please keep us posted so we can help.

1 Like

Sorry for the late response. The issue was the faulty RAM. After a couple of extensive SSD-tests, I was able to convince the support team, that it is in fact the RAM. They sent me a new one and everything works fine.

If its possible, somebody might close this topic of, as I cnat figure out how to do that on my own.