During rsync: "BUG: Bad page state in process"

I’m attempting to back up my laptop over wifi to a local network drive with rsync.

After a while, I begin to see these messages in the syslog, and journald tries to write them so fast it pins all the cores and renders the laptop unresponsive and I have to hold the power button down to force it to shut down. (I can’t even switch to a new text console, move the mouse, etc.)

Has anyone else run into this? I’m just about to try a system update, and then doing the backup over ethernet (via a USB dongle) but in the meantime I thought I’d ask here! I’ve never run into this before, I’m not sure if this has anything to do with the ssd/hardware or it’s just a bug in rsync or some other software…?

Linux lake 5.14.7-arch1-1 #1 SMP PREEMPT Wed, 22 Sep 2021 21:35:11 +0000 x86_64 GNU/Linux

Oct 01 12:33:53 lake kernel: page:000000004d49cdbd refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x4f1f40
Oct 01 12:33:53 lake kernel: flags: 0x2ffff0000020000(mappedtodisk|node=0|zone=2|lastcpupid=0xffff)
Oct 01 12:33:53 lake kernel: raw: 02ffff0000020000 dead000000000100 dead000000000122 0000000000000000
Oct 01 12:33:53 lake kernel: raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
Oct 01 12:33:53 lake kernel: page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
Oct 01 12:33:53 lake kernel: Modules linked in: 8021q garp mrp stp llc ccm snd_hda_codec_hdmi hid_sensor_als hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio hid_sensor_custom hid_sensor_hub cros_ec_ishtp cros_ec intel_ishtp_loader intel_ishtp_hid mousedev btusb hid_multitouch btrtl btbcm btintel iTCO_wdt intel_pmc_bxt joydev mei_hdcp bluetooth mei_wdt iTCO_vendor_support ecdh_generic ecc intel_pmt_telemetry intel_pmt_class intel_rapl_msr usbhid wmi_bmof intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp snd_sof_pci_intel_tgl snd_sof_intel_hda_common kvm_intel soundwire_intel soundwire_generic_allocation soundwire_cadence kvm snd_sof_intel_hda iwlmvm snd_sof_pci snd_sof_xtensa_dsp irqbypass crct10dif_pclmul snd_sof crc32_pclmul ghash_clmulni_intel snd_soc_hdac_hda snd_hda_ext_core aesni_intel mac80211 snd_soc_acpi_intel_match crypto_simd snd_soc_acpi cryptd soundwire_bus intel_cstate snd_soc_core snd_hda_codec_realtek intel_uncore
Oct 01 12:33:53 lake kernel:  snd_hda_codec_generic snd_compress ac97_bus ledtrig_audio libarc4 snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi pcspkr iwlwifi snd_hda_codec snd_hda_core intel_spi_pci intel_spi snd_hwdep psmouse snd_pcm spi_nor snd_timer cfg80211 snd i2c_i801 mtd soundcore i2c_smbus rfkill mei_me mei intel_lpss_pci intel_ish_ipc vfat intel_lpss idma64 i915 fat thunderbolt intel_ishtp intel_pmt i2c_algo_bit ttm drm_kms_helper cec processor_thermal_device_pci_legacy processor_thermal_device intel_gtt processor_thermal_rfim agpgart processor_thermal_mbox processor_thermal_rapl syscopyarea sysfillrect intel_rapl_common sysimgblt intel_soc_dts_iosf fb_sys_fops igen6_edac tpm_crb ucsi_acpi typec_ucsi typec roles mac_hid wmi i2c_hid_acpi int3403_thermal tpm_tis i2c_hid int340x_thermal_zone tpm_tis_core video tpm rng_core int3400_thermal acpi_thermal_rel acpi_pad drm fuse bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd libps2 i8042
Oct 01 12:33:53 lake kernel:  xhci_pci crc32c_intel xhci_pci_renesas serio
Oct 01 12:33:53 lake kernel: CPU: 7 PID: 1352 Comm: rsync Tainted: G     U            5.14.7-arch1-1 #1 0ba4a27bdcf67c80b7c97fb72a96656aafa14b65
Oct 01 12:33:53 lake kernel: Hardware name: Framework Laptop/FRANBMCP08, BIOS 03.02 07/01/2021
Oct 01 12:33:53 lake kernel: Call Trace:
Oct 01 12:33:53 lake kernel:  dump_stack_lvl+0x46/0x5a
Oct 01 12:33:53 lake kernel:  bad_page.cold+0x63/0x94
Oct 01 12:33:53 lake kernel:  rmqueue_bulk+0x743/0x9d0
Oct 01 12:33:53 lake kernel:  get_page_from_freelist+0x102b/0x14c0
Oct 01 12:33:53 lake kernel:  ? __mod_memcg_lruvec_state+0x1f/0xe0
Oct 01 12:33:53 lake kernel:  ? __mod_lruvec_page_state+0x6f/0xb0
Oct 01 12:33:53 lake kernel:  __alloc_pages+0xee/0x230
Oct 01 12:33:53 lake kernel:  page_cache_ra_unbounded+0x112/0x210
Oct 01 12:33:53 lake kernel:  filemap_get_pages+0x250/0x600
Oct 01 12:33:53 lake kernel:  filemap_read+0xb9/0x350
Oct 01 12:33:53 lake kernel:  new_sync_read+0x14f/0x1e0
Oct 01 12:33:53 lake kernel:  vfs_read+0xf3/0x180
Oct 01 12:33:53 lake kernel:  ksys_read+0x67/0xe0
Oct 01 12:33:53 lake kernel:  do_syscall_64+0x59/0x80
Oct 01 12:33:53 lake kernel:  ? sched_clock_cpu+0x9/0xb0
Oct 01 12:33:53 lake kernel:  ? irqtime_account_irq+0x38/0xb0
Oct 01 12:33:53 lake kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Oct 01 12:33:53 lake kernel: RIP: 0033:0x7f8c6ffa8862
Oct 01 12:33:53 lake kernel: Code: c0 e9 b2 fe ff ff 50 48 8d 3d 5a 29 0a 00 e8 55 e4 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
Oct 01 12:33:53 lake kernel: RSP: 002b:00007ffca0e55d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Oct 01 12:33:53 lake kernel: RAX: ffffffffffffffda RBX: 000056354f8bac70 RCX: 00007f8c6ffa8862
Oct 01 12:33:53 lake kernel: RDX: 0000000000040000 RSI: 000056354c68c9b0 RDI: 0000000000000003
Oct 01 12:33:53 lake kernel: RBP: 0000000000040000 R08: 0000000093ac0000 R09: 0000000000000000
Oct 01 12:33:53 lake kernel: R10: 0000000000000080 R11: 0000000000000246 R12: 0000000000000000
Oct 01 12:33:53 lake kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000040000
Oct 01 12:33:53 lake kernel: Disabling lock debugging due to kernel taint
Oct 01 12:33:53 lake kernel: BUG: Bad page state in process jbd2/nvme0n1p3-  pfn:4f1f81
Oct 01 12:33:53 lake kernel: page:000000003009ad0a refcount:0 mapcount:0 mapping:000000003e7e113c index:0x1 pfn:0x4f1f81
Oct 01 12:33:53 lake kernel: failed to read mapping contents, not a valid kernel address?
Oct 01 12:33:53 lake kernel: flags: 0x2ffff0000000000(node=0|zone=2|lastcpupid=0xffff)
Oct 01 12:33:53 lake kernel: raw: 02ffff0000000000 dead000000000100 dead000000000122 0000000000200000
Oct 01 12:33:53 lake kernel: raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
Oct 01 12:33:53 lake kernel: page dumped because: non-NULL mapping
Oct 01 12:33:53 lake kernel: Modules linked in: 8021q garp mrp stp llc ccm snd_hda_codec_hdmi hid_sensor_als hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio hid_sensor_custom hid_sensor_hub cros_ec_ishtp cros_ec intel_ishtp_loader intel_ishtp_hid mousedev btusb hid_multitouch btrtl btbcm btintel iTCO_wdt intel_pmc_bxt joydev mei_hdcp bluetooth mei_wdt iTCO_vendor_support ecdh_generic ecc intel_pmt_telemetry intel_pmt_class intel_rapl_msr usbhid wmi_bmof intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp snd_sof_pci_intel_tgl snd_sof_intel_hda_common kvm_intel soundwire_intel soundwire_generic_allocation soundwire_cadence kvm snd_sof_intel_hda iwlmvm snd_sof_pci snd_sof_xtensa_dsp irqbypass crct10dif_pclmul snd_sof crc32_pclmul ghash_clmulni_intel snd_soc_hdac_hda snd_hda_ext_core aesni_intel mac80211 snd_soc_acpi_intel_match crypto_simd snd_soc_acpi cryptd soundwire_bus intel_cstate snd_soc_core snd_hda_codec_realtek intel_uncore
Oct 01 12:33:53 lake kernel:  snd_hda_codec_generic snd_compress ac97_bus ledtrig_audio libarc4 snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi pcspkr iwlwifi snd_hda_codec snd_hda_core intel_spi_pci intel_spi snd_hwdep psmouse snd_pcm spi_nor snd_timer cfg80211 snd i2c_i801 mtd soundcore i2c_smbus rfkill mei_me mei intel_lpss_pci intel_ish_ipc vfat intel_lpss idma64 i915 fat thunderbolt intel_ishtp intel_pmt i2c_algo_bit ttm drm_kms_helper cec processor_thermal_device_pci_legacy processor_thermal_device intel_gtt processor_thermal_rfim agpgart processor_thermal_mbox processor_thermal_rapl syscopyarea sysfillrect intel_rapl_common sysimgblt intel_soc_dts_iosf fb_sys_fops igen6_edac tpm_crb ucsi_acpi typec_ucsi typec roles mac_hid wmi i2c_hid_acpi int3403_thermal tpm_tis i2c_hid int340x_thermal_zone tpm_tis_core video tpm rng_core int3400_thermal acpi_thermal_rel acpi_pad drm fuse bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd libps2 i8042
Oct 01 12:33:53 lake kernel:  xhci_pci crc32c_intel xhci_pci_renesas serio
Oct 01 12:33:53 lake kernel: CPU: 2 PID: 199 Comm: jbd2/nvme0n1p3- Tainted: G    BU            5.14.7-arch1-1 #1 0ba4a27bdcf67c80b7c97fb72a96656aafa14b65
Oct 01 12:33:53 lake kernel: Hardware name: Framework Laptop/FRANBMCP08, BIOS 03.02 07/01/2021
Oct 01 12:33:53 lake kernel: Call Trace:
Oct 01 12:33:53 lake kernel:  dump_stack_lvl+0x46/0x5a
Oct 01 12:33:53 lake kernel:  bad_page.cold+0x63/0x94
Oct 01 12:33:53 lake kernel:  rmqueue_bulk+0x743/0x9d0
Oct 01 12:33:53 lake kernel:  get_page_from_freelist+0x102b/0x14c0
Oct 01 12:33:53 lake kernel:  ? ext4_map_blocks+0x452/0x5c0 [ext4 c174ea75cb8b23a87e474c9c8e461501e0c79067]
Oct 01 12:33:53 lake kernel:  ? jbd2_transaction_committed+0x55/0x60 [jbd2 1ad8c788afba6d52e6761cec9dcf60c2cfea5c64]
Oct 01 12:33:53 lake kernel:  __alloc_pages+0xee/0x230
Oct 01 12:33:53 lake kernel:  pagecache_get_page+0x1c9/0x510
Oct 01 12:33:53 lake kernel:  __getblk_gfp+0xdd/0x270
Oct 01 12:33:53 lake kernel:  jbd2_journal_get_descriptor_buffer+0x5e/0x100 [jbd2 1ad8c788afba6d52e6761cec9dcf60c2cfea5c64]
Oct 01 12:33:53 lake kernel:  jbd2_journal_commit_transaction+0xe17/0x1c70 [jbd2 1ad8c788afba6d52e6761cec9dcf60c2cfea5c64]
Oct 01 12:33:53 lake kernel:  ? cpuacct_charge+0x32/0x80
Oct 01 12:33:53 lake kernel:  kjournald2+0xdc/0x2b0 [jbd2 1ad8c788afba6d52e6761cec9dcf60c2cfea5c64]
Oct 01 12:33:53 lake kernel:  ? do_wait_intr_irq+0xa0/0xa0
Oct 01 12:33:53 lake kernel:  ? load_superblock.part.0+0xb0/0xb0 [jbd2 1ad8c788afba6d52e6761cec9dcf60c2cfea5c64]
Oct 01 12:33:53 lake kernel:  kthread+0x12f/0x160
Oct 01 12:33:53 lake kernel:  ? set_kthread_struct+0x40/0x40
Oct 01 12:33:53 lake kernel:  ret_from_fork+0x1f/0x30

Update: I updated my system, rebooted, and tried to rsync over wifi again. After a few minutes the same thing happened as described above.

So I rebooted again, left the wifi card disabled completely and tried the rsync again over ethernet. (Via a USB dongle.) This time I didn’t see journald pegging all the CPUs, it froze like this instead:

I’m posting from the framework laptop now. After a reboot everything seems fine otherwise.

The only other thing I can think to try next is to throttle rsync and see if that helps… it seems to die always during the transfer of a large file, but that might be a coincidence.

A bad memory stick might possibly cause this error; I’d run a memory test to rule that out.

Thanks – I’ve put memtest on a USB stick, but while trying to boot from it the laptop is now hanging in the BIOS.

I was just making sure I have usb booting enabled in the bios, and the UI froze, the laptop started getting hot and the fans were running loudly until I held the power button again to kill it.

I’m letting it cool off… (posting from my thinkpad meanwhile.)

If you haven’t I would also make sure you have installed the intel-microcode package or equivalent for your distribution (on Arch apparently intel-ucode). Something people don’t always realize is that CPUs are sort of like cookies, they don’t all come out with the same number of chocolate chips and raisins, or they are in different spots, so Intel releases these microcode updates that basically mark “danger thin ice” or reroute some functions to other parts of a CPU if there were issues in a certain batch. Running without the microcode updates means you can hit that thin ice and have your CPU do weird things to whatever code it is trying to execute.

https://software.intel.com/content/www/us/en/develop/articles/software-security-guidance/secure-coding/loading-microcode-os.html

https://wiki.debian.org/Microcode

Please install the amd64-microcode package (for systems with AMD AMD64 processors), or the intel-microcode package (for systems with Intel processors). You will have to enable both contrib and non-free in /etc/apt/sources.list.

https://wiki.archlinux.org/title/microcode

1 Like

Thanks for the note about quick boot. It was enabled, but I hadn’t heard about that issue yet.

The memtest website only has downloads for UEFI boots from what I can tell. Still haven’t gotten it to boot though I haven’t tried again yet this morning.

@Ethan_Spoelstra thanks, yes I installed the microcode package during installation. It’s up to date.

Edit: just trying again this morning to complete the backup, this time I didn’t see any messages in the syslog at all, but the caps lock key started flashing.

FWIW this seems to happen always after about 10 minutes, throttling rsync doesn’t seem to make a difference. So far I’ve crashed after about 10 minutes with no throttling, 1GB, 100MB, 60MB, and 20MB limits. I’m progressing through my updates though, and getting closer to finishing the backup – so if the disk is bad, it seems to be failing in rough intervals.

Edit again: I’ve got a complete backup now! That’s a huge relief. :slight_smile: I’ll try to get memtest to boot this weekend. Thankfully everything else seems normal!

@Erik_Schoster it sounds like there’s something wrong with your laptop (perhaps memory, perhaps something else), I’d contact Framework support.

Perhaps they might do a swap so you can get a working laptop and they can do a post-mortem on your current laptop and find out what went wrong.

1 Like

Are you using btrfs by any chance? There’s this bug which I believe has been fixed on 5.14.9.

2 Likes

I’ve never heard of it but thank you so much.

Edit: I’m going to wait for the update and see what happens. (For a day.) Thank you so much to everyone helping in this thread!!!

Edit 2: FWIW, nothing has changed technically for me. I’m still trying to find a solution.

If you have two sticks of memory, you could try running them one at a time to see if one is potentially having issues.

Thanks! I just have the one stick right now. I just tried a (smaller) backup this morning after updating to 5.14.8 and I had no issues. I still would like to get memtest running for peace of mind but I suspect @dimitris might have found my issue! Thanks so much for everyone’s help figuring this out – I’ll update the thread if I learn anything new.

I’ll delete this thread later on unless anyone thinks it would be useful to keep around. This seems like it was an arch-specific bug (as noted by @dimitris above) that didn’t last very long in the wild.