Debugging boot failures

I recently obtained an NVME SSD, installed it, and used reform-setup-encrypted-nvme to migrate to it. I was able to boot into the SSD successfully once, and confirmed that I was booting into the correct storage device both by noticing the presence of the disk decryption prompt at startup and by checking which device was mounted at /.

After some time, I powered off the device, and the next time I attempted to power on the device I was met with a blank screen. The system controller is responsive and the keyboard backlight turns on when I power on.

I was able to use a USB serial adapter with a second device to get a startup log. It hangs indefinitely after Starting kernel...:

U-Boot SPL 2022.04-gb0e908b1-dirty (May 02 2024 - 19:18:25 +0000)
config to do 4000 1d training.
config to do 400 1d training.
config to do 100 1d training.
config to do 4000 2d training.
Normal Boot
Trying to boot from BOOTROM
image offset 0x0, pagesize 0x200, ivt offset 0x0
NOTICE:  Do not release JR0 to NS as it can be used by HAB
NOTICE:  BL31: v2.8(release):lf-6.1.22-2.0.0-6-g7e3484cc1
NOTICE:  BL31: Built : 15:05:50, Oct 17 2023


U-Boot 2022.04-gb0e908b1-dirty (May 02 2024 - 19:18:25 +0000)

CPU:   i.MX8MP[8] rev1.1 1800 MHz (running at 1200 MHz)
CPU:   Commercial temperature grade (0C to 95C) at 59C
Reset cause: POR
Model: MNT Pocket Reform with i.MX8MP Module
Board: nitrogen8mp
       Watchdog enabled
DRAM:  8 GiB
No USB device found
Core:  142 devices, 26 uclasses, devicetree: separate
MMC:   FSL_SDHC: 0, FSL_SDHC: 2
Loading Environment from MMC...
*** Warning - bad CRC, using default environment

missing node fb_lvds
missing node ldb/lvds-channel@0
missing node ldb/lvds-channel@1
missing node mipi_dsi
missing node lcdif
Display: hdmi:1920x1080M@60 (1920x1080)
[*]-Video Link 2 (1920 x 1080)
        [0] lcd-controller@32fc6000, video
        [1] hdmi@32fd8000, display
In:    serial
Out:   serial
Err:   serial

 BuildInfo:
  - ATF 7e3484c

missing node fb_lvds
missing node ldb/lvds-channel@0
missing node ldb/lvds-channel@1
missing node mipi_dsi
missing node lcdif
Net:   Could not get PHY for FEC1: mask 0x90
Could not get PHY for FEC1: mask 0x90
Micrel ksz9031
eth0: ethernet@30bf0000 [PRIME]
Hit any key to stop autoboot:  0
MMC: no card present
mmc_init: -123, time 2
switch to partitions #0, OK
mmc2(part 0) is current device
Scanning mmc 2:1...
Found U-Boot script /boot.scr
5166 bytes read in 2 ms (2.5 MiB/s)
## Executing script at 40480000
33499648 bytes read in 187 ms (170.8 MiB/s)
52866 bytes read in 3 ms (16.8 MiB/s)
37155637 bytes read in 208 ms (170.4 MiB/s)
Booting Debian 6.9.7-mnt-reform-arm64 from mmc 2:1...
Moving Image from 0x40480000 to 0x40600000, end=426d0000
## Flattened Device Tree blob at 43000000
   Booting using the fdt blob at 0x43000000
   Using Device Tree in place at 0000000043000000, end 000000004300fe81

Starting kernel ...

I was able to see kernel messages by editing the kernel boot parameters (editenv bootargs at uboot, again over serial) with loglevel=7, so the handoff from bootloader to kernel is working, but I don’t see anything that seems related to the failure.

I also flashed the system image from reform.debian.net to an sd; however, I don’t get any different behavior on startup with the SD card inserted, and the bootloader proceeds to (attempt to) boot from emmc. I’ve verified that the SD card isn’t corrupted by comparing the md5 sum with that of pocket-reform-system-imx8mp-bpo.img.

My questions:

  • Is there a series of kernel parameters that will give me a shell over serial so that I can investigate further? I tried both console=ttyS0 and console=ttyS1 with and without init=/bin/sh, but neither work.
  • Is there some way to prompt uboot to boot from the SD card, or how would I debug being unable to boot to the system image on the SD card?
  • What else should I check out?

You already tried most of the things to debug this, which is great! :+1: This sounds like two things to me but neither make much sense.

One is, that your luks passphrase prompt is showing up at a tty that you for some reason do not see. Note, that appending console= in ${bootargs} is likely not effective because the default boot.scr sets console=tty1 to avoid a situation where the luks passphrase prompt does not show up on the screen: 00reform2_ubootenv: put console=tty1 at the end of the cmdline to make sure... (d9ee804f) · Commits · Reform / MNT Reform Tools · GitLab

The other is a recent regression we had with kernel 6.10 but I see from your log that you are booting 6.9, so this cannot be it: minute: "PSA MNT Pocket Reform: I have pulled Linux kernel…" - Mastodon

I think booting a working system from SD-card is a good first step. Your u-boot on emmc will automatically load kernel and initrd from the first partition of your SD-card in favour of those files coming from the first partition on emmc. That way, you should always be able to boot a rescue system image to fix things up.

Note, that there are not many users of the images on reform.debian.net and I only have one Reform which is my main machine, so I am unable to test the content on reform.debian.net much. I also do not have a Pocket Reform. It is thus entirely plausible that the system images for the Pocket Reform offered at reform.debian.net do not work. Did you try the official images? There is a warning message at the top of https://reform.debian.net/ which points you to https://mnt.re/system-image

Wow, I’m not sure how I missed this or even how I ended up at that url. I’ll try the proper image, thanks for noticing.

I was told by other people that depending on what you type into some search engines, reform.debian.net comes out above mntre.com urls when you search for reform debian images. This is why I added the warning but maybe it should be even more prominent? @minute what do you think?

I wrote pocket-reform-system-imx8mp.img from Artifacts · build (#5342) · Jobs · Reform / reform-system-image · GitLab (md5 5c4ee76fee3feea75ed7e388e91031b0) to SD card and per serial the pocket reform still attempts to boot from mmc (Booting Debian 6.9.7-mnt-reform-arm64 from mmc 2:1...). I’ve tried both a 32GB SDHC card as well as a 64GB SDXC card.

Do these images work for others / are others able to boot the pocket reform from SD card successfully?

Is it possible my sd card reader is defective, is there a good way to rule that out?

If I write this image to a usb flash drive I get the same results (Booting Debian 6.9.7-mnt-reform-arm64 from mmc 2:1...). I thought I might be able to prompt uboot over serial to boot from usb but I get the following:

=> run bootcmd_usb0
starting USB...
Bus usb@38100000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
Bus usb@38200000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
scanning bus usb@38100000 for devices... 5 USB Device(s) found
scanning bus usb@38200000 for devices... 2 USB Device(s) found
       scanning usb for storage devices... 1 Storage Device(s) found

Device 0: Vendor: Lexar    Rev: 1100 Prod: USB Flash Drive 
            Type: Removable Hard Disk
            Capacity: 30526.0 MB = 29.8 GB (62517248 x 512)
... is now current device
** No partition table - usb 0 **
Couldn't find partition usb 0:1

I suspect the image at this point? Are there historical /known working images available anywhere? I saw that the older artifacts in the build system are expired

Having stared a bit at hexdump -C and binwalk output, I think the partition table gets overwritten somewhere in the build process.

I think UBOOT_OFFSET=0 for SYSIMAGE=pocket-reform-system-imx8mp is wrong and the uboot image overwrites the partition table.

/cc @josch

1 Like

I think this analysis is spot on. How could it come to this? Remember how an incorrect UBOOT_OFFSET of 33792 just recently thrashed the u-boot on eMMC of @andypiper: Updating uboot? - #5 by josch

I fixed that by changing UBOOT_OFFSET to zero which is the correct offset for the eMMC. For the SD-card, 33792 is the correct offset but the imx8mplus is unable to load u-boot from the SD-card so this offset is rather theoretical. The problem is, that the script creating the system images does not respect the SD_BOOT=false for the imx8mplus and happily writes u-boot with an offset of zero to the image, overwriting the MBR in the process. This commit should fix that situation: mkimage.sh: prevent u-boot from being flashed for platforms that cannot load it from SD-card (075f8603) · Commits · Reform / reform-system-image · GitLab

Unfortunately, due to another issue with imx8mplus and kernel 6.10 (Mntre.com repository URL problem - #2 by minute) we are unable to build system images right now…

2 Likes

We already figured out that the current images served by source.mntre.com and reform.debian.net for the pocket reform are broken because of the UBOOT_OFFSET issue. But I also have a merge request open which builds a board-agnostic universal system image: Draft: Create a generic (universal) image that works on all platforms with u-boot on eMMC (!104) · Merge requests · Reform / reform-system-image · GitLab

If you’d like to give that a try, here are the artifacts: Artifacts · build (#5282) · Jobs · Reform / reform-system-image · GitLab

Though beware, that system image is with kernel 6.10, so it might not boot reliably. Maybe you have to try a few times before it manages to boot…

1 Like

In case the generic no-uboot image fails to work, I prepared a system image based on the reform.debian.net repo which still has the 6.9 kernel here: https://mister-muffin.de/reform/pocket-reform-system-imx8mp.img.xz

It is signed by my GPG key and you can download the detached signature here: https://mister-muffin.de/reform/pocket-reform-system-imx8mp.img.xz.asc

In case you rather want to build the image yourself, you can do so by running this from the reform-system-image git clone:

DIST=bookworm-backports MIRROR=reform.debian.net ./mkimage.sh pocket-reform-system-imx8mp
2 Likes

This all tracks; I was surprised that there was no partition table on that image.

I am able to boot from @josch’s image, but I need to interrupt uboot and run bootcmd_usb0, and there are some issues. Here’s what it says (on the display, so typos if present are mine):

Gave up waiting for root file system device.  Common problems::
- Boot args (cat /proc/cmdline)
  - Check rootdelay= (did the system wait long enough?)
- Missing modules (cat /proc/modules: ls /dev)
ALERT!  Label=reformsdroot does not exist.  Dropping to a shell!

It does indeed drop me into a busybox ash shell but the keyboard doesn’t appear to work correctly. I suspect this is enough to get me somewhere after tinkering with kernel parameters. Thanks!

Here’s what happens if I attempt to boot the generic-no-uboot.img from job 5282 from an SD card

Found U-Boot script /boot.scr
5166 bytes read in 1 ms (4.9 MiB/s)
## Executing script at 40480000
script exited: continuing...
52866 bytes read in 3 ms (16.8 MiB/s)
BootOrder not defined
EFI boot manager: Cannot load any image
starting USB...
Bus usb@38100000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
Bus usb@38200000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
scanning bus usb@38100000 for devices... 4 USB Device(s) found
scanning bus usb@38200000 for devices... 2 USB Device(s) found
       scanning usb for storage devices... 0 Storage Device(s) found

Device 0: unknown device

From a USB flash drive it attempts to boot from mmc again; if I interrupt it and run bootcmd_usb0:

=> run bootcmd_usb0
starting USB...
Bus usb@38100000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
Bus usb@38200000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
scanning bus usb@38100000 for devices... 5 USB Device(s) found
scanning bus usb@38200000 for devices... 2 USB Device(s) found
       scanning usb for storage devices... 1 Storage Device(s) found
   
Device 0: Vendor: Lexar    Rev: 1100 Prod: USB Flash Drive
            Type: Removable Hard Disk
            Capacity: 30526.0 MB = 29.8 GB (62517248 x 512)
... is now current device
Scanning usb 0:1...
Found U-Boot script /boot.scr
5442 bytes read in 2 ms (2.6 MiB/s)
## Executing script at 40480000
28273152 bytes read in 226 ms (119.3 MiB/s)
Failed to load '/dtb-6.10.3-mnt-reform-arm64'
script exited: continuing...
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
MMC: no card present
mmc_init: -123, time 2
Scanning disk mmc@30b40000.blk...
Disk mmc@30b40000.blk not ready
Scanning disk mmc@30b60000.blk...
Scanning disk usb_mass_storage.lun0...
Found 6 disks
No EFI system partition
fail to find output device
ERROR: invalid device tree

It does seem like something is wrong with the boot order on pocket reform; I’ve never had it automatically select booting from anything other than emmc.

Thank you for reporting your findings here. We get to fix interesting bugs!

This is odd. The image you flashed should have a filesystem on the second partition with the label “reformsdroot”. Can you confirm? If it didn’t find that one, it could mean that somehow the USB controller didn’t come up or that linux for some reason was unable to enumerate your USB mass storage device?

Your initramfs wanting to mount the filesystem matching LABEL=reformsdboot is because by default, the /etc/fstab in your initrd contains that entry. Maybe that can be overridden by adding a boot= entry to u-boot’s ${bootargs}? But even if you do, I fear that systemd will insist on having it mounted…

Did you try flashing the image to an SD-card instead of to a usb-stick?

Why would the boot.scr just exit without loading linux?? o0 I’ve not seen this before…

The message BootOrder not defined might come from u-boot’s lib/efi_loader/efi_bootmgr.c, so this is efi stuff which should not be working in the first place.

This should not be happening. Could you share your printenv output? I’m specically interested in the value of the ${fdtfile} variable.

1 Like

yeah! Definitely appreciate your help, and happy to help contribute by reporting findings etc.

The label shows up no problem on another device:

$ ls -l /dev/disk/by-label/
total 0
lrwxrwxrwx 1 root root 10 Aug 11 15:02 reformsdboot -> ../../sdb1
lrwxrwxrwx 1 root root 10 Aug 11 15:02 reformsdroot -> ../../sdb2

I did; it attempts to boot from mmc, and I’m not sure if there’s a corresponding way to instruct uboot to attempt to boot from sd card?

It’s blank. I noticed this earlier:

Loading Environment from MMC...
*** Warning - bad CRC, using default environment

maybe the source of my issues? (But I also see this in several other messages on this forum).

Thanks to @amospalla we now know that these lines about a bad CRC are indeed normal: Collecting known-good u-boot output for debugging purposes - #5 by amospalla

2 Likes

This behavior might be explained by the u-boot environment not containing the ${fdtfile} environment variable on which generic-no-uboot.img to figure out on which platform it is being run and which device tree blob to load.

Are you sure it’s blank for you? In the output that @amospalla shared it is not.

Here’s my printenv:

=> printenv
arch=arm   
baudrate=115200
board=nitrogen8mp
board_carrier=-enc
board_name=nitrogen8mp
board_rv=_r20
boot_a_script=setenv disk ${devnum}; setenv dtype ${devtype}; setenv bootpart ${distro_bootpart}; load ${devtype} ${devnum}:${distro_bootpart} ${scriptaddr} ${prefix}${script}; source ${scriptad
dr}
boot_efi_binary=load ${devtype} ${devnum}:${distro_bootpart} ${kernel_addr_r} efi/boot/bootaa64.efi; if fdt addr ${fdt_addr_r}; then bootefi ${kernel_addr_r} ${fdt_addr_r};else bootefi ${kernel_
addr_r} ${fdtcontroladdr};fi
boot_efi_bootmgr=if fdt addr ${fdt_addr_r}; then bootefi bootmgr ${fdt_addr_r};else bootefi bootmgr;fi
boot_extlinux=sysboot ${devtype} ${devnum}:${distro_bootpart} any ${scriptaddr} ${prefix}${boot_syslinux_conf}
boot_net_usb_start=usb start
boot_prefixes=/boot/ /
boot_script_dhcp=boot.scr.uimg
boot_scripts=boot.scr.uimg boot.scr
boot_syslinux_conf=extlinux/extlinux.conf
boot_targets=mmc0 mmc2 usb0
bootargs=ro no_console_suspend pci=pcie_bus_perf nvme_core.default_ps_max_latency_us=0 console=tty1 fbcon=rotate:3 cma=256MB
bootcmd=run distro_bootcmd
bootcmd_mmc0=devnum=0; run mmc_boot
bootcmd_mmc2=devnum=2; run mmc_boot
bootcmd_usb0=devnum=0; run usb_boot
bootdelay=2
cmd_hdmi=fdt set fb_hdmi status okay;fdt set fb_hdmi mode_str 1920x1080M@60;
cmd_lvds=fdt set fb_lvds status disabled;fdt set ldb/lvds-channel@0 status disabled;
cmd_lvds2=fdt set ldb/lvds-channel@1 status disabled;
cmd_mipi=fdt set mipi_dsi status disabled;fdt set lcdif status disabled;
console=ttymxc1
cpu=armv8  
distro_bootcmd=for target in ${boot_targets}; do run bootcmd_${target}; done
efi_dtb_prefixes=/ /dtb/ /dtb/current/
env_dev=2  
env_part=1 
eth2addr=00:19:b8:00:00:02
ethact=ethernet@30bf0000
ethaddr=[redacted]
ethprime=eth0
fastboot_raw_partition_bootloader=0x0 0x1ff0 mmcpart 1
fastboot_raw_partition_bootloader-env=0x1ff0 0x10 mmcpart 1
fb_hdmi_name=1920x1080M@60
fdt_addr=0x43000000
fdt_addr_r=0x43000000
fdt_high=0xffffffffffffffff
fdtcontroladdr=f4bf08c0
fuse1=1 3
fuse1_val=10002000
fuse_mac1a=9 1
fuse_mac1a_val=00000019
fuse_mac1b=9 0
imx_cpu=8MP[8]
initrd_high=0xffffffffffffffff
kernel_addr_r=0x40480000
load_efi_dtb=load ${devtype} ${devnum}:${distro_bootpart} ${fdt_addr_r} ${prefix}${efi_fdtfile}
loadaddr=0x40480000
m4boot=load ${devtype} ${devnum}:1 ${m4loadaddr} ${m4image}; dcache flush; bootaux ${m4loadaddr}
m4image=m4_fw.bin
m4loadaddr=0x007E0000
mcore_bootargs=clk-imx8mp.mcore_booted
mmc_boot=if mmc dev ${devnum}; then devtype=mmc; run scan_dev_for_boot_part; fi
net_upgradeu=dhcp 40020000 net_upgradeu.scr && source 40020000
netargs=setenv bootargs console=${console},115200 root=/dev/nfs rw ip=dhcp nfsroot=${tftpserverip}:${nfsroot},v3,tcp
netboot=run netargs; if test -z "${fdt_file}" -a -n "${soc}"; then setenv fdt_file ${soc}-${board}${boardver}.dtb; fi; if test ${ip_dyn} = yes; then setenv get_cmd dhcp; else setenv get_cmd tftp; fi; ${get_cmd} ${loadaddr} ${tftpserverip}:Image; if ${get_cmd} ${fdt_addr} ${tftpserverip}:${fdt_file}; then booti ${loadaddr} - ${fdt_addr}; else echo WARN: Cannot load the DT; fi;
otg_upgradeu=run usbnetwork; tftp 40020000 net_upgradeu.scr && source 40020000
ramdisk_addr_r=0x43800000
scan_dev_for_boot=echo Scanning ${devtype} ${devnum}:${distro_bootpart}...; for prefix in ${boot_prefixes}; do run scan_dev_for_extlinux; run scan_dev_for_scripts; done;run scan_dev_for_efi;
scan_dev_for_boot_part=part list ${devtype} ${devnum} -bootable devplist; env exists devplist || setenv devplist 1; for distro_bootpart in ${devplist}; do if fstype ${devtype} ${devnum}:${distro_bootpart} bootfstype; then run scan_dev_for_boot; fi; done; setenv devplist
scan_dev_for_efi=setenv efi_fdtfile ${fdtfile}; for prefix in ${efi_dtb_prefixes}; do if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${efi_fdtfile}; then run load_efi_dtb; fi;done;run boot_efi_bootmgr;if test -e ${devtype} ${devnum}:${distro_bootpart} efi/boot/bootaa64.efi; then echo Found EFI removable media binary efi/boot/bootaa64.efi; run boot_efi_binary; echo EFI LOAD FAILED: continuing...; fi; setenv efi_fdtfile
scan_dev_for_extlinux=if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${boot_syslinux_conf}; then echo Found ${prefix}${boot_syslinux_conf}; run boot_extlinux; echo SCRIPT FAILED: continuing...; fi
scan_dev_for_scripts=for script in ${boot_scripts}; do if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${script}; then echo Found U-Boot script ${prefix}${script}; run boot_a_script; echo script exited: continuing...; fi; done
scriptaddr=0x40480000
serial#=[redacted]
soc=imx8m
uboot_defconfig=nitrogen8mp_8g
uboot_release=2022.04-gb0e908b1-dirty
upgradeu=setenv boot_scripts upgrade.scr; boot;echo Upgrade failed!; setenv boot_scripts boot.scr
usb_boot=usb start; if usb dev ${devnum}; then devtype=usb; run scan_dev_for_boot_part; fi
usbnet_devaddr=00:19:b8:00:00:02
usbnet_hostaddr=00:19:b8:00:00:01
usbnetwork=setenv ethact usb_ether; setenv ipaddr 10.0.0.2; setenv netmask 255.255.255.0; setenv serverip 10.0.0.1;
vendor=boundary
vidconsole=vidconsole

Environment size: 5071/8188 bytes

fdtfile is not defined (which I suppose is different than being blank?); however, the same is true in amospalla’s. The only differences I see between my output and amospalla’s are the ethernet address, the serial number, and pxefile_addr_r=0x40480000 in theirs.

I just rebooted and it booted successfully from the sd card :face_with_spiral_eyes:. So I guess it works some (maybe even most?) of the time?

I was able to switch back to booting from emmc using reform-boot-config --emmc emmc in the sd card based image and now I can boot from the device again!

1 Like

Yay nice!! :tada: You really had me worried that something was seriously broken. Glad to hear that it’s okay now!

So is your very initial problem solved now?

Yes! I was even able to use reform-boot-config to switch back to booting from nvme and it works fine now (across multiple reboots). I don’t have an explanation but such is life. Thank you for your help!

Circling back here: after a few more boots I started experiencing issues booting from nvme again. In the process of debugging I realized I was stuck on somewhat old versions of both the kernel (6.9.7-1+reform20240630T062105Z) and reform-tools (1.45):

$ sudo apt-get upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages were automatically installed and are no longer required:
 ...
Use 'sudo apt autoremove' to remove them.
The following packages have been kept back:
  cpp cpp-aarch64-linux-gnu curl g++ g++-aarch64-linux-gnu gcc gcc-aarch64-linux-gnu gvfs gvfs-backends gvfs-common
  gvfs-daemons gvfs-libs initramfs-tools initramfs-tools-core libadwaita-1-0 libass9 libcurl3t64-gnutls libcurl4t64
  libgd3 libgtk-4-1 libpoppler-glib8t64 libssl3t64 libwlroots12t64 linux-headers-arm64 linux-headers-mnt-reform-arm64
  linux-image-arm64 linux-image-mnt-reform-arm64 minetest minetest-data reform-qcacld2 reform-tools ruby-gettext
The following packages will be upgraded:
  ...
11 upgraded, 0 newly installed, 0 to remove and 32 not upgraded.
Need to get 8,141 kB of archives.
After this operation, 3,072 B of additional disk space will be used.
Do you want to continue? [Y/n]

I’m not sure it’s related, but I upgraded these (with sudo apt-get --with-new-pkgs upgrade ...) and ran reform-setup-encrypted-nvme again a few days ago. It appears to reliably boot from nvme at last :slight_smile:

2 Likes