AlmaLinux 8.6 - GRUB 2 boot failures and resolution
To give context: Was running cPanel on CentOS 7. With the sunsetting of CentOS for this usecase, I decided to migrate (via the cPanel fork of the ELevate tool) to AlmaLinux 8.6. The migration worked relatively well, except the Linode GRUB 2 boot situation really threw a wrench into it (issue reported).
I thought that all was well and resolved a bit over a month ago and the system was booting fine after modifying the GRUB config:
$ nano /etc/default/grub ### set: GRUB_ENABLE_BLSCFG=false $ grub2-mkconfig -o /boot/grub2/grub.cfg Generating grub configuration file ... Found linux image: /boot/vmlinuz-4.18.0-372.9.1.el8.x86_64 Found initrd image: /boot/initramfs-4.18.0-372.9.1.el8.x86_64.img Found linux image: /boot/vmlinuz-0-rescue-4f09fa5fdd3642fa85221d7c11370603 Found initrd image: /boot/initramfs-0-rescue-4f09fa5fdd3642fa85221d7c11370603.img done
Yesterday I figured it was time to check things out after having a nice stable month of operation. I created a backup snapshot of the system, jumped in and issued
yum update. Thinking that all was well, I rebooted the server.
Upon reboot the server would not start. LISH was giving the same results prior to my last fix. I issued the commands that allowed me to boot before via LISH with GRUB 2:
set root=(hd0) linux /boot/vmlinuz-4.18.0-372.9.1.el8.x86_64 root=/dev/sda ro crashkernel=auto rhgb console=ttyS0,19200n8 net.ifnames=0 initrd /boot/initramfs-4.18.0-372.9.1.el8.x86_64.img boot
The kernel was still present, but the system would not boot and responded with
This alarmed me and I didn't put much thought into the situation since I had not planned for a prolonged downtime and troubleshooting stint. I proceeded to issue a recovery / reload of the last saved snapshot backup state.
After restoration was finished I booted up… but the system would not boot. Again, this was very alarming since I had "fixed it" back over a month prior and it was booting at that time.
Just for sanity, I went ahead and issued the above commands to
GRUB 2 and it proceeded to boot after SELinux had its way with things…
When I got into the system I checked out
/etc/default/grub to see if anything had changed and lo and behold, blasted
GRUB_ENABLE_BLSCFG was back to
GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL="serial" GRUB_CMDLINE_LINUX="crashkernel=auto rhgb console=ttyS0,19200n8 net.ifnames=0" GRUB_DISABLE_RECOVERY="true" GRUB_DISABLE_OS_PROBER=true GRUB_SERIAL_COMMAND="serial --speed=19200 --unit=0 --word=8 --parity=no --stop=1" GRUB_DISABLE_LINUX_UUID=true GRUB_GFXMODE=1024x768x32 GRUB_GFXPAYLOAD_LINUX=text GRUB_ENABLE_BLSCFG=true
Of course this was immensely frustrating and upon checking the change date of the file, I noticed that it had - SOMEHOW - been modified the date AFTER I had made changes and the system was working/booting sans trouble.
Does anyone have any idea what may have provoked such a change?
It seems as if others have seen similar problems perhaps with RockyLinux: https://serverfault.com/questions/1079875/how-to-prevent-changes-of-grub-enable-blscfg-in-etc-default-grub
Clearly something that is not obviously apparent is happening with
GRUB 2 and/or in other scenarios.
A) I do not know if the problem is that Linode's
GRUB 2 boot system (
Linode Web control panel ? specific Linode ? Configurations ? configuration in question: Edit ? Boot Settings: Select a Kernel: set to GRUB 2) is dynamically and independently making changes to this config file? Or is the Linux distribution (operating system) making the change (which would mean it's happening in any CentOS successor system like Alma or Rocky)…?
B) Furthermore, should I and/or is there a way to remove Linode's
GRUB 2 boot system from being involved here? I thought I could just change over to
Direct Disk, but LISH doesn't even get to any loading screen and fails to function itself altogether.