No GRUB, no reboot: A solution for a non-booting server after HDD replacements in a Software RAID

  • Posted on: 18 March 2015
  • By: Don

The hosting company Hetzner offers strong servers with pretty nice prices – and pretty shitty HDDs. Therefore, the broken hard disks must be replaced ongoing. Forget to install a bootloader like GRUB2 on these replacements and you will have a problem: Sooner or later there will be just new HDDs without any bootloader. In this case, the server can't reboot.

Hetzner doesn't provide help via customer support or in its manual/wiki, except the short hint:

"For repairing via the Rescue System, the installed system has to be mounted first, as described here. All GRUB installation steps then have to be performed after chroot."

Exemplary case

2 HDDs (sda & sdb) in a Software RAID 1. sdb was replaced 3 months ago. sda is now defective. The defective HDD sda was removed from the RAID array through all partitions

mdadm /dev/md0 -r /dev/sda1
mdadm /dev/md1 -r /dev/sda2
mdadm /dev/md2 -r /dev/sda3
mdadm /dev/md3 -r /dev/sda4

and the new HDD replacement was prepared (copying from old sdb to new sda) with GPT

sgdisk -R /dev/sda /dev/sdb
sgdisk -G /dev/sda

or MBR

sfdisk -d /dev/sdb | sfdisk /dev/sda
sfdisk -R /dev/sda

and then integrated into the RAID array

mdadm /dev/md0 -a /dev/sda1
mdadm /dev/md1 -a /dev/sda2
mdadm /dev/md2 -a /dev/sda3
mdadm /dev/md3 -a /dev/sda4

checked with

cat /proc/mdstat

and finally finished with the bootloader installation on the new HDD

grub-install /dev/sda

In case an error like

grub-install /dev/sda
/usr/sbin/grub-probe: error: cannot find a device for /boot/grub (is
/dev mounted?)

occurs or a reboot just doesn't work, there is an issue with the bootloader. No customer support despite the defective HDD, no instructions, a lot of non-working posts in the web. You are shit out of luck.

Solution

Use the Rescue System and log in via SSH. Then do the following:

mount /dev/md2 /mnt
mount /dev/md1 /mnt/boot
mount -t dev -o bind /dev /mnt/dev
mount -t proc -o bind /proc /mnt/proc
mount -t sys -o bind /sys /mnt/sys
chroot /mnt

grub-mkdevicemap -n
grub-install /dev/sdb
grub-install /dev/sda
update-grub

Done. Reboot now and your server will be back to life.

Advice

Consider moving to a cloud hosting provider like AWS.

Tags: 

Comments

Hi Don
Completely agree with you. I had to replace the /dev/sda this morning and came to the problem with installing grub.
I really appreciate for your hint about mounting the device and change root into it to install grub.
I already feared the server would be down for the whole morning but with your help, I was able to bring it back online just before the maintenance window closed.

Thank you men.

I am looking forward to the time, when we have migrated all our servers back in our datacenter onto our new server platform based on a Nutanix solution.

Kind regards, Stefan

You're welcome, Stefan! I'm glad I could help with this post.