Rebuild Software RAID 1

creemhost

Verified User
Joined
Jan 12, 2023
Messages
25
I have 1 server from Hetzner with software RAID 1. I have replaced 1 NVMe, and the Hetzner team is asking me to rebuild the RAID. Can anyone help me with how to rebuild software RAID after replacing an NVMe?
 
after replacing an NVMe?
Can you tell us afterwards if all went well? Is it the first or second NVMe you had replaced?

We also have a Hetzner server with NVMe shortly, but with GPT so also EFI. I'm very curious if it's the first one, how the boot process will go, because that works different than with MBR.
Please let us know how it worked for you.
 
If you are using EFI in a software raid - if /boot/efi is mounted on a RAID1 mirror device, then it needs to use metaversion 1.0 (or 0.9? maybe?... not 1.2). Version 1.0 writes the superblock information at the end of the partition, so UEFI boot will find the ESP partition without issue. Note too, this can only be RAID1. It can't be RAID10. It needs an exact 1:1 mirror (and it's small anyway).

To check this, find out what device /boot/efi is mounted on:

Code:
df

Look for /boot/efi and the corresponding filesystem in the first column.

Then check the meta version of this device:

Code:
mdadm --detail %the/dev/device that /boot/efi is mounted from%

Then look for Version

Code:
Version : 1.0
 
If you are using EFI in a software raid
Yes for /boot/efi but....
so UEFI boot will find the ESP partition without issue.
That is what I'm talking about, because the EFI system partition can not be in raid and must be a fat32 partition.
So you need to have that on both disks to be able to boot, right?

If this one is only on disk 1 (like /dev/nvme0n1 for example) and this drive goes broke, then you should have it on /dev/nvme1n1 (second device) too already, otherwise no boot. Right?
So that part is what I'm wondering about.

With MBR this was much easier, just give the grub copy command and you're done. I haven't done this with EFI and GPT yet, which makes me curious too.
 
Yes for /boot/efi but....

That is what I'm talking about, because the EFI system partition can not be in raid and must be a fat32 partition.
So you need to have that on both disks to be able to boot, right?

If this one is only on disk 1 (like /dev/nvme0n1 for example) and this drive goes broke, then you should have it on /dev/nvme1n1 (second device) too already, otherwise no boot. Right?
So that part is what I'm wondering about.

With MBR this was much easier, just give the grub copy command and you're done. I haven't done this with EFI and GPT yet, which makes me curious too.
You would format the resulting /dev/mdxxx device as vfat and put the EFI files in there. As long as /dev/mdxxx is a RAID1 which is a 1:1 mirror, it doesn't matter which disk the BIOS boots from, they're identical anyway (unless the RAID is degraded... but this changed so infrequently it doesn't really matter... and if you keep the last 2 or 3 kernels installed then at worst, you'll boot into an older kernel and will have to rectify the issue later).

But the /dev/mdxxx device has to be using mdadm meta version 1.0 in order for the partition to be recognized by the BIOS when it offloads to EFI.

If the device is RAID10 then half the bits are on one physical drive and half are on another physical drive - that won't work.

Alternatively, you can partition a small EFI (which is really just a special vfat partition) partition on each drive and mount each drive separately on the system (i.e. /boot/efi, /boot/efi2, /boot/efi3...) and use rsync or some other system utility to insure that the extra /boot/efiX directories keep a 1:1 copy of what's in /boot/efi. There really shouldn't be any modifications to /boot/efi unless there is a kernel upgrade.

When the BIOS offloads to EFI it doesn't really make a distinction between /boot/efi, /boot/efi2, or /boot/efiX ... it just looks for an EFI partition.

You can actually use efibootmgr to show the order of the EFI boot.

Code:
efibootmgr -v

Then pay attention to the BootOrder line.

Find the corresponding BootXXXX in the list, it will say something like:

Code:
Boot0000* AlmaLinux     HD(3,GPT,f678152c-d50e-4515-9bbc-e8265b54732d,0x139f000,0x7d800)/File(\EFI\almalinux\shimx64.efi)

Then find the correspond UUID, or usually it's the PARTUUID for the stated drive

Code:
blkid | grep "f678152c-d50e-4515-9bbc-e8265b54732d"

This will tell you what drive and partition this referring to, i.e. /dev/nvme0n1p1

Then you'll find that /dev/nvme0n1p1 is a member of the mdadm RAID1 device that's mounted on /boot/efi and the mdadm RAID system is keeping that mirrored. This is essentially the same thing as using an OS level tool like rsync to keep /boot/efi2 as a mirrored copy of /boot/efi

In a lot of ways, EFI is actually better than MBR booting because you can actually control this from the underlying OS. You can use the same efibootmgr command to change the boot order if you want.
 
Find the corresponding BootXXXX in the list, it will say something like:
First of all, thank you for taking the time to make this more clear to me/us.

So I tried the command:
Code:
efibootmgr -v
BootCurrent: 0002
Timeout: 1 seconds
BootOrder: 0001,0002,0003
Boot0001* UEFI: PXE IPv4 Realtek PCIe 2.5GBE Family Controller  PciRoot(0x0)/Pci(0x1c,0x2)/Pci(0x0,0x0)/MAC(107c61xxxxxx,0)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot0002* UEFI OS       HD(1,GPT,2d2837e3-xxx etc. )/File(\EFI\BOOT\BOOTX64.EFI)..BO
Boot0003* UEFI OS       HD(1,GPT,8211994a-xxx etc. )/File(\EFI\BOOT\BOOTX64.EFI)..BO
So as it says "BootCurrent: 0002 then it's first hd starting with 1,GPT,2d2837e3 right?
That is indeed nvme0n1p1.

That the /boot/efi was mirrored was clear to me. They are mirrored in raid 1 indeed.

But I had read somewhere (stackexchange or something similar) that the Efi system partition should always be Fat32 and could not be part of a raid system.
This was last month I read that, so then I started to check and found such partition on both disks with the "fdisk -l" command.

Code:
/dev/nvme0n1p1     4096     528383    524288   256M EFI System
/dev/nvme0n1p2   528384   67637247  67108864    32G Linux RAID
/dev/nvme0n1p3 67637248   69734399   2097152     1G Linux RAID
/dev/nvme0n1p4 69734400 1000215182 930480783 443.7G Linux RAID

Same for /dev/nvme1np1 which came up first in the fdisk -l output by the way.

This is what I ment. If the nvme0n1p1 will get replaced, then when inserting the new NVMe it's not only rebuilding the RAID but the partitioning must also be correct containing that EFI system partion, to be able to boot from it again later on of ir the second device needs replacement later, right? Or am I mistaken?

So this is what I'm wondering about. Suppose that first NVMe is replaced. I create all partitions newly so raid can be rebuild. But how do I create a new correct EFI system partition on the new ssd? Just some command to copy that partition with all content or...? It's confusing me a bit that EFI stuff.

You can use the same efibootmgr command to change the boot order if you want.
That's indeed a good thing! However, system must be booted then first. But this is a command which could be given before bringing the system down for swapping the disks. Nice!
 
This is where you would run the command

Code:
mdadm --detail %the/dev/device that /boot/efi is mounted from%

or

Code:
mdadm --detail /dev/mdXXX

Really depends on what your mdadm device naming convention is. I think some use /dev/md/boot or /dev/md/boot/efi - I'm really not sure in regards to that. I don't use that naming convention. Nothing wrong with it, I just don't use it.

Examine probably does the same thing, except on a per physical disk partition level. The Version: 1.0 looks to be correct.
 
Really depends on what your mdadm device naming convention is.
I just use default. Just checked and in /dev/md2 I have /boot and in /dev/md0I have /boot/efi and that says version 1 too.

But what about the question in the post before about the fat32 partition copying?
 
But what about the question in the post before about the fat32 partition copying?
I don't know what this is referring to?

mdadm doesn't care what the partition type is. It operates at the bit level. It couldn't care less what the partition type is. In a RAID1, bit1 on device1 gets copied to bit1 on device2.
 
I don't know what this is referring to?
The ESP, Efi system partion as I explained in post #7 which is the 256M fat32 partition outside of raid. Like you can see from my "fdisk -l" output.

You have the Efi system partition and the Efi boot partition. The Efi boot partition (/boot/efi) is in raid1. But the fat32 Efi system partition is not.

That is the part still confusing me because that is the system partition and it's not in raid so does not get copied automatically.
We do need that too, to be able to boot, right? Or what is that else for?
 
The ESP, Efi system partion as I explained in post #7 which is the 256M fat32 partition outside of raid. Like you can see from my "fdisk -l" output.

You have the Efi system partition and the Efi boot partition. The Efi boot partition (/boot/efi) is in raid1. But the fat32 Efi system partition is not.

That is the part still confusing me because that is the system partition and it's not in raid so does not get copied automatically.
We do need that too, to be able to boot, right? Or what is that else for?
Unfortunately I don't know what you are referring to with this.

Are you thinking that there is something else other than the /boot/efi partition? There's not. Not with EFI.

If you check the mount type for the /boot/efi partition, it will show as vfat. vfat is just the Linux name for any of the FAT tables that Microsoft has come up with over the years (not NTFS and I have no clue what Microsoft is using these days).

Your efibootmgr results show that the boot order is:

Code:
BootOrder: 0001,0002,0003

Your Boot IDs show that Boot0001 is a network boot, presumably it tries a network boot first but fails.

Then it tries Boot0002 and then Boot0003

For Boot0002 the UUID of that device is shown as 2d2837e3-xxx

For Boot0003 the UUID of that device is shown as 8211994a-xxx

If you run:

Code:
blkid | grep 2d2837e3-xxx

This is going to return a /dev device (/dev/nvme0xxxx if this is an NVMe device)

And running:

Code:
blkid | grep 8211994a-xxx

is going to return another /dev device

You said /dev/md0 is your /boot/efi device.

If you do a

Code:
mdadm --detail /dev/md0

This is going to show Version: 1.0

And then it's going to show these same /dev devices as being members of /dev/md0. It will also show that /dev/md0 is a RAID1 device. That's how your mirroring is happening.

The EFI system is seeing the devices as 2d2837e3-xxx and 8211994a-xxx and since they are RAID1 mirrors then they are duplicated exactly.

Your EFI is going to try and boot 2d2837e3-xxx first. If it can't, then it's going to boot 8211994a-xxx. All of this presumably after it fails to boot a network boot.

If 2d2837e3-xxx dies then the system will still boot with 8211994a-xxx. You can remove the 2d2837e3-xxx device completely and the system will still boot with the 8211994a-xxx device, but of course you'll have a degraded RAID.

AFTER you replace a failed drive you need to make sure your EFI system is synced up. Presumably the UUID of the replaced 2d2837e3-xxx drive will change. You will want to make sure this UUID is listed in the efibootmgr. I would have to consult the documentation or notes for how to add a new EFI boot id via efibootmgr, but I know it can be done.
 
Unfortunately I don't know what you are referring to with this.
I just were under the impression that there were 2 uefi partitions, system and boot partition. Where I wrongly understood that the ESP needed to be seperately on a fat32 partition outside of raid. I had read something liket hat on stackexchange, but I found the info was from 2011 so was too old. But that confuses me.

So I don't need to worry then about a fat32 partition for EFI.
If 2d2837e3-xxx dies then the system will still boot with 8211994a-xxx.
Ah... this was also what I was looking for. Because before with MBR this would not happen if not first the command "grub2-install /dev/sdb" was given so if /dev/sda was removed, it could boot from /dev/sdb and if the /dev/sda was replaced and raid was rebuild, had to do the same like "grub2-install /dev/sda" to get the bootsector back on /dev/sda again to be able to boot from that device.
That command was very easy.

So I was worried how it worked with EFI, because it does not work with this command anymore with EFI.

But you say it automatically will boot from the second drive automatically. So then it's indeed easier.

AFTER you replace a failed drive you need to make sure your EFI system is synced up.
Yes exactly that. So that is not automatically done after rebuilding raid. But I can do that via the efiboomgr. Just have to check that the UUID matches the ones in there. Is that correct?
If yes, then I understand. Sorry for being a bit slow on this. How this raid works with EFI is new to me.
Thank you VERY much for your time and effort! (y) (y)
 
Back
Top