Rebuilding a Removed / Failed RAID 10 Array in CentOS / Rocky Linux

Tuesday, February 22nd, 2022

Replace Hard Drive in a RAID 10 Array and Sync the RAID 10 Array to the New Hard Drive

I had the hardest time rebuilding a RAID 10 array after replacing a hard drive.  I didn't fail the old hard drive before removing it from the array, and sometimes, this may not be an option.  What happened in my case is the data center replaced the hard drive that I had shipped to them directly from an eBay seller.  I was hoping that the RAID array would rebuild itself onto the new drive (as I have seen happen before in some circumstances).  However, that may not happen if the replacement drive still has its old RAID array or partition information present, and then, it might be difficult to actually get the RAID array to sync to the new drive. 

In my case, I run LVM (Logical Volume Manager) for my partitions.  This complicates the RAID setup, and I found that mdadm commands didn't work as expected.  If this situation occurs, it is best to boot Rocky Linux or CentOS in recovery mode using a Rocky Linux ISO or CentOS ISO.  Once the recovery system loads, drop to a shell without mounting any file systems.  Next, you will need to deactivate your LVM volume group:

vgdisplay
vgchange -a n my_volume_group # deactivate

Next, examine your md RAID array by running the following command:

cat /proc/mdstat

After running that command, I identied my RAID devices as md126 and md127.  /dev/md127 is considered the parent even though /dev/md126 is where everything is. 

I can get more information about the RAID array by running the below commands:

mdadm --detail /dev/md126
mdadm --detail /dev/md127

Let's fail and remove any removed (no longer existing) drives using this command:

mdadm /dev/md126 --remove failed
mdadm /dev/md126 --remove detached
mdadm /dev/md127 --remove failed
mdadm /dev/md127 --remove detached

Next, we need to identify the hard drive we want to add / replace the removed drive in the array:

lsblk

From running the above command, I noticed that the new drive was /dev/sde, so I needed to wipe its old RAID configuration (if there is any) and then add it to the RAID array.

wipefs /dev/sde
mdadm --add /dev/md127 /dev/sde

Check to see if the syncing process has started:

cat /proc/mdstat

You may or may not need to run the below command to get the RAID device to start syncing to the new drive:

mdadm --grow /dev/md126 --raid-devices=4

Helpful Links:

https://delightlylinux.wordpress.com/2020/12/22/how-to-remove-a-drive-from-a-raid-array/
https://serverfault.com/questions/554553/how-to-delete-removed-devices-from-a-mdadm-raid1
https://unix.stackexchange.com/questions/53129/dev-md127-refuses-to-stop-no-open-files
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/4/html/cluster_logical_volume_manager/vg_activate
https://serverfault.com/questions/676638/mdadm-drive-replacement-shows-up-as-spare-and-refuses-to-sync
https://serverfault.com/questions/554553/how-to-delete-removed-devices-from-a-mdadm-raid1

CentOS LVM and Software RAID Partitioning Instructions

Sunday, May 30th, 2021

Installing and Configuring CentOS to Host KVM Virtual Machines

GUI

When configuring a fresh install of CentOS for a KVM host machine (the main server that hosts all of the virtual machines), I like to run a GUI to make managing some of the virtual machines easier.  Thus, during install, choose the options for CentOS with Minimal GUI:

RAID 10 LVM Partitions

When configuring the hard drive partitions, set it up to use RAID 10 LVM SOFTWARE RAID:

Create volume group called "vms" without the quotes that is setup as RAID 10 (set volume group space to be as large as possible).

Set the "/" partition to 100GB XFS LVM (RAID10).

Set the "swap" partition to 32GB.

Only setup those two partitions.  The remaining space in the RAID 10 volume group "vms" will be used for KVM containers (and the remaining space does NOT need to be assigned to any mount points).

That's all.