266
edits
Changes
Jump to navigation
Jump to search
Cloud
,→Notes
https://pve.proxmox.com/wiki/ZFS:_Switch_Legacy-Boot_to_Proxmox_Boot_Tool#Switching_to_proxmox-boot-tool
https://sleeplessbeastie.eu/2017/03/06/how-to-use-hp-command-line-array-configuration-utility/
http://www.datadisk.co.uk/html_docs/redhat/hpacucli.htm
=== HP Server, Drive Replacement ===
# Remote into the server: <code>ssh root@192.168.15.2</code>
# Confirm with ZFS that a drive has failed: <code>zpool status rpool</code>. One of the drives should be marked failed/offline/removed, something like that.
# Confirm on the front of the server that a drive has failed: look for the red light. Note the number of the drive that has failed.
# Open up the RAID controller utility: <code>hpacucli</code>.
# You are now in the command line for the RAID controller utility. Type <code>ctrl slot=0 show config</code>. This will take a while, and should confirm that one of your logicaldrives has failed. The logicaldrive number should match the numbered bay on the front of the server.
# Physically swap the drive: push the red button on the '''correct''' drive, pull it out, and push in the new drive until it clicks. '''WARNING: pulling the wrong drive at this moment will result in zfs redundancy failure and data loss.''' If you're really scared, shut down the computer first, and don't start it back up until you know you've swapped the correct drive.
# Now run <code>ctrl slot=0 ld <NUMBER> modify reenable forced</code>, where <code><NUMBER></code> is the bay number of the (previously) failed drive. This tells the RAID controller that everything's fine and it should just carry on with the new disk.
# Type <code>exit</code> to get out of the array configuration utility.
# Run <code>zpool status rpool</code> again, and now one of the drives should definitely be marked as <code>REMOVED</code>. Note the (very long) name of that drive in the left column.
# Replace the disk in ZFS: <code>zpool replace rpool <NAME> /dev/disk/by-id/<NAME-2></code>, where <code><NAME-2></code> is the same name as the drive, without any <code>-part3</code> suffix. For example, if <code><NAME></code> is <code>scsi-300320938420934-part3</code>, <code><NAME-2></code> should be <code>scsi-300320938420934</code>. I think this would be <code>-part9</code> if a previously replaced disk failed. The exact number is not important.
# ZFS is now rebuilding (resilvering) the failed disk. The server may be slow for the next several hours. You can run <code>zpool status</code> to check on the progress. It's remaining time estimation is wildly too low, don't rely on it.