Replacing a disk on a ZFS pool

// under zfs disk failure
// Wed 19 October 2016

This is what happened to me the other day on my RAIDZ-1:

$ sudo zpool status apool -x
  pool: apool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
  Sufficient replicas exist for the pool to continue functioning in a
  degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
  repaired.
  scan: scrub repaired 0 in 3h53m with 0 errors
config:

  NAME        STATE     READ WRITE CKSUM
  apool       DEGRADED     0     0     0
    raidz1-0  DEGRADED     0     0     0
      b       ONLINE       0     0     0
      c       FAULTED      0   140     0  too many errors
      d       ONLINE       0     0    11

errors: No known data errors

It is as bad as it looks like. I had a ZFS pool with 3 disks and one decided to fail. I had seen some smart warning but never gave it the attention needed ... my bad ... now it needs to be taken care of.

Identifying the device

The first step is to identify the device. A first method is to get each device's serial number using smartctl. Obviously if your disk is unreachable by smartctl, you'll have to get the healthy ones's serials and go by deduction from there.

Here's the smartctl command to get a device's serial:

$ sudo smartctl -i /dev/sdXXX | grep -i 'Serial Number'

Another way is to use ledctl from the ledmon tool. A little software which will allow you to control storage leds and thus identify physically your device.

Here's how the pool looks like once the failing disk has been removed:

$ sudo zpool status
  pool: apool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
  invalid.  Sufficient replicas exist for the pool to continue
  functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 4h5m with 0 errors
config:

  NAME                      STATE     READ WRITE CKSUM
  apool                     DEGRADED     0     0     0
    raidz1-0                DEGRADED     0     0     0
      b                     ONLINE       0     0     0
      11743263287665849900  FAULTED      0     0     0  was /dev/mapper/c
      c                     ONLINE       0     0     0

errors: No known data errors

It is still working but in a degraded state, what means that hopefully no other disk goes sick !

Replacing the disk in the pool

Once you get your new device, here's the process to replace it in the zfs pool.

$ sudo zpool status
  pool: apool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 3h55m with 0 errors
config:

        NAME                      STATE     READ WRITE CKSUM
        apool                     DEGRADED     0     0     0
          raidz1-0                DEGRADED     0     0     0
            b                     ONLINE       0     0     0
            11743263287665849900  UNAVAIL      0     0     0  was /dev/mapper/c
            d                     ONLINE       0     0     0

errors: No known data errors

First put the old device offline:

sudo zpool offline apool 11743263287665849900

And finally replace with the new mounted disk:

sudo zpool replace apool /dev/mapper/c

Now the pool is rebuilding using the new disk (resilvering in the zfs world):

$ sudo zpool status
  pool: apool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress
    116G scanned out of 4.53T at 264M/s, 4h52m to go
    38.5G resilvered, 2.49% done
config:

        NAME                        STATE     READ WRITE CKSUM
        apool                       DEGRADED     0     0     0
          raidz1-0                  DEGRADED     0     0     0
            b                       ONLINE       0     0     0
            replacing-1             OFFLINE      0     0     0
              11743263287665849900  OFFLINE      0     0     0  was /dev/mapper/c/old
              c                     ONLINE       0     0     0  (resilvering)
            d                       ONLINE       0     0     0

errors: No known data errors

With the above command, you can see the progress ...resilvered, 2.49% done... and the expected duration: 4h52m to go.

Hopefully, you will end up with something like that: scan: resilvered 1.51T in 5h2m with 0 errors. My resilvering took a bit more than 5 hours. After that my pool was back in shape. Thanks zfs !