Tag Archives: zpool

Another hard drive crashed

About six months ago I had my first drive crash on my file server. I use Western Digital Green 1,5 TB and they’re not the best disks for RAID, but for me it’s a matter of cost. I like cheap disks and so do ZFS. Anyway, yesterday it happened again. Or to be precise, the night before yesterday at about 04.00. When I woke up op5 had sent me both e-mails and SMS about it, so I just had to shut down the file server (no hotswap, it’s cheaper) and replace the disk. Since I always assume the worst, I had a spare disk waiting in case of a crash. The RAIDZ started generating at 09.00 and was done at around 05.00 today, 20 hours later. I know, it’s a looong rebuild time… but at least my data is intact.

 

Mar  6 04:32:58 titan ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci port 3 task_file_status = 0x4041
Mar  6 04:32:58 titan ahci: [ID 657156 kern.warning] WARNING: ahci0: error recovery for port 3 succeed
Mar  6 04:33:01 titan ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci port 3 has task file error
Mar  6 04:33:01 titan ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci port 3 is trying to do error recovery
Mar  6 04:33:01 titan ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci port 3 task_file_status = 0x4041
Mar  6 04:33:01 titan ahci: [ID 657156 kern.warning] WARNING: ahci0: error recovery for port 3 succeed
Mar  6 04:33:01 titan ahci: [ID 811322 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 3 reset device
Mar  6 04:33:04 titan ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci port 3 has task file error
Mar  6 04:33:04 titan ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci port 3 is trying to do error recovery

This message is what I got over and over again until the disk finally crashed. Instead it now says this:

ZFS resilvered

ZFS resilvered

ZFS calls the rebuilding process resilvering, but it’s the same thing. The nice thing about ZFS, since it’s both the volume manager and the file system, is that it knows which data is live data and don’t have to rebuild the entire disk. In this case i had 527 GB data on each disk, that means about 1/3 of the disk. If this had been a hardware RAID it would have taken three times this time to rebuild the entire disk. Talk about waste, rebuilding the data which doesn’t really contain anything.

Periodic zpool scrubbing

ZFS has a built in scrub that checks for errors and corrects them when possible. Running this task is pretty essential to prevent more errors that aren’t correctable. Per default ZFS doesn’t run this periodically, you have to tell it when to scrub. The easiest way to set up periodic scrubbing is to use crontab, a feature present in all UNIX systems for scheduling background tasks.

Start the editing of roots crontab by issuing the command crontab -e as root. The crontab is set up by a simple set of commands:

* * * * * command to run
- - - - -
| | | | |
| | | | +----- day of week (0-6) (Sunday is 0)
| | | +------- month (1-12)
| | +--------- day of month (1-31)
| +----------- hour (0-23)
+------------- min (0-59)

For example, I want to scrub my zpool called storage on Sundays at 04:00 and my rpool on Mondays at 04:00. The commands for this would then be:

0 4 * * 0 /usr/sbin/zpool scrub storage
0 4 * * 1 /usr/sbin/zpool scrub rpool

0 for minute (00), 4 for hour (04), day of month irrelevant, month irrelevant and day of week is set to Sunday in the first command and Monday in the last.

By using this simple technique you can scrub your pools without having to remember it every week. Of course, you can use crontab to execute any script or program you want.