Tag Archives: hardware

Phasing out the fileserver, replacing it with a NAS from Synology

Inevitably this day would come, the day when I got fed up with my fileserver. To be honest it hasn’t given me anything but trouble. Here are some of the things about my fileserver that made me lean towards a real NAS:

  • No hot swap
  • No assured compatibility with disks, HBA cards, etc.
  • Memory problems (never got to the bottom of it)
  • Random reboots
  • No web interface
  • etc etc

You get it, the fileserver was hangin’ loose. After some research I decided the Synology DS1812+ had what I wanted (except for ZFS, but I didn’t expect that). The main thing that got me interested was the SHR, Synology Hybrid RAID. Read on to get an explanation on how it works.

SHR – Synology Hybrid RAID

It makes sure to use all available disk space, not wasting anything even if you have different disk sizes. In my case I use 5x 2TB and 3x 1,5TB. It looks like in this picture below:

SHR-2

SHR-2

I use SHR-2, which basically means RAID-6. Any two disks can fail simultaneously without data loss. A RAID is created by partitioning the 2 TB disks into one 1,5 TB partition and one 500 GB partition. The 1,5 TB partitions of all 2 TB disks will form a RAID-6 together with all 1,5 TB disks. This will give me 9 TB usable space. In addition to this another RAID-6 will be created using the 500 GB partitions on the 2 TB disks. This gives another 1,5 TB usable space. Now the NAS uses LVM, a Logical Volume Manager, to create a single volume from the 9+1,5 TB of space. This gives me 10,5 TB of usable disk space with the security of RAID 6. This is pretty neat, since no space is wasted. Using normal RAID the 500 GB on each 2 TB disk would be unusable.

What do I think so far?

It does what it’s supposed to. I get all the features that made me phase out the fileserver. Hot swap, assured compatibility, hopefully no hardware problems, a really good web interface. The feel of the unit is really solid, although I of course wish it had ZFS. To be on the safe side my plan is to but an UPS, Uninterruptible Power Supply, to prevent data corruption from spikes in the electric grid.

I’m also able to use NFS or iSCSI to run my virtual machine storage from it. Right now I’m using iSCSI, trying out the support for VAAI (vStorage API for Array Integration) offload support in the NAS.

Another thing that really pleases me is the built in SNMP agent. I can extract all the information I need and monitor it through op5, not needing to install NRPE.

DS1812+

DS1812+

Partition alignment, how important is it?

To start with, what is alignment? I’ll illustrate it below.

Misaligned partition

Misaligned partition

This is the most common problem that usually occurs on older OS:es like Windows XP. The partition gets an offset from the physical sectors because of 63 hidden sectors (512 bytes) used by the boot mechanism. This is solved by instead of let’s say 63 sectors (512 bytes) you use 1 MB in the beginning of the disk. This is more than enough space for any boot info and the disk now gets aligned since it’ll use a number of blocks that can be divided by 4k, not just a part of the first one in the beginning.

There’s still a problem though, the disk uses 4k blocks and in this case it reports it’s using 512b blocks to the OS. This effectively means that if you want to modify a 512b block you have to read 4k, modify 512b and write down 4k again. Quite a waste and it becomes an irritating issue with ZFS.

What about ZFS?

ZFS uses variable block size. This means that it’ll use down to 512b blocks if that’s what’s reported by the disk. This can take quite a toll in performance, but there’s a way to force ZFS to use a minimum of 4k blocks. I’m using a program called gnop in FreeBSD. It’s used like this and forces ZFS to use a block size minimum of 4k.

# gnop create -S 4096 /dev/ada0
# gnop create -S 4096 /dev/ada1
# gnop create -S 4096 /dev/ada2
# gnop create -S 4096 /dev/ada3

# zpool create pool raidz2 /dev/ada0.nop /dev/ada1.nop /dev/ada2.nop /dev/ada3.nop

What’s the impact in performance? I actually did a test with two striped SSDs. They’re a couple of Corsair F60, about two years old and uses SATA 2.0 (3 Gb/s).

bonnie++ with misaligned ZFS on mirrored SSDs

bonnie++ with misaligned ZFS on mirrored SSDs

bonnie++ with aligned ZFS on mirrored SSDs

bonnie++ with aligned ZFS on mirrored SSDs

As expected the read performance was the same, but the write performance was considerably higher with alignment. It went from 261 MB/s to 331 MB/s, quite a bump by just doing things the right way!

The firmware bug that made me crazy

My file server has been a pretty frequent topic on this blog and it probably won’t be a surprise when I’m bringing it up again. This time there’s been an issue with firmware, although I didn’t realize it until a just a few days ago. I’ve always had some minor issues with my file server, most prominently in performance. It used to stall now and then, but I read and read and ZFS seemed to have some problems with “bursty” writes etc. so  I didn’t pay that much attention to it. Not until about a week ago. I had to replace one of the disks in the RAID, I had a WD Green of the same model as the others laying as a cold spare so I used it and started resilvering. I got about 2-5 MB/s in resilvering speed, something was definitely wrong. But what as it?

My first though was “this is probably an alignment problem, I knew the disks were 4k (Advanced Format) drives with 4k sectors reporting 512k sectors to the OS. This can cause a bit of a problem, so I decided to backup all my important data from the RAIDZ store and align the disks, like this, using the gnop utility in FreeBSD. However this didn’t make a difference. The RAID was still AWFULLY slow.

I started reading blog after blog, but couldn’t figure out what the problem was. Then it hit me, I’d heard about the problem with WD Green disks a couple of years ago and some of my disks were from around that time. I decided to try them out one by one and check if there was a difference in performance between the separate disks. My two oldest 1,5 TB disks seemed to be the problem. Now and then they stalled for about 5-20 seconds, no data could be read or written. This seemed to indicate I had disks with the Intellipark bug, extremely aggressive power savings that made the disks pretty much useless. I tried running a configuration tool from WD that would sort out the problem but with no luck. It seemed the RAID was kind of OK with one of the disks with buggy firmware but when I used two of them it just went bananas and stopped working at all, pretty much.

What to do? I replaced the drives, plain and simple. Two fresh WD Green 2 TB disks. OK, the other disks are 1,5 TB so I’m ending up loosing 1 TB but the RAID works! What about the performance? I used to get stalls and about really bad performance with bonnie++ (25 MB/s write and 100 MB/s read from the RAID in average).

Screen shot from a round of bonnie++

Screen shot from a round of bonnie++

If you have a look at the screen shot above I now get 179 MB/s write and 237 MB/s read with bonnie++ on my RAIDZ2. By no means extremely fast, but it’s not bad either.

Conclusion

If you have disks from the WD Green series performing strangely, make sure it isn’t bad firmware that’s the problem. In my case I had problems with WD15EARS-00Z5B1 using firmware 80.00A80. It seems that’s the firmware causing the most problems.

Another reorganization

I’ll always have a new idea on how to improve my home network. I try it out and sometimes it wasn’t as good as I hoped. My last idea about using my old Macbook Pro as a server was such an idea. Mac OS X is a wonderful OS, but I kept feeling like my hands were tighted behind my back with it on my server. I had no idea how stuff was configured. Sure, it worked just fine but I need more. I need control and I have to know how all the services are configured! This called for a change.

I started by putting together a new ESXi server. This time it’s an AMD Athlon 64 X2 6000+ with 4 GB RAM running ESXi from a USB stick. I’m using ESXi 5.0 which is the, currently, latest version. I have all virtual machines stored on the file server via NFS using a dedicated gigabit network. Don’t want the storage network to be affected by normal network traffic.

Since I’ve also decommisioned my HTPC (I wrote about it here) it has now become my home automation server. It’s used to control my Tellstick via Telldus live and it works really well. The Tellstick used to be connected to the Mac server, so this works wonderfully.

The next step is to replace the ESXi server with something that doesn’t need a nuclear plant to run (it draws a lot of power, Athlon 64 isn’t very energy efficent). I’m thinking about AMD Fusion, or the E-350 to be more precise. CPU performance never seem to be a problem, it’s always lack of RAM that’s the problem. With a E-350 and 8 GB I should reach a good compromise between power consumpion and reasonable performance.

The future ESXi server?

The future ESXi server?

Fileserver crash

Hmm… Yesterday my fileserver crashed and I lost all my data. It was all because of a faulty SATA-controller. All my data was corrupt. I didn’t realise this until today, that the card was the problem, so I installed FreeBSD instead of Solaris last night.

I think I’ll keep FreeBSD and run ZFS on that instead. Seems to work really great and to be honest, FreeBSD is a lot easier to maintain than Solaris. The ports system makes my life a lot easier than the very few precompiled packages for Solaris.

Monitor the outdoor temperature

My latest project is to montior the temperature outdoors. Today I decided to get some cableing done and set up the probe.

To start with I had to mount the RJ12 connector on one side of the cable.

Cable, connectors and tools

Cable, connectors and tools

I used the wireing scheme found on this blog: http://domotica.ronnkvist.nu/how-to/1-wire/

The next step was to draw some cable all over the apartment. This was pretty easy and took about an hour. I tried to conceal the cabling as much as possible.

Behind the closet you go, little cable!

Behind the closet you go, little cable!

The final destination of the cable was my balcony where I would joint the cable with the temp probe. The temp probe will then be rain proofed with silicone and placed outside.

Cable on the balcony

Cable on the balcony

The temp probe

The temp probe

Right now the probe reports 2 degrees Celcius to op5. Success! :)

 

 

Another (very) simple Nagios plugin

I’ve modified my Nagios plugin that’s checking harddisk temperatures to check the temperature of the first CPU core on my Mac server(s).

#!/usr/bin/python
# coding: utf-8
# This script checks the temp of the first CPU core
# Requires Temperature Monitor to be installed 
# It has no tolerance for errors

import string, sys, os

sensor = int(os.popen("/Applications/TemperatureMonitor.app/Contents/MacOS/tempmonitor | awk '{print $1}'").read())

# Print status    
if (sensor < 95):
 print "OK: Temperature: " + str(sensor) + " C|Temperature=" + str(sensor) + ";95;100;60;110"
 raise SystemExit(0)
elif (sensor >= 95 and total < 100):
 print "WARNING: Temperature: " + str(sensor) + " C|Temperature=" + str(sensor) + ";95;100;60;110"
 raise SystemExit(1)
else:
 print "CRITICAL: Temperature: " + str(sensor) + " C|Temperature=" + str(sensor) + ";95;100;60;110"
 raise SystemExit(2)

As always, very simple and no error handeling. Still serves it’s purpose though!

My last problem, which I haven’t been able to solve yet, is to monitor the temperature of the CPU on my file server. Pretty much impossible to get temp readings from off-the-shelf-motherboards it seems. It’s a work in progress.

Writing a temperature monitoring plugin for Nagios/op5

My file server uses six disks for main storage. To make sure the drives aren’t overheated for some reason I wrote a simple plugin to monitor the drive temperatures.

#!/usr/bin/python
# coding: utf-8
# This script checks the temp of harddisks
# It has no tolerance for errors

import string, sys, os

hdd1 = int(os.popen("/usr/local/src/smartmontools-5.40/smartctl -d scsi -a /dev/rdsk/c10t0d0|grep Current|cut -d \: -f 2|cut -d C -f 1|awk '{print $1}'").read())
hdd2 = int(os.popen("/usr/local/src/smartmontools-5.40/smartctl -d scsi -a /dev/rdsk/c10t1d0|grep Current|cut -d \: -f 2|cut -d C -f 1|awk '{print $1}'").read())
hdd3 = int(os.popen("/usr/local/src/smartmontools-5.40/smartctl -d scsi -a /dev/rdsk/c10t2d0|grep Current|cut -d \: -f 2|cut -d C -f 1|awk '{print $1}'").read())
hdd4 = int(os.popen("/usr/local/src/smartmontools-5.40/smartctl -d scsi -a /dev/rdsk/c10t3d0|grep Current|cut -d \: -f 2|cut -d C -f 1|awk '{print $1}'").read())
hdd5 = int(os.popen("/usr/local/src/smartmontools-5.40/smartctl -d scsi -a /dev/rdsk/c10t4d0|grep Current|cut -d \: -f 2|cut -d C -f 1|awk '{print $1}'").read())
hdd6 = int(os.popen("/usr/local/src/smartmontools-5.40/smartctl -d scsi -a /dev/rdsk/c10t5d0|grep Current|cut -d \: -f 2|cut -d C -f 1|awk '{print $1}'").read())

total = hdd1 + hdd2 + hdd3 + hdd4 + hdd5 + hdd6
total = total / 6

# Print status
if (total < 40):
 print "OK: Temperature: " + str(total) + " C|Temperature=" + str(total) + ";40;45;30;50"
 raise SystemExit(0)
elif (total >= 40 and total < 45):
 print "WARNING: Temperature: " + str(total) + " C|Temperature=" + str(total) + ";40;45;30;50"
 raise SystemExit(1)
else:
 print "CRITICAL: Temperature: " + str(total) + " C|Temperature=" + str(total) + ";40;45;30;50"
 raise SystemExit(2)

The plugin is written in Python and I use remote SSH execution from my op5 server to execute it. Now, I’ll give a brief explanation on how the plugin works.

I start by reading the temperature of each drive with smartmontools, add the temps up and divide by the numbers of drives. This gives me the mean temperature stored in the variable total.

If the temperature is below 40 degrees Celcius everything is OK, the plugin reports this back to op5 via the printed string and the exit-value (in this case 0 for OK). If the temp is between 40 and 45 it returns a warning instead and if it’s over 45 it returns a critical alert.

This script is VERY case specific and only works under the conditions in my file server. Feel free to use or modify the whole script or parts of it.

Temperature graph generated by op5

Temperature graph generated by op5

The result from op5 can be seen above. This is the temperature reported by the plugin over the last month.