Tag Archives: python

Monitoring display prototype working

Finally I’m home again after a week taking care of my parents house and dog while they were on vacation. I’ve continued work on my Arduino project and today I connected it to my network monitoring software op5 to see if it worked as intended. It did! Maybe you’re wondering what it’s supposed to do?

It’s basically a two row LCD display, two LEDs and an Arduino. This is connected to the computer running op5 via USB. A program, written in Python, is running on the op5 computer. This transmits current out- and indoor temperature which is displayed in the LCD and in addition the LEDs (a green and a red one) shows me the overall status of my network. If the green LED’s lit everything’s OK, if the red’s lit something’s wrong. Pretty simple.

The monitoring server

Let’s start with the monitoring server. It has a Python program running which sends data to the Arduino.

#!/usr/bin/python
import time
import serial
import string, sys, os, time

#Serial port
serial_name = "/dev/ttyUSB0"
hostname = "jupiter.nickebo.net"
community = "public"
oid1 = "1.3.6.1.4.1.31440.10.5.1.1.0"
oid2 = "1.3.6.1.4.1.31440.10.5.1.1.2"
def printLCD():
  # configure the serial connections (the parameters differs on the device you are connecting to)
  try:
    ser = serial.Serial(serial_name,9600)
  except:
    print "Could not open serial port " + serial_name
    raise SystemExit()

  time.sleep(2)
  print "Printing: Out: " + str(round(sensor, 1))
  ser.write('0Out: ' + str(round(sensor, 1)) + ' C')
  time.sleep(2)
  print "Printing: In: " + str(round(sensor2, 1))
  ser.write('1In: ' + str(round(sensor2, 1)) + ' C')
  time.sleep(2)
  statusfile = os.popen("/opt/monitor/bin/monitorstats |grep 'Services Ok'|awk '{print $5 $7 $9}'")
  status = int(statusfile.read())
  statusfile.close()  
  print "op5 status: " + str(int(status))
  if(int(status) > 0):
    print "alarm"
    ser.write('alarm')
  else:
    print "reset"
    ser.write('reset')

while 1:
  #Connect to the 1-wire server
  try:
    sensorfile = os.popen('/usr/bin/snmpget -c ' + community + ' -v 2c ' + hostname + ' ' + oid1 + ' |cut -d \\" -f 2')
    sensor = str(sensorfile.read())
    sensor = float(sensor)
  except:
    print "Could not connect to " + hostname + " or error reading probe"
    raise SystemExit(3)

  try:
          sensorfile2 = os.popen('/usr/bin/snmpget -c ' + community + ' -v 2c ' + hostname + ' ' + oid2 + ' |cut -d \\" -f 2')
    sensor2 = str(sensorfile2.read())
          sensor2 = float(sensor2)
  except:
          print "Could not connect to " + hostname + " or error reading probe"
          raise SystemExit(3)
  printLCD()
  time.sleep(10)

This is pretty much a dirty hack, but it works. I use a while-loop which tries to fetch the temperature data from my 1-wire server via SNMP. If this succeeds it calls the printLCD function which sends data to the Arduino. The function also checks the status of op5, sending “alarm” if something’s wrong and “reset” if everything’s OK.  As you can see the data sent for the temperature readings start with either 0 or 1, this is a simple header which tells the Arduino if the data is to be printed on line 0 or 1 on the LCD. Unfortunately  I’ve been too lazy to do proper comments in the code, this will be sorted in the final version.

If you’ve been able to decrypt my crappy Python code above you might want to see the other side of the serial line? Namely the code for the Arduino.

The Arduino, where the real magic happens

The Arduino uses C++ (well, it’s VERY similar to C++) for the programming part. I’m using a library called LiquidCrystal to control the HD44780 LCD display, this makes it a whole lot easier. Beside that it’s handling the incoming serial data and sorting it out that’s the challenge. Now, the code.

#include 

// initialize the library with the numbers of the interface pins
LiquidCrystal lcd(12, 11, 5, 4, 3, 2);

void setup(){
    // set up the LCD's number of columns and rows:
  lcd.begin(16, 2);
  // initialize the serial communications:
  Serial.begin(9600);

  //Grön
  pinMode(8, OUTPUT);
  //Röd
  pinMode(9, OUTPUT);

  //Sätt Grön LED till high
  digitalWrite(8, HIGH);
}
void loop()
{
  // when characters arrive over the serial port...
  if (Serial.available()) {
    char inData[18];
    char inChar=-1;
    char cmdVar[2];
    byte index=0;
    // wait a bit for the entire message to arrive
    delay(100);
    // read all the available characters
    while (Serial.available() > 0) {
        if(index < 17) // One less than the size of the array
        {
            inChar = Serial.read(); // Read a character
            inData[index] = inChar; // Store it
            if(index == 0)
            {
              cmdVar[0] = inChar;
              cmdVar[1] = '\0';
            }
            index++; // Increment where to write next
            inData[index] = '\0'; // Null terminate the string
        }
    }
    if(strcmp(inData,"alarm") == 0)
    {
      digitalWrite(9, HIGH);
      digitalWrite(8, LOW);
    }
    else if(strcmp(inData,"reset") == 0)
    {
      digitalWrite(9, LOW);
      digitalWrite(8, HIGH);
    }
    else
    {
      byte i=1;
      char dispVar[18];
      for(i=1;i<17;i++)
      {
        dispVar[i-1] = inData[i];
      }
      Serial.write(cmdVar);
      if(strcmp(cmdVar,"0") == 0)
      {
        //lcd.clear();
        lcd.setCursor(0,0);
        lcd.write(dispVar);
      }
      else if(strcmp(cmdVar,"1") == 0)
      {
        //lcd.clear();
        lcd.setCursor(0,1);
        lcd.write(dispVar);
      }
    }
  }
}

At the first glimpse this code is “WTF? what did he do? this is the crappiest code I’ve ever…” and yes, it IS crappy. But it works, I haven’t refined it in any way. So, what does it do? I initialize the LCD, loads libraries, etc. at the top. Also sets digital pin 8 and 9 as outputs for my LEDs. It assumes everything is OK and sets the green LED to high, which means it will be lit. Now, if data is received on the serial port (emulated via an FTDI chip over USB) the data is put in inData and inData is null terminated with ‘\0′. This is straight forward, receive the data and put it in an array. I then check if it matches “alarm” or “reset”, if so the green/red LED is lit. If not, I place the first char in a separate variable called cmdVar. If it’s equal to 0 or 1 i print out the rest of the data on either row 0 or row 1. If the first char it’s either 0 or 1 nothing happens, it’s not valid since I don’t know which line to print it on.

The next step is to order all stuff needed and solder it all together and make a permanent installation. Hopefully I’ll get this done next week.

Arduino prototype display working

Arduino prototype display working

Above is the Arduino at present, displaying current indoor and outdoor temperatures. As you can see the green LED is lit, everything is OK in my network. I’ll keep it like this until I’ve soldered the new unit.

 

Hard drive temperature check script version 0.1

I’ve more or less rewritten my Nagios script for checking hard drive temperatures. It still gives you a mean value of all the hard drives checked, but it’s a proper script now with error handeling and arguments.

#!/usr/local/bin/python
# coding: utf-8
#
# Check the temperature of hard drives and calculate a mean value
# Requires smartmontools and privileges to check the disk temperatures
#
# By Marcus Wilhelmsson
# marcus@nickebo.net
# http://www.nickebo.net
# Licence GPLv2
# Version 0.1

import string, sys, os, argparse

# Set full directory and name of the smartctl binary, change if needed
smartctlbin="/usr/local/sbin/smartctl"

# Check if above binary really exists
if os.path.isfile(smartctlbin) == False:
  print "Binary file for smartctl is faulty: " + smartctlbin
  raise SystemExit(3) 

# Parse arguments
parser = argparse.ArgumentParser(description='Check hard drive temperatures using smartmontools')
parser.add_argument('-w', action="store", dest="warn", type=int, help='Warning temperature')
parser.add_argument('-c', action="store", dest="crit", type=int, help='Critical temperature')
parser.add_argument(nargs='*', action='store', dest='disk', help='Disks to check: /dev/sda /dev/sdb /dev/sdc etc.',)
results = parser.parse_args()

# Store parsed arguments in variables and make sure they're not empty
warn = results.warn
crit = results.crit
disks = results.disk
if (warn == None or crit == None or disks == []):
  parser.print_help()
  raise SystemExit()

# Do the actual disk temperature checks
total = 0
for x in disks:
  if os.path.exists(x):
    try:
      total = total + int(os.popen(smartctlbin + " -a " + x + "| grep Celsius | awk '{print $10}'").read())
    except:
      print "Error checking " + x + ". Is it a valid hard drive?"
      raise SystemExit(3)
  else:
    print "Disk " + x + " does not exist. Exiting."
    raise SystemExit(3)
try:
  total = total/len(disks)
except:
  print "Error calculating temperature mean value"

# Print status and make sure critical is greater than warning
if warn >= crit:
  print "ERROR: Critical must be greater than warning"
  raise SystemExit(3)

if (total < warn):   print "OK: Temperature: " + str(total) + " C|Temperature=" + str(total) + ";" + str(warn) + ";" + str(crit) + ";" + str(warn-5) + ";" + str(crit+5)   raise SystemExit(0) elif (total >= warn and total < crit):
  print "WARNING: Temperature: " + str(total) + " C|Temperature=" + str(total) + ";" + str(warn) + ";" + str(crit) + ";" + str(warn-5) + ";" + str(crit+5)
  raise SystemExit(1)
else:
  print "CRITICAL: Temperature: " + str(total) + " C|Temperature=" + str(total) + ";" + str(warn) + ";" + str(crit) + ";" + str(warn-5) + ";" + str(crit+5)
  raise SystemExit(2)

I’ve licensed it under GNU GPL version 2.

I’ll post it on Nagios Exchange as soon as I get a confirmation mail for my account…

The script can be downloaded here.

Updated ZFS storage I/O plugin for Nagios/op5

Some time ago I wrote a plugin for op5 used to measure the I/O load on a storage pool. The problem was that the script only sampled data over 10-20 seconds (if it takes too long the script exceeds it’s execution time limit). The short sample time also gives “bad data”, since op5 only executes the plugin once every five minutes a lot can happen (I/O can change a lot) the un-sampled 4 minutes and 50 seconds.

I’ve solved this by two scripts. One that runs every five minutes from cron and samples for five minutes. It then writes the read/write I/O load to two temp files in /tmp. The second script reads the files in /tmp and performs the usual testing for OK, WARNING or CRITICAL normally found in Nagios/op5 plugins.

The first script:

#!//usr/bin/python

import string, os, sys, threading

class readThread ( threading.Thread ):
        def run ( self ):
                zfsread = str(os.popen("/usr/sbin/zpool iostat 299 2|tail -2|grep storage|awk '{print $6}'|cut -d M -f 1").read())
                if (zfsread.find("K") != -1):
                        zfsread = zfsread.replace('K', '')[:-1]
                        zfsread = float(zfsread)
                        zfsread = int(zfsread)
                        zfsread = str(zfsread)
                        zfsread = float("0." + zfsread)
                        zfsread = round(zfsread, 2)
                else:
                        zfsread = float(zfsread)
                r = open("/tmp/zfsread.txt", "w")
                r.write(str(zfsread) + "\n")
                r.close()

class writeThread ( threading.Thread ):
        def run ( self ):
                zfswrite = str(os.popen("/usr/sbin/zpool iostat 299 2|tail -2|grep storage|awk '{print $7}'|cut -d M -f 1").read())
                if (zfswrite.find("K") != -1):
                        zfswrite = zfswrite.replace('K', '')[:-1]
                        zfswrite = float(zfswrite)
                        zfswrite = int(zfswrite)
                        zfswrite = str(zfswrite)
                        zfswrite = float("0." + zfswrite)
                        zfswrite = round(zfswrite, 2)
                else:
                        zfswrite = float(zfswrite)
                w = open("/tmp/zfswrite.txt", "w")
                w.write(str(zfswrite) + "\n")
                w.close()

readThread().start()
writeThread().start()

This script might need some explanation. It executes two threads, one for sampling read I/O load and one for write I/O load. These are then run simultaneously. Why use threading? In the script there’s really no point except I wanted to see how it’s done in python. Each thread uses zpool iostat with some grep/awk/cut magic to get the load in MB/s or KB/s. I then convert any data from KB/s to MB/s so that it’s all in MB/s. This is then written to a file in /tmp.

Script nr 2 (the script called from Nagios/op5, preferrably via NRPE):

#!/usr/bin/python

import string, sys, os

zfsread = float(os.popen("cat /tmp/zfsread.txt").read())
zfswrite = float(os.popen("cat /tmp/zfswrite.txt").read())

# Print status
if (zfsread > 120 or zfswrite > 120):
 print "WARNING: Read: " + str(zfsread) + " MB/s Write: " + str(zfswrite) + " MB/s|Read=" + str(zfsread) + ";120;130;0;120 Write=" + str(zfswrite) + ";120;130;0;120"
 raise SystemExit(1)
elif (zfsread > 130 or zfswrite > 130):
 print "CRITICAL: Read: " + str(zfsread) + " MB/s Write: " + str(zfswrite) + " MB/s|Read=" + str(zfsread) + ";120;130;0;120 Write=" + str(zfswrite) + ";120;130;0;120"
 raise SystemExit(2)
else:
 print "OK: Read: " + str(zfsread) + " MB/s Write: " + str(zfswrite) + " MB/s|Read=" + str(zfsread) + ";120;130;0;120 Write=" + str(zfswrite) + ";120;130;0;120"
 raise SystemExit(0)

The second script takes the data from the tmp-files and processes them for the thresholds of WARNING and CRITICAL values.

The last step is to add a line to crontab:

0,5,10,15,20,25,30,35,40,45,50,55 * * * * /home/marcus/plugins/zfs_run_bg.py

This now gives me a sample time of five minutes, much more accurate and I can even sample while the script from Nagios/op5 isn’t running or called.

Storage I/O from op5

Storage I/O from op5