Thursday, 30 May 2013

Get a list of files that exist on a website via curl and strip out HTML code

The following can be used to display a list of .csv.gz files that exist on a website and strips out all HTML code:-
 curl --silent http://www.theurl.com/thefiles/ | egrep -o "<a href=[^>]*>*.csv.gz"
 | sed 's/<a href=\"\([^"]*\).*/\1/g'  

The --silent flag in curl supresses the progress information and any error messages

Ruby - Check if a port is open

This requires the socket class so the following needs to be included at the top of the program:-
 require 'socket'  
The code here below can be used to see if linux is listening on a particular port.
 def port_open?(ip, port, timeout)  
  start_time = Time.now  
  current_time = start_time  
  while (current_time - start_time) <= timeout  
   begin  
    TCPSocket.new(ip, port)  
    return true  
   rescue Errno::ECONNREFUSED  
    sleep 0.1  
   end  
   current_time = Time.now  
  end  
  return false  
 end  
This can be called with the following:-
 port_open?(Socket.gethostname, 80, 10)  

Creating Directories With Chef

Creating directories using the automation tool Chef is fairly easy to do, unfortunately when creating a directory the correct permissions are only applied to the last directory. E.g. If you created /tmp/foo/bar only bar would have the correct permissions. This to me seems to be a bug and has been logged with Chef, but at the time of writing this; the version I am using which is 10 still has the bug. The bug is logged here:-

https://tickets.opscode.com/browse/CHEF-1327

This could be worked around in multiple ways, a possible solution is to create a method and call this for each level of the directory. For example:-
 def make_dir(dir_path)  
  directory dir_path do  
   owner username  
   group usergroup  
   mode "0755"  
   recursive true  
   action :create  
  end  
 end  

This saves us from calling the chef directory resource for each level of the directory. This could be called like so:-
 make_dir("/tmp/foo")  
 make_dir("/tmp/foo/bar")  

Hopefully this will be fixed in future versions but having the directory resource within a ruby method similar to the above will save lines of code

Source Code Formatting

I am using the following to format source code, linux commands etc:-

http://codeformatter.blogspot.co.uk

Creating a Custom Nagios/Opsview Plugin

This was done using Nagios with Opsview front end version 4.2.3

Scripts are generally stored in the following location:-
 /usr/local/nagios/libexec 
Write the script in your language of choice and ensure it is runable by the nagios user.

The script must exit and create either an OK, WARNING, CRITICAL or UNKNOWN status, with a text based message which can be displayed by Nagios.

Exit Codes

  • 0 - This tells nagios that the check has passed and is OK
  • 1 - This tells nagios that the check has a problem but is just a WARNING
  • 2 - This tells nagios that the check has a problem that is CRITICAL
  • 3 - This generally means there has been a problem running the check it will display UNKNOWN in nagios

Bash Script

The script could be done in Bash if so you just need to echo the message out and then exit with the relevant code. For example if the check is okay and you wish to exit do the following:-
 echo "The check has passed"  
 exit 0  
The script could also be written in other languages for example the same above in Python:-
 import sys  
 print 'The check has passed'
 sys.exit(0)
The same above in Ruby:-
 puts "The check has passed"  
 exit 0  
Once the script is thoroughly tested it needs to be linked to nagios. You should be aiming for a script which takes a short amount of time to run. i.e. under 10 seconds, although the timeout can be extended if neccessary. Since this is a local check a config file needs to be put in the nrpe_local directory. The contents of this file could be something like the following:-
 cat /usr/local/nagios/etc/nrpe_local/new_check.sh  
The service will now need restarting.

Opsview Configuration

Go into Settings > Advanced > Service Checks. Add a new check, the two fields you are interested in are plugin which should be check_nrpe and arguments which should be:-
 -H $HOSTADDRESS$ -c new_check  
Note that this can be tested via on the host with the following:-
 /usr/local/nagios/libexec/check_nrpe -H `hostname` -c new_check

If this works it should work okay in Opsview. If there is a timeout problem -t can be appended to the above with a figure e.g. 30, the default is 10.
You then need to associate the check with the relevant hosts. To do this go to Settings > Basic > Hosts, search for the host, double click to amend, go to the monitoring tab, expand the relevant service group and select the new check.

After this you need to update the configuration. To do this go to Settings > Configuration > Apply Changes. Click Reload Configuration, then once this is reloaded and the check run you should see the result.