staff web u of c library
  Departments > DLDC > Programming

crontab montoring tools

Description

I have recently written two utilities designed to work in crontab in concert with our venerable cronmail: as a package, they can be used to set up very simple monitoring of practically anything, with the key feature of minimizing emails (in the manner of systest) and therefore being exactly the opposite of argus.

Note that while they are being used in production, they are preliminary versions and I plan to add some more features (like regexp predicates for threshold, units other than days for changes, etc).

A brief tutorial

Here's how I monitor the disk space in luna-archive.

This command gives me the disk usage:

$ du -sk /zones/luna-zone/local/oracle/flash_recovery_area/INSIGHT/archivelog
5727280 /zones/luna-zone/local/oracle/flash_recovery_area/INSIGHT/archivelog
    

I want to know if it goes over 5G, so (nb I ask du for Kbytes) I use the threshold predicate -gt 5291456. The "sense" of threshold is to complain when a threshold is violated, so "no news is good news": if all is well, it produces no output. This fits with cronmail: the model, simplest, invocation is:

      TEST | threshold ... | cronmail LUSER
    

Let's pretend we are over the luna limit right now:

$ LUNA=/zones/luna-zone/local/oracle/flash_recovery_area/INSIGHT/archivelog
$ ssh t2 du -sk $LUNA |
  threshold -gt 5291456 Outta space

----------------------------------------------------------------
Outta space
----------------------------------------------------------------
5727280 /zones/luna-zone/local/oracle/flash_recovery_area/INSIGHT/archivelog
$

Note that in the simplest case, threshold formats up the cmdline message, and whatever was on stdin. -i and -s can sexy it up a bit:

$ ssh t2 du -sk $LUNA |
  threshold -gt 5291456 -i FEH  -s Outta space
mystique.lib.uchicago.edu | threshold -gt 5291456


-- FEH ---------------------------------------------------------
Outta space
----------------------------------------------------------------
5727280 /zones/luna-zone/local/oracle/flash_recovery_area/INSIGHT/archivelog
$

and if the stdin (the output of the TEST program) is likely to be more confusing than helpful, you can zap it with -z:

$ ssh t2 du -sk $LUNA |
  threshold -gt 5291456 -i FEH  -s -z Outta space
mystique.lib.uchicago.edu | threshold -gt 5291456

Outta space

Now, if you are running this as a cronjob every 5 minutes, you are going to get way too much mail! The changes program can be used to cut this down to a reasonable amount. It simply manages a state file that contains the previous value of stdin and compares the current stdin to it; if it has changes, it copies the differences to stdout (using systest-style changebars); it remains silent if things are the same as last time, unless it's been "too long" since it's complained, and it manages the boundary conditions. changes assumes likewise that "no news is good news" and so when stdin goes quiet (ie empty), it assumes all is well (and optionally sends a final reassuring message).

Simplest usage is to just give it a state file and stick it in the pipe:

$ ssh t2 du -sk $LUNA |
  threshold -gt 5291456 -i FEH  -s -z Outta space |
  changes -s /tmp/luna-state
mystique.lib.uchicago.edu | threshold -gt 5291456

Outta space
$ ssh t2 du -sk $LUNA |
  threshold -gt 5291456 -i FEH  -s -z Outta space |
  changes -s /tmp/luna-state
$

NB: no output on the second run, because it was the same as the first time.

If you want to be notified again that things are amiss, even if nothing has changed, add a -t parm, which is a threshold in days; the default is 1, so that everything gets re-reported after one day.

Note that if threshold is passing on the stdin of the test program, and if it changes "trivially" (like the number of Kbytes in the example du output above), then changes will consider the input different and produce output again. If this isn't what you want then you need to either:

  1. change the output from 5727280 to "TOO BIG" or the like before changes sees it (either in your test, or via sed after threshold), or
  2. use threshold's -z option

cronmail usage

cronmail: mail stdin to recipients IFF stdin is non-empty

    Usage: cronmail [-H] [-s subject] [address ...]

threshold usage

threshold: test stdin and complain if it exceeds some threshold

    Usage: threshold [-gt|-lt|-eq|-ne PARM]|-empty|-nonempty [-f MSGFILE][-i ID][-s][-z] [MSG ...]; -H for help
    This is threshold 6f2f45c8b2d6+ by Keith Waclena http://www.lib.uchicago.edu/keith/>

    Each command starts with a predicate; the sense is, if the predicate
    succeeds, there is a problem.  For example, use -empty if an empty
    stdin indicates trouble.

    -empty and -nonempty only care about whether stdin is empty or
    non-empty and don't pay any attention to its structure.

    -empty  report if stdin is empty
    -nonempty       report if stdin is non-empty

    For -gt, -lt, -eq and -ne, line 1 (only) of stdin is parsed into
    whitespace separated words.  Leading whitespace is trimmed.  Only $1
    is used and compared to the parameter PARM.

    -gt     report if stdin is >     PARM (numeric comparison)
    -lt     report if stdin is <     PARM (numeric comparison)
    -eq     report if stdin is =  to PARM (string comparison)
    -ne     report if stdin is <> to PARM (string comparison)

    If the test fails, no output is written to stdout.  If it succeeds, MSG...
    is written, followed by a blank line and the contents of stdin.

    -f MSGFILE  use contents of MSGFILE instead of MSG...
    -i ID   mark test output with identifying ID
    -s      stamp output with a line identifying hostname, cmd line args, timestamp
    -z      don't include (zap) stdin in report; print MSG... only

changes usage

changes: monitor stdin over time, reporting changes

    Usage: changes [-l][-r][-t THRESHOLD][-v] -s STATEFILE; -H for help
    This is changes 65f83b503fdd+ by Keith Waclena <http://www.lib.uchicago.edu/keith/>

    -l      prepend legend to beginning of output (some people don't grok changebars!)
    -r      display reassuring message
    -t THRESHOLD    if no changes, print anyway if THRESHOLD exceeded

Installation

These tools are not yet under jinn. To install manually, for now, do this:

for prog in threshold changes changebars
do
    cp ~keith/src/$prog/$prog /host/bin/
done

Let me know if there are portability problems. Patches would be appreciated.

Author

Keith Waclena

Language

Bourne Shell


This page was last generated on 21 March 2014 at 11:03:02 am by dldc-info
The URL of this page is http://www.lib.uchicago.edu/staffweb/depts/dldc/programming/crontab-monitoring.html