|
|
|
|
| Departments > DLDC > Programming | ||
I have recently written two utilities designed to work in crontab in concert with our venerable cronmail: as a package, they can be used to set up very simple monitoring of practically anything, with the key feature of minimizing emails (in the manner of systest) and therefore being exactly the opposite of argus.
Note that while they are being used in production, they are preliminary versions and I plan to add some more features (like regexp predicates for threshold, units other than days for changes, etc).
Here's how I monitor the disk space in luna-archive.
This command gives me the disk usage:
$ du -sk /zones/luna-zone/local/oracle/flash_recovery_area/INSIGHT/archivelog
5727280 /zones/luna-zone/local/oracle/flash_recovery_area/INSIGHT/archivelog
I want to know if it goes over 5G, so (nb I ask du for Kbytes) I use the threshold predicate -gt 5291456. The "sense" of threshold is to complain when a threshold is violated, so "no news is good news": if all is well, it produces no output. This fits with cronmail: the model, simplest, invocation is:
TEST | threshold ... | cronmail LUSER
Let's pretend we are over the luna limit right now:
$ LUNA=/zones/luna-zone/local/oracle/flash_recovery_area/INSIGHT/archivelog $ ssh t2 du -sk $LUNA | threshold -gt 5291456 Outta space ---------------------------------------------------------------- Outta space ---------------------------------------------------------------- 5727280 /zones/luna-zone/local/oracle/flash_recovery_area/INSIGHT/archivelog $
Note that in the simplest case, threshold formats up the cmdline message, and whatever was on stdin. -i and -s can sexy it up a bit:
$ ssh t2 du -sk $LUNA | threshold -gt 5291456 -i FEH -s Outta space mystique.lib.uchicago.edu | threshold -gt 5291456 -- FEH --------------------------------------------------------- Outta space ---------------------------------------------------------------- 5727280 /zones/luna-zone/local/oracle/flash_recovery_area/INSIGHT/archivelog $
and if the stdin (the output of the TEST program) is likely to be more confusing than helpful, you can zap it with -z:
$ ssh t2 du -sk $LUNA | threshold -gt 5291456 -i FEH -s -z Outta space mystique.lib.uchicago.edu | threshold -gt 5291456 Outta space
Now, if you are running this as a cronjob every 5 minutes, you are going to get way too much mail! The changes program can be used to cut this down to a reasonable amount. It simply manages a state file that contains the previous value of stdin and compares the current stdin to it; if it has changes, it copies the differences to stdout (using systest-style changebars); it remains silent if things are the same as last time, unless it's been "too long" since it's complained, and it manages the boundary conditions. changes assumes likewise that "no news is good news" and so when stdin goes quiet (ie empty), it assumes all is well (and optionally sends a final reassuring message).
Simplest usage is to just give it a state file and stick it in the pipe:
$ ssh t2 du -sk $LUNA | threshold -gt 5291456 -i FEH -s -z Outta space | changes -s /tmp/luna-state mystique.lib.uchicago.edu | threshold -gt 5291456 Outta space $ ssh t2 du -sk $LUNA | threshold -gt 5291456 -i FEH -s -z Outta space | changes -s /tmp/luna-state $
NB: no output on the second run, because it was the same as the first time.
If you want to be notified again that things are amiss, even if nothing has changed, add a -t parm, which is a threshold in days; the default is 1, so that everything gets re-reported after one day.
Note that if threshold is passing on the stdin of the test program, and if it changes "trivially" (like the number of Kbytes in the example du output above), then changes will consider the input different and produce output again. If this isn't what you want then you need to either:
cronmail: mail stdin to recipients IFF stdin is non-empty
Usage: cronmail [-H] [-s subject] [address ...]
threshold: test stdin and complain if it exceeds some threshold
Usage: threshold [-gt|-lt|-eq|-ne PARM]|-empty|-nonempty [-f MSGFILE][-i ID][-s][-z] [MSG ...]; -H for help
This is threshold 6f2f45c8b2d6+ by Keith Waclena http://www.lib.uchicago.edu/keith/>
Each command starts with a predicate; the sense is, if the predicate
succeeds, there is a problem. For example, use -empty if an empty
stdin indicates trouble.
-empty and -nonempty only care about whether stdin is empty or
non-empty and don't pay any attention to its structure.
-empty report if stdin is empty
-nonempty report if stdin is non-empty
For -gt, -lt, -eq and -ne, line 1 (only) of stdin is parsed into
whitespace separated words. Leading whitespace is trimmed. Only $1
is used and compared to the parameter PARM.
-gt report if stdin is > PARM (numeric comparison)
-lt report if stdin is < PARM (numeric comparison)
-eq report if stdin is = to PARM (string comparison)
-ne report if stdin is <> to PARM (string comparison)
If the test fails, no output is written to stdout. If it succeeds, MSG...
is written, followed by a blank line and the contents of stdin.
-f MSGFILE use contents of MSGFILE instead of MSG...
-i ID mark test output with identifying ID
-s stamp output with a line identifying hostname, cmd line args, timestamp
-z don't include (zap) stdin in report; print MSG... only
changes: monitor stdin over time, reporting changes
Usage: changes [-l][-r][-t THRESHOLD][-v] -s STATEFILE; -H for help
This is changes 65f83b503fdd+ by Keith Waclena <http://www.lib.uchicago.edu/keith/>
-l prepend legend to beginning of output (some people don't grok changebars!)
-r display reassuring message
-t THRESHOLD if no changes, print anyway if THRESHOLD exceeded
These tools are not yet under jinn. To install manually, for now, do this:
for prog in threshold changes changebars
do
cp ~keith/src/$prog/$prog /host/bin/
done
Let me know if there are portability problems. Patches would be appreciated.
Bourne Shell