Nagios: Delay WARNING Notifications
I have a Nagios check to monitor the power being drawn through our PDUs. While setting all this up, I had to figure out alert threshholds. The convention for disk capacity is warn at 80%, critical at 90%. I also had this gem to work with:
National Electric Code requires that the continuous current drawn from a branch circuit not exceed 80% of the circuit’s maximum rating. “Continuous current” is any load sustained continuously for at least 3 hours.
(Thanks to mike Pennington, via http://serverfault.com/a/413307/72839)
So, I went with 80% warning, 90% critical.
Lately I have been getting a lot of warning notifications about circuits exceeding 80%. Ah, but the NEC says that is only a problem if they are at 80% for more than three hours. So, I dig through Nagios documentation and split my check out into two services:
define service{ # PDU load at 90% of circuit rating
use generic-service
hostgroup_name pdus
service_description Power Load Critical
notification_options c,u,r
check_command check_sentry
contact_groups admins
}
define service{ # PDU sustained load at 80% of circuit rating for 3 hours
use generic-service
hostgroup_name pdus
service_description Power Load High
notification_options w,r
first_notification_delay 180
check_command check_sentry
contact_groups admins
}
The first part limits regular notifications to critical alerts. In the second case, the first_notification_delay should cover the “don’t bug me unless it has been happening for three hours” caveat and I set that service to only notify on warnings and recovery.
Comment
Tiny Print:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>