IT_Architect
Verified User
- Joined
- Feb 27, 2006
- Messages
- 1,084
Script to restart dependent processes if master process is running. Read the Documentation section to understand how it works.
Example my_watchdog.txt entry that restarts SpamAssassin
Example crontab entry that checks for failed processes every 5 minutes, and only notifies me only if it cannot restart it.
Code:
#!/bin/bash
# ===================================================================
#
# my_watchdog.sh
#
# ===================================================================
# ===================================================================
# Documentation
# ===================================================================
<<COMMENT1
- My only request: If you find an error, respond here so others benefit
- Purpose: Cron this to restart failed processes ONLY if master process
is running. We do not want to shut off a master process in the
control panel only to have this start it back up again. If you want
to run it all the time, then pick a master process that always runs.
- The input file:
A. Comma separated rows with exactly 5 columns per row. Embedded
commas (the separator), quotes, double-quotes, and anything that
prevents being read by into a script variable must be escaped (\)
B. Row columns contain:
1. ProcessName - Friendly name for notification emails (SpamAssassin)
2. ProcessTestString - used by ps -ax|grep to test if running (spamd)
3. ProcessMinimumCount - min. instances to be considered running (1,2,3,etc.)
4. ProcessRestartString - ("\etc\rc.d\blah restart")
5. ProcessStartUpDelay (Seconds to wait after restarting before
before retesting if restart was successful)
- DEBUG_LEVEL command line parameter (optional) - Controls verbosity
0 = Never E-Mail (default used if no parameter supplied or out of range)
1 = E-Mail when a process could not be restarted.
2 = E-Mail every time a process had to be restarted
3 = E-Mail with full details when a process could not be restarted
4.= E-Mail and print every time with full details
- ERROR_LEVEL preserves worst error level for reporting
0=No Errors
1=Restarted a process
2=Failed to restart a process
3=Master process not running
- User Defined Parameters
A. InputFileName - Full path name ("/etc/my_watchdog.txt")
B. MasterProcessTestString - used by ps -ax|grep to test if Master
process is running (exim)
C. MasterProcessMinCount - min. instances to be considered running (1,2,3,etc.)
D. FromEmail - Used as sender for notification E-Mails. Domain
should be real, but not necessarily the mail box. ([email protected])
E. NotificationName - How From will read in recipient's E-Mail Client ("Server Alert my-domain"
F. NotificationEMail - Real E-Mail address where notification
will be sent ([email protected])
G. OkMessage - when it finds no errors
H. Need2RestartMessage - when a process needs tobe restarted
I. RestartedMessage - when a process was restarted
J. FailureMessage - when it has failed to restart a process
K. MasterNotRunningMessage - when the master process not running"
- Concept of Operation
A. Test with ps -ax|grep if MasterProcessTest is running and meets
MasterProcessMinCount. If yes continue, if not, exit.
B. Loop through rows in input file and run ProcessTestString
1. If -lt ProcessMinimumCount
a. Run ProcessRestartString
b. sleep ProcessStartUpDelay
c. Run ProcessTestString command
d. If -lt ProcessMinimumCount
1) If highest ERROR_LEVEL -lt 2, set ERROR_LEVEL to 2 and save ProcessName to StatusProcess
2) Add reporting to MessageBody as determined by DEBUG_LEVEL
ELSE
1) If highest ERROR_LEVEL -lt 1, set ERROR_LEVEL to 1 and save ProcessName to StatusProcess
2) Add reporting to MessageBody as determined by DEBUG_LEVEL
ELSE
a. Read next row from InputFileName
fi
C. After reading all InputFileName rows
1. Calculate MessageSubject
2. Print as directed by $DEBUG_LEVEL and ERROR_LEVEL
COMMENT1
# ===================================================================
# User defined parameters
# ===================================================================
InputFileName="/etc/my_watchdog.txt"
MasterProcessTestString=exim
MasterProcessMinCount=1
[email protected]
NotificationName="Server.Domain.com"
[email protected]
OkMessage="No processes needed to be restarted"
Need2RestartMessage="needs to be restarted"
RestartedMessage="was successfully restarted"
FailureMessage="failed to restart"
MasterNotRunningMessage="Master process not running"
# ============= Make no modifications below this line ===============
function SendEmailNotification {
(printf "From: $ServerName <$FromEmail>\n";
printf "To: $NotificationName <$NotificationEMail>\n";
printf "Reply-To: $FromEmail\n";
printf "Subject: $MessageSubject\n";
printf "$MessageBody\n") | sendmail $NotificationEMail
}
ServerName=$hostname # set to actual computer name
OLDIFS="" # Initialize
ERROR_LEVEL=0 # Everything OK = 0, Restarted = 1, Failed to restart 2
StatusProcess="" # Name of process causing highest failure
MessageSubject="" # E-Mail Subject line
MessageBody="" # Captures Body text based on verbosity specified in $DEBUG_LEVEL
if [ -z $1 ] || [ $1 -gt 4 ] || [ $1 -lt 0 ] # Sets DEBUG_LEVEL parameter based on $1
then
DEBUG_LEVEL=0 # Set DEBUG_LEVEL to the default when no valid DEBUG_LEVEL passed
else
DEBUG_LEVEL=$1 # Assign valid DEBUG_LEVEL passed by parameter $1
fi
if [ $DEBUG_LEVEL -ge 3 ]; then MessageBody="${MessageBody}DEBUG_LEVEL=$DEBUG_LEVEL\n";fi
if [ $DEBUG_LEVEL -eq 4 ]; then printf "DEBUG_LEVEL=$DEBUG_LEVEL\n";fi
# Initialize Columns
ProcessName="" # (SpamAssassin)
ProcessTestString="" # (spamd)
ProcessMinimumCount=0 # (1, or 2, or 8, etc.)
ProcessRestartString="" # (service exim restart)
ProcessStartUpDelay=0 # (Seconds to wait before testing if restarted)
# Restart downed processes only if MasterProcess is running
commandline=$(ps -ax|grep -v "grep"|grep -c "$MasterProcessTestString")
count=$commandline
if [ $count -ge $MasterProcessMinCount ] # Do only if master process is running
then
OLDIFS=IFS # save off separator
IFS="," flds=( $fileline )
nrofflds=${#flds[@]}
while read fileline # Read rows
do
IFS="," flds=( $fileline )
nrofflds=${#flds[@]}
ctr=0
while [[ $ctr -lt $nrofflds ]] # Read Columns
do
case $ctr in
0) ProcessName=${flds[$ctr]};;
1) ProcessTestString=${flds[$ctr]};;
2) ProcessMinimumCount=${flds[$ctr]};;
3) ProcessRestartString=${flds[$ctr]};;
4) ProcessStartUpDelay=${flds[$ctr]};;
*) echo "Invalid number of columns in row of input file $InputFileName";;
esac
ctr=$(($ctr+1))
done
if [ $DEBUG_LEVEL -ge 3 ]
then
MessageBody="${MessageBody}ProcessName = $ProcessName\n"
MessageBody="${MessageBody}ProcessTestString = $ProcessTestString\n"
MessageBody="${MessageBody}ProcessMinimumCount = $ProcessMinimumCount\n"
MessageBody="${MessageBody}ProcessRestartString = $ProcessRestartString\n"
MessageBody="${MessageBody}ProcessStartUpDelay = $ProcessStartUpDelay\n"
fi
if [ $DEBUG_LEVEL -eq 4 ]
then
printf "ProcessName $ProcessName\n" # (SpamAssassin)
printf "ProcessTestString $ProcessTestString\n" # (spamd)
printf "ProcessMinimumCount $ProcessMinimumCount\n" # (1, or 2, or 3, etc.)
printf "ProcessRestartString $ProcessRestartString\n" # (service exim restart)
printf "ProcessStartUpDelay $ProcessStartUpDelay\n" # (Seconds to wait before testing if restarted)
fi
commandline=$(ps -ax|grep -v "grep"|grep -c "$ProcessTestString")
count=$commandline
if [ $count -lt $ProcessMinimumCount ]
then
if [ $ERROR_LEVEL -lt 1 ]; then ERROR_LEVEL=1; StatusProcess=$ProcessName; fi
if [ $DEBUG_LEVEL -ge 3 ]; then MessageBody="${MessageBody}$ProcessName $Need2RestartMessage\n"; fi
if [ $DEBUG_LEVEL -eq 4 ]; then printf "$ProcessName $Need2RestartMessage\n"; fi
commandline=$ProcessRestartString
eval $commandline
commandline="sleep $ProcessStartUpDelay"
eval $commandline
commandline=$(ps -ax|grep -v "grep"|grep -c "$ProcessTestString")
count=$commandline
if [ $count -lt $ProcessMinimumCount ]
then
if [ $ERROR_LEVEL -lt 2 ]; then ERROR_LEVEL=2; StatusProcess=$ProcessName; fi
if [ $DEBUG_LEVEL -eq 1 ]; then MessageBody="${MessageBody}$ProcessName $FailureMessage\n"; fi
if [ $DEBUG_LEVEL -ge 3 ]; then MessageBody="${MessageBody}$ProcessName $FailureMessage\n"; fi
if [ $DEBUG_LEVEL -eq 4 ]; then printf "$ProcessName $FailureMessage\n"; fi
else
if [ $DEBUG_LEVEL -ge 2 ]; then MessageBody="${MessageBody}$ProcessName $RestartedMessage\n"; fi
if [ $DEBUG_LEVEL -eq 4 ]; then printf "$ProcessName $RestartedMessage\n"; fi
fi
else
if [ $DEBUG_LEVEL -ge 3 ]; then MessageBody="${MessageBody}$OkMessage\n"; fi
if [ $DEBUG_LEVEL -eq 4 ]; then printf "$OkMessage"; fi
fi
done <"$InputFileName"
IFS=OLDIFS
else
if [ $DEBUG_LEVEL -eq 4 ]; then
MessageBody="${MessageBody}$MasterNotRunningMessage\n"
printf "$MasterNotRunningMessage\n"
$ERROR_LEVEL=3
fi
fi
# =========== End of Calculations SendEmail If Necessary ============
if [ $DEBUG_LEVEL -gt 0 ]; then
if [ "$ERROR_LEVEL" -eq 0 ]; then
MessageSubject="$OkMessage"
elif [ "$ERROR_LEVEL" -eq 1 ]; then
MessageSubject="$StatusProcess $RestartedMessage"
elif [ "$ERROR_LEVEL" -eq 2 ]; then
MessageSubject="$StatusProcess $FailureMessage"
elif [ "$ERROR_LEVEL" -eq 3 ]; then
MessageSubject="$StatusProcess $MasterNotRunningMessage"
else
MessageSubject="Error in monitor script. Invalid ERROR_LEVEL of $ERROR_LEVEL"
fi
fi
if ! [[ ( "$ERROR_LEVEL" -eq 0 && "$DEBUG_LEVEL" -lt 4 ) ]]; then
SendEmailNotification
fi
Example my_watchdog.txt entry that restarts SpamAssassin
Code:
SpamAssassin,spamd,3,/usr/bin/spamd -d -c -m 15 --ipv4 --pidfile=/var/run/spamd.pid,5
Example crontab entry that checks for failed processes every 5 minutes, and only notifies me only if it cannot restart it.
Code:
*/5 * * * * root /etc/my_watchdog.sh 1