How-To Script to restart dependent processes if master process is running

IT_Architect

Verified User
Joined
Feb 27, 2006
Messages
1,114
Script to restart dependent processes if master process is running. Read the Documentation section to understand how it works.

Code:
#!/bin/bash
# ===================================================================
#
#							my_watchdog.sh
#
# ===================================================================
# ===================================================================
#     						Documentation
# ===================================================================
<<COMMENT1
- My only request: If you find an error, respond here so others benefit
- Purpose: Cron this to restart failed processes ONLY if master process
  is running.  We do not want to shut off a master process in the 
  control panel only to have this start it back up again.  If you want
  to run it all the time, then pick a master process that always runs.  
- The input file:
  A.  Comma separated rows with exactly 5 columns per row.  Embedded
      commas (the separator), quotes, double-quotes, and anything that
      prevents being read by into a script variable must be escaped (\)
  B. Row columns contain:
    1.  ProcessName - Friendly name for notification emails (SpamAssassin)
    2.  ProcessTestString - used by ps -ax|grep to test if running (spamd)
    3.  ProcessMinimumCount - min. instances to be considered running (1,2,3,etc.)
    4.  ProcessRestartString - ("\etc\rc.d\blah restart")
    5.  ProcessStartUpDelay (Seconds to wait after restarting before
		before retesting if restart was successful)
- DEBUG_LEVEL command line parameter (optional) - Controls verbosity
	0 = Never E-Mail (default used if no parameter supplied or out of range)
	1 = E-Mail when a process could not be restarted.
	2 = E-Mail every time a process had to be restarted
	3 = E-Mail with full details when a process could not be restarted
	4.= E-Mail and print every time with full details
- ERROR_LEVEL preserves worst error level for reporting 
	0=No Errors
	1=Restarted a process
	2=Failed to restart a process
	3=Master process not running
- User Defined Parameters
	A.  InputFileName - Full path name ("/etc/my_watchdog.txt")
	B.  MasterProcessTestString -  used by ps -ax|grep to test if Master
		process is running (exim)
	C.  MasterProcessMinCount - min. instances to be considered running (1,2,3,etc.)
	D.  FromEmail - Used as sender for notification E-Mails.  Domain
		should be real, but not necessarily the mail box. ([email protected])
	E.  NotificationName - How From will read in recipient's E-Mail Client ("Server Alert my-domain"
	F.	NotificationEMail - Real E-Mail address where notification
		will be sent ([email protected])

	G.	OkMessage - when it finds no errors
	H.	Need2RestartMessage - when a process needs tobe restarted
	I.	RestartedMessage - when a process was restarted
	J.	FailureMessage - when it has failed to restart a process
	K.	MasterNotRunningMessage - when the master process not running"
- Concept of Operation
	A.	Test with ps -ax|grep if MasterProcessTest is running and meets
  		MasterProcessMinCount.  If yes continue, if not, exit. 
	B.  Loop through rows in input file and run ProcessTestString
		1.  If -lt ProcessMinimumCount
			a.  Run ProcessRestartString
			b.	sleep ProcessStartUpDelay
			c.  Run ProcessTestString command
			d.	If -lt ProcessMinimumCount
					1)	If highest ERROR_LEVEL -lt 2, set ERROR_LEVEL to 2 and save ProcessName to StatusProcess
					2)	Add reporting to MessageBody as determined by DEBUG_LEVEL
				ELSE
					1)	If highest ERROR_LEVEL -lt 1, set ERROR_LEVEL to 1 and save ProcessName to StatusProcess
					2)	Add reporting to MessageBody as determined by DEBUG_LEVEL
		ELSE
			a.	Read next row from InputFileName
		fi
	C.	After reading all InputFileName rows
		1.	Calculate MessageSubject
		2.	Print as directed by $DEBUG_LEVEL and ERROR_LEVEL
					
COMMENT1

# ===================================================================
#						User defined parameters
# ===================================================================
InputFileName="/etc/my_watchdog.txt"
MasterProcessTestString=exim
MasterProcessMinCount=1
[email protected]
NotificationName="Server.Domain.com"
[email protected]

OkMessage="No processes needed to be restarted"
Need2RestartMessage="needs to be restarted"
RestartedMessage="was successfully restarted"
FailureMessage="failed to restart"
MasterNotRunningMessage="Master process not running"

# ============= Make no modifications below this line ===============
function SendEmailNotification {
	(printf "From: $ServerName <$FromEmail>\n"; 
	printf "To: $NotificationName <$NotificationEMail>\n";
	printf "Reply-To: $FromEmail\n";
	printf "Subject: $MessageSubject\n";
	printf "$MessageBody\n") | sendmail $NotificationEMail
}

ServerName=$hostname	# set to actual computer name
OLDIFS=""			# Initialize
ERROR_LEVEL=0		# Everything OK = 0, Restarted = 1, Failed to restart 2
StatusProcess=""	# Name of process causing highest failure
MessageSubject=""	# E-Mail Subject line
MessageBody=""		# Captures Body text based on verbosity specified in $DEBUG_LEVEL
if [ -z $1 ] || [ $1 -gt 4 ] || [ $1 -lt 0 ]	# Sets DEBUG_LEVEL parameter based on $1
	then
	DEBUG_LEVEL=0		# Set DEBUG_LEVEL to the default when no valid DEBUG_LEVEL passed
else
	DEBUG_LEVEL=$1		# Assign valid DEBUG_LEVEL passed by parameter $1
fi
if [ $DEBUG_LEVEL -ge 3 ]; then MessageBody="${MessageBody}DEBUG_LEVEL=$DEBUG_LEVEL\n";fi
if [ $DEBUG_LEVEL -eq 4 ]; then printf "DEBUG_LEVEL=$DEBUG_LEVEL\n";fi

# Initialize Columns
ProcessName="" 			# (SpamAssassin)
ProcessTestString=""	# (spamd)
ProcessMinimumCount=0	# (1, or 2, or 8, etc.)
ProcessRestartString=""	# (service exim restart)
ProcessStartUpDelay=0	# (Seconds to wait before testing if restarted)

# Restart downed processes only if MasterProcess is running
commandline=$(ps -ax|grep -v "grep"|grep -c "$MasterProcessTestString")
count=$commandline
if [ $count -ge $MasterProcessMinCount ]	# Do only if master process is running
then
	OLDIFS=IFS			# save off separator
	IFS="," flds=( $fileline )
	nrofflds=${#flds[@]}

	while read fileline # Read rows
	do
		IFS="," flds=( $fileline )
		nrofflds=${#flds[@]}
		ctr=0

		while [[ $ctr -lt $nrofflds ]] # Read Columns
		do
			case $ctr in
				0) ProcessName=${flds[$ctr]};;
			    1) ProcessTestString=${flds[$ctr]};;
			    2) ProcessMinimumCount=${flds[$ctr]};;
			    3) ProcessRestartString=${flds[$ctr]};;
			    4) ProcessStartUpDelay=${flds[$ctr]};;
			    *) echo "Invalid number of columns in row of input file $InputFileName";;
			 esac
			 ctr=$(($ctr+1))
		done
		
		if [ $DEBUG_LEVEL -ge 3 ]
		then 
			MessageBody="${MessageBody}ProcessName = $ProcessName\n"
			MessageBody="${MessageBody}ProcessTestString = $ProcessTestString\n"
			MessageBody="${MessageBody}ProcessMinimumCount = $ProcessMinimumCount\n"
			MessageBody="${MessageBody}ProcessRestartString = $ProcessRestartString\n"
			MessageBody="${MessageBody}ProcessStartUpDelay = $ProcessStartUpDelay\n"
		fi
		
		if [ $DEBUG_LEVEL -eq 4 ]
		then 
			printf "ProcessName			$ProcessName\n" 					# (SpamAssassin)
			printf "ProcessTestString		$ProcessTestString\n"			# (spamd)
			printf "ProcessMinimumCount		$ProcessMinimumCount\n"			# (1, or 2, or 3, etc.)
			printf "ProcessRestartString		$ProcessRestartString\n"	# (service exim restart)
			printf "ProcessStartUpDelay		$ProcessStartUpDelay\n"			# (Seconds to wait before testing if restarted)
		fi
		
		commandline=$(ps -ax|grep -v "grep"|grep -c "$ProcessTestString")
		count=$commandline
		
		if [ $count -lt $ProcessMinimumCount ]
		then
			if [ $ERROR_LEVEL -lt 1 ]; then ERROR_LEVEL=1; StatusProcess=$ProcessName; fi
			if [ $DEBUG_LEVEL -ge 3 ]; then MessageBody="${MessageBody}$ProcessName $Need2RestartMessage\n"; fi
			if [ $DEBUG_LEVEL -eq 4 ]; then printf "$ProcessName $Need2RestartMessage\n"; fi
			commandline=$ProcessRestartString
			eval $commandline
			commandline="sleep $ProcessStartUpDelay"
			eval $commandline
		    commandline=$(ps -ax|grep -v "grep"|grep -c "$ProcessTestString")
			count=$commandline
			if [ $count -lt $ProcessMinimumCount ]
			then
				if [ $ERROR_LEVEL -lt 2 ]; then ERROR_LEVEL=2; StatusProcess=$ProcessName; fi
				if [ $DEBUG_LEVEL -eq 1 ]; then MessageBody="${MessageBody}$ProcessName $FailureMessage\n"; fi
				if [ $DEBUG_LEVEL -ge 3 ]; then MessageBody="${MessageBody}$ProcessName $FailureMessage\n"; fi
				if [ $DEBUG_LEVEL -eq 4 ]; then printf "$ProcessName $FailureMessage\n"; fi
			else
				if [ $DEBUG_LEVEL -ge 2 ]; then MessageBody="${MessageBody}$ProcessName $RestartedMessage\n"; fi
				if [ $DEBUG_LEVEL -eq 4 ]; then printf "$ProcessName $RestartedMessage\n"; fi
			fi
		else
			if [ $DEBUG_LEVEL -ge 3 ]; then MessageBody="${MessageBody}$OkMessage\n"; fi
			if [ $DEBUG_LEVEL -eq 4 ]; then printf "$OkMessage"; fi
		fi
	done <"$InputFileName"
	IFS=OLDIFS
else
	if [ $DEBUG_LEVEL -eq 4 ]; then
		MessageBody="${MessageBody}$MasterNotRunningMessage\n"
		printf "$MasterNotRunningMessage\n"
		$ERROR_LEVEL=3
	fi
fi

# =========== End of Calculations SendEmail If Necessary ============

if [ $DEBUG_LEVEL -gt 0 ]; then
	if [ "$ERROR_LEVEL" -eq 0 ]; then
		MessageSubject="$OkMessage"
	elif [ "$ERROR_LEVEL" -eq 1 ]; then
		MessageSubject="$StatusProcess $RestartedMessage"
	elif [ "$ERROR_LEVEL" -eq 2 ]; then
		MessageSubject="$StatusProcess $FailureMessage"
	elif [ "$ERROR_LEVEL" -eq 3 ]; then
		MessageSubject="$StatusProcess $MasterNotRunningMessage"
	else
		MessageSubject="Error in monitor script. Invalid ERROR_LEVEL of $ERROR_LEVEL"
	fi
fi

if ! [[ ( "$ERROR_LEVEL" -eq 0 && "$DEBUG_LEVEL" -lt 4 ) ]]; then 
	SendEmailNotification
fi

Example my_watchdog.txt entry that restarts SpamAssassin
Code:
SpamAssassin,spamd,3,/usr/bin/spamd -d -c -m 15 --ipv4 --pidfile=/var/run/spamd.pid,5

Example crontab entry that checks for failed processes every 5 minutes, and only notifies me only if it cannot restart it.
Code:
*/5	*	*	*	*	root	/etc/my_watchdog.sh 1
 
Back
Top