Monday, October 31, 2005

SBS, Healthmon: Article 6, Specific items to monitor for using healthmon

Without giving away our edge here, I wanted to document a few of the healthmon alerts that I like to use, and alluded to in the previous five SBS healthmon posts (linked below). This is by no means exhaustive or comprehensive, but using commonsense, you can start to see some of the real value you can add by leveraging healthmon in your offerings. And remember, this is all out-of-the-box type functionality, and the key is that it provides real business-value.

Why did that server go down over the weekend? Are my backupexec remote agents running? When was that computer account created, and why? Who created that user account?

In the enterprise, these items would be covered and then documented as part of a change management process. In any environment, these are really key building blocks in establishing a “managed services model”.

So, listed below are a few of the alerts I use, as well as the relevant alert configurations. Each one has a justification section explaining why they’re being used. Take a look. If you have something you’re using effectively, why don’t you go ahead and add a comment to that effect?

Type: Core server alert, Data Collector>Service Monitor
Name: SBS Server, BackupExec Remote Agent
Details, Service: BackupExecAgentAccelerator
Details, Properties: Display Name, Started, State, Status,
Actions: Send Email>Execution condition, Critical>Reminder:6 hours
Message: standard service monitor message (state and condition), plus server name.

Justification: Alert sysadmin when the Veritas Backup Exec remote agent service on the SBS 2003 server fails. The “backup server” is located on another machine.

Type: Core server alert, Data Collector>Windows Event Log monitor
Name: SBS - Computer Account Created (645)
Details: Success audit
Details, log file: Security
Details, Event ID: 645
Actions: Warning, and Critical email
Schedule: all days, all times, every 1 second, 1 sample needed
Message: Computer account created

Justification: Alert sysadmin whenever a computer account gets created. Computer accounts created by non-admin’s need to be reviewed for security compliance (a/v, patch, OS, etc).

Type: Core server alert, Data Collector>Windows Event Log monitor
Name: SBS - Computer Account Deleted (647)
Details: Success audit
Details, log file: Security
Details, Event ID: 647
Actions: Warning, and Critical email
Schedule: all days, all times, every 1 second, 1 sample needed
Message: Computer account deleted

Justification: Alert sysadmin whenever a computer account gets deleted. Computer accounts typically shouldn’t be deleted by non-technical owners and sysadmins should be alerted to review the issue.

Type: Core server alert, Data Collector>Windows Event Log monitor
Name: SBS - User Account Created (624)
Details: Success audit
Details, log file: Security
Details, Event ID: 624
Actions: Warning, and Critical email
Schedule: all days, all times, every 1 second, 1 sample needed
Message: User account created

Justification: Alert sysadmin whenever a user account gets created. User accounts typically shouldn’t be deleted by non-technical owners and sysadmins should be alerted to review the need/assignment of the account and account properties.

Type: Core server alert, Data Collector>Windows Event Log monitor
Name: SBS - User Account Deleted (630)
Details: Success audit
Details, log file: Security
Details, Event ID: 630
Actions: Warning, and Critical email
Schedule: all days, all times, every 1 second, 1 sample needed
Message: User account deleted

Justification: Alert sysadmin whenever a user account gets deleted. User accounts typically shouldn’t be deleted by non-technical owners and sysadmins should be alerted to review the change.

Type: Core server alert, Data Collector>Windows Event Log monitor
Name: SBS - Failed Password Attempt (529)
Details: Success audit, Failure audit
Details, log file: Security
Details, Event ID: 529
Actions: Critical email
Schedule: all days, all times, every 1 second, 1 sample needed
Message: User account deleted

Justification: Excessive failed password attempts should be reported to a sysadmin for event follow-up.

Name: Member Server – Server-Name (no-ups) Ping (ICMP) Monitor
Details: System, “server-name”, timeout(msec): 1000
Actions: Critical email
Details: Success audit, Failure audit
Details, log file: Security
Details, Event ID: 529
Actions:
Schedule: all days, all times, every 10 minutes, 6 samples needed
Message: Server-name down: Ping failed (power-issue?)

Justification: For servers that do not have a UPS attached, downtime should be recorded and reported to a sysadmin as justification for possible purchase (non-sbsers probably won’t get this one).

Previous 5 SBS, Healthmon articles in order of posting:
SBS: "sbsmonacct" and healthmon alerts
SBS Healthmon: Filtering events for notification
SBS, Healthmon: Why would I care about notifications?
SBS, Healthmon: Why would I care when a computer account gets created/deleted?
The managed services model

No comments: