Zenoss, VMWare and Critical “is up” alerts

I’m currently working on setting up Zenoss to either replace or supplement our current network/server monitoring systems.

As its in test, it got stuck on our currently relatively unused “old” vmware environment (ESX 3). After initially going well it start to go wrong. We kept getting critical alerts that a server was up. Switching zenping to debug didn’t help either – it offered no new information and made the problem worse. As its open source I thought I’d take a look at the source, and hey presto I found the suspected problem. Stuck in an extra debug line, and confirmed it.

The problem – well it was an issue with the clock on the server caused by running under vmware , it was jumping about leading to negative rrt on pings. Funnily enough zenoss didn’t like. I’ll be submitting a bug so that it comes up with the slightly less cryptic error of Ip xxx.xxx.xxx.xx is up. To fix the problem, I had to specify the clocksource in the kernel options. See vmware KB 1006427 for details.

If you can’t fix this for some reason or can’t reboot server, for now you can put an event transform for status/ping in place to suppress them.

import re
match = re.search('ip (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) is up', evt.message)
if match and evt.severity==5:
	evt._action = 'drop'

September 6, 2009 · robert · Comments Closed
Tags: , , , ,  · Posted in: Linux, VMWare, Zenoss