kjkoster
03-09-2009, 20:16
Dear All,
I am happy to announce that we have resolved the server problems that started just over a week ago. The problem was an unexpected dependency between our front-end Apache server and its monitoring server. This was brought to light by a defective router at one of our hosting providers. When the Apache monitoring server was unreachable due to a broken router, the Apache daemons would freak out and cause the errors you have seen.
To remedy this problem, my hosting service has replaced the router with a new one and I have broken thee dependency between the monitoring server and the Apache daemons, *and* I have broken the dependency between the Java-monitor data collection service and the Apache daemons. Finally, we now have a throttling mechanism in place to keep the probes from swamping the Java-monitor collector after a restart. This means much smoother restarts for us, and less false alerts for you.
Interestingly Java-monitor alerted me of its own impending demise every time. While it was nice to see that my monitoring service worked even for monitoring itself, it is only a small consultation when I know that means more false positives being sent to the other Java-monitor users.
We have just released a new probe that is robust against the outages we've been seeing over the past few days. I would like all of you to take a moment to download the new probe and replace your current one.
Please accept our apologies for the false alarms.
Kees Jan
PS. To update the probe, simply log in to Java-monitor.com and download the probe again. Then undeploy or delete the old one from your application server and replace it with the new one.
PPS. We have a good record of when we sent out false positive SMS messages. Of course, we are refunding those that were sent in error.
I am happy to announce that we have resolved the server problems that started just over a week ago. The problem was an unexpected dependency between our front-end Apache server and its monitoring server. This was brought to light by a defective router at one of our hosting providers. When the Apache monitoring server was unreachable due to a broken router, the Apache daemons would freak out and cause the errors you have seen.
To remedy this problem, my hosting service has replaced the router with a new one and I have broken thee dependency between the monitoring server and the Apache daemons, *and* I have broken the dependency between the Java-monitor data collection service and the Apache daemons. Finally, we now have a throttling mechanism in place to keep the probes from swamping the Java-monitor collector after a restart. This means much smoother restarts for us, and less false alerts for you.
Interestingly Java-monitor alerted me of its own impending demise every time. While it was nice to see that my monitoring service worked even for monitoring itself, it is only a small consultation when I know that means more false positives being sent to the other Java-monitor users.
We have just released a new probe that is robust against the outages we've been seeing over the past few days. I would like all of you to take a moment to download the new probe and replace your current one.
Please accept our apologies for the false alarms.
Kees Jan
PS. To update the probe, simply log in to Java-monitor.com and download the probe again. Then undeploy or delete the old one from your application server and replace it with the new one.
PPS. We have a good record of when we sent out false positive SMS messages. Of course, we are refunding those that were sent in error.