PDA

View Full Version : Monitoring helps avoid down time


kjkoster
08-11-2010, 13:16
Dear All,

This weekend we had a nice example of how monitoring averted unplanned downtime. The temperature of a set of disks started rising sharply. Interestingly the temperature went up on two machines and on none of the others. These two machines are both in the same server room, separate from the rest. Based on history I concluded that the air cooler in that room had gone dead (for the third time in two years). That turned out to be the case. Someone had switched it off by accident. They switched it back on and the temperatures are almost nominal again.

This shows that 1) my ISP does not have any monitoring of their air conditioners in that room, and 2) They are not monitoring the temperature of their hardware.

Kees Jan

kjkoster
09-11-2010, 21:10
Dear All,

This just gets better and better. Turns out the airco broke the day after this happened. Again I had to inform my ISP and they had no idea. Good thing I have no production hardware hosted with them, only test machines.

Kees Jan

kjkoster
09-11-2010, 21:11
Dear All,

Oh, and they have a work-around: the door of the server room is barred open with a box. I don't know whether to laugh or cry.

Kees Jan

cfd
31-12-2010, 11:49
laugh or cry ? who cares at your ISP... they have a insurance for the hardware I guess ;)

kjkoster
31-12-2010, 11:55
Dear cfd,

*haha* I had not thought of it that way.

And to think that two years ago I actually had my production machine in that rack. What was I thinking?

Well, all my test machines are now in a different location. 24/7 access, remotely accessible power strip, hot/cold alleys, DNS servers on different subnets and locks on the rack doors that actually close.

Kees Jan