Unscheduled websrv01 Downtime

Update: Oct 21 2:24AM
websrv01 is currently back online. Diagnostic tests are currently running on the server as we speak (thanks to Greg), but initial reports indicate that 1 of the 2 hard drives on websrv01 has died. Luckily we run a RAID 1 (mirror) configuration, so the other drive is picking up the slack (whew). Dell is aware of the issue and will get back to me later in the day to schedule a time for them to visit the data centre and investigate further. I will post more information as it becomes available.

Initial Report: Oct 20, 11:03PM
websrv01 is currently off-line. It is highly upsetting to say that; however, we are currently experiencing some major hardware issues / failures. I am currently working with Dell and our co-location provider to resolve the issue; however, we expect that the server will be down most of the day on Wednesday while we recover service.

Unscheduled websrv01 Downtime

This morning we experienced a short unscheduled service outage on websrv01 due to spam attack that took place early in the morning. This incident could have easily been avoided if a select few users had e-mail address passwords that were not incredibly simple. If you have a simple e-mail address password, please change it immediately. Passwords should be alphanumeric and contain a minimum of 6 characters, and no dictionary words.

Unscheduled websrv01 Downtime

We are currently experiencing an unscheduled service outage on websrv01 due to what we believe may be a hardware issue on the server. In fact, I think this could be the same issue we encountered on April 25th, and I hate to say it but our co-location provider *still* has not resolved the misconfigured the power port that our server is plugged into, so I am still unable to reboot the machine.

A technician has been informed of the problem, and someone is going down to the server to reboot it right now. Luckily, I am told there are people in the building today, so it should be back shortly. I will post an update as soon as I know anything.

Update 9:29AM
Data centre technicians are making their way to the server right now to fix the APC switch and restart the machine.

Update 10:26AM
I’m still waiting, and getting more angry by the minute. I apologize for the inconvenience.

Update 10:50AM
websrv01 is back online after the technician finally rebooted the server, I apologize once again for the inconvenience. I am fairly certain that they assigned John to my support ticket:

Data Centre Technician John

Data Centre Technician John

Unscheduled websrv01 Downtime

We are currently experiencing an unscheduled service outage on websrv01 due to what we believe may be a hardware issue on the server. Unfortunately our co-location provider misconfigured the power port that our server is plugged into, so we were unable to reboot the machine ourselves. Currently we have a technician assigned in Montreal who is on his way to the data centre to reboot the server and investigate further. We will update this post as more information becomes available.

Update 3:53PM
We are still working with our co-location provider to determine the exact cause of the problem. One theory currently being investigated is that we may be experiencing a distributed denial of service attack on the server. As soon as we have any further information, we will post it.

Update 6:20PM
The problem has now been resolved, and all service has been fully restored. It does in fact appear to have been a distributed denial of service attack, which fortunately ceased on it’s own. We sustained 1Mbit of http traffic to websrv01 for only a short period of time before the server was unable to handle the requests. The 1Mbit wall continued until just after 6PM when it stopped just as mysteriously as it began. Further investigation is on-going and any new information will be made available.

We apologize for the inconvenience.