On Sunday, October 22nd at approximately 5:30 PM PDT (UTC-7) the network in Los Angeles went offline due to a power supply failure on the core switch. Upon working with the datacenter we found that the spare power supply we had available was also faulty and would not power the switch on. We were not able to source a third replacement PSU until approximately 12:15 AM (Monday, October 23rd). Upon installation of the new PSU the network came back online immediately.
In addition to this global outage, clients on node ovzla002 experienced a slightly longer downtime as when the core switch went offline the NIC on this host stalled and did not relink when the switch came back online. A reload of the NIC driver on this machine resolved the problem without a reboot and client services became available approximately 30 minutes after the initial service restoration.

To prevent this from being a repeat issue in the future we have already sourced two new supplies for the switch as well as two spares to keep in stock. We also have another sysadmin starting this week who will be able to respond to issues like this during times when our other staff members are not available. Finally, we will also be adding third party availability monitoring to complement our existing monitoring system.

If you have any questions or concerns, or are still having trouble accessing a service please contact our helpdesk - https://billing.anynode.net/submitticket.php

Sunday, October 22, 2017

« Atrás