Monday, 14 September 2009

The Inevitability of Down Time

It's been a bad week for service providers, which knocks on to a bad week for the people who support their products in client organisations. Problems happen, but how a company handles problems and how it communicates what happened can display their real quality.

On Monday the 7th of September online payment processors SagePay (the rebranded Protx) suffered an outage affecting all 25,000 clients and untold end users. Late in the evening of Wednesday the 9th of September the mobile phone company Orange experienced an outage which affected some of their mobile data customers and their landline broadband customers. This combined with a reportedly unrelated Blackberry data service outage during the same period to leave users only able to access email from their computers.


IT Managers are generally a pragmatic bunch and more than anyone else in an organisation will accept that down time happens. This will generally be for one of two reasons:
  1. The complexity of modern systems means that unanticipated problems will only be discovered when the exact required combination of coincidences occurs to make them happen.
  2. Anticipated risks can be avoided, but only at a cost - at a certain point the diminishing returns from spending to avoid risk make the sensible decision one to accept the risk.
Problems happen, but the response to them is what matters. SagePay focused on getting their core service back up and running, with the result that payments could be made as normal through the site again within two and a half hours, even though full functionality wasn't restored for nearly a full day. Within a day all the companies using their services had been forwarded a message from the head of SagePay explaining the problem.

With Orange experiencing major problems that lasted for over a day, there was no contact from Orange, and there has been no information after the event from any level. The only way of gathering news was from online forums and via industry news websites - despite downright confusing issues including a 50 / 50 split between Blackberry devices working normally and being completely inactive.

IT Managers are the suppliers to the rest of their organisation. When there is a problem with a service and no information is supplied as to what went wrong, it is the IT Manager who is hung out to dry. "An external problem occurred, but I don't know what it was" isn't an answer that anyone wants to give - and any supplier in a competitive industry shouldn't force people to give it.

0 comments:

Post a Comment