As our U.S. based readers are likely aware, Wednesday of this week was not necessarily a good day for the IT community. In the course of a few hours Wednesday morning, mission critical systems of the New York Stock Exchange, United Airlines and The Wall Street Journal failed, and respective customers were not pleased.
With the cascading breaking headlines on Wednesday, just about everyone’s initial impression was that this was some form of a coordinated cyber-attack. After quick investigations from various federal agencies, that premise was later negated.
Since Wednesday, the NYSE communicated to its brokerage customers that the outage was likely caused by a planned software upgrade that was underway. The outage resulted in a four hour outage, but remarkably, had little impact on the exchange of stocks because ancillary systems took on the task of transacting trades. According to a report in today’s WSJ, the problem started with a new software program designed to more precisely time-stamp data that was installed during the prior evening.
The WSJ outage was reportedly caused by a volume surge that overwhelmed the publication’s home page. It remains unclear, at this point, as to why the volume surge occurred.
The United Airlines outage, which extended upwards of 90 minutes, causing the cancellation of 60 flights and consequent delays to hundreds of U.S. based flights, was attributed its outage to a failed router in its computer network. As anyone who has flown lately can attest, airlines like United have cut back on airport customer service agents. Thus, system-wide interruptions cause significant passenger disruption, particularly when backup planning is inconsistent. Given United’s continual history of computer failures, schedule interruptions and poor customer service, the Wednesday incident was yet another source of continuous disappointment from United’s long-standing customers. (This author is included in that category).
As a supply chain community who deal with business and mission-critical systems each and every day, Wednesday’s litany of IT incidents provide us poignant reminders. The first and obvious reminder for IT teams themselves is that no mission-critical system should have a single-point of failure. While that appears to a simple statement, the existence and complexity of global-wide outsourced systems and/or networks has added new vulnerabilities which must be communicated and addressed. There is the further theme of complex software upgrades that can precipitate outages. It is no wonder that IT and business functional teams remain very concerned about the potential risks of complex ERP or supply chain business critical system and applications upgrades.
For functional supply chain and line of business team leaders, the prime takeaway is twofold. First, listen to your IT support teams when they raise concerns regarding system vulnerabilities or needs to invest in IT redundancy in specific business critical systems. Too often, functional business and supply chain teams become too impatient with planned system maintenance downtime or extra time needed to complete a planned software upgrade. Better to invest that energy in preparing consistent contingency back-up plans. Insure that there are plans associated with each and every business critical system. Take the time to thank and reward both IT and functional teams for their diligence in planning.
A final message relates to senior executive leaders and their zest for cost control. A theme surrounding Wednesday’s concurrent outages is that larger and more complicated business critical systems require adequate resources to support testing, monitoring and reliability. That includes not only adequate defenses to guard against hacking and cyber-attacks but day-to-day operations as well.
Many years ago, I worked for a very insightful CIO who mastered communications to senior executive management. Often, when he received pressure regarding systems maintenance budgets associated with mission critical business systems such as order fulfillment, he would use an analogy of flying on a jet aircraft. “Do you expect the pilots to upgrade or change an engine while flying at 30,000 feet.” Of course not, and that is why diligent and timely maintenance and backup plans exist.
Don’t let your firm be the next headline for a supply chain systems failure.