Don't Blame the Hackers

Register now

The numerous online and ATM banking outages over the past few months can't be attributed to hackers or cybercrime, rather the raft of interruptions and outages were all caused by systems glitches arising from ordinary updates, upgrades and overnight maintenance to old infrastructure-portending more downtime to come if banks don't modernize their legacy systems.

Among the glitches: "system upgrades" that darkened Bank of America's Web channel in March, the "internal systems issue" that brought down 9,500 Wells Fargo ATMs in February, and the "routine database maintenance" that caused Commonwealth Bank of Australia's ATMs to spit out cash in March to customers in amounts far exceeding their balances.

The problem is the big banks' aging, mainframe-based architectures are showing signs they're having trouble supporting optimal uptimes, particularly when adding new systems. (While CBA is porting its legacy, batch-run systems to a new real-time, core banking solution powered by SAP, the bank continues to rely on mainframes to run it.) Most major outages result from unforeseen interactions of new applications deployed atop old systems. "It's the structural engineering-which somebody didn't quite understand-that all of a sudden blows up in operations," says Bill Curtis, founding director of the Consortium for IT Software Quality (CISQ).

One of the most prevalent causes of banking outages is "an upgrade with unintended consequences," says Richard Crone, founder of banking IT advisor, Crone Consulting.

Big banks' mainframe systems have been customized so much that the risks of prolonged outages are perceived as greater now than they are for institutions using modern grids. All agree mainframes entail lengthier lead times for new product development; most big banks' core systems are comprised of millions of lines of computer code, much of it written in COBOL, a programming language first developed more than 50 years ago. While it may be difficult to crack such an antiquated language, it's just as hard to find someone who fully understands it-veteran programmers retire but their programs remain-let alone locate any records that completely track its use in the enterprise down the years.

"It's incredibly complex, it's not documented, and the guy that wrote it is dead," quips Curtis, who is also chief scientist at CAST, which probes interactions of different programming languages.

That means "more outages will occur at banks as their processing demands increase, if banks don't update to service-oriented architectures," Crone says. "Mobile banking and payments in particular portend banking logjams, due to exponential increases in service interactions expected." Disruptions involving modern grids, meanwhile, are "easier to fix in a shorter period of time," he says. Channels will go dark-"no one can guarantee total prevention of problems," says Zohar Gilad, evp at Precise Software, a performance monitoring provider. The goal is to trim "MTTR," or "mean time to recovery."

Old code makes that tougher.

For reprint and licensing requests for this article, click here.