Short of a major data breach, it could be a bank chief information officer's worst nightmare: a critical piece of software crashes, disrupting an important and highly visible activity like processing consumer payments or setting net asset values for mutual fund clients.

HSBC and Bank of New York Mellon are still reeling from recent software snafus that brought down major segments of their business and are all too familiar in financial services. In July the New York Stock Exchange suffered a systemwide outage affecting every stock on the exchange – traders basically got a blank screen. In 2012 a problem in Knight Trading's algorithmic trading system that lasted for thirty minutes cost the firm $440 million.

Software problems are inevitable, and the financial industry at the moment is particularly vulnerable. Many banks have old core systems, in some cases more than 40 years old, and all are under pressure to provide state-of-the-art mobile and online services.

"Digital transformation is a real challenge because you're trying to make significant upgrades to your systems quickly, without breaking the bank," said Lev Lesokhin, executive vice president of strategy and analytics at Cast Software. "How do you manage that? Very carefully."

All of this is driving banks to take software risk and software quality more seriously.

"Security gets a lot of attention, but when you look at the cost to the business and [information technology] and the business impact, these software glitches, some of which we see in the papers and many of which we don't, cost a lot more money than security," Lesokhin said. "Security is scary, and this stuff just happens all the time."

There are several things banks can do to improve the reliability and security of their code and minimize the odds of being the victim of a major software outage.

Software Testing
Careful testing of software before deployment is one common precaution. Margo Visitacion, vice president at Forrester, said that while most banks test software, their efforts are sometimes cursory.

"One thing I've seen that's specific to financial applications is that when organizations are testing, they are miscalculating the test coverage and in the need for speed, they're not always encountering some of the regression testing that needs to be performed," she said.

Tests need to take into account related systems that may be affected by software changes, she said. "If you're testing what has been changed in the release or what has been modified in a release and you're not testing the areas that have potential impact, you're not sufficiently covering all the potential tests you could perform," Visitacion said.

Vetting Vendors
In BNY Mellon's case, it was not the bank's own software, but an accounting program from SunGard called InvestOne that it has been using since the 1990s, that collapsed during an update. That failure left the bank unable to calculate net asset values for more than a thousand mutual funds.

Hundreds of bank executives reportedly spent days manually calculating these values. Such calculations are data intensive and require timely data on all underlying equities and funds. If the data gets corrupted or is not handled correctly, the results prove flawed and the entire process needs to be fixed.

But banks can and should rigorously vet software from the outside.

"When it's third-party software you're bringing in-house, that's when quality management becomes even more important," Visitacion said. "You need to be looking at the performance of the package and how that third-party package is going to operate in your environment. There needs to be a quality plan put into place to verify that the bank understands what the known defects are within the application."

And as software updates occur, there needs to be transparency between the vendor and the financial institution to plan for and conduct the appropriate levels of testing, she said.

It is typical for bank executives to hand off  the oversight of software to a trusted third party, but they are realizing they need to be more involved.

"What some in the industry have been doing is taking a more proactive approach to that and saying to their vendors, 'I'm going to measure the level of software risk and structural quality you're delivering to me,'" Lesokhin said. That includes ensuring the robustness of the software itself before any problems flare up.

"We're seeing some of the more forward-thinking IT shops putting requirements in their service level agreements on quality of engineering of the software in order to avoid these types of glitches," Lesokhin said.

Precarious Updates
Software updates are often behind major bugs at banks, said Bill Curtis, the executive director of the Consortium for IT Software Quality.  In Knight Trading's case, a software update accessed outdated code that suddenly made more than $400 million in bad trades in 30 minutes.

The NYSE outage in July also happened around a system upgrade.

"This is kind of typical," Lesokhin said. "Upgrades are complex, [and] these systems have many components, many points of integration to other components in the environment. You might be making a simple change in one component, but it's hard to predict how that little change might affect the overall system or how that system will interact with other systems."

Often, during the first day or week of a software upgrade the IT staffers wait "with their fingers crossed with a fire squad waiting to fix things if they happen," Lesokhin said.

Even testing of software updates is problematic, he said.

"You can't test these systems enough — there are more paths through one of these systems than there are known stars in the universe," said Lesokhin. "The number of permutations is incredibly high. So you end up doing risk-based testing. You're testing just the paths that are the most likely to be executed. But some of the edge conditions and corner conditions it's hard to run enough regression tests or stress tests to test those."

New Quality Standards
Until recently, there has been no way to objectively measure software quality and eliminate potential points of weakness that could lead to outages. Systems integrators would have security and reliability written into their contracts and service-level agreements, but there were no standards for how it was measured.

That led to the formation of the software-quality consortium, Curtis said. Curtis is also the author of the capability-maturity model, a method of improving software development processes.

The group has come up with a specification for measuring four aspects of software: reliability, security, performance efficiency and maintainability.

Each of these measures is based on detecting and counting violations of good architectural practice. For instance, in security there is a repository maintained by The Mitre Corp. called the common weakness enumeration; it is a collection of more than 800 known weaknesses in computer code that hackers have exploited . So one aspect of the standard is to look for such weaknesses in software.

"Some of these are old weaknesses," Curtis said. "They've been known for 20 years and we still find them in software." For instance, sometimes SQL injection commands pop up that could give a hacker anything he asks for.

The standards, and forthcoming certifications based on these standards, can be applied to third-party software.

"We've seen this work well where the customer will put in place a quality gate, so when the software is brought in, it's run through series of tests: functional testing, static analysis, penetration testing — a number of different techniques to find any unacceptable violations of good practice in the code," Curtis said. Then depending on the contract, the vendor or client fixes the problems.

No method will detect and prevent all software glitches, especially with software as complicated as that used in banks.

"Complex software is like a big block of Jello," said Lesokhin. "If you touch it one place, the whole thing kind of wiggles elsewhere and you have to wait for it to stabilize."