A series of fresh technology shutdowns this spring at banks around the world reveals the financial services industry still has a long way to go toward ensuring full up time for networks, as well as communicating with the public about why tech glitches have happened and what is being done about them.
In May, Santander, Barclays and HSBC were all hit by digital banking outages. Some customers of Barclays and Santander were unable to access accounts online for a time near the end of the month, an outage blamed largely on end-of-the-month transaction volume. At HSBC, an IT hardware failure temporarily rendered ATMs unable to dispense cash or accept card payments in the U.K. Barclays and Santander both apologized for the outages though statements, while HSBC's approach revealed both the power and peril of social media in such cases.
HSBC's PR office took to social media to communicate updates on the outage, and to also receive criticism about the outage (HSBC, Santander and Barclays did not return queries for comment). After an earlier outage in November, HSBC had set up a social monitoring team to be more proactive about communicating with the public about tech glitches, a move that seemed to have some positive impact, as not all of the Twitter and Facebook postings about the most recent outage were complaints.
The basic task of making sure the rails are working, and smoothing things over with customers when systems invariably shut down, is an even more pressing matter considering the propensity for outrage to spread quickly among the public via new channels.
"One thing that's true about outages is we're hearing more about them. The prevalence of social media use by irate customers and even employees makes these outages more publicized," says Jacob Jegher, a senior analyst at Celent.
Jegher says the use of social media for outage communication is tough - balancing the need to communicate with customers with internal tech propriety is easier said than done. "While it's certainly not the institution's job nor should it be their job to go into every technical detail, it's helpful to provide some sort of consistent messaging with updates, so customers know that the bank is listening to them," Jegher says.
National Australia Bank, which suffered from a series of periodic online outages about a year ago that left millions of people unable to access paychecks, responded with new due diligence and communications programs. In an email response to BTN, National Australia Bank Chief Information Officer Adam Bennett said the bank has since reduced incident numbers by as much as 40 percent through a project that has aimed to improve testing. He said that if an incident does occur, the bank communicates via social media channels, with regular updates and individual responses to consumers where possible.
The bank also issued an additional statement to BTN, saying "while the transaction and data demands on systems have grown exponentially in recent years led by online and mobile banking, the rate of incidents has steadily declined due to a culture of continuous improvement...The team tests and uses a range of business continuity plans. While we don't disclose the specifics, whenever possible we will evoke these plans to allow the customer experience to continue uninterrupted."
While communicating information about outages is good, it's obviously better to prevent them in the first place.
Coastal Bank & Trust, a $66 million-asset community bank based in Wilmington, N.C., has outsourced its monitoring and recovery, using disaster recovery support from Safe Systems, a business continuity firm, to vet for outage threats, supply backup server support in the event of an outage, and contribute to the bank's preparation and response to mandatory yearly penetration and vulnerability tests.
"Safe Systems makes sure that the IP addresses are accessible and helps with those scans," says Renee Rhodes, chief compliance and operations officer for Coastal Bank & Trust.
The bank has also outsourced security monitoring to Gladiator, a Jack Henry enterprise security monitoring product that scours the bank's IT network to flag activity that could indicate a potential outage or external attack. The security updates include weekly virus scans and patches.
Coastal Bank & Trust's size - it has only 13 employees - makes digital banking a must for competitive reasons, which increases both the threat of downtime and the burden of maintaining access.
"We do mobile, remote deposit capture, all of the products that the largest banks have. I am a network administrator, and one of my co-workers is a security officer. With that being said, none of us has an IT background," Rhodes says. "I don't know if I could put a number on how important it is to have these systems up and running."
Much of the effort toward managing downtime risk is identifying and thwarting external threats that could render systems inoperable for a period of time.
Troy Bradley, chief technology officer at FIS, says the tech firm has noticed an increase in external denial of service attacks recently, which is putting the entire banking and financial services technology industries on alert for outage and tech issues with online banking and other platforms.
"You'll see a lot of service providers spending time on this. It's not the only continuity requirement to solve, but it's one of the larger ones," he says.
To mitigate downtime risk for its hosted solutions, FIS uses virtualization to backstop the servers that run financial applications, such as web banking or mobile banking. That creates a "copy" of that server for redundancy purposes, and that copy can be moved to another data center if necessary. "We can host the URL (that runs the web enabled service on behalf of the bank) at any data center...if we need to move the service or host it across multiple data centers we can do that...we think we have enough bandwidth across these data centers to [deal with] any kind of denial of service attack that a crook can come up with," Bradley says.
FIS also uses third party software to monitor activity at its data centers in Brown Deer, WI; Little Rock and Phoenix, searching for patterns that can anticipate a denial of service attack early and allow traffic connected to its clients to be routed to one of the other two data centers. For licensed solutions, FIS sells added middleware that performs a similar function, creating a redundant copy of a financial service that can be stored and accessed in the case of an emergency.
Stephanie Balaouras, a vice president and research director for security and risk at Forrester Research, says virtualization is a good way to mitigate both performance issues, such as systems being overwhelmed by the volume of customer transactions, and operational issues such as hardware failure, software failure, or human error.
"If it's [performance], the bank needs to revisit its bandwidth and performance capacity. With technologies like server virtualization, it shouldn't be all that difficult for a large bank to bring additional capacity online in advance of peak periods or specific sales and marketing campaigns that would increase traffic to the site. The same technology would also allow the bank to load-balance performance across all of its servers - non-disruptively. The technology is never really the main challenge, it tends to be the level of maturity and sophistication of the IT processes for capacity planning, performance management, incident management, automation, etc.," she says.
In the case of operational issues, server virtualization is still a great technology, Balaouras says, adding it allows the bank to restart failed workloads within minutes to alternate physical servers in the environment or even to another other data center.
"You can also configure virtual servers in high-availability or fault-tolerant pairs across physical servers so that one hardware failure cannot take down a mission-critical application or service," Balaouras says.
Balaouras says more significant operational failures, such as a storage area network (SAN) failure, pose a greater challenge to network continuity and back up efforts. "In this case, you would need to recover from a backup. But more than likely a bank should treat this as 'disaster' and failover operations to another data center where there is redundant IT infrastructure," she says.
Outages are still a risk, monitoring and backup tech is improving. The right social media communications help on the PR side.