Advice On Service So Good It's Invisible-The Backing Up Of Data
IT's attention to backing up data has made for great improvement in data throughput and expanding data protection. Put into context of an everyday credit union, ask if IT wouldn't earn new laurels from faster recovery following failures. Isn't the real aim of backup systems a recovery so fast that CU members would hardly notice a service interruption?
Placing more member services online- including adding wireless services-has really tipped the scales, making it imperative that CUs look into real-time business continuity practices. From your members' perspective, recovery must be a matter of moments, not hours or days.
Keeping data and digital transactions in process safe and accessible requires substantial new thinking from IT groups in order to improve on industry statistics estimating successful backup rates of 40% to 60%. Whether service outages stem from natural disasters or equipment or network failure, a CU's approach with in-house resources and dedicated technology partners has to evolve on par with changing times-and members' increasing expectations.
Continuous member service is the most obvious sign to members that a CU has its priorities right and the technology behind the glass doors is well-thought.
One Credit Union's Model
Reengineering services continuity at Silver State Schools Credit Union (SSSCU) provides a good model of practicality and feasibility. Like many CUs, its network is anchored by IBM nonstop servers, one hosting an Episys core application from Symitar.
"We had the usual tape backup practices for protecting data," said Tracey Brown, Operations Manager. "Yet, the more we discussed continuity of member services, the less satisfied we became. There was exposure to data loss from the time a tape was made until couriers shuttled it between sites and knowing a full restore from the tapes after a major failure would require hours. Potentially, up to 24 hours of data could be lost because an old tape would have to be used."
The credit union sees "continuity" as opportunity, catalyst for change, and technical challenge. Consider that a number of SSSCU's 58,000 members with cell phones or PDAs had services since February of 2002 enabling account access for many types of transactions. These wireless services are supported by integration of SensCom technology integrated with Episys and existing backup processes.
Taking stock, SSSCU's operations group re-enlisted legacy partners, Strategic Technology Solutions and Mainline Information Systems, for planning new continuity-related enhancements. While aspects of the Episys database and 90 servers comprising SSSCU's enterprise (connecting 15 branches) prevented using obvious redundancy tools, an alternative proved even more straightforward. Consultants suggested the RealTime recovery application, which is sold by Mainline and developed by Mendocino Software. Advertised as rapid recovery software, the plan was to test constant access to databases and recovery from various faults during the summer of 2003.
Setting up two test hosts on a test network replicated the production environment. A B80 supported the Episys application and RealTime client. The RealTime Backup Agent resided on another host (a J-40) dedicated to RealTime processing.
"For three months, we tested various scenarios and the virtual cloning of data and tags that RealTime uses to 'rollback' data to any given point in time, i.e., before any problem," said Brown. "In one major recovery simulation we brought back the Episys host and restored services within 20 minutes." The group also tested creating clones to facilitate our study of new software releases without affecting the production system. The successful test showed all that's needed is to make a tape of the clone and restore it to the primary host-minimizing the disruption to member services that sometimes accompanies these upgrades.
Impact on the CU's business processes was also tested-first letting RealTime move all the software and data to the backup (a Supertrans), applying transactions and replicating, running end-of-day routines, and creating a tape backup off backup.
The proof-of-concept tests identified some additional items that were finalized before going live with the RealTime solution.
"The Operations Group recognized that attention to numerous levels and facets of the infrastructure linked to their backup reengineering might also be enhanced," said Chris Dedham, Business Continuance Practice Director at Mainline. "For example, they tuned the wireless WAN for better bandwidth to support transaction throughput, increased the disk space on the Episys production host, and upgraded the AIX operating system before loading RealTime into the live environment in October."
New Light On One High Cost
The high availability of today's hardware and database application systems might lead some to think there's less justification for protecting a credit union from unplanned downtime. Even if you could depend on luck to save the CU from host or database failures, CUs gain advantages from recovery solutions in other aspects of the business.
Consider the impact scheduled maintenance can have on services for customers who work and live in 24-hour cities. CUs generally schedule taking systems offline for a given number of hours each year for maintenance. Although the ATMs still work during these outages, members are inconvenienced if account balances aren't available for lookup and the ATMs either dispense money and create an over draft or, alternatively, limit cash dispensing.
SSSCU circumvents problems like these with its RealTme recovery system that enables maintenance on the replicated host (e.g. purging files, new releases, nightly backups, and other testing) before restoring to the production system in one final step that happens so quickly that members hardly notice a processing delay. Hence, SSSCU meets/exceeds 99% availability of all services on a 24 x 7 basis.
A brief note on certain technical aspects of the client/server architecture and application is important because they contribute significantly to service continuity and other cost-efficiencies. For one, the architecture offloads almost all of the RealTime replication services processing from the Episys production host to a separate host that also serves as a backup host. This backup server, per the RealTime Backup Agent, is where replications are created and managed, ready for recovery. Moreover the production client maximizes the storage RealTime uses by compression, caching, and virtualization techniques.
Also note that if recovery processes rely on restoring full volumes of data or require use of application logs to fully reconstruct data prior to system failure, the CU won't have accomplished much to put systems back online quickly. The idea advanced at SSSCU was to reduce recovery time with technology that continually captures data and time stamps it (a journaling process), and logs the data. As such, data is recovered is restored simply by returning through time and test by the Operations Group showed a full restore of SSSCU's 10 Gbytes would require less than a half hour.
The Value Of The Approach
This approach also provides an important protection or hedge against data corruption. Since this approach replicates in bursts and does not duplex the data, corruption disabling the primary server may be avoided by the delay in replicating. Also with the rollback feature the backup host can be rolled back to a point in time prior to any corruption. It only would be necessary for front line staff to verify the last transaction entered.
With a few months of working experience, we are finding ways this model for CU service continuity might be enhanced. To protect transactions in process we have asked Symitar to work with us to "quiet" the system periodically to ensure that the replication includes no partial transactions. The system would then indicate a synchronization point with the replication so it would be a known good point.
This would greatly streamline the recovery process, but is not mandatory to the quality of the replication process in the event Symitar opts not to cooperate.
Dan Kinne, Vice President of IT for SSSCU and a member of the CUNA Technology Council, where he had been vice chairman, and a past president of the Washington DC chapter of Information Systems Audit and Control Association.