When TopLine Federal Credit Union transitioned from physical servers with tape backup to virtualized servers replicated to a storage-area network at a disaster-recovery site, delays began immediately.
Neither the new virtual server environment (configured using EMC's VMware) nor the storage-area network (a Dell EqualLogic system) were at fault. They worked properly and did what they're supposed to: replicate data from the $300 million-asset credit union's headquarters in Maple Grove, Minn., over to the institution's disaster-recovery site about 10 miles to the east at a branch in Brooklyn Park. The problem was the new setup was pushing so much data through TopLine's T1 pipe and sapping so much bandwidth in the process that it was preventing the very job it was tasked with completing from finishing on time each night. "When we put the two SANs in the virtual environment, we did not anticipate replication being an issue," says Colleen Jakes, TopLine's director of information services. "But it became drastically slow."
In such a setup, the SAN typically holds all the data and VMware carries all the server configurations, promising high availability because VMware allows users to migrate virtual servers to separate physical locations, as long as the same hardware is placed at the disaster-recovery site. However, SANs replicate by reproducing whole blocks of hard drives, versus replicating file by file, creating the potential for surprise saps to bandwidth for firms fully engaging such solutions for the first time.
TopLine's SAN, for instance, contains 15 terabytes worth of hard drives and there are typically about 144 blocks per drive. In addition, the SAN, though it controls where the information is written, does not necessarily assess changes sequentially; it replicates the most available or "ready to be replicated" block. So the system was essentially reproducing every block that had any changes written to it every night.
Doing so took up nearly all the T1's capacity, slowing the backup process considerably. The result was that TopLine could only replicate a third of its data nightly before having to turn off the backup system in the morning to free up bandwidth to resume normal production.
"It would only ever catch up on the weekends, where it had a solid chunk of time — about 36 hours — to complete the replication," Jakes says. "We couldn't get backup in a short run of time, nor with all the data."
Buying more bandwidth — always an expensive proposition — was cost prohibitive, Jakes said, and thus out of the question. However, Dell suggests in its troubleshooting materials that customers consider using wide-area network optimizers to speed up replications. Since other tips, like doing backups less often, were inapplicable, tapping a WAN controller became the clear route for TopLine.
Knowing colleagues who use Riverbed's Steelhead WAN optimization appliances in enterprises across several industries gave Jakes confidence to trial the solution. "Within four days, our replications were happening overnight," she says. The system can now be backed up to at least the previous day's data within five to 10 minutes.
F5, Riverbed and Silver Peak are among WAN controller vendors known for focusing on optimizing data center and SAN replication, according to Gartner analysts.
Much of what these solutions do is de-duplicate data. "It doesn't transfer anything it doesn't need to," Jakes says. "It picks up only the change and passes along only the change, because it knows the rest."
Solutions like Steelhead also prioritize bandwidth allocations on the network to tasks like replication, and tune "chatty" TCP-based applications like e-mail and file transfers to acknowledge receipts of data on the local-area network, instead of using the entire WAN to do so, which can free up bandwidth.
Installing Steelhead appliances — one placed between the router and the switch at the TopLine data center at headquarters, and another at the Brooklyn Park disaster-recovery branch — enabled TopLine to experience an 86% reduction in bandwidth utilization.
Jakes' biggest lesson learned? "If you're planning to replicate to a warm site, WAN optimization needs to be a part of your virtual solution," he says. That's primarily because anticipating bandwidth requirements remains more art than science. "Replication calculators are not even close," Jakes says. "I haven't found anybody able to do that accurately yet."
Executives are also reluctant to allocate dollars until circumstances justify the cost. "It's always an afterthought," Jakes says.










