Redundancy in dedupe

By Juan Orlandini
11/15/2011

Recently, Mike Spindler and I had a conversation, and he brought up an interesting point. When we design storage architectures, it's a foregone conclusion that we're going to build the SAN/NAS solution to be redundant across everything we can: dual HBAs, redundant fabrics, high-availability (HA) storage solutions, etc. However, when we build backup solutions, that's not necessarily the case. Like most insurance policies, we tend to buy just enough. As an industry, we usually size the backup solution based on the volume of data you currently have, plus some growth. We then derive the number of dedupe appliances, tape drives, bandwidth, and the rest that we'll need. However, we forget that at the core, dedupe devices are just another storage array – which means we don't build our networks to those devices like we do to primary storage arrays. We also don't typically consider what would happen if those devices were suddenly unavailable. A particularly striking fact since most of these devices aren't built with HA capabilities.

Consider this scenario. After deciding you no longer want tape (which many of our customers have), you move to an all dedupe device strategy. You do the right thing by replicating from one dedupe box to another at your disaster recovery facility. Things are great. You back up faster and more reliably than you ever have before. You even restore faster than before. And then one night, the dedupe array needs maintenance. The vendor tells you it'll only take two hours, but – and there's always a "but" – it takes six hours to do the maintenance. Usually not a big deal, right? Only backups were missed. But (remember that "but?") what if while you're doing maintenance, you need to restore because a DBA accidentally dropped a table? Hmm... not a problem, right? You have it replicated to the other location. Easy fix, just restore from over there. But wait. The replication was done POST dedupe so it was efficiently replicated. Your recovery will be fully rehydrated data sent over your WAN. You sized your WAN link to accommodate this right? Uh oh. And what if the outage windows on your dedupe device is longer than six hours? How are you going to get your backups done?

Admittedly, this situation is somewhat contrived, but I think you get my point. If budget permits, we shouldn't put all of our eggs in one basket. There are very good dedupe solutions available that are not redundant. If your solution is one of those, consider buying two of them. You can still have a single device for your DR site, but have two at your primary facility. Other vendors offer HA solutions. Don't skimp on that.

Murphy will get you.