It’s all about the primitives (cont.)

By Juan Orlandini
5/4/2011

In my previous post, I discussed two of vSphere’s 4.1 "primitives." Today, I’ll dig into hardware-assisted locking. Like the other primitives, this is a specific function that the arrays’ vendors can choose to implement.

Hardware-assisted locking

VMFS is a very clever distributed cluster file system. It is the underpinning of much of what you do with VMware: vMotion, virtual machine (VM) creation, snapshots, and many other wizzbang things. To do its magic, VMFS has to coordinate and guarantee the safety of your VMs at all times. Because multiple hosts can access the same file system at the same time, this means that it has to have a way of ensuring that only a single host does certain operations at time. Imagine if two hosts tried to start the same VM at the same time. That'd be like crossing the streams – you'd get total protonic reversal. Ok – not that bad, but you would get your VM, and possibly your entire ESX farm, into a pickle. To make sure this doesn't happen, VMFS uses SCSI reservations. This control mechanism (a.k.a. Node Fencing) is very coarse. The granularity it can control is at the LUN level, and because of that, if you have multiple VMs on a single datastore, certain operations can only be done to __one__ VM at a time. The list of operations is somewhat surprising. This is the list of things you can only do one at a time on each block datastore:

  • VMotion migrations
  • Creating a new VM or template
  • Deploying a VM from a template
  • Powering a VM on or off (Really – try starting two VMs on the same _block_ datastore at the same time. No can do.)
  • Creating, deleting, or growing a file
  • Creating, deleting, or growing a snapshot

And that is one of the big reasons that VMware's best practices have been to limit block datastores to 500 GB or less. You just can't put that many VMs in 500 GB. That means your chances of creating a conflict are smaller. Which is a good thing. BUT... that also means you typically end up with gobs of datastores. And that you artificially limit the size of your clusters. And that you waste space because each datastore ends up with unusable free space.

I've been saying "block datastores" a lot though, haven't I? Well, it turns out that a lot of these limitations don't apply to NFS datastores. This is one of the big reasons that people love NFS datastores. You can create really big datastores. You can have multiple operations at the same time. Arrays can handle lots of the file management magic for you (hardware-based snapshots). And there are others. I'll cover those in yet another blog (told you we have lots to cover). As good as NFS is, its problem is that it is a TCP/IP-based protocol. Yup. Now, I don't want to get into the Fibre vs. ethernet debate. Let's just leave it at this: Fibre+FCP is a much more efficient stack for disk I/O than ethernet+TCP/IP. If performance is at an absolute premium, typically it's easier to architect a FCP-based solution than a TCP/IP-based one. Notice my words: typically and usually. OK ethernet fans?

Now that we have that out of the way, enter __Hardware-assisted locking__. With this API, VMFS can lock at a much more granular level: the individual block. In practical terms, that means that each VMDK file can be locked individually. This API essentially eliminates or reduces the limits on all of the operations I described above. You can have VMFSs that are larger, host more VMs, and cluster more hosts.

So, what does it mean?

VAAI is a game changer for VMware. So much of what you do on a day-to-day basis with VMware is centered around the storage. With these APIs, much of that is simplified and/or sped up dramatically. If you are deploying VMware in vSphere 4.1, you owe it to yourself to make sure that your storage vendor supports VAAI. All of it.

But... because you are going to be able to do more with your storage, that means that your VMware farms, VMs, and storage needs are going to grow. Storage vendors like that. Backup vendors like it even more. And with that I leave you hanging for my next post, where I'll cover the other half of vStorage, VADP.