Taking snapshots of your virtual machines is a useful way to preserve and restore ESX configurations; however, proper management is needed to avoid performance problems. In this tip, we'll explore advanced snapshot management topics. (For snapshot basics or to review how snapshots work, see my previous tip.)
Disk space and deleting multiple snapshots
It's important to plan ahead and allow for ample disk space on your VMware virtual machine file system (VMFS) volumes for snapshot files. A good rule of thumb is to allow for disk space of at least 25% of the virtual machine's (VM's) total disk size. But this amount can vary depending upon the type of server, how long you keep the snapshots, and if you plan on using multiple snapshots. If you plan on including the memory state with your snapshots, you'll also need to allow for extra disk space equal to amount of RAM assigned to the VM.
A VM with only one snapshot requires no extra disk space when deleting, or committing, it. (The term committing is used because the changes saved in the snapshot's delta files are now committed to the original virtual machine disk file, or VMDK.) But if you have multiple snapshots, you will need extra disk space available when deleting all snapshots. This is because of the way they are merged back into the original disk file.
For example, say you were to delete all snapshots on a VM with three snapshots. We'll call them Snap1, Snap2 and Snap3. First, Snap3 will be merged into Snap2, which will cause Snap2 to grow. Next, Snap2 will be merged into Snap1, which will also grow. Finally, Snap1 will be merged into the original disk file, which requires no extra disk space. The snapshot files are deleted when the original disk file is updated at the very end of the operation, rather than being deleted as each is merged into another. So having a VM with 20 GB of snapshots files could potentially require an additional 20 GB when committing them. If you have an ESX host that is low on disk space, this can use up all the disk space available on your datastore and prevent you from deleting your snapshots.
An alternate method of deleting multiple snapshots that requires less additional disk space is to delete the snapshots one by one, starting with the VMs farthest down the snapshot tree. This way, the snapshots grow individually when they are merged into the previous snapshot, and subsequently deleted. If a little more tedious, this method requires far less extra disk space.
Important: Don't run a Windows disk defragmentation while the VM has a snapshot running. Defragment operations change many disk blocks and can cause very rapid growth of snapshot files.
How long does it take to delete a snapshot?
When deleting snapshots through the VMware Infrastructure Client (VI Client), the task status bar can be misleading. Generally, the task status jumps to 95% complete fairly quickly, but you'll notice it will stay at 95% without changing until the entire commit process is completed. VirtualCenter has a 15 minute timeout for all tasks. Thus, even though your files are still committing, VirtualCenter will report that the operation has timed out.
One method for finding out when a task completes is to look at the VM's directory using the Datastore Browser in the VI Client. When the delta files disappear you know that the snapshot deletion has completed. (Starting with VirtualCenter 2.0.2 you can change the default 15 minute task timeout, see the VirtualCenter 2.0.2 release notes (task timeout value can be manually set in the vpxd.cfg file) for information on how to do this.)
Snapshots that have been active for a very long time (thereby becoming extremely large) can take a very long time to commit when deleted. The amount of time the snapshot takes to commit varies depending on the VM's activity level; it will commit faster if it is powered off. The amount of activity your ESX host's disk subsystem is engaging also affects the time the snapshot takes to commit.
A 100 GB snapshot can take 3-6 hours to merge back into the original disk. With ESX 3.5 it can take even longer because of a change to the consolidation algorithm (see VMware's support article, Consolidation of large or deeply nested snapshots). This can affect performance of both your VMs and ESX hosts. For this reason you should limit the length of time you keep snapshots and delete them as soon as you no longer need them.
Snapshots and metadata locks affect ESX performance
Snapshots have a negative impact on the performance of your ESX host and virtual machines in several ways. When you first create a snapshot, your VM activity will pause briefly; if you ping a VM while creating a snapshot you will notice a few timeouts. Also, creating a snapshot causes metadata updates, which can cause SCSI reservations conflicts that briefly lock your LUN. As a result, the LUN will be available exclusively to a single ESX Server host for a brief period of time.
If you've created a snapshot of a VM, and run the VM, the snapshot is active. If a snapshot is active, the performance of the VM will be degraded because ESX writes to delta files differently and less efficiently than it does to standard VMDK files. Because there is a lock on the metadata, nothing else can be written to the delta file when a write is made to the disk. Also, as the delta file grows by each 16 MB increment (discussed in part one of this series), it will cause another metadata lock. This can affect your VMs and ESX hosts. How big an impact on performance this will have varies based on how busy your VM and ESX hosts are.
Finally, deleting/committing a snapshot also creates a metadata lock. In addition, the snapshot you are deleting can create greatly reduced performance on its VM while the delta files are being committed; this will be more noticeable if the VM is very busy. To avoid this problem, it's better to delete large/numerous snapshots during off-peak hours when the host server is less busy.
Never expand a disk file with a snapshot running
You should never expand a virtual disk while snapshots are active. With ESX 3.0.x, you can only expand disks using the vmkfstools –X command; however, this command will not warn you that a disk has snapshots when you are trying to expand it. In ESX 3.5 you can also expand virtual disks through the VI Client which will allow you to expand a virtual disk with snapshots. The VI Client will report that the task completes successfully, but in truth it will not actually expand the disk file.
If you do expand a virtual disk using vmkfstools while a snapshot is active, the VM will no longer start and you will receive an error: "Cannot open the disk ".vmdk" or one of the snapshot disks it depends on. Reason: The parent virtual disk has been modified since the child was created." Fortunately there is a way to recover from this scenario, it is detailed in the VMworld 2007 presentation IO44: Top support issues and how to solve them – Batch 2.
Excluding virtual disks from using snapshots
If you have a VM with more then one disk and you wish to exclude a disk from being included in a snapshot, you must edit the VM's settings by changing the disk mode to Independent (make sure you select Persistent). The independent setting provides you the means to control how each disk functions independently, there is no difference to the disk file or structure. Once a disk is Independent it will not be included in any snapshots.
Additionally, you will not be able to include memory snapshots on a VM that has independent disks. This is done to protect the independent disk in case you revert back to a previous snapshot with a memory state that may have an application running which was writing to the independent disk. Since the independent disk is not reverted when the other disks are it could potential corrupt data on it.
Using snapshots to backup your VMs while they are running
Snapshots provide a great method to backup the raw VMDK files while the VM is powered on. All write operations are stopped on the original disk file, so it is safe to copy it to another storage volume. This is how backup applications like VMware Consolidated Backup and Vizioncore's vRanger function. They snapshot the VM, backup the disk file and then remove the snapshot when completed.
There are also some free user created scripts like VMBK which provide this functionality. These programs allow you to copy your VMDK files to local storage or to a network share to provide another recovery method for your important VMs.
Disk space and deleting multiple snapshots
It's important to plan ahead and allow for ample disk space on your VMware virtual machine file system (VMFS) volumes for snapshot files. A good rule of thumb is to allow for disk space of at least 25% of the virtual machine's (VM's) total disk size. But this amount can vary depending upon the type of server, how long you keep the snapshots, and if you plan on using multiple snapshots. If you plan on including the memory state with your snapshots, you'll also need to allow for extra disk space equal to amount of RAM assigned to the VM.
A VM with only one snapshot requires no extra disk space when deleting, or committing, it. (The term committing is used because the changes saved in the snapshot's delta files are now committed to the original virtual machine disk file, or VMDK.) But if you have multiple snapshots, you will need extra disk space available when deleting all snapshots. This is because of the way they are merged back into the original disk file.
For example, say you were to delete all snapshots on a VM with three snapshots. We'll call them Snap1, Snap2 and Snap3. First, Snap3 will be merged into Snap2, which will cause Snap2 to grow. Next, Snap2 will be merged into Snap1, which will also grow. Finally, Snap1 will be merged into the original disk file, which requires no extra disk space. The snapshot files are deleted when the original disk file is updated at the very end of the operation, rather than being deleted as each is merged into another. So having a VM with 20 GB of snapshots files could potentially require an additional 20 GB when committing them. If you have an ESX host that is low on disk space, this can use up all the disk space available on your datastore and prevent you from deleting your snapshots.
An alternate method of deleting multiple snapshots that requires less additional disk space is to delete the snapshots one by one, starting with the VMs farthest down the snapshot tree. This way, the snapshots grow individually when they are merged into the previous snapshot, and subsequently deleted. If a little more tedious, this method requires far less extra disk space.
Important: Don't run a Windows disk defragmentation while the VM has a snapshot running. Defragment operations change many disk blocks and can cause very rapid growth of snapshot files.
How long does it take to delete a snapshot?
When deleting snapshots through the VMware Infrastructure Client (VI Client), the task status bar can be misleading. Generally, the task status jumps to 95% complete fairly quickly, but you'll notice it will stay at 95% without changing until the entire commit process is completed. VirtualCenter has a 15 minute timeout for all tasks. Thus, even though your files are still committing, VirtualCenter will report that the operation has timed out.
One method for finding out when a task completes is to look at the VM's directory using the Datastore Browser in the VI Client. When the delta files disappear you know that the snapshot deletion has completed. (Starting with VirtualCenter 2.0.2 you can change the default 15 minute task timeout, see the VirtualCenter 2.0.2 release notes (task timeout value can be manually set in the vpxd.cfg file) for information on how to do this.)
Snapshots that have been active for a very long time (thereby becoming extremely large) can take a very long time to commit when deleted. The amount of time the snapshot takes to commit varies depending on the VM's activity level; it will commit faster if it is powered off. The amount of activity your ESX host's disk subsystem is engaging also affects the time the snapshot takes to commit.
A 100 GB snapshot can take 3-6 hours to merge back into the original disk. With ESX 3.5 it can take even longer because of a change to the consolidation algorithm (see VMware's support article, Consolidation of large or deeply nested snapshots). This can affect performance of both your VMs and ESX hosts. For this reason you should limit the length of time you keep snapshots and delete them as soon as you no longer need them.
Snapshots and metadata locks affect ESX performance
Snapshots have a negative impact on the performance of your ESX host and virtual machines in several ways. When you first create a snapshot, your VM activity will pause briefly; if you ping a VM while creating a snapshot you will notice a few timeouts. Also, creating a snapshot causes metadata updates, which can cause SCSI reservations conflicts that briefly lock your LUN. As a result, the LUN will be available exclusively to a single ESX Server host for a brief period of time.
If you've created a snapshot of a VM, and run the VM, the snapshot is active. If a snapshot is active, the performance of the VM will be degraded because ESX writes to delta files differently and less efficiently than it does to standard VMDK files. Because there is a lock on the metadata, nothing else can be written to the delta file when a write is made to the disk. Also, as the delta file grows by each 16 MB increment (discussed in part one of this series), it will cause another metadata lock. This can affect your VMs and ESX hosts. How big an impact on performance this will have varies based on how busy your VM and ESX hosts are.
Finally, deleting/committing a snapshot also creates a metadata lock. In addition, the snapshot you are deleting can create greatly reduced performance on its VM while the delta files are being committed; this will be more noticeable if the VM is very busy. To avoid this problem, it's better to delete large/numerous snapshots during off-peak hours when the host server is less busy.
Never expand a disk file with a snapshot running
You should never expand a virtual disk while snapshots are active. With ESX 3.0.x, you can only expand disks using the vmkfstools –X command; however, this command will not warn you that a disk has snapshots when you are trying to expand it. In ESX 3.5 you can also expand virtual disks through the VI Client which will allow you to expand a virtual disk with snapshots. The VI Client will report that the task completes successfully, but in truth it will not actually expand the disk file.
If you do expand a virtual disk using vmkfstools while a snapshot is active, the VM will no longer start and you will receive an error: "Cannot open the disk ".vmdk" or one of the snapshot disks it depends on. Reason: The parent virtual disk has been modified since the child was created." Fortunately there is a way to recover from this scenario, it is detailed in the VMworld 2007 presentation IO44: Top support issues and how to solve them – Batch 2.
Excluding virtual disks from using snapshots
If you have a VM with more then one disk and you wish to exclude a disk from being included in a snapshot, you must edit the VM's settings by changing the disk mode to Independent (make sure you select Persistent). The independent setting provides you the means to control how each disk functions independently, there is no difference to the disk file or structure. Once a disk is Independent it will not be included in any snapshots.
Additionally, you will not be able to include memory snapshots on a VM that has independent disks. This is done to protect the independent disk in case you revert back to a previous snapshot with a memory state that may have an application running which was writing to the independent disk. Since the independent disk is not reverted when the other disks are it could potential corrupt data on it.
Using snapshots to backup your VMs while they are running
Snapshots provide a great method to backup the raw VMDK files while the VM is powered on. All write operations are stopped on the original disk file, so it is safe to copy it to another storage volume. This is how backup applications like VMware Consolidated Backup and Vizioncore's vRanger function. They snapshot the VM, backup the disk file and then remove the snapshot when completed.
There are also some free user created scripts like VMBK which provide this functionality. These programs allow you to copy your VMDK files to local storage or to a network share to provide another recovery method for your important VMs.
No comments:
Post a Comment