Hyper-V: Cannot Delete a Checkpoint Due To Catastrophic Failure

I had a VM that I had restore in my environment that failed. I had to rebuild the VM and started backing up again. But since then, I have had issues with the checkpoints and kept getting these errors in my backup logs:

Catastrophic Failure to delete the checkpoint.

So. I go into Hyper-V Manager and try to manually delete the checkpoint. I got the same error:

Virtual machine failed to generate VHD tree: ‘Catastrophic failure'(‘0x8000FFFF’)

So, I go and find a blog post explaining how to manually export the checkpoint files to a new VHD and recover the VM in its current state properly so that my backups can start again. Here are the steps:

NOTE: This process will merge changes so previous checkpoints will no longer be available for rollback.

Export the last checkpoint of the VM:

Locate the most recent snapshot and select it.
Click Export from the actions menu.
Export the VM to a new location.
Shutdown the original VM.
Once the export completes you will have a new merged vhdx!

Replace the Offending VM with the Exported VM:


Click Import Virtual Machine.
The VM will have the name of the snapshot.
Power the imported VM on and validate it’s working as desired.
Power Off the VM.
Once satisfied with the new VM, delete the offending VM and it’s disks.
Rename the newly imported VM.
Place the virtual disks in their original spots and reconfigure the new VM to go to those locations.
Now you’re VM is updated and fixed!

I luckily had enough disk space on my drive to export the VM since it is my WSUS server. I probably could have just deleted the WSUS repository disk, but I did not want to chance it since the other was working. Things are back to their normal, POSITIVE state!

POSITIVE THOUGHTS AND ACTIONS STAY!
HAPPY TROUBLESHOOTING!

REFERENCES:
Hyper-V Catastrophic Failure when trying to restore a checkpoint.

Veeam Backup Validation Tool

I ran across an issue with my VM backups saying that they were failing validation and not backing up properly, even though each VM showed success when checking the logs. I was getting a specific error in the backup logs:

Backup files health check has been completed
Failed to perform backup file verification Error: Data error (cyclic redundancy check). Failed to read data from the file [B:\Backups\LDLNET Other Backup\LDLNET BackupD2019-01-12T001234.vbk]. Agent failed to process method {Signature.FullRecheckBackup}.

So, I did some research and found a little known tool that is used to manually validate the Veeam backup files, basically because it’s a tool usually executed only by the technical support staff. It is located in the following folder (Version 9.5.0.1922):

C:\Program Files\Veeam\Backup and Replication\Backup\Veeam.Backup.Validator.exe

Its main use case is to verify the consistency of a backup created with Veeam Backup & Replication. It’s NOT SureBackup, that does another kind of control by starting the VM from the backup file, and is for sure more reliable. But if you do not want to start a SureBackup activity, or if you only have a Standard license lacking SureBackup, this tool can be a good alternative, or you can use it to check a backup file after it has been moved or if you had a consistency problem on the storage holding those files.

The command switches are listed here for the executable

Now since I had a specific file that was showing an error, I wanted to run the command against that file to validate the backup. Here is the command that I ran:

.\Veeam.Backup.Validator.exe /file:”B:\Backups\LDLNET Other Backup\LDLNET BackupD2019-01-12T001234.vbk”

It started running against the file and it did fail as it did in the log files from the backup process. Here is the error message:

Skipping VM ‘B:\Backups\LDLNET Other Backup\LDLNET BackupD2019-01-12T001234.vbk ‘: File “LDLNET-VM01-flat.vmdk” is corrupted. Data error (cyclic redundancy check).
Failed to read data from the file [B:\Backups\LDLNET Other Backup\LDLNET BackupD2019-01-12T001234.vbk ].

Now, when looking at the backup job, I found that the file listed was the original full backup that I had completed when I originally changed the job for all the new VMs that were now listed in the backup job. Since that was the case and I did not have any VMs that were in a bad state, I deleted all the backup files from my storage and started another full backup of the VMs in the job.

Using the CLI tool to manually validate the backup file was very helpful in this case as it would help me decide to clear out a backup that would not restore properly, even with the incremental backups since the base full file was corrupt.

References:
Veeam Backup Validator: check the consistency of your backup files