Thursday, August 19, 2010

vSphere: VM Stuck during Power down at 95%

Occasionally I've run into a VM that gets stuck at 95% while powering down. I know the issue isn't unheard of, but I didn't run into it until working with a few ESXi 4.0.0 208167 servers. So - if you have a virtual machine hangs while shutting down - and you're certain that you're just not waiting for it to finish powering down, and you've already tried to "power off" from the client- but the power-off command is stuck at 95%, you may have to manually kill the hung VM.
1. Login to the host with the hung machine via SSH (enable SSH if you haven't already)
2. do a /sbin/ restart (or services vmware-mgmt restart on ESX)... which is the same thing as doing this from the ESXi console
1. This command will restart the agents that are installed in /etc/init.d/ ... including hostd, ntpd, sfcbd, sfcbd-watchdog, slpd and wsmand (and HA if you have it)
2. When you do this, the VI/vsphere client will loose connectivity as those services restart, but VM's that are running will not be affected
3. After the services have restarted, you can re-connect via the VI client.
3. Via SSH, go to the right datastore (such as, /vmfs/volumes/DatastoreName/VMname), and delete (rm -r) the *.vswp file (the swap file).
4. If you can't delete it, and you're getting an error message to the effect... can not remove VM: device or resource busy... go find the processes associated with the VM.
1. "ps auxfww|grep "vmname"
2. "kill -9 ProcessIDNumber"
5. After doing so, remove the orphaned VM from inventory... just right-click the "unknown" VM, and select "remove from inventory", being careful to not delete it.
6. Then delete the *.log, and *.0*. If you don't, re-adding the VM may cause the interface to hang, and you'll have to go through some of this all over again.
7. Add the VM back to the inventory, and you should be able to start the VM.
I have run into a situation once, where a host reboot was the only way to solve the problem. But other than that, this seems to be quite effective.


Ryan Russell said...

Seems a little overkill. I usually just do step 4 by itself.

Nick said...

Indeed - just do step 4 - especially if you already know the other stuff. If you don't know the other stuff, maybe this will be helpful to someone.

Ryan Russell said...

I also got to demonstrate today that it's good for vMotion. I had a VM stuck at 78% migrate for some time. After letting it sit to make sure it wasn't going anywhere on its own, I did the kill -9 on it. Turns out, it was actually done moving, and the stuck part was the release from the old host. It instantly went done, and was still running on the new host.

Scott said...

Had the same problem. Went through all the steps and could not delete the swap file, nor delete the processes. VMware support said we needed to reboot the ESXi host to fix the problem.

Anonymous said...

This sounds like a simple solution and I’m not entirely sure why it worked. But I figure if it helped me, maybe it can help someone else. I had tried to force a reboot of a VM through vSphere client. It hung at 95% forever. As I was researching, I came across your blog. As I was about to try your steps I came across 2 processes on my local machine that were VMware related. I had closed vSphere client. I killed those 2 processes and the VM finished it’s shutdown and rebooted. I'm guessing the processes on my machine were hanging on to some files from the VM. Could be coincidence or not, just thought I’d reply.

JonShado said...

Myself and the other network admin at our company are adding this document to our arsenal of ESX documentation. Had a single guest stuck at 95% migration. Started at step 4, grepped, killed process, guest finished moving and every other pending task for that host finished or completed.