My VM Was Non-Responsive
Today I had an Azure virtual machine go down very unexpectedly.
I received error reports from users and tried to go to the related service endpoint myself… and sure enough, it didn’t come up. Then, I tried to ssh onto the VM and I couldn’t.
I hopped into the Azure portal, went to the VM, and things actually looked alright… it wasn’t stopped, or de-allocated, or anything.
After multiple minutes of digging around the Azure portal for more information, suddenly the “Activity Log” popped up with a new entry. This was relatively disconcerting as the issue had been reported over half an hour ago and I had been on the portal for multiple minutes.
The activity log said I had a “health event” which was “updated”. Upon expanding it, I could see more events that had been “in progress”. When you click the “in progress” event, you can get JSON for it and look into the details. In my case, the bottom of the details said this:
"title": "We're sorry, your virtual machine isn't available because an unexpected failure on the host server",
So, the physical host which was running my VM in azure died. Azure automatically noticed this and moved it to a new physical host, though much slower than I would have appreciated.
The VM came up after a few more minutes and all was right with the world. So… the moral of the story is that if your VM is unresponsive, it may be because the host died, and you may have to wait quite a while to see information on that in the activity log. But it does auto resolve apparently which is nice.