Tenant VM Startup Failure in Azure Stack TP2 Refresh

Microsoft has recently released a refresh build for Azure Stack Technical Preview 2. Me and my colleague CDM MVP Nirmal Thewarathanthri were eager to deploy the new build from day one. This post is about a known issue in this build which prevents tenant VMs from being automatically started after a host power failure.

Scenario

After we create a tenant VM in Azure Stack Portal and verify that it is running properly, we decided to turn off the host for the day and start the load testing next day. When the host was turned on next day the tenant VM is missing in Hyper-V manager and is in a failed status in Azure Stack portal. Not only that the VMs used by the PaaS RPs mySQL & SQL have also been disappeared.1-mas-tp2-vms-running

2-mas-tp2-vms-running-hvm

As you can see below neither deleting the VM nor deleting the resource group of that VM works in the portal. Also the VM status is set to Unknown in the portal.

3-mas-tp2-vms-unkown-portal

4-mas-tp2-rg-deletion-failure

4-mas-tp2-vm-deletion-failure

But the Azure Stack TP2 Management VMs have automatically started after the power failure.

5-mas-tp2-vms-missing-hvm

Solution

We noticed that in the Failover Cluster Manager in MAS TP2 host, all the tenant VMs including PaaS RP VMs are in a saved state after a power failure. Once we start these VM,s they will be online in both Hyper-V manager and Azure Stack Portal. Now we can successfully delete the concerned resource group or the tenant VM.

6-mas-tp2-vms-saved-fcm

RCA

This seems to be a known bug in TP2 refresh where the Management VMs will startup automatically “after taking sometime” where tenant VMs including PaaS RP VMs do not automatically start after a power failure. The workaround is to manually start them after all 14 Management VMs are up and running.

You can refer this link for a list of known issues in Azure Stack TP.