Tenant VM Startup Failure in Azure Stack TP2 Refresh

Microsoft has recently released a refresh build for Azure Stack Technical Preview 2. Me and my colleague CDM MVP Nirmal Thewarathanthri were eager to deploy the new build from day one. This post is about a known issue in this build which prevents tenant VMs from being automatically started after a host power failure.

Scenario

After we create a tenant VM in Azure Stack Portal and verify that it is running properly, we decided to turn off the host for the day and start the load testing next day. When the host was turned on next day the tenant VM is missing in Hyper-V manager and is in a failed status in Azure Stack portal. Not only that the VMs used by the PaaS RPs mySQL & SQL have also been disappeared.1-mas-tp2-vms-running

2-mas-tp2-vms-running-hvm

As you can see below neither deleting the VM nor deleting the resource group of that VM works in the portal. Also the VM status is set to Unknown in the portal.

3-mas-tp2-vms-unkown-portal

4-mas-tp2-rg-deletion-failure

4-mas-tp2-vm-deletion-failure

But the Azure Stack TP2 Management VMs have automatically started after the power failure.

5-mas-tp2-vms-missing-hvm

Solution

We noticed that in the Failover Cluster Manager in MAS TP2 host, all the tenant VMs including PaaS RP VMs are in a saved state after a power failure. Once we start these VM,s they will be online in both Hyper-V manager and Azure Stack Portal. Now we can successfully delete the concerned resource group or the tenant VM.

6-mas-tp2-vms-saved-fcm

RCA

This seems to be a known bug in TP2 refresh where the Management VMs will startup automatically “after taking sometime” where tenant VMs including PaaS RP VMs do not automatically start after a power failure. The workaround is to manually start them after all 14 Management VMs are up and running.

You can refer this link for a list of known issues in Azure Stack TP. 

 

Storage Spaces Direct | Deploying S2D in Azure

This post explores how to build a Storage Space Direct lab in Azure. Bear in mind that S2D in Azure is not a supported scenario for production workloads as of yet.

Following are the high level steps that needs to be followed in order to create provision a S2D lab in Azure. For this lab, I’m using DS1 V2 VMs with Windows Server 2016 Datacenter edition for all the roles and two P20 512 GB Premium SSD disks in each storage node.

Create a VNET

In my Azure tenant I have created a VNET called s2d-vnet with 10.0.0.0/24 address space with a single subnet as below.

1-s2d-create-vnet

Create a Domain Controller

I have deployed a domain controller called jcb-dc in a new windows active directory jcb.com with DNS role installed. Once DNS role has been installed, I have changed the DNS server IP address in the s2d-vnet to my domain controller’s IP address. You may wonder what is the second DNS IP address. It is actually the default Azure DNS IP address added as a redundant DNS server in case if we lose connectivity to the domain controller. This will provide Internet name resolution to the VMs in case domain controller is no longer functional.

1-s2d-vnet-dns

Create the Cluster Nodes

Here I have deployed 3 Windows Server VMs jcb-node1, jcb-node2 and jcb-node3 and joined them to the jcb.com domain. All 3 nodes are deployed in a single availability set.

Configure Failover Clustering

Now we have to configure the Failover Cluster. I’m installing the Failover Clustering role in all 3 nodes using below PowerShell snippet.

$nodes = (“jcb-node01”, “jcb-node02”, “jcb-node03”)

icm $nodes {Install-WindowsFeature Failover-Clustering -IncludeAllSubFeature -IncludeManagementTools}

3-s2d-install-fc

Then I’m going to create the Failover Cluster by executing below snippet in any of the three nodes. This will create a Failover Cluster called JCB-CLU.

$nodes = (“jcb-node01”, “jcb-node02”, “jcb-node03”)

New-Cluster -Name JCB-CLU -Node $nodes –StaticAddress 10.0.0.10

4-s2d-create-fc

Deploying S2D

When I execute Enable-ClusterS2D cmdlet, it will enable Storage Paces Direct and start creating a storage pool automatically as below.

5-s2d-enable-1

5-s2d-enable-2

12-s2d-csv

You can see that the storage pool has been created.

7-s2d-pool-fcm

8-s2d-pool

Creating a Volume

Now we can create a volume in our new S2D setup.

New-Volume -StoragePoolFriendlyName S2D* -FriendlyName JCBVDisk01 -FileSystem CSVFS_REFS -Size 800GB

9-s2d-create-volume

Implementing Scale-out File Server Role

Now we can proceed with SOFS role installation followed by adding SOFS cluster role.

icm $nodes {Install-WindowsFeature FS-FileServer}

Add-ClusterScaleOutFileServerRole -Name jcb-sofs

10-s2d-sofs-install

11-s2d-sofs-enable

Finally I have created an SMB share called Janaka in the newly created CSV.
13-s2d-smb-share

Automating S2D Deployment in Azure with ARM Templates

If you want to automate the entire deployment of the S2D lab you can use below ARM template by Keith Mayer which will create a 2-node S2D Cluster.

Create a Storage Spaces Direct (S2D) Scale-Out File Server (SOFS) Cluster with Windows Server 2016 on an existing VNET

This template requires you to have active VNET and a domain controller deployed first which you can automate using below ARM template. 

Create a 2 new Windows VMs, create a new AD Forest, Domain and 2 DCs in an availability set

We will discuss how to use DISKSPD & VMFLET to perform load and stress testing in a S2D deployment in our next post.

New Security Features in Azure Backup

Recently Microsoft has introduced new security capabilities to Azure Backup which allows you to secure your backups against any data compromise and attacks. These features are now built into the recovery services vault and you can enable and start using them within a matter of 5 minutes.

Prevention

For critical operations such as  delete backup data, change passphrase, Azure Backup now allows you to use an additional authentication layer where you need to provide a  Security PIN which is available only for users with valid azure credentials to access the backup vaults.

Alerting

You can now configure email notifications to be sent for specified users for operations that have an impact on the availability of the backup data .

Recovery

You can configure Azure backup to retain deleted backup data for 14 days where you can recover the deleted data using the recovery points. When enabled, this will always maintain more than one recovery point so that there will be enough recovery points from which you can recover the deleted data.

How do I enable security features in Azure Backup?

These security features are now built into the recovery services vault where you can enable all of them with a single click.

1-enable-azure-backup-security

Following are the requirements and considerations that you should be aware of when you enable these new security features.

  • The minimum MAB agent version should be 2.0.9052 or you should upgrade to this agent version immediately after you have enabled these features.
  • If you are using Azure Backup Server the minimum MAB agent version should be 2.0.9052 with Azure Backup Server upgrade 1
  • Currently these settings won’t work with Data Protection Manager and will only be enabled with future Update Roll-ups.
  • Currently these settings won’t work with IaaS VM Backups.
  • Enabling these settings is a one-time action which is irreversible.

Testing new security features

In below video I’m trying to change the passphrase of my Azure Backup agent and save it. Note that here I will have to provide a Security PIN in order to proceed or otherwise the operations fails. 

Next I’m going to setup backup alerts for my recovery services vault. Once I create an alert subscription I’m going to delete my previous backup schedule. Here I will have the chance of restoring the data within 14 days after deletion.

Storage Spaces Direct | Architecture

In my last post I’ve explained the basics of Storage Spaces Direct in Windows Server 2016. This post explores the internals of S2D and it’s architecture in much simple context.

S2D Architecture & Design

(Image Courtesy) Microsoft Technet

S2D is designed to provide nearly 600K IOPS (read) & 1 Tbps of throughput at it’s ultimate configuration with RDMA adapters & NVMe SSD drives. S2D is all about Software Defined Storage and let’s dissect the pieces that makes up the S2D paradigm one by one.

Physical Disks – You can deploy S2D just inside 2 servers up to 16 servers on from 2 to 16 servers with locally-attached SATA, SAS, or NVMe drives. Keep in mind that each server should at least have 2 SSDs, and at least 4 additional drives which can be SAS or SATA HDD. These commodity SATA and SAS devices should be leverage a host-bus adapter (HBA) and SAS expander. 

Software Storage Bus – Think this as the Fiber Channel and Shared SAS cabling in your SAN solution. Software Storage Bus spans across the storage cluster to establish a software-defined storage fabric and allows all the servers can see all the local drives in each and every host in the cluster.

Failover Cluster & Networking – For server communication, S2D leverages the native clustering feature in Windows Server ans uses SMB3, including SMB Direct and SMB Multichannel, over Ethernet. Microsoft recommends to use 10+ GbE (Mellanox) network cards and switches with remote-direct memory access (RDMA), either iWARP or RoCE.

Storage Pool & Storage Spaces – With the recommendation of one pool per cluster Storage Pools consists of the drives that forms the S2D and it is created by discovering and adding all eligible drives automatically to the Storage Pool. Storage Spaces are your software-defined RAID based on Storage Pools. With S2D the data can have tolerance up to two simultaneous drive or server failures along with chassis and rack fault tolerance as well.

Storage Bus Layer Cache – The duty of the Software Storage Bus  is to dynamically bind the fastest drives present  to slower drives (i.e SSD to HDD) which provides server-side read/write caching to accelerate IO and to boost throughput.

Resilient File System (ReFS) & Cluster Shared Volumes – ReFS is a file system that has been built to enhance server virtualization experience in Windows Server. With Acclerated VHDX Operations feature in ReFS it improves the creation, expansion, and checkpoint merging in Virtual Disks significantly. Cluster Shared Volumes consolidate all the ReFS volumes into a single namespace which you can access from any server so it becomes shared storage.

Scale-Out File Server (SOFS) – If your S2D deployment is a Converged solution it is required to implement SOFS which provides remote file access using the SMB3 protocol to clients. i.e Hyper-V Computer Cluster. In a Hyper Converged S2D solution both storage and compute reside in the same cluster thus there is no need to introduce SOFS.

In my next post I’m going to explore how we can deploy S2D in Azure. This will be a Converged setup as Azure doesn’t allow nested virtualization. 

Storage Spaces Direct | Introduction

What is Storage Spaces Direct?

Storage Spaces Direct (S2D) is a new storage feature in Windows Server 2016 which allows you to leverage the locally attached disk drives of the servers in your datacentre to build highly available, highly scalable software-defined storage solutions. S2D helps you save your investments on expensive SAN or NAS solutions by allowing you to use your existing NVMe, SSD or SAS drives combined together to provide high performing and simple storage solutions for your datacentre workloads.

S2D Deployment Choices

There are two deployment options available with S2D.

Converged

In a Converged or disaggreagted S2D architecture, Scale-out File Server/s (SoFS) built on top of S2D  provides shared storage on  SMB3 file shares. Like your traditional NAS systems this separates the storage layer from compute and this option is ideal for large scale enterprise deployments such as Hyper-V VMs hosted by a service provider. 

(Image Courtesy) Microsoft TechNet

Hyper Converged

With Hyper Converged S2D deployments, both compute and storage layers reside in same server/s and this allows to further reduce the hardware cost and ideal for SMEs. 

(Image Courtesy) Microsoft TechNet

S2D is the successor of Storage Spaces introduced in Windows Server 2012 and it is the underlying storage system for Microsoft Azure & Azure Stack. In my next post I will explain about the S2D architecture and key components of an S2D solution in much detail.

Following video explains the core concepts of S2D and it’s  internals and use cases.

Fix It | October 2016 Cumulative Windows Updates Crash SCOM Console

In my last post I’ve shared the console crashing issue you face after installing the security updates in  MS16-118 and MS16-126. Now Microsoft has published a new KBs to fix the console crashing issue in SCOM after applying these updates.

Individual hot fixes are available for the following list of Operating systems which you can download from here.

  • Windows Vista
  • Windows 7
  • Windows 8.1
  • Windows Server 2008
  • Windows Server 2008R2
  • Windows Server 2012
  • Windows Server 2012 R2

For Windows 10 and Server 2016, the fix was applied to the latest cumulative updates.

October 2016 Cumulative Windows Updates Crash SCOM Console

It seems like the October 2016 cumulative Windows updates (KB3194798, KB3192392, KB3185330 &KB3185331) cause the SCOM consoles 2012/2016 in all Windows versions from Windows Server 2008 R2 up to 2016 and Windows 7 up to 10 Windows 10 to regularly crash without any doubt.

According to Microsoft Germany’s SCOM PFE Dirk Brinkmann  who has blogged about this issue here, the SCOM team is working on a fix for this as of now and no ETA for an resolution has been provided yet.

Once a fix is available you will be able to see it via SCOM team blog.

Multiple vNICs in Azure ARM VMs

I have recently faced with a challenge to add multiple vNICs to an ARM based Azure VM. The requirement was to add a secondary vNIC while keeping the first one intact. This post explores how  I achieved this task with PowerShell.

Before you do anything make sure that the existing vNIC Private IP address has been set to static as below. Otherwise once the VM Update operation is completed you will lose that IP address. Optionally you can set the Public IP address to reserved (static) as well. The VM should already have at least two NICs and it is not supported to pass from a single NIC VM to multiple NIC VM and vice versa.

1-vnic-private-ip

2-vnic-public-ip
Create a new vNIC

I have defined the properties for the new vNIC as below.

$VNET = Get-AzureRmVirtualNetwork -Name ‘vmnic-rg-vnet’ -ResourceGroupName ‘vmnic-rg’
$SubnetID = (Get-AzureRmVirtualNetworkSubnetConfig -Name ‘default’ -VirtualNetwork $VNET).Id
$NICName = ‘jcb-vnic02’
$NICResourceGroup = ‘vmnic-rg’
$Location = ‘East US’
$IPAddress = ‘10.0.0.7’

Next step is to create the new vNIC.

New-AzureRmNetworkInterface -Name $NICName -ResourceGroupName $NICResourceGroup -Location $Location -SubnetId $SubnetID -PrivateIpAddress $IPAddress

3-vnic-create

Adding the new vNIC to an existing VM

I’ve executed below PowerShell snippet which sets the existing vNIC as primary and and updates the VM once the new vNIC is in place.

$VMname = ‘jcb-nicvm01’
$VMRG =  ‘vmnic-rg’

$VM = Get-AzureRmVM -Name $VMname -ResourceGroupName $VMRG

$NewNIC =  Get-AzureRmNetworkInterface -Name $NICName -ResourceGroupName $NICResourceGroup
$VM = Add-AzureRmVMNetworkInterface -VM $VM -Id $NewNIC.Id

Then I’m listing the attached NICs in the VM and setting the first one as primary.

$VM.NetworkProfile.NetworkInterfaces

$VM.NetworkProfile.NetworkInterfaces.Item(0).Primary = $true
Update-AzureRmVM -VM $VM -ResourceGroupName $VMRG

 

Bash on Windows 10 | Developer Mode Failure

Recently with Windows 10 Anniversary Update Microsoft has introduced Linux Subsystem as an optional feature. In order to get enable Bash on Windows first you need to enable and install Developer Mode package through Settings > Update & Security > For Developers > Developer Mode. Surprisingly on a fresh installation of Windows 10 1607 I’ve encountered below error.

developer-mode-1

The trick is even after this error message I was able to enable Linux subsystem through Control Panel > Programs > Turn Windows Features on or off as below.

developer-mode-2

The reason for this behavior is the extra components required by the OS to enable additional debugging features in Visual Studio or the Windows Device Portal not being installed automatically. Although you see this error message, your PC will be in developer mode and you can enable Windows Developer Mode package like Bash without any issue. Therefore you can ignore the 0x80004005 error.

One thing to keep in mind is that Linux Subsystem is still beta feature and not complete. So you can expect some things to work while there can be failures.

Locking Resources with ARM

Sometimes you need to restrict access to an Azure subscription, resource group or a resource in order to prevent accidental deletion or modification of same by other users. With Azure resource Manager you can lock your resources in two levels.

  • CanNotDelete Authorized users can read and modify a resource, but they can’t delete it.
  • ReadOnly Authorized users can read from a resource, but they can’t delete it or perform any actions on it. The permission on the resource is restricted to the Reader role.

The ReadOnly lock be trick in certain situations. For an example a ReadOnly lock placed in a storage account  prevents all users from listing the keys as the list keys operation is handled through a POST request since the  returned keys are available for write operations. When you apply a lock at a parent, all child resources inherit the same lock. For an example if you apply a lock in a resource group all the resources in it will inherit same and even resources you add later will inherit same.

Locking with PowerShell

Following snippet demonstrates how you can apply a resource lock using PowerShell.

New-AzureRmResourceLock –LockLevel <either CanNotDelete or ReadOnly> –LockName <Lock Name> –ResourceName <resource name> –ResourceType <resource type> –ResourceGroupName <resource group name>

Here you should provide the exact resource type. For a complete list for available Azure resource providers please refer this article.