Storage Spaces Direct | Architecture

In my last post I’ve explained the basics of Storage Spaces Direct in Windows Server 2016. This post explores the internals of S2D and it’s architecture in much simple context.

S2D Architecture & Design

(Image Courtesy) Microsoft Technet

S2D is designed to provide nearly 600K IOPS (read) & 1 Tbps of throughput at it’s ultimate configuration with RDMA adapters & NVMe SSD drives. S2D is all about Software Defined Storage and let’s dissect the pieces that makes up the S2D paradigm one by one.

Physical Disks – You can deploy S2D just inside 2 servers up to 16 servers on from 2 to 16 servers with locally-attached SATA, SAS, or NVMe drives. Keep in mind that each server should at least have 2 SSDs, and at least 4 additional drives which can be SAS or SATA HDD. These commodity SATA and SAS devices should be leverage a host-bus adapter (HBA) and SAS expander. 

Software Storage Bus – Think this as the Fiber Channel and Shared SAS cabling in your SAN solution. Software Storage Bus spans across the storage cluster to establish a software-defined storage fabric and allows all the servers can see all the local drives in each and every host in the cluster.

Failover Cluster & Networking – For server communication, S2D leverages the native clustering feature in Windows Server ans uses SMB3, including SMB Direct and SMB Multichannel, over Ethernet. Microsoft recommends to use 10+ GbE (Mellanox) network cards and switches with remote-direct memory access (RDMA), either iWARP or RoCE.

Storage Pool & Storage Spaces – With the recommendation of one pool per cluster Storage Pools consists of the drives that forms the S2D and it is created by discovering and adding all eligible drives automatically to the Storage Pool. Storage Spaces are your software-defined RAID based on Storage Pools. With S2D the data can have tolerance up to two simultaneous drive or server failures along with chassis and rack fault tolerance as well.

Storage Bus Layer Cache – The duty of the Software Storage Bus  is to dynamically bind the fastest drives present  to slower drives (i.e SSD to HDD) which provides server-side read/write caching to accelerate IO and to boost throughput.

Resilient File System (ReFS) & Cluster Shared Volumes – ReFS is a file system that has been built to enhance server virtualization experience in Windows Server. With Acclerated VHDX Operations feature in ReFS it improves the creation, expansion, and checkpoint merging in Virtual Disks significantly. Cluster Shared Volumes consolidate all the ReFS volumes into a single namespace which you can access from any server so it becomes shared storage.

Scale-Out File Server (SOFS) – If your S2D deployment is a Converged solution it is required to implement SOFS which provides remote file access using the SMB3 protocol to clients. i.e Hyper-V Computer Cluster. In a Hyper Converged S2D solution both storage and compute reside in the same cluster thus there is no need to introduce SOFS.

In my next post I’m going to explore how we can deploy S2D in Azure. This will be a Converged setup as Azure doesn’t allow nested virtualization. 

Storage Spaces Direct | Introduction

What is Storage Spaces Direct?

Storage Spaces Direct (S2D) is a new storage feature in Windows Server 2016 which allows you to leverage the locally attached disk drives of the servers in your datacentre to build highly available, highly scalable software-defined storage solutions. S2D helps you save your investments on expensive SAN or NAS solutions by allowing you to use your existing NVMe, SSD or SAS drives combined together to provide high performing and simple storage solutions for your datacentre workloads.

S2D Deployment Choices

There are two deployment options available with S2D.

Converged

In a Converged or disaggreagted S2D architecture, Scale-out File Server/s (SoFS) built on top of S2D  provides shared storage on  SMB3 file shares. Like your traditional NAS systems this separates the storage layer from compute and this option is ideal for large scale enterprise deployments such as Hyper-V VMs hosted by a service provider. 

(Image Courtesy) Microsoft TechNet

Hyper Converged

With Hyper Converged S2D deployments, both compute and storage layers reside in same server/s and this allows to further reduce the hardware cost and ideal for SMEs. 

(Image Courtesy) Microsoft TechNet

S2D is the successor of Storage Spaces introduced in Windows Server 2012 and it is the underlying storage system for Microsoft Azure & Azure Stack. In my next post I will explain about the S2D architecture and key components of an S2D solution in much detail.

Following video explains the core concepts of S2D and it’s  internals and use cases.

Fix It | October 2016 Cumulative Windows Updates Crash SCOM Console

In my last post I’ve shared the console crashing issue you face after installing the security updates in  MS16-118 and MS16-126. Now Microsoft has published a new KBs to fix the console crashing issue in SCOM after applying these updates.

Individual hot fixes are available for the following list of Operating systems which you can download from here.

  • Windows Vista
  • Windows 7
  • Windows 8.1
  • Windows Server 2008
  • Windows Server 2008R2
  • Windows Server 2012
  • Windows Server 2012 R2

For Windows 10 and Server 2016, the fix was applied to the latest cumulative updates.

October 2016 Cumulative Windows Updates Crash SCOM Console

It seems like the October 2016 cumulative Windows updates (KB3194798, KB3192392, KB3185330 &KB3185331) cause the SCOM consoles 2012/2016 in all Windows versions from Windows Server 2008 R2 up to 2016 and Windows 7 up to 10 Windows 10 to regularly crash without any doubt.

According to Microsoft Germany’s SCOM PFE Dirk Brinkmann  who has blogged about this issue here, the SCOM team is working on a fix for this as of now and no ETA for an resolution has been provided yet.

Once a fix is available you will be able to see it via SCOM team blog.

Multiple vNICs in Azure ARM VMs

I have recently faced with a challenge to add multiple vNICs to an ARM based Azure VM. The requirement was to add a secondary vNIC while keeping the first one intact. This post explores how  I achieved this task with PowerShell.

Before you do anything make sure that the existing vNIC Private IP address has been set to static as below. Otherwise once the VM Update operation is completed you will lose that IP address. Optionally you can set the Public IP address to reserved (static) as well. The VM should already have at least two NICs and it is not supported to pass from a single NIC VM to multiple NIC VM and vice versa.

1-vnic-private-ip

2-vnic-public-ip
Create a new vNIC

I have defined the properties for the new vNIC as below.

$VNET = Get-AzureRmVirtualNetwork -Name ‘vmnic-rg-vnet’ -ResourceGroupName ‘vmnic-rg’
$SubnetID = (Get-AzureRmVirtualNetworkSubnetConfig -Name ‘default’ -VirtualNetwork $VNET).Id
$NICName = ‘jcb-vnic02’
$NICResourceGroup = ‘vmnic-rg’
$Location = ‘East US’
$IPAddress = ‘10.0.0.7’

Next step is to create the new vNIC.

New-AzureRmNetworkInterface -Name $NICName -ResourceGroupName $NICResourceGroup -Location $Location -SubnetId $SubnetID -PrivateIpAddress $IPAddress

3-vnic-create

Adding the new vNIC to an existing VM

I’ve executed below PowerShell snippet which sets the existing vNIC as primary and and updates the VM once the new vNIC is in place.

$VMname = ‘jcb-nicvm01’
$VMRG =  ‘vmnic-rg’

$VM = Get-AzureRmVM -Name $VMname -ResourceGroupName $VMRG

$NewNIC =  Get-AzureRmNetworkInterface -Name $NICName -ResourceGroupName $NICResourceGroup
$VM = Add-AzureRmVMNetworkInterface -VM $VM -Id $NewNIC.Id

Then I’m listing the attached NICs in the VM and setting the first one as primary.

$VM.NetworkProfile.NetworkInterfaces

$VM.NetworkProfile.NetworkInterfaces.Item(0).Primary = $true
Update-AzureRmVM -VM $VM -ResourceGroupName $VMRG

 

Bash on Windows 10 | Developer Mode Failure

Recently with Windows 10 Anniversary Update Microsoft has introduced Linux Subsystem as an optional feature. In order to get enable Bash on Windows first you need to enable and install Developer Mode package through Settings > Update & Security > For Developers > Developer Mode. Surprisingly on a fresh installation of Windows 10 1607 I’ve encountered below error.

developer-mode-1

The trick is even after this error message I was able to enable Linux subsystem through Control Panel > Programs > Turn Windows Features on or off as below.

developer-mode-2

The reason for this behavior is the extra components required by the OS to enable additional debugging features in Visual Studio or the Windows Device Portal not being installed automatically. Although you see this error message, your PC will be in developer mode and you can enable Windows Developer Mode package like Bash without any issue. Therefore you can ignore the 0x80004005 error.

One thing to keep in mind is that Linux Subsystem is still beta feature and not complete. So you can expect some things to work while there can be failures.

Locking Resources with ARM

Sometimes you need to restrict access to an Azure subscription, resource group or a resource in order to prevent accidental deletion or modification of same by other users. With Azure resource Manager you can lock your resources in two levels.

  • CanNotDelete Authorized users can read and modify a resource, but they can’t delete it.
  • ReadOnly Authorized users can read from a resource, but they can’t delete it or perform any actions on it. The permission on the resource is restricted to the Reader role.

The ReadOnly lock be trick in certain situations. For an example a ReadOnly lock placed in a storage account  prevents all users from listing the keys as the list keys operation is handled through a POST request since the  returned keys are available for write operations. When you apply a lock at a parent, all child resources inherit the same lock. For an example if you apply a lock in a resource group all the resources in it will inherit same and even resources you add later will inherit same.

Locking with PowerShell

Following snippet demonstrates how you can apply a resource lock using PowerShell.

New-AzureRmResourceLock –LockLevel <either CanNotDelete or ReadOnly> –LockName <Lock Name> –ResourceName <resource name> –ResourceType <resource type> –ResourceGroupName <resource group name>

Here you should provide the exact resource type. For a complete list for available Azure resource providers please refer this article.

Azure Resource Policies | Part 2

In my last post we discussed what Azure Resource Policies are and how it can help you to better manage your Azure deployments. Now it’s time to understand how to practically implement and use resource policies in Azure.

Control Virtual Machines sku in a resource group

In this example we are denying the creating VMs other than Standard A1 sku in a resource group by applying a custom resource policy.

First we create a resource group to apply this policy.

$ResourceGroup = New-AzureRmResourceGroup -Name protectedrg -Location “South East Asia”

Next we are going to define our security policy. This policy allows only Standard A1 VMs to be created when applied to a resource group.

$PolicyDefinition = New-AzureRmPolicyDefinition –Name vmLockPolicy -DisplayName vmLockPolicy -Description “Do not allow the creation of Virtual Machines” -Policy ‘{
“if”: {
“allOf”: [
{
“field”: “type”,
“equals”: “Microsoft.Compute/virtualMachines”
},
{
“not”: {
“field”: “Microsoft.Compute/virtualMachines/sku.name”,
“in”: [ “Standard_A1” ]
}
}
]
},
“then”: {
“effect”: “deny”
}
}’

Next we are going to assign the policy to our newly created resource group. First we are retrieving the subscription name and resource group name to which the custom policy to be assigned.

$Subscription = Get-AzureRmSubscription -SubscriptionName “MVP Personal”
$ResourceGroupName = $ResourceGroup.ResourceGroupName

Now we can assign the policy accordingly.

$AssignPolicy = New-AzureRmPolicyAssignment -Name vmLockPolicyAssignment -PolicyDefinition $PolicyDefinition -Scope /subscriptions/$Subscription/resourceGroups/$ResourceGroupName

Let’s try to create a virtual machine of DS1 v2 sku in the protectedrg resource group as below.

arm-policy-vm-creation-1 However the deployment activity fails as per our custom resource policy.

arm-policy-vm-creation-2

In the next post let’s discuss resource locking in Azure.

SQL RP Installation Failure in Azure Stack TP1 | Fix It

Myself & my good friend CDM MVP Nirmal Thewarathanthri have been experimenting with Azure Stack for a while now. Although we tried more than 30 times to install SQL Resource Provider in our Azure Stack lab it was never quite successful. The biggest problem is cleaning up the Azure Stack environment every time after a failure as sometimes we had to do a fresh install from scratch.

The Epic Failure

Following were the symptoms of this issue.

  • The SQL VM installs just fine.
  • Deployment always fails at DSC configuration in the SQL VM.
  • The URL of the ARM template for SQL VM seems no longer valid as you can see.

Here is the full description of the error that we encountered.

VERBOSE: 8:54:27 AM – Resource Microsoft.Compute/virtualMachines/extensions ‘sqlrp/InstallSqlServer’ provisioning
status is running
New-AzureRmResourceGroupDeployment : 10:29:12 AM – Resource Microsoft.Compute/virtualMachines/extensions
‘sqlrp/InstallSqlServer’ failed with message ‘The resource operation completed with terminal provisioning state
‘Failed’.’
At D:\SQLRP\AzureStack.SqlRP.Deployment.5.11.61.0\Content\Deployment\SqlRPTemplateDeployment.ps1:207 char:5
+     New-AzureRmResourceGroupDeployment -Name “newSqlRPTemplateDeploym …
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
    + FullyQualifiedErrorId : Microsoft.Azure.Commands.Resources.NewAzureResourceGroupDeploymentCommand

New-AzureRmResourceGroupDeployment : 10:29:12 AM – An internal execution error occurred.
At D:\SQLRP\AzureStack.SqlRP.Deployment.5.11.61.0\Content\Deployment\SqlRPTemplateDeployment.ps1:207 char:5
+     New-AzureRmResourceGroupDeployment -Name “newSqlRPTemplateDeploym …
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
    + FullyQualifiedErrorId : Microsoft.Azure.Commands.Resources.NewAzureResourceGroupDeploymentCommand

The issue in this case was the unstable Internet connection we had. The ARM template for SQL RP downloads the SQL 2014 ISO first. In our case timeout in the download has stopped the entire process. Once the VM was created SLQ Server 2014 wasn’t installed in it.

To solve this issue we followed below procedure in a fresh installation of MAS TP1. You can try this out in an existing installation with failed SQL RP deployment but there’s no guarantee that you will be able to cleanup the existing resource group. If you have executed the SQL RP installation only once clean up may work but if you have tried it multiple times there’s a high chance of failing to cleanup the existing resource group/s.

  1. Download the SQL image from here.
  2. Open the default PIR image. This is available in the MAS TP1 host  \\sofs\Share\CRP\PlatformImage\WindowsServer2012R2DatacenterEval\WindowsServer2012R2DatacenterEval.vhd
  3. Once you mount the VHD (simply double click to mount), create new Folder called  SQL2014 on the PIR image under C:\ drive
  4. Copy all files from the downloaded ISO into the folder SQL2014
  5. Start the deployment script. If you are trying this on an existing failed deployment, then  re-run the deployment after cleaning up the existing resource group/s for SQL RP.

Once all the deployment tasks are completed you can see a successfully deployed SQL Resource Provider in the portal as below.

SQL RP Success (1)

You can refer the MSFT guide on how to add a SQL resource provider in MAS TP1 deployment here for more information.

Azure Resource Policies | Part 1

Any data center should adhere to certain organizational compliance policies whether it is on-premises or cloud. If your organization is using Microsoft Azure and want your resources to adhere resource conventions and standards that govern the data center policy of your organization how would you do that? For an example you want to restrict person A to not to create VMs larger than Standard A2.  The answer would be to leverage custom resource policies and assigning them at the desired level, be it a subscription, resource group or an individual resource.

Is it same as RBAC?

No it isn’t. Role Based Access Controls in Azure is about actions a user or a group can perform while policies are about actions that can be applied at a resource level.  As an example RBAC sets different access levels in different scopes while policies can control what type of resources that can be provisioned or which locations those resources can be provisioned in an resource group/subscription. These two work together as in order to use a policy a user should be authenticated through RBAC.

Why do we need custom policies?

Imagine that you need to calculate chargeback for your Azure resources by team or department. Certain departments will need to have a limited consumption imposed and you need to charge the proper business unit at the end. Also if your organization wants to restrict what resource or where they are provisioned in Azure. For an example you want to impose a policy that allows user to create Standard A2 VMs only in West Europe region. Another good example is that you want to restrict creating load balancers in Azure for all the teams except the network team.

Policy Structure

As all ARM artifacts policies are also written in JSON format which contains a control structure. You need to specify a condition and what to perform when that condition is met simply like an IF THEN ELSE statement. There are two key components in a custom Azure resource policy.

Condition/Logical operators which contains a set of conditions which can be manipulated through a set of logical operators.

Effect which describes the action that will be performed when the condition is satisfied, either deny, append or audit.  If you create an audit effect it will trigger a warning event service log. As an example your policy can trigger an audit if someone creates a VM larger than Standard A2.

  • Deny generates an event in the audit log and fails the request
  • Audit generates an event in audit log but does not fail the request
  • Append adds the defined set of fields to the request

Following is the simple syntax for creating an Azure Resource policy.

{
“if” : {
<condition> | <logical operator>
},
“then” : {
“effect” : “deny | audit | append”
}
}

Evaluating policies

A policy will be evaluated at the time of resource creation or when a template deployment happens using a HTTP PUT request. If you are deploying a template, it will be evaluated during the creation of each resource in the template.

In the next post let’s discuss some practical use cases of using Azure resource policies to regulate your resources.