Category Archives: Cluster

Storage Spaces Direct | Architecture

In my last post I’ve explained the basics of Storage Spaces Direct in Windows Server 2016. This post explores the internals of S2D and it’s architecture in much simple context.

S2D Architecture & Design

(Image Courtesy) Microsoft Technet

S2D is designed to provide nearly 600K IOPS (read) & 1 Tbps of throughput at it’s ultimate configuration with RDMA adapters & NVMe SSD drives. S2D is all about Software Defined Storage and let’s dissect the pieces that makes up the S2D paradigm one by one.

Physical Disks – You can deploy S2D just inside 2 servers up to 16 servers on from 2 to 16 servers with locally-attached SATA, SAS, or NVMe drives. Keep in mind that each server should at least have 2 SSDs, and at least 4 additional drives which can be SAS or SATA HDD. These commodity SATA and SAS devices should be leverage a host-bus adapter (HBA) and SAS expander. 

Software Storage Bus – Think this as the Fiber Channel and Shared SAS cabling in your SAN solution. Software Storage Bus spans across the storage cluster to establish a software-defined storage fabric and allows all the servers can see all the local drives in each and every host in the cluster.

Failover Cluster & Networking – For server communication, S2D leverages the native clustering feature in Windows Server ans uses SMB3, including SMB Direct and SMB Multichannel, over Ethernet. Microsoft recommends to use 10+ GbE (Mellanox) network cards and switches with remote-direct memory access (RDMA), either iWARP or RoCE.

Storage Pool & Storage Spaces – With the recommendation of one pool per cluster Storage Pools consists of the drives that forms the S2D and it is created by discovering and adding all eligible drives automatically to the Storage Pool. Storage Spaces are your software-defined RAID based on Storage Pools. With S2D the data can have tolerance up to two simultaneous drive or server failures along with chassis and rack fault tolerance as well.

Storage Bus Layer Cache – The duty of the Software Storage Bus  is to dynamically bind the fastest drives present  to slower drives (i.e SSD to HDD) which provides server-side read/write caching to accelerate IO and to boost throughput.

Resilient File System (ReFS) & Cluster Shared Volumes – ReFS is a file system that has been built to enhance server virtualization experience in Windows Server. With Acclerated VHDX Operations feature in ReFS it improves the creation, expansion, and checkpoint merging in Virtual Disks significantly. Cluster Shared Volumes consolidate all the ReFS volumes into a single namespace which you can access from any server so it becomes shared storage.

Scale-Out File Server (SOFS) – If your S2D deployment is a Converged solution it is required to implement SOFS which provides remote file access using the SMB3 protocol to clients. i.e Hyper-V Computer Cluster. In a Hyper Converged S2D solution both storage and compute reside in the same cluster thus there is no need to introduce SOFS.

In my next post I’m going to explore how we can deploy S2D in Azure. This will be a Converged setup as Azure doesn’t allow nested virtualization. 

CSV Access Redirected in Hyper-V Cluster

I’ve been working with Hyper-V for quite sometime. During a recent Hyper-V Cluster deployment that myself and my colleague Hasitha Willarachchi (Enterprise Client Managament MVP) were working with, we have come across an issue which was really interesting to troubleshoot.

For some odd reason one of three Cluster Disks in a 3-Node Hyper-V 2012 R2 Cluster was in Redirected Access status.

CSV GFI Filter 1

When we were going through the cluster event noticed a bunch of 5125 Events complaining about an active system filter driver which is not compatible with CSV. Basically the I/O access to that volume has been redirected through another Hyper-V Node.

CSV GFI Filter 2

We tried changing the ownership of the particular CSV to another node, followed by trying to Turn off the Restricted Access Mode by right clicking the CSV and selecting that option. Changing the ownership was no success and for our surprise the operation to turn off the redirected access mode always failed with Set Operation Failed error.

After doing some research we decided to check up the CSV state and what are the active system filters in that particular volume. So we decided to run below commands in the current node owning the CSV.

CSV GFI Filter 3

We noticed a filter called esecdrv60 was having a frame value of Legacy. The nest command confirms that in all three nodes the CSV access is redirected. Then we immediately checked rest of the nodes with fltmc instances command and found out that same legacy filter was present there as well.

The Culprit aka GFI EndPoint Security

esecdrv60 filter actually belongs to GFI EndPoint Security software, which was installed and running in all three Hyper-V nodes. This software was pushed through it’s default policies and somehow Hyper-V cluster was not excluded in deployment list.

CSV GFI Filter 4

Uninstalling GFI was not possible locally so therefore we worked with GFI administrator to uninstall the software from all three hosts. Remember uninstalling GFI  requires a reboot and therefore we had to live migrate all the VMs and reboot one server at a time.

After uninstalling GFI and rebooting  all three hosts executed fltmc instances again to see whether GFI legacy filters were present or not. As you can see below all legacy filters were gone and CSV was back to normal operation mode without any error.

CSV GFI Filter 5

Following references were really helpful to identify and rectify the issue.

  1. Troubleshooting ‘Redirected Access’ on a Cluster Shared Volume (CSV)
  2. Cluster Shared Volume Diagnostics