Tag Archives: Azure Site Recovery

Process Server fails to communicate after ASR Update Rollup 22

Recently I have been working on an ASR implementation for a customer, where it was required to upgrade the MARS version from 9.8 to 9.13. Microsoft has set a deadline of 28th February for this update as after that enabling replication using older version of MARS wouldn’t work. Here is what you see when you are prompted to perform this upgrade.

Since we were running an incompatible version of MARS (Microsoft claims that you need to have a n-4 version of DRA in order to successfully perform this update) we had to perform a step upgrade from 9.8 to 9.10 and then to 9.13. This link provides to reference to that.

The Issue

Both process servers in the environment have been successfully upgraded to 9.13 as a step upgrade. But as soon as the latest version was installed, both the config server and the secondary process server lost communication with the ASR vault and refresh server connection task has been failing for no reason.

We have raised this with Microsoft support and the support engineer assigned to our case found out one alarming issue under ASR operational log under Windows Event viewer in the config server.

 

System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.IdentityModel, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified.  File name: 'Microsoft.IdentityModel, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35'    

at SrsRestApiClientLib.SrsCreds.InitializeServiceUrl()    

at SrsRestApiClientLib.SrsCreds..ctor(String resourceId, String siteId, String draId, X509Certificate2 cert, AcsConfiguration acsConfig, Boolean retryFetch, String apiVersion, Proxy webProxy, String msiVersion, String vmmVersion)    

at SrsRestApiClientLib.ClientHelper.InitializeDra(String resourceId, String siteId, String draId, X509Certificate2 cert, AcsConfiguration acsConfig, String apiVersion, Proxy webProxy, String msiVersion, String vmmVersion)    

at Dra.SrsCommunication.SrsCommunicationClient.InitializeSrsCommunication()    

at Dra.SrsCommunication.SrsCommunicationClient..ctor(IDraFabricAdapter fabricAdapter)    

at Dra.Dra.Initialize(IDraFabricAdapter fabricAdapter)    

at Dra.DraFactory.CreateInstance(IDraFabricAdapter fabricAdapter)    

at Microsoft.VirtualManager.Engine.DRA.Core.Core.Initialize()   

WRN: Assembly binding logging is turned OFF.  To enable assembly bind failure logging, set the registry value [HKLM\Software\Microsoft\Fusion!EnableLog] (DWORD) to 1.  Note: There is some performance penalty associated with assembly bind failure logging.  To turn this feature off, remove the registry value [HKLM\Software\Microsoft\Fusion!EnableLog].  ,{00000000-0000-0000-0000-000000000000}



======================

The Culprit

As you may have already notice there is an exception logged stating that the DRA cannot find ‘Microsoft.IdentityModel, Version=3.5.0.0’, which is indeed a part of Windows Identity Foundation 3.5 role in Windows Server. When we checked the installed roles in the config server this role wasn’t present and we have enabled it again to see whether it would have made any difference. Viola!, the process servers reestablished the communication to the vault soon after this role has been installed.

Aftermath

ASR engineering team has confirm that the DRA has a pre-requisite to check whether the config server has Microsoft .NET framework 4.5 installed. Furthermore with .NET 4.5, WIF role is fully integrated into the .NET Framework, meaning that it should be automatically installed alongside .NET 4.5 should have been present in this case. They have reproduced the issue and verified that in upgrade from 9.10 to 9.13, no issues are observed as there are no options available to disable WIF along with .NET Framework 4.5 installation (DRA runs ion a minimum of .NET 4.5 and the latest requirement supports Recently .NET framework 4.6.2).

My best bet is that this has happened when we performed the upgrade from 9.8 to 9.10 as we haven’t reproduced that possibility. The config server haven’t had any major change in its configuration so the only suspect is the 9.8 to 9.10 upgrade. Well I’m exploring that possibility right now and will be publishing a another post if my theory is confirmed. But until then, it still remains a mystery.

Protecting Azure IaaS VMs with Managed Disks with Site Recovery

Microsoft Azure Site Recovery team has just announced the capability to protect Azure IaaS VMs with managed disks using ASR as a public preview in all ASR enabled regions. This was an important steps as lot of customers were expecting to protect their VM workloads with managed disks, compared to the previous unmanaged disk scenario.

A2A Managed Disk Architecture

The challenge here was how do you replicate disk without requiring a VM object. To overcome this hurdle, ASR uses one storage account, in source region which will cache the disks to the target region. This enables ASR to create a replica managed disk in the target region for each VM protected in the primary region and this replica disk will be the data store for the source disk in the primary region. One important thing to note is that the initial replication between the source and target happens externally using a snapshot at the VM level, so are delta syncs.

Things to Remember

  • VM protection can be enabled via VM settings blade or in the recovery services vault settings.
  • If you have VMs with unmanaged disks that are currently protected by ASR, you need to disable protection and convert the VMs to managed disks first as conversion from unmanaged to managed disks while protected by ASR is not supported.
  • There is an option for you to selected the type of the replica disks, to standard or premium. See below screenshot.
  • Only one cache storage account is needed to so store the data changes from source to primary regions, but you can leverage multiple cache accounts per VM as well.

Azure Site Recovery | New Onboarding Experience for VMWare to Azure Workloads

Microsoft has introduced a new onboarding experience with the latest update to Azure Site Recovery service for VMware to Azure  workloads. In this blogpost we are going to explore how this new experience save time when you setup your ASR infrastructure on-premises.

Open Virtualization Format (OVF) template-based configuration server deployment

Previously you had to download and install the ASR configuration server package in to a VMWare VM running a supported OS which you have created earlier. With this update you will be using a OVF template, which you can directly import in a VMWare host as a guest VM that functions as the configuration server for your ASR setup. All the necessary software, except MySQL Server 5.7.20 and VMware PowerCLI 6.0, is pre-installed in this VM template.

Below video from Microsoft ASR product team explains how you can leverage the new OVF template setup.

Web Portal for Configuration Management

With this update, Microsoft has introduced a new web portal in the configuration server. All configuration servers deployed using the OVF template will use this portal modify the following settings.

New Mobility Service Deployment model

If you are familiar with VMWare to Azure scenario in ASR, then you know the difficulties of the mobility service push deployment method for your VMware VMs. Previously this required you to open firewall rules for WMI and File and Printer Sharing services in Windows for the protected VMs. The reason being that WMI and File and Printer Sharing services were used by ASR service to push install the mobility service on the protected VMs. Not every organization allows this firewall exception in production environments.

In the latest ASR release, VMware tools will be install/update mobility services on all protected VMware VMs replacing the need to open above mentioned services in firewall rules. One thing to keep in mind is that VMware tools based mobility service installation is available only if you update your configuration servers to version 9.13. xxxx.x.

Azure Site Recovery Comprehensive Monitoring

One of the challenges I had with Azure Site Recovery is to provide a Business Continuity Dashboard experience for the IT administrators in my customer organizations. Previously I was able to achieve somewhat of this task by creating Azure Dashboards that showcase the components of the ASR environment. However Microsoft has recently introduced an out-of-the-box comprehensive monitoring dashboard experience for Azure Site Recovery. This gives full visibility into whether business continuity objectives are being met for organizations and also a failover readiness model that monitors resource availability and suggests configurations based on best practices.

What’s new in Azure Site Recovery Monitoring

Below is the new dashboard you see when you navigate to the Overview section of your recovery services vault in Azure.

  • Enhanced vault overview page – The new vault overview page features a dashboard that presents everything you need to know to understand if your business continuity objectives are being met. In addition to the information needed to understand the current health of your business continuity plan, the dashboard features recommendations based on best practices, and in-built tooling for troubleshooting issues that you may be facing.
  • Replication health model – Continuous, real time monitoring of replication health of servers based on an assessment of a wide range of replication parameters.
  • Failover readiness model – A failover readiness model based on a comprehensive checklist of configuration and disaster recovery best practices, and resource availability monitoring, to help gauge your level of disaster preparedness.
  • Simplified troubleshooting experience – Start at the vault dashboard and dive deeper using an intuitive navigational experience to get in depth visibility into individual components, and additional troubleshooting tools including a brand new dashboard for replicated machines.
  • In-depth anomaly detection tooling to detect error symptoms, and offer prescriptive guidance for remediation.

Azure Site Recovery Updates | Support for Large Disks

Microsoft Azure recently announced the support for large disks up to 4 TB. Now Azure Site Recovery supports protecting on-premises VMs and physical servers with disks up to 4095 GB in size to Azure. Many customers use disks with more than 1 TB in capacity for various reasons. A good example would be SQL databases and file servers. The availability of large disks in Azure allows you to leverage ASR as a DR solution for your datacenter infrastructure. 

Large disks in Azure are available both in standard and premium tiers. Standard disks offer two sizes  S40 (2TB) and S50 (4TB) for both managed and unmanaged disks. If you have IO intensive workloads that require premium storage you can use P40 (2TB) and P50 (4TB)  for both managed and unmanaged disks.

Pre-requisites for protecting VMs with large disks in ASR

You need to make sure that your on-premises ASR infrastructure components are up-to-date before you  you start protecting VMs and/or physical servers with disks greater than 1 TB in size. 

VMware/Physical Servers  Install the latest update on the Configuration server, additional process servers, additional master target servers and agents.
SCVMM managed Hyper-V environments Install the latest Microsoft Azure Site Recovery Provider update on the on-premises VMM server.
Standalone Hyper-V servers not managed by SCVMM Install the latest Microsoft Azure Site Recovery Provider on each Hyper-V server that is registered with Azure Site Recovery.

Note that protecting Azure VMs with large disks is not a currently supported scenario. 

Azure Site Recovery Updates | Storage Spaces & Windows Server 2016

Microsoft has recently announced a preview for protecting Azure IaaS VMs with ASR. Now you can protect Azure VMs running Windows Server 2016 . Also ASR now supports protecting Azure IaaS VMs with Storage Spaces. Storage Spaces allow you to  improve IO performance by striping disks and to create logical disks larger than 4 TB. 

Following is a list of all supported OS versions that can be protected using ASR.

Windows
  • Windows Server 2016 (Server Core and Server with Desktop Experience)
  • Windows Server 2012 R2
  • Windows Server 2012
  • Windows Server 2008 R2 SP1 and above
Linux
  • Red Hat Enterprise Linux 6.7, 6.8, 7.0, 7.1, 7.2, 7.3
  • CentOS 6.5, 6.6, 6.7, 6.8, 7.0, 7.1, 7.2, 7.3
  • Ubuntu 14.04/16.04 LTS Server (only supported kernel versions)
  • SUSE Linux Enterprise Server 11 SP3
  • Oracle Enterprise Linux 6.4, 6.5 running either the Red Hat compatible kernel or Unbreakable Enterprise Kernel Release 3 (UEK3)

Azure Site Recovery updates | Managed Disks & Availability Sets

Azure Site Recovery team has made some significant improvements to the service during past couple of months. Recently Microsoft has announced the support for managed disks and availability sets with ASR. 

Managed Disks in ASR

Managed disks allow simplified disk management for Azure IaaS VMs and users no longer have to leverage storage accounts to store the VHD files. With ASR,  you can attach managed disks to your IaaS VMs during a failover or migration to Azure. Additional using managed disks ensure reliability for VMs placed in Availability Sets by guaranteeing that the failed over VMs are automatically placed in different storage scale units (stamps) to avoid any single point of failure.

Availability Sets in ASR

Site Recovery now supports configuring VMs into availability sets in ASR VM settings. Previously users had to leverage a script that can be integrated to the recovery plan to achieve this goal. Now you can configure availability sets before the failover so that you do not need to rely on any manual intervention.

Below are some considerations to be made when you are using these two features.

  • Managed disks are supported only in Resource manager deployment model.  
  • VMs with managed disks can only be part of availability sets with “Use managed disks” property set to Yes
  • Creation of managed disks will fail , if the replication storage account was encrypted with Storage Service Encryption (SSE). If this happens during a failover you can  either set “Use managed disks” to “No” in the Compute and Network settings for the VM and retry failover or disable protection for the vm and protect it to a storage account without Storage service encryption enabled.
  • Use this option only if you plan to migrate to Azure for any SCVMM managed/unmanaged Hyper-V VM’s Failback from Azure to on-premises Hyper-V environment is not currently supported for VMs with managed disks.
  • Disaster Recovery of Azure IaaS machines with managed disks is not supported currently.

Savision Free Whitepaper | MVP Peter de Tender

Business Continuity & Disaster Recovery is a critical functionality in today’s IT business. Microsoft Azure Site Recovery is premier solution that can reduce your BCDR cost drastically. Planing a DR solution requires lot of effort and time and using Azure Site Recovery enterprises can have a bullet proof BCDR solution within couple of days where you only pay when a disaster has actually happened.

In Savision’s newest free whitepaper MVP Peter de Tender explains why you should focus on building an effective Disaster Recovery Plan for your virtualized Data center. This whitepaper explains,

  • How to leverage Microsoft Azure Site Recovery to build a DR solution for Hyper-V?
  • Azure Site Recovery for VMWare & Physical servers
  • Leveraging Azure IaaS for a hybrid data center,

You can download the whitepaper from here.

 

Replication Failure in Azure Site Recovery

Azure Site Recovery is a great product for those who want to setup their DR environment with a minimal cost. It is based on Hyper-V replica technology for Hyper-V workloads and supports replication VMware & Physical server workloads to DR as well. Today I’m going to discuss a common issue one can encounter when enabling ASR replication to the cloud.

I’ve been working on an ASR setup during couple months and encountered strange issue when I enabled replication in protected VMs.

The enable protection job fails with below error.

Job ID: f9f84765-b18c-4002-96a4-d420dfb76ea6-2015-05-14 10:00:29Z

Start Time: 5/14/2015 3:30:29 PM

Duration: 10 MINUTES

Protection couldn’t be enabled for the virtual machine. (Error code: 70094)

Provider error: Unable to complete the request. Operation on the <Hyper-V Node>  timed out.

Try the operation again. (Provider error code: 2924)

Possible causes: Protection can’t be enabled with the virtual machine in its current state. Check the Provider errors for more information.

Recommendation: Fix any issues in the Event Viewer logs (Applications and Service Logs – MicrosoftAzureRecoveryServices) on the Hyper-V host server. If this virtual machine is enabled for replication on the Hyper-V host, disable this setting. Then try to enable protection again.

UTC Time: Thu May 14 2015 10:15:59 GMT+0530 (Sri Lanka Standard Time)

Browser: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36

Language: en-us

Portal Version: 5.4.00298.11 (rd_auxportal_stable.150511-1702)

PageRequestId: a04f08ed-8932-43f2-95bc-2faab60ed958

Email Address: xxxxxx@outlook.com (MSA)

Subscriptions: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx

In the particular Hyper-V host following error has been logged in Event logs.

Enable replication failed for virtual machine ‘XXXXXX’ due to a network communication failure. (Virtual Machine ID 807780f6-bb7c-48d5-937d-4857a654dec3, Data Source ID 2256321007502018113, Task ID 8c1a5d7d-0693-4d6b-9243-37cc5e96a7d6)

This ASR setup was a on-premise to Cloud scenario with a single SCVMM server.

After spending a good number of troubleshooting hours I finally figured out what went wrong. The Hyper-V Hosts themselves need Internet connectivity to replicate the VMs to ASR. If you cannot enable direct Internet connectivity on the Hyper-V hosts you should do so via a proxy setup. You can change the proxy settings in ASR Provider in Hyper-V Host.

ASR replication requires traffic to be sent over port 443 (SSL) and in my case only the SCVMM server was configured with Internet access. If you are using a proxy server you may need to consider allowing below for successful replication.

  • *.hypervrecoverymanager.windowsazure.com
  • *.accesscontrol.windows.net
  • *.backup.windowsazure.com
  • *.blob.core.windows.net
  • *.store.core.windows.net
  • Allow the IP addresses in Azure Datacenter IP Ranges and HTTPS (443) protocol. Also your IP address whitelist should contain that of your primary region and  West US IP address ranges.

Trick or Treat | Protecting large VHDs with Azure Site Recovery

One of the most annoying problems with ASR is that it cannot protect VMs with VHDs which has capacity greater 1 TB. Now we know the OS drive limitation has been already addressed by Microsoft (refer my previous blog post) but still 1 TB cap is there for data disks. This  is a limitation in Azure itself as of now. Let’s see a workaround that we can leverage to overcome this barrier.

Solution

This solution involves creating new striped disk which consists of the creation of a new striped disk drive consisting multiple smaller VHD images less than 1 TB each. Here we are copying the data from the old VHD to the new striped volume and the remove the old VHD.

Prerequisites

    • VM should be in shutdown state.
    • Required number of 1 TB VHDs should be added to the VM that can accommodate the size of the VHD which is greater than 1 TB. Keep in mind these VHDs are dynamic not fixed.

Procedure

  • Start the VM and stop any application services that are running i.e SQL
  • Go to Computer Management > Disk management tab. If prompted to initialize the new VHDs click OK and proceed.
  • Right-click one of the new unallocated volumes, and then click Create striped volume.
  • Select all the new volumes that are displayed in the wizard to create a striped volume.
  • Assign a temporary drive letter (i.e, F:) to the new drive and format the new drive to NTFS
  • Now you can copy the data between the two drives. Use below robocopy command to do so.
robocopy E:\ F:\ /mir
  • Change the drive letter of the new disk to the drive letter of the old disk. (Swap the drive letters)
  • In a PowerShell window (as administrator) run diskpart.
  • Type SAN POLICY=OnlineAll.
  • Shut down the VM and remove the old VHD image from the VM.
  • Start the VM and the services that you’ve stopped earlier, and try to protect the VM with Site Recovery now.