Process Server fails to communicate after ASR Update Rollup 22

Recently I have been working on an ASR implementation for a customer, where it was required to upgrade the MARS version from 9.8 to 9.13. Microsoft has set a deadline of 28th February for this update as after that enabling replication using older version of MARS wouldn’t work. Here is what you see when you are prompted to perform this upgrade.

Since we were running an incompatible version of MARS (Microsoft claims that you need to have a n-4 version of DRA in order to successfully perform this update) we had to perform a step upgrade from 9.8 to 9.10 and then to 9.13. This link provides to reference to that.

The Issue

Both process servers in the environment have been successfully upgraded to 9.13 as a step upgrade. But as soon as the latest version was installed, both the config server and the secondary process server lost communication with the ASR vault and refresh server connection task has been failing for no reason.

We have raised this with Microsoft support and the support engineer assigned to our case found out one alarming issue under ASR operational log under Windows Event viewer in the config server.

 

System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.IdentityModel, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified.  File name: 'Microsoft.IdentityModel, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35'    

at SrsRestApiClientLib.SrsCreds.InitializeServiceUrl()    

at SrsRestApiClientLib.SrsCreds..ctor(String resourceId, String siteId, String draId, X509Certificate2 cert, AcsConfiguration acsConfig, Boolean retryFetch, String apiVersion, Proxy webProxy, String msiVersion, String vmmVersion)    

at SrsRestApiClientLib.ClientHelper.InitializeDra(String resourceId, String siteId, String draId, X509Certificate2 cert, AcsConfiguration acsConfig, String apiVersion, Proxy webProxy, String msiVersion, String vmmVersion)    

at Dra.SrsCommunication.SrsCommunicationClient.InitializeSrsCommunication()    

at Dra.SrsCommunication.SrsCommunicationClient..ctor(IDraFabricAdapter fabricAdapter)    

at Dra.Dra.Initialize(IDraFabricAdapter fabricAdapter)    

at Dra.DraFactory.CreateInstance(IDraFabricAdapter fabricAdapter)    

at Microsoft.VirtualManager.Engine.DRA.Core.Core.Initialize()   

WRN: Assembly binding logging is turned OFF.  To enable assembly bind failure logging, set the registry value [HKLM\Software\Microsoft\Fusion!EnableLog] (DWORD) to 1.  Note: There is some performance penalty associated with assembly bind failure logging.  To turn this feature off, remove the registry value [HKLM\Software\Microsoft\Fusion!EnableLog].  ,{00000000-0000-0000-0000-000000000000}



======================

The Culprit

As you may have already notice there is an exception logged stating that the DRA cannot find ‘Microsoft.IdentityModel, Version=3.5.0.0’, which is indeed a part of Windows Identity Foundation 3.5 role in Windows Server. When we checked the installed roles in the config server this role wasn’t present and we have enabled it again to see whether it would have made any difference. Viola!, the process servers reestablished the communication to the vault soon after this role has been installed.

Aftermath

ASR engineering team has confirm that the DRA has a pre-requisite to check whether the config server has Microsoft .NET framework 4.5 installed. Furthermore with .NET 4.5, WIF role is fully integrated into the .NET Framework, meaning that it should be automatically installed alongside .NET 4.5 should have been present in this case. They have reproduced the issue and verified that in upgrade from 9.10 to 9.13, no issues are observed as there are no options available to disable WIF along with .NET Framework 4.5 installation (DRA runs ion a minimum of .NET 4.5 and the latest requirement supports Recently .NET framework 4.6.2).

My best bet is that this has happened when we performed the upgrade from 9.8 to 9.10 as we haven’t reproduced that possibility. The config server haven’t had any major change in its configuration so the only suspect is the 9.8 to 9.10 upgrade. Well I’m exploring that possibility right now and will be publishing a another post if my theory is confirmed. But until then, it still remains a mystery.