Debugging VMM Issues with logman

Sometimes Microsoft support will ask you to provide the VMM debug trace logs if you encounter issues with your VMM deployment. Today I’m going to simply explain the process of collecting debug logs for VMM and prasing them to text files.

  • First of all create a flder to store your VMM log files. I prefer to save them on C:\VMMLogs path.
  • Delete any existing VMM logs if present. In order to do this open up a PowerShell window as an administrator on your VMM server and type logman delete VMM and press enter. There will be warnings such as “Data Collector Set was not found” and you can safetly ignore same.
  • Create a VMM trace. You can use the following command to that.

logman create trace VMM -v mmddhhmm -o $env:SystemDrive\VMMlogs\VMMLog_$env:computername.ETL -cnf 01:00:00 -p Microsoft-VirtualMachineManager-Debug -nb 10 250 -bs 16 -max 512

  • Start the VMM trace by entering logman start vmm in the same PowerShell window.
  • Now you can reproduce the VMM issue that you have faced (i.e a job failure)
  • Immediately after reproducing the iussue you need to stop the VMM strace by entering logman stop vmm
  • The log files you created will be of ETL file format.The ETL is a log file created by Microsoft Tracelog, a program that creates logs using the events from the kernel in Microsoft operating systems and are machine readable. So next step is to convert same to text format.
  • You can convert the collected ETL log by entering Netsh trace convert <Path to file name>

I find these logs very useful specially when the errors in Windows Event Viewer are too generic. In fact debug trace can provide more information if you are encountering bizarre issues in your VMM deployment.

Replication Failure in Azure Site Recovery

Azure Site Recovery is a great product for those who want to setup their DR environment with a minimal cost. It is based on Hyper-V replica technology for Hyper-V workloads and supports replication VMware & Physical server workloads to DR as well. Today I’m going to discuss a common issue one can encounter when enabling ASR replication to the cloud.

I’ve been working on an ASR setup during couple months and encountered strange issue when I enabled replication in protected VMs.

The enable protection job fails with below error.

Job ID: f9f84765-b18c-4002-96a4-d420dfb76ea6-2015-05-14 10:00:29Z

Start Time: 5/14/2015 3:30:29 PM

Duration: 10 MINUTES

Protection couldn’t be enabled for the virtual machine. (Error code: 70094)

Provider error: Unable to complete the request. Operation on the <Hyper-V Node>  timed out.

Try the operation again. (Provider error code: 2924)

Possible causes: Protection can’t be enabled with the virtual machine in its current state. Check the Provider errors for more information.

Recommendation: Fix any issues in the Event Viewer logs (Applications and Service Logs – MicrosoftAzureRecoveryServices) on the Hyper-V host server. If this virtual machine is enabled for replication on the Hyper-V host, disable this setting. Then try to enable protection again.

UTC Time: Thu May 14 2015 10:15:59 GMT+0530 (Sri Lanka Standard Time)

Browser: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36

Language: en-us

Portal Version: 5.4.00298.11 (rd_auxportal_stable.150511-1702)

PageRequestId: a04f08ed-8932-43f2-95bc-2faab60ed958

Email Address: xxxxxx@outlook.com (MSA)

Subscriptions: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx

In the particular Hyper-V host following error has been logged in Event logs.

Enable replication failed for virtual machine ‘XXXXXX’ due to a network communication failure. (Virtual Machine ID 807780f6-bb7c-48d5-937d-4857a654dec3, Data Source ID 2256321007502018113, Task ID 8c1a5d7d-0693-4d6b-9243-37cc5e96a7d6)

This ASR setup was a on-premise to Cloud scenario with a single SCVMM server.

After spending a good number of troubleshooting hours I finally figured out what went wrong. The Hyper-V Hosts themselves need Internet connectivity to replicate the VMs to ASR. If you cannot enable direct Internet connectivity on the Hyper-V hosts you should do so via a proxy setup. You can change the proxy settings in ASR Provider in Hyper-V Host.

ASR replication requires traffic to be sent over port 443 (SSL) and in my case only the SCVMM server was configured with Internet access. If you are using a proxy server you may need to consider allowing below for successful replication.

  • *.hypervrecoverymanager.windowsazure.com
  • *.accesscontrol.windows.net
  • *.backup.windowsazure.com
  • *.blob.core.windows.net
  • *.store.core.windows.net
  • Allow the IP addresses in Azure Datacenter IP Ranges and HTTPS (443) protocol. Also your IP address whitelist should contain that of your primary region and  West US IP address ranges.

Internet Access denied in Azure VM

I’ve been working with Microsoft Azure for the past 4 years and sometimes it’s quite challenging to find answers to weird common service misconfiguration that people do. Following is such a scenario where Internet access to Azure VM suddenly stopped. Recently I’ve been working with a customer on an Azure PoC with below setup and I encountered the same issue.

Azure Site

  • One Azure VM
  • Virtual Network with a dynamic VPN Gateway
  • DNS servers are on-premise (only one DNS server)

Issue

Due to an issue with the on-premise VPN device there are frequent disconnections in the VPN. When that happen Internet access is gone in the Azure VM. However RDP access is okay.

Why did I lose Internet when VPN was down?

Since I had only one DNS server on-premise providing Name Resolution for Azure virtual network it was also providing DNS for Internet access as well. So once the VPN is down the VM is basically orphaned in terms of Internet.

Solution

When your create a VM associated to Azure virtual network it will automatically assign an Azure DNS server. You can make a note of this once you login. After you have changed the DNS settings in Azure virtual network to point your on-premise DNS server, add this Azure provided DNS server IP address as a secondary DNS from the Azure Portal.

Internet Access Issue Azure DNS 1

Remember once you do any change to the DNS settings in an Azure virtual network you will have to reboot any servers that are in the same virtual network from the Azure Portal. Restarting within the virtual machine won’t have any effect so you need to make sure you do that from portal as below.

Internet Access Issue Azure DNS 2

Once the restart is completed check the network adapter status in the VM. It will display the Azure DNS as a secondary server. Now even though the VPN is down you can access internet from the VM.

Internet Access Issue Azure DNS 3

System Center Technical Preview 2 Released

During Microsoft Ignite, System Center Team has announced the availability of System Center Technical Preview 2 recently. Preview 2 will be ultimately renamed o System Center 2016 when it will be released next year and as of now has some great enhancements over the current version.

Improved Linux Management Capabilities – Preview 2 has Desired State Configuration (DSC), Native SSH support and improved LAMP server monitoring support for your Linux workloads.

Software defined Datacenter Management – The System Center vNext supports mixed mode cluster upgrades, enhanced Scale-Out File Server (SOFS) management, and deployment of software-defined networking (SDN).

New Workload Monitoring – This version is capable of monitoring Azure & Office 365 and SQL & Exchange server monitoring has improved monitoring scenarios.

You can download the installation files from here. Also you can download pre-configured VHD files for each system center component from below.

System Center Technical Preview 2 Virtual Machine Manager VHD

System Center Technical Preview 2 Data Protection Manager VHD

System Center Technical Preview 2 Orchestrator VHD

System Center Technical Preview 2 Operations Manager VHD

System Center Technical Preview 2 Service Manager VHD

What’s new in Update Rollup 6 for SCOM 2012 R2

Last week Microsoft has announced the availability of System Center 2012 R2 Update Rollup 6. This update rollup contains many improvements plus new functionality for most of the products in the suite. Let’s see what’s in the box for SCOM admins with this release. I got to admit that rather than new features, this UR is more focused on fixing some existing issues in SCOM 2012 R2.

Top Issue Fixes

  • The “Remove-DisabledClassInstance” Windows PowerShell command times out without completing – In large SCOM deployments this cmdlets sometimes times out without completing. This release addresses same by optimizing the underlying query.
  • Duplicate closed alerts – Due to an update issue in the grid, Find option in a message view duplicates results with a previous search and it is fixed to display only the current search results.
  • Topology widget objects lose location when they are opened in a console that has a different locale and decimal format – In this release users can create widgets in different locale where in previous releases it will display incorrect loacle settings in the original local is different.
  • WebConsole Details widget does not display anything – This issue is fixed by fixing the XAML page of the details widget. The XAML page was preventing the data from being displayed on the page.
  • Top 10 Performance widgets (WebConsole) are sometimes empty – Due to an performance issue the Top 10 Performance Widgets in Web Console are empty sometimes. In UR3 we had to manually import a mpb file to fix this. This release has the fix integrated.
  • Problem with decoding SCOM trace log files – SCOM trace logs do not decode but generate the error “Unknown(X ): GUID=XXX (No Format Information found).” when the latest TMF files that contain formatting information for the related traces are missing. In this version All TMF files are updated.

Following issues have been addresses in Unix/Linux MP.

  • JEE: WebLogic 12.1.3 servers on Linux or Solaris are not discovered – In order to solve this issue you need to update the SCOM Agent in target Unix/Linux Servers after updating to the latest Unix/Linux MP.
  • In rare cases, lots of omiagent processes may be observed on a UNIX or Linux computer – Requires the Management Pack Update as above.
  • The SSLv3 protocol cannot be disabled in OpenSSL as used by the UNIX and Linux agents – By modifying the NoSSLv3=<true|false> in omiserver.conf (/etc/opt/microsoft/scx/conf) you can disable SSL v3 in Unix/Linux Monitoring agents. By default ISSLv3 is enabled and will be used only if TLS encryption cannot be negotiated. Otherwise the  agent rejects connections that cannot be negotiated by using TLS encryption.

Installation Procedure

If you have past experience in installing an Update Rollup for SCOM 2012 R2 the procedure is same. You can refer my previous post as a reference. You can download the binaries for a manual installation which I strongly recommend from here.

However for Unix/Linux servers after updating to UR6 make sure you import the latest MP for each version of Unix/Linux OS. You can download the latest Management Pack from here. Finally you have to update the the monitoring agent in those *nix servers.