Network Discovery Rule Failure in SCOM 2012 R2

Although most of my time is now spent on Azure, I love and work on SCOM the best monitoring platform that I’ve ever worked with. Some can say it’s noisy but that’s not true if you know how to tune your SCOM deployment. In a recent adventure I’ve come across another SCOM mystery which is I’m going to tell you how to solve today.

I’ve got a SCOM deployment where there are two management servers and one database server; all part of the same management group. The second management server was implemented solely for the purpose of network device monitoring. For those who know Microsoft does recommend to have a separate management server for that.

First things first, I’ve created a Network Discovery Rule targeting the second management server to be the one that actually does the discovery. If you do not know how to do that you can refer this TechNet article.

The Problem

Though the Network Discovery rules creation was successful I noticed that the rule status is always IDLE and discovers nothing even though I tried to manually run it couple of times. I did all I could possibly fathom restarting services/management servers, recreating the rules, hell even deleting the management pack itself (unsealed management pack  Microsoft.SystemCenter.NetworkDiscovery.Internal which stores the discovery rule) and re-importing. The weirdest thing is if I recreate a rule selecting the first management server I scan discover the network devices but not with the second server. I noticed below error in the second management server’s event log.

SCOM Network Discovery Failure 1Seems like the management server was having trouble with updating the network discovery script and yes obviously I’ve tried it after 3600 seconds like they say. 😉

The Solution

The regular Google search led me to two invaluable posts one from my fellow MVP colleague Daniele Grandini and the other one from TechNet which explained the exact same issue I’ve faced. As Daniele’s post explains it nicely there are couple of events that you can notice in case of a successful or unsuccessful discovery of network devices. But still after performing the steps on both articles I was still at ground zero with no results.

For those who are familiar with my friend & MVP colleague Tao Yang, one of the SCOM Gurus we have in this part of the world know how he does his magic with management packs. Tao has come across the same issue in the past when he was helping out a friend, and he suggested a nice little trick that I’ve missed.

The Trick

Tao suggested to flush the health service state and cache of the ill management server. Now this is one last hope of beacon for us SCOM admins which will perform below tasks.

  1. Stops the System Center Management service.
  2. Deletes the health service store files.
  3. Resets the state of the agent, including all rules, monitors, outgoing data, and cached management packs.
  4. Starts the System Center Management service.

This task leaves no reference to itself as it deletes the cached data in the health service store files, including the record of this task itself.

All you have to do is follow 1>2>3>4 as per below screenshot.

SCOM Network Discovery Failure 2

Now that I’ve done so, I’ve created a brand new network discovery rule for the second management server and let it run for the first time and wait. It did really worked and all I could see was the devices that are discovered with much joy.

SCOM Network Discovery Failure 4

Now looking back at the event log I could see the traces of a successful network discovery.

SCOM Network Discovery Failure 3 revised

Now let’s hear a big round of applause for Master Tao Yang the hero that saved my day.