Everybody wants to see a greener SCOM console where nothing is in critical status. Sometimes even though your infrastructure is functioning properly you may still see critical health status and missing alerts in your SCOM. Let’s look at two different scenarios like this and try to troubleshoot each.
My SCOM Management Server or Monitored Servers are healthy. But why do I see them in Critical status now and then even there are no unhealthy child monitors?
MOM Agent’s communication channel between an agent and a management server is maintained by Health Service. This tries to connect to a MS every 60 seconds. By default if 3 heartbeats are missed consecutively the health service monitor turns into Critical status where MS assumes that there is a connection failure between the agent and MS. This is if you have a single MS but if you have a secondary MS it will try to connect to that one after the third try.
The culprit here is corrupted cache in Health Service Monitor. Sometimes even though the Health Service is OK, if the cache is messy it will still display the RED alert. This can happen if your MS or agents are generating lot of alerts.
Clearing Health Service Cache
This is the final troubleshooting step that you can get before you uninstall the MOM agent from a monitored server. Interesting thing is this is kind a like GHOST PROTOCOL (yes the Tom Cruz movie) where after you perform this task it won’t have a task status of that since after the cache is cleared there won’t be any record of that. In simple terms this is where you HIT RESET in the agent and it does the following.
- Stops the System Center Management service.
- Deletes the health service store files.
- Resets the state of the agent, including all rules, monitors, outgoing data, and cached management packs.
- Starts the System Center Management service.
Clearing the health service cache is pretty straight forward.
Missing Alerts in SCOM Console
Sometimes you may have encountered an error like “An object of type MonitoringAlert with id xxx was not found” when you click on an alert in SCOM console. This is because of a corrupted SCOM console cache. The alerts may be already resolved but here in the console they might be still present.
To clear the SCOM Console cache type below command in a Run window and press enter.
“C:\Program Files\System Center Operations Manager 2012 R2\Console\Microsoft.EnterpriseManagement.Monitoring.Console.exe” /clearcache
If you have an older version of SCOM this path may be different. You can get the actual path by right clicking the SCOM console shortcut and copying the path.
This command will delete the momcache.mdb found in user’s appdata folder which contains the user display preferences for each user that uses SCOM console.