Way back in the day I wrote about this issue, where the SCOM agent in some cases can consume above typical resource levels of memory, handles, etc. When this occurs – we will restart the agent to kill any “runaway” processes. Read about this here:
One of the things I have noticed, is that on many of my servers, these thresholds are being breached on a regular basis – mostly due to the monitoringhost.exe processes needing to use more than the default of 300mb of RAM (private bytes).
The issue is, that you will likely have NO idea this is happening. We don’t generate any alerts for this by default – we simply “fix the problem” by creating a state change, then running a response script to bounce the agent. The bad part about this, is you could have agents in a constant restart loop.
In SCOM 2012 – I still recommend making the following changes via overrides: Open the “Operations Manager > Agent Details > Agents by Version” view in the console:
Open health explorer for one of the agents – and here is an example of an agent that has been bouncing on a regular basis:
On the 4 monitors highlighted above – I recommend enabling alerting – and disabling auto-close of the alert so you can take action on agents that need it:
Then – for any agents that need higher values – make the necessary adjustments via override:
As a refresher – this will be common on any monitored systems that discover a large number of instances – such as Exchange, DNS, SQL servers, SCVMM, etc.