Stop Healthservice restarts in SCOM 2016

This is probably the single biggest issue I find in 100% of customer environments.

YOU ARE IMPACTED. Trust me.

SCOM monitors itself to ensure we aren’t using too much memory, or too many handles for the SCOM processes. If we detect that the SCOM agent is using an unexpected amount of memory or handles, we will forcibly KILL the agent, and restart it.

That sounds good right?

In theory, yes. In reality, however, this is KILLING your SCOM environment, and you probably aren’t aware it is even it is happening.

The problem?

1. The default thresholds are WAY out of touch with reality. They were set almost 10 years ago, when systems used a LOT less resources than modern operating systems today. This is MUCH worse if you choose to MULTIHOME. Multi-homed agents can use twice as many resources as non-multi-homed agents, and this restart can be issued from EITHER management group, but will affect BOTH.

2. We don’t generate an alert when this happens, so you are blind that this is impacting you.

We need to change these in the product. Until we do, a simple override is the solution.

Why is this so bad?

This is bad because of two impacts:

1. You are hurting your monitored systems by restarting them over and over, causing the startup scripts to run on loops and actually consuming additional resources. You are actually going periods of time without any monitoring because of this as well, because when the agent is killed and restarting, there is a period of time where the monitoring is unloaded.

2. You are filling SCOM with state change events. Every time all the monitors initialize, they send an updated “new” statechange event unpon initialization. You are hammering SCOM with useless state data.

What can I do about it?

Well, I am glad you asked! We simply need to override 4 monitors, to give them realistic agent thresholds, and set them to generate an informational alert. I will also include a view for these alerts so we can see if anyone is still generating them. I will wrap all this in a sample management pack for you to download.

In the console, go to Authoring, Monitors, and change scope to “Agent”

We will override each one:

Private bytes monitors should be set to a default threshold of 943718400 (triple the default of 300MB)

Handle Count monitors should be set to 30000 (the default of 6000 is WAY low)

Override Generate Alert to True (to generate alerts)

Override Auto-Resolve to False (even though default is false, this must be set, to keep from auto-closing these so you can see them and their repeat count)

Override Alert severity to Information (to keep from ticketing on these events)

Override EACH monitor, “all objects of class” and choose “Agent” class.

NOTE: It is CRITICAL that we choose the “Agent” class for our overrides, because we do not want to impact thresholds already set on Management Servers or Gateways.

This is a good configuration:

Ok – those are much more reasonable defaults.

What else should I do?

Create an alert view that shows alerts with name “Microsoft.SystemCenter.Agent.%”

This will show you if you STILL have some agents restarting on a regular basis. You should review the ones with high repeat counts on a weekly basis, and adjust their agent specific thresholds, or investigate why they are consuming so much, so often. An occasional agent restart (one or less per day) is totally fine and probably not worth the time to investigate.

I am including a management pack with these overrides, and the alert view, and you can download it below if you prefer to to make your own.

Download:

https://gallery.technet.microsoft.com/SCOM-Agent-Threshold-b96c4d6a

Stop Healthservice restarts in SCOM 2016

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112