Quantcast
Channel: TechNet Blogs
Viewing all articles
Browse latest Browse all 36188

Windows 2012 R2 server fails to establish outbound connections

$
0
0

Hi there,

It's been a very long while since I have blogged something here and it's time to come back and continue sharing our field experiences with the IT community hoping to shed light for similar problems.

I was tasked to deal with a customer problem where the end users were reporting various problems like "cannot access the file server, getting authentication prompts" and the IT admins were also observing various problems like the server wasn't properly applying GPOs, Netlogon service complaining about DC access issues and etc. At times, they were even able to manually reproduce the issue by issuing a "telnet DC-IP 389" command from the affected server.

There might be a lot of reasons behind, so I decided to collect a number of logs while the issue was reproduced:

a) TCPIP ETL trace:

You can collect it with the below commands on a Windows client/server: (from an elevated command prompt)

netsh start trace capture=yes scenario=internetclient

<<repro>>

netsh trace stop

b) Network trace:

This could be collected in different ways like using the above command, Wireshark, Network Monitor, Message Analyzer,...

c) Handle outputs

This could be collected as follows:

Note: Handle tool could be downloaded from the following link: https://technet.microsoft.com/en-us/sysinternals/handle.aspx Handle v4.1

handle.exe -a -u >> %computername%_handledetails.txt

handle.exe -s >> %computername%_handlesummary.txt

ANALYSIS:

========

The logs were collected while doing a repro with telnet command on the server. After the logs were shared with us, I checked various things to understand why the outbound connection might be failing (by the way, the file server not being able to authenticate the incoming users was also a side effect of this issue since the file server wasn't able to verify the client credentials via Netlogon secure channel)

1) I first checked network traces, but there were no outgoing connection attempts (TCP SYNs sent to the target server) which means the issue is local to the server itself

2) Then I checked the TCPIP ETL trace and observed the root cause:

Note: You can open up the ETL file that is generated as a result of running netsh command in Network Monitor or Message Analyzer

[0]03E0.5214::01/04/18-15:07:37.5237622 [Microsoft-Windows-TCPIP/Diagnostic] TCP: endpoint (sockaddr=0.0.0.0) bind failed: port-acquisition status = The transport address could not be opened because all the available addresses are in use..

[0]58F0.4558::01/04/18-15:07:51.8242042 [Microsoft-Windows-TCPIP/Diagnostic] TCP: endpoint (sockaddr=0.0.0.0) bind failed: port-acquisition status = The transport address could not be opened because all the available addresses are in use..

[0]04D8.072C::01/04/18-15:07:52.0110322 [Microsoft-Windows-TCPIP/Diagnostic] TCP: endpoint (sockaddr=0.0.0.0) bind failed: port-acquisition status = The transport address could not be opened because all the available addresses are in use.. 1616260 [0]

...

Actually that clearly explained why the outbound connections were failing: PORT EXHAUSTION.

3) And the main reason behind the port failure was a socket leak caused by an outdated 3rd party AV software: (from handles.exe output)

Note: The process name was deliberately changed

92355 ABC.exe pid: 1148 NT AUTHORITYSYSTEM

92517   144: File  (---)   DeviceAfd

92519   148: File  (---)   DeviceAfd

92627   220: File  (---)   DeviceAfd

92629   224: File  (---)   DeviceAfd

92633   22C: File  (---)   DeviceAfd

92635   230: File  (---)   DeviceAfd

92689   29C: File  (---)   DeviceAfd

92701   2B4: File  (---)   DeviceAfd

92703   2B8: File  (---)   DeviceAfd

92705   2BC: File  (---)   DeviceAfd

92707   2C0: File  (---)   DeviceAfd

92743   308: File  (---)   DeviceAfd

92755   320: File  (---)   DeviceAfd

92761   32C: File  (---)   DeviceAfd

92767   338: File  (---)   DeviceAfd

92771   340: File  (---)   DeviceAfd

92773   344: File  (---)   DeviceAfd

92779   350: File  (---)   DeviceAfd

92881   420: File  (---)   DeviceAfd

92897   440: File  (---)   DeviceAfd

92899   444: File  (---)   DeviceAfd

92927   47C: File  (---)   DeviceAfd

92929   480: File  (---)   DeviceAfd

92933   488: File  (---)   DeviceAfd

92935   48C: File  (---)   DeviceAfd

92941   498: File  (---)   DeviceAfd

92977   4E0: File  (---)   DeviceAfd

92993   500: File  (---)   DeviceAfd

93053   578: File  (---)   DeviceAfd

93073   5A0: File  (---)   DeviceAfd

93075   5A4: File  (---)   DeviceAfd

93077   5A8: File  (---)   DeviceAfd

93079   5AC: File  (---)   DeviceAfd

93093   5C8: File  (---)   DeviceAfd

93113   5F0: File  (---)   DeviceAfd

93145   630: File  (---)   DeviceAfd

93165   658: File  (---)   DeviceAfd

93167   65C: File  (---)   DeviceAfd

93175   66C: File  (---)   DeviceAfd

93195   694: File  (---)   DeviceAfd

93199   69C: File  (---)   DeviceAfd

93217   6C0: File  (---)   DeviceAfd

93219   6C4: File  (---)   DeviceAfd

93227   6D4: File  (---)   DeviceAfd

93239   6EC: File  (---)   DeviceAfd

93249   700: File  (---)   DeviceAfd

93253   708: File  (---)   DeviceAfd

93265   720: File  (---)   DeviceAfd

93269   728: File  (---)   DeviceAfd

93271   72C: File  (---)   DeviceAfd

93273   730: File  (---)   DeviceAfd

93275   734: File  (---)   DeviceAfd

93277   738: File  (---)   DeviceAfd

93281   740: File  (---)   DeviceAfd

93283   744: File  (---)   DeviceAfd

93285   748: File  (---)   DeviceAfd

93297   760: File  (---)   DeviceAfd

93299   764: File  (---)   DeviceAfd

93301   768: File  (---)   DeviceAfd

93305   770: File  (---)   DeviceAfd

93307   774: File  (---)   DeviceAfd

93313   780: File  (---)   DeviceAfd

93317   788: File  (---)   DeviceAfd

93321   790: File  (---)   DeviceAfd

93323   794: File  (---)   DeviceAfd

93327   79C: File  (---)   DeviceAfd

93329   7A0: File  (---)   DeviceAfd

93331   7A4: File  (---)   DeviceAfd

93333   7A8: File  (---)   DeviceAfd

93335   7AC: File  (---)   DeviceAfd

93339   7B4: File  (---)   DeviceAfd

93343   7BC: File  (---)   DeviceAfd

93355   7D4: File  (---)   DeviceAfd

93357   7D8: File  (---)   DeviceAfd

93359   7DC: File  (---)   DeviceAfd

93361   7E0: File  (---)   DeviceAfd

93365   7E8: File  (---)   DeviceAfd

93373   7F8: File  (---)   DeviceAfd

93383   810: File  (---)   DeviceAfd

93389   81C: File  (---)   DeviceAfd

 

RESOLUTION:

===========

So we advised the customer to update the 3rd party AV software. Apart from that, you can take the following actions to avoid possible port leak issues:

a) Please make sure that Windows OS runs with latest rollups/security updates

b) Please make sure that all 3rd party softwares are up to date (including Firewall, AV, backup or any kind of software that might have to frequently establish outbound connections)

c) Finally you may consider extending the port range for busy servers which are supposed to establish many outbound connections very frequently. The following is the maximum range that you can set, but you may extend the range in phases instead of maxing out at the very beginning: (from an elevated command prompt)

netsh int ipv4 set dynamicport tcp start=1025 num=64500

netsh int ipv4 set dynamicport udp start=1025 num=64500

and you can decrease the TCPTimedWaitDelay registry key on the servers: (you may lower it to 30 seconds)

https://technet.microsoft.com/en-us/library/cc757512(v=ws.10).aspx TcpTimedWaitDelay

The TcpTimedWaitDelay value determines the length of time that a connection stays in the TIME_WAIT state when being closed. While a connection is in the TIME_WAIT state, the socket pair cannot be reused. This is also known as the 2MSL state because the value should be twice the maximum segment lifetime on the network. To adjust the TcpTimedWaitDelay settings, you have to modify/create the registry settings as listed below:

 

Key: HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesTcpipParameters
Value: TcpTimedWaitDelay
Data Type: REG_DWORD
Range: 30-300 (decimal)
Default value: 0x78 (120 decimal)
Recommended value: 30
Value exists by default? No, needs to be added.

Note: This change requires a server reboot

 

Please note that the same techniques could be applied to virtually any Windows versions as of Windows 7/Windows 2008 R2 onwards easily.

 

Hope this helps

Thanks,

Murat

 


Viewing all articles
Browse latest Browse all 36188

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>