KBI 310187 TCP/IP Connections Created Faster Than Expired

Version

All Versions

Date

5 Jun 2010

Summary

You may see issues where monitoring appears to stop and all TCP/IP connections from the main engine to other servers cannot be established.

Technical Background

This typically happens on an overloaded server that is monitoring too many devices by itself.

Each time a TCP/IP connection is made, a Windows socket is created.

TCP/IP connections can be both remote AND/OR local connections.

A socket works like this:

Connect, Open Socket, Perform Task, Close Socket.

When the socket is closed, Windows doesn’t remove the socket immediately. It puts the socket into a state called TIME_WAIT.

In the TIME_WAIT state, the socket is still using up resources, and by default, it is removed after 240 seconds (4 minutes).

The issue seems to be that sockets are created faster than they are released by the Operating System.

This will happen on an overloaded system with very frequent TCP connections.

When it hits the maximum (a few thousand sockets), no new sockets can be created.

To test if “sockets” are the root cause, run:

netstat

If there are thousands of connections with the TIME_WAIT state, this is likely the cause.

When the issue occurs, an alternate method to test is to try telnetting to any external machine from the “bad” machine.

e.g.

telnet www.yahoo.com 80

Remote registry and PerfMon are NOT good tests, as they do NOT use TCP connections (it uses Windows API), and no sockets are created.

To re-iterate, connecting FROM the “bad” machine to a remote machine via TCP should theoretically always fail when the TIME_WAIT issue occurs.

However, connecting from a “good” machine into the “bad” machine may work, as the sockets for receiving are different from the sockets for initiating.

Resolution

Rebooting the machine or resetting the TCP/IP stack is only a temporary band-aid until the ports become exhausted again.

The root cause must be addressed — which in this case, is an overloaded server initiating too many connections before the sockets expire.

Architectural changes, such as reducing the load off the Monitoring Engine will help to resolve these issues, but if this is not an option, Microsoft has highlighted registry entries that can greatly reduce the impiact of TCP/IP port exhaustion:

Reduce the client TCP/IP socket connection timeout value from the default value of 240 seconds (4 minutes)

1. Start Registry Editor.

2. Browse to, and then click the following key in the registry:

   HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

3. On the Edit menu, click New, DWORD Value

4. Add the registry value TcpTimedWaitDelay with a decimal value of 30

5. A reboot will be required to make changes effective

For Microsoft’s explanation of this Microsoft issue, see Avoiding TCP/IP Exhaustion