Friday, September 29, 2023

TIME_WAIT: Mitigating TCP/IP Connection Exhaustion

TIME_WAIT is an incredible part of the TCP/IP stack that enables connections to linger until the client properly closes the connection. However, in some cases, the client does not close the connection properly or efficiently. This can result in TCP connections in the TIME_WAIT state to persist until the operating system purges them. For a busy system, this can result in a denial of service on the host because all available connections get tied up in the TIME_WAIT state.

Having worked with many large-scale and high-performance systems through the years, I've seen this scenario play out many times. Fortunately, each operating system has its own way of optimizing for this scenario to minimize the impact.

To determine if this is a problem on your host, track the number of connections in the TIME_WAIT state. For example, on UNIX/Linux and MacOS systems, you can count these connections with netstat:

$ netstat -an|grep -c TIME_WAIT
45564

Here are TCP tunings per operating system that I have used to mitigate this issue:

RedHat/Oracle Linux 8: /etc/sysctl.conf
net.netfilter.nf_conntrack_tcp_timeout_time_wait=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_fin_timeout=1

RedHat/Oracle Linux 7: /etc/sysctl.conf
net.netfilter.nf_conntrack_tcp_timeout_time_wait=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_fin_timeout=1

Solaris 10/11:
ndd -set /dev/tcp tcp_time_wait_interval 30000

Windows:
Reduce TcpTimedWaitDelay from the default of 2 minutes (120 seconds) down to around 20 seconds

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
—> TcpTimedWaitDelay": dword:00000028

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
—> StrictTimeWaitSeqCheck: dword:00000001

Notes for Windows:
 · Changing these values requires a reboot. Plan to do that out of your production hours.
 · TcpTimedWaitDelay is 2 minutes by default, even if the value is not present in the registry.
 · You must set the StrictTimeWaitSeqCheck to 0x1 or the TcpTimedWaitDelay value will have no effect.

While changing this parameter the following important points needs to be considered:
Changing these values requires a reboot. Plan to do that out of your production hours.
TcpTimedWaitDelay is 2 minutes by default, even if the value is not present in the registry.
You must set the StrictTimeWaitSeqCheck to 0x1 or the TcpTimedWaitDelay value will have no effect.

References:
TcpTimedWaitDelay - https://technet.microsoft.com/en-us/library/cc938217.aspx
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/cc731521(v=ws.10)#BKMK_setdynamicportrange
https://support.microsoft.com/en-in/help/929851/the-default-dynamic-port-range-for-tcp-ip-has-changed-in-windows