The Dangers of Using tcp_tw_recycle in Linux – Strange Intermittent Timeout Issues

Wednesday, January 13th, 2021

Do Not Use tcp_tw_recycle

I had a very strange connectivity issue recently that I was only able to reproduce intermittently on my own LAN network when connecting to a few of my servers hosting websites that process and receive tons of simultaneous connections at any point in time.  Basically, my connection to a specific set of websites that I host would timeout from my home internet connection.  However, I was never able to reproduce this issue when connecting to the same sites from other networks belonging to my family and friends. 

From my home connection, I used TCPView and saw that SYN_SENT packets were supposedly sent to my servers to establish a connection.  Unfortunately, the server never replied to some of these requests.  As such, my connection would timeout at times, and work perfectly fine sometimes.  I looked at DD-WRT's connection table, and it also claimed that the packets had been sent, but that they were in an UNREPLIED state when I experienced issues.  Thus, packets were supposedly being sent, but the server was not responding at times.  After spending nearly a week trying to tackle this issue and buying new cable internet equipment (an officially supported Comcast modem), I tracked down the issue, and it ended up being a TCP configuration setting on my servers rather than my home LAN equipment.

Modem or Router's Fault?

Originally, I thought my issue was caused by the DD-WRT open source firmware I was running on my wireless router.  If I restored the router's settings to DD-WRT's factory defaults, I could always connect to the websites I was having intermittent connection timeout issues on.  I suspected it might be my router after trying an older router which didn't have any problems either.  I even upgraded the DD-WRT firmware to the latest version and rebuilt my complicated network configuration from scratch.  Unfortunately, the issue was still there.  Thus, despite mixed results with different routers, I started to wonder if the issue was on my server's end.

Finally Fixed

I started looking at sysctl TCP settings I could adjust on my router, and I ended up comparing some of these values to the ones used on my servers (that were hosting the problem websites).  Eventually, I came across configuration values I had changed myself several months ago which were supposed to help the server support multiple simultaneous connections.

After reading this StackOverflow thread (https://stackoverflow.com/questions/6426253/tcp-tw-reuse-vs-tcp-tw-recycle-which-to-use-or-both), I decided I would try disabling the tcp_tw_recycle setting.

/proc/sys/net/ipv4/tcp_tw_recycle was set to 1 (enabled) from tweaks I had run that I had found on the internet.  After I disabled it, /proc/sys/net/ipv4/tcp_tw_recycle was set to 0 (disabled).  By default, Linux keeps tcp_tw_recycle disabled.  Again, this is something I had changed for tuning reasons.  After disabling this setting and rebooting the server, I no longer have any issues connecting to the severs in question.  No more connection timeouts, and everything works properly again.

I have no idea why I wasn't able to reproduce this issue on other networks.  I thought it was my network equipment (modem and router), but it ended up being the server.

Lessons Learned

Be careful when applying settings you find online.  Sometimes, they may not work, or their usage may be buggy.  In fact, net.ipv4.tcp_tw_recycle has been removed from Linux in kernel versions newer than 4.12 by default.  I'm guessing this is because it doesn't work, as I experienced.  Do NOT use  net.ipv4.tcp_tw_recycle! I kept tcp_tw_reuse enabled, so you can enable this setting without running into problems.  Just don't for the love of anything use tcp_tw_recycle!  It doesn't work, and it will cause you headaches trying to track down strange intermittent issues!