We've recorded up to 20 seconds for initial handshakes randomly with our load balancer. This causes extreme slowness for the user experience, but it's every inconsistent and hard to replicate.
We use pound for our load balancer. We have pound set to 1000 threads.
[root@]# grep Threads /etc/pound.cfg
When you look from the OS and ask pound how many threads it's actively running it shows an additional 3 magic threads so it shows as 1003. The 3 magic threads are most likely standard in, standard out and standard error.
(NLWP = Number of Light Weight Processes. Light Weight Process just means "a thread.")
[root@]# ps -upound -onlwp
Then we ask the OS what are the states of TCP connections for pound:
[root@]# lsof -n -P -a -u pound -c pound -i |egrep -o '\(.*\)' | sort | uniq -c
See the problem?
We set the thread limit for pound to be 1000. Which would be fine given that we have 708 ESTABLISH TCP connections to pound (assuming 1 thread per connection). BUT what if these 545 CLOSE_WAIT sockets are holding open threads as well making the total 1200+ which is greater than our 1000 thread limit. And since we set the OS to "unlimited" file descriptors, it has unlimited sockets, so it doesn't close the connections but instead opens the socket and waits until a thread is available. This could explain why we are getting such extremely slow initial connections as it's waiting for a thread to be available.
I recommended we double the threads in pound.