Agnus Dei (jackal) wrote,
Agnus Dei
jackal

Debugging issues with pound load balancer threads versus TCP socket states

THE PROBLEM

We've recorded up to 20 seconds for initial handshakes randomly with our load balancer. This causes extreme slowness for the user experience, but it's every inconsistent and hard to replicate.


THE ANALYSIS

We use pound for our load balancer.   We have pound set to 1000 threads.

[root@]# grep Threads /etc/pound.cfg
Threads 1000


When you look from the OS and ask pound how many threads it's actively running it shows an additional 3 magic threads so it shows as 1003. The 3 magic threads are most likely standard in, standard out and standard error.

(NLWP = Number of Light Weight Processes.   Light Weight Process just means "a thread.")

[root@]# ps -upound -onlwp
NLWP
1
1003


Then we ask the OS what are the states of TCP connections for pound:

[root@]# lsof -n -P -a -u pound -c pound -i |egrep -o '\(.*\)' | sort | uniq -c
545 (CLOSE_WAIT)
708 (ESTABLISHED)
2 (LISTEN)


See the problem?

We set the thread limit for pound to be 1000. Which would be fine given that we have 708 ESTABLISH TCP connections to pound (assuming 1 thread per connection). BUT what if these  545 CLOSE_WAIT sockets are holding open threads as well making the total 1200+ which is greater than our 1000 thread limit.  And since we set the OS to "unlimited" file descriptors, it has unlimited sockets, so it doesn't close the connections but instead opens the socket and waits until a thread is available.   This could explain why we are getting such extremely slow initial connections as it's waiting for a thread to be available.

THE SOLUTION

I recommended we double the threads in pound.

Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

  • 0 comments