One Request Per Mongrel, Dammit

Much has been written about the fact that Ruby On Rails is single-threaded, and how much that sucks. As a side effect, mongrel (the most popular Ruby webserver, AFAIK) is single-threaded if you've got rails code on the back end.

For those of you who haven't dealt with this yet, a single-threaded web server Is Really Bad(tm).

The issue is that how long a request takes to answer is dependent not only on how long the request takes to process, but how long every other request in the queue before it takes to process. So a slow request can mess things up for an arbitrary number of other requests.

The general "solution" to this is to launch a bunch of mongrels (say, 10, because they're memory hogs) and load balance across them.

This doesn't actually work for most load balancing solutions, because basically nothing else on the planet that can be load balanced is single threaded, so most LB solutions will happily pile up connections against each and every one of your mongrels. Wheee.

There's a fair bit of discussion on the web of people dealing with this problem, including not a few discussions of solutions, but there's very little talking in a cogent way about which solutions don't work, and why, hence this page.

Requirements

For something to sit in front of mongrel to make rails suck less (I know that's a wierd way to phrase it, but the single-threading is Rails' fault, not mongrel's), I have basically three requirements:

  • Must serve static assets, because holy shit is that a waste of Ruby's precious memory hogging... Erm... I mean time.
  • Must deliver decent timing data, because Rails straight-up lies about how long it takes to serve things, which makes it hard to find the pages that need attention.
  • Must only make one connection to any given mongrel at any given time. This is the hard part, because it's a stupid requirement brought on by stupid software written by, in the words of former co-worker Jan Kujawa, retarded twelve year olds.

The Solution

I know how impatient I am when reading this sort of thing, so I'll save you some reading and just tell you what worked. We have nginx (external link) out front, but all it really does is serve static assets, so really any webserver would work. It reverse proxies to haproxy (external link) which load balances across our mongrels, on connection each. The timing data that haproxy delivers is wonderful, and it does the one connection thing properly.

This is all running on each server; a NetScaler (external link) load balances at the server level.

For detailed configs, scroll down.

Things That Didn't Work

The NetScaler

I wasn't much involved in this process, as the NetScaler (external link) is handled by our networking team, of which I am not part, but I do know that we (they) never managed to get the one-request-per-mongrel thing working.

At one point we put in settings that should have done that, but the result was visibly worse performance at the user level, which means something was horribly wrong.

What, we have no idea, because this method doesn't do the timing data or the static asset delivery. It just seemed worth mentioning that we couldn't even get the one part that a for-money LB should be able to do to work.

Apache mod_proxy

Seriously. No workie. I'm as surprised as you, if not more so. Neither me nor my very technically competent boss could make mod_proxy and/or mod_proxy_balancer do the one-connection-per-mongrel thing. I only spent a day on it (because I then found out my boss had also tried and failed), with Apache 2.2, but the documentation is very clear on how this should work, and it just wasn't. In particular:

  • It wouldn't pull mongrels out of the pool in a timely fashion (i.e. it completely ignored the times I gave it)
  • It wouldn't leave them out of the pool for a reasonable length of time (i.e. it completely ignored the times I gave it)

I didn't actually get as far as testing the one connection thing thoroughly, because I happened to have a machine lying around where the mongrels could be made to hang easily, so I tested that first. Faced with mongrels on ports 8000, 8001, and 8002, only the last of which was functioning (the others would answer socket requests then hang forever) and me telling it to pull bad servers in 5 seconds, Apache mod_proxy_balancer did this:

  1. Deliver the first request to port 8000
  2. Hang on all subsequent requests
  3. Time out something like 3 minutes later
  4. Send the next request to port 8000

I shit you not. I have no idea what's wrong with this thing, and I decided I didn't have time to find out.

nginx, no haproxy

The failure here is simple: it piles up requests in a per-backend-server queue. There's actually discussion in the nginx FAQ (external link) about this:

Many people have also requested that Nginx implement a feature in the load balancer to limit the number of requests per backend (usually to one). While support for this is planned, it should be pointed out that the desire for this feature is rooted in a misfeature of the application being proxied to (mostly Rails it seems) rather than a shortcoming in Nginx. Ideally this request should be refactored into a request that the backend be improved to better handle simultaneous requests.

Other than that, it was fine, which is why we're using it in front of haproxy (external link).

The Solution, Reprise

Here are our actual configs for your perusal.

Our Configs

Note that in both cases daemon mode is turned off, because we use DaemonTools (external link) to keep things running, and it needs things to not daemonize.

nginx

This is for nginx 0.5.35, which is kind of old as of this writing (Wed Jul 2 13:46:38 PDT 2008), but it's what I found a .deb for, so there you are.

daemon off;

user  www-data;
worker_processes  2;

error_log  /var/www/logs/nginx_error.log notice;

pid        /var/www/logs/nginx.pid;

events {
    worker_connections  1024;
    use epoll;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    # I like big logs and I can not lie.
    log_format timing '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" '
                      'upstream_addr $upstream_addr '
		      'upstream_response_time $upstream_response_time '
                        'request_time $request_time';

    access_log /var/www/logs/nginx_access.log  timing;

    # no sendfile on OSX
    sendfile on;

    #These are good default values.
    tcp_nopush        on;
    tcp_nodelay       off;
    # output compression saves bandwidth
    gzip            on;
    gzip_http_version 1.0;
    gzip_comp_level 2;
    gzip_proxied any;
    # If you get an erorr around here, try putting it all on one
    # line; I broke it up so the web page wouldn't be so wide.
    gzip_types      text/plain text/html
	text/css application/x-javascript
	text/xml application/xml
	application/xml+rss text/javascript;
    ## This doesn't work for our version; it's supposed to fix an
    ## MSIE6 bug WRT gzip.
    ## gzip_disable "MSIE [1-6]\.(?!.*SV1)";


    upstream backend_haproxy {
        server   localhost:8070;
    }

    root /var/www/public/;


    server {
        listen       8080;

        server_name www.some.com;
        port_in_redirect off;

        # Set the max size for file uploads to 50Mb
        client_max_body_size 50M;

        # needed to forward user's IP address to rails
        proxy_set_header  X-Real-IP  $remote_addr;
        proxy_set_header Host www.some.com;
        proxy_read_timeout 120;
        proxy_connect_timeout 5;
        proxy_buffering off;

        location / {
            if (!-f $request_filename) {
              proxy_pass  http://backend_haproxy;
              break;
            }
        }
    }
}

haproxy

This is for haproxy version 1.2.18; it won't work with 1.3, but as of this writing (Wed Jul 2 13:46:38 PDT 2008) the 1.3 docs are somewhat behind, so we didn't try it.

The one problem we've had with haproxy is that when all the mongrels are in use, it doesn't wait as long as I'd like for one to be free before throwing a CQ or cQ error (see the docs). These errors imply that it's actually the client that gave up, but I don't really buy it. Call it a hunch. Although now that I think about it, it could be the NetScaler timing out. Ideally, I'd like it to keep things in the queue forever until something is ready to deal with it. In practice, just adding enough mongrels to take the load cleared that issue right up.

To make the logging work (oh haproxy's sweet, sweet logging) I had to get syslog-ng set up to listen to local udp and redirect local0 to a file, but that was no biggie.

global
  #daemon
  maxconn       512      # total max connections (dependent on ulimit)
  nbproc        1         # number of processing cores
  log 127.0.0.1 local0


defaults
  mode              http

  clitimeout    120000  # maximum inactivity time on the client side
  srvtimeout    120000  # maximum inactivity time on the server side

  # conntimeout also controls how long it will wait if all servers are busy,
  # which is all-by-itself an unfortunate situation, but one we should deal
  # with as gracefully as possible
  contimeout    120000  # maximum time to wait for a connection
			# attempt to a server to succeed

  option            httpclose     # disable keepalive (HAProxy does
				    # not yet support the HTTP keep-alive mode)
  option            abortonclose  # enable early dropping of aborted
				    # requests from pending queue
  option            httpchk       # enable HTTP protocol to check on servers health
  option            forwardfor    # enable insert of X-Forwarded-For headers
  option        httplog


  balance roundrobin            # each server is used in turns,
				# according to assigned weight

listen rails_proxy *:8070
  log global
  maxconn 512

  # - equal weights on all servers
  # - maxconn will queue requests at HAProxy if limit is reached
  # - minconn dynamically scales the connection concurrency (bound
  # 	my maxconn) depending on size of HAProxy queue
  # - check health every N microseconds

  server port8000 127.0.0.1:8000 weight 1 minconn 1 maxconn 1 check inter 30000
  server port8001 127.0.0.1:8001 weight 1 minconn 1 maxconn 1 check inter 30000
  server port8002 127.0.0.1:8002 weight 1 minconn 1 maxconn 1 check inter 30000
  server port8003 127.0.0.1:8003 weight 1 minconn 1 maxconn 1 check inter 30000
  server port8004 127.0.0.1:8004 weight 1 minconn 1 maxconn 1 check inter 30000
  server port8005 127.0.0.1:8005 weight 1 minconn 1 maxconn 1 check inter 30000
  server port8006 127.0.0.1:8006 weight 1 minconn 1 maxconn 1 check inter 30000
  server port8007 127.0.0.1:8007 weight 1 minconn 1 maxconn 1 check inter 30000
  server port8008 127.0.0.1:8008 weight 1 minconn 1 maxconn 1 check inter 30000
  server port8009 127.0.0.1:8009 weight 1 minconn 1 maxconn 1 check inter 30000

Created by rlpowell. Last Modification: Friday 21 of November, 2008 11:32:40 PST by rlpowell.