Gaining performance boost with Rails, Ruby 1.9, PostgreSQL and concurrency sauce

by serge on July 17th, 2011

Recently I was trying to resolve performance issue in a back-end Rails app. The challenge was to reduce response time of one computational heavy back-end request. Target app is using PostgreSQL 9 for the database and running on Ruby 1.9 MRI. Logic for that particular request naturally breaks down into two parts. The first part is search, which is a combination of different filters applied together in order to reduce available dataset. There are filters that run SQL queries and filters making HTTP requests to other services (3rd party and our own). The second part is all about computing additional data based on search results. The fact that search filters can be made independent of each other was a clear hint for adding concurrency to the search step.

But first, I was trying to squeeze out as much performance as I can from single-threaded code itself. Ruby garbage collector is getting quite bad rap these days, but I didn’t want to go to the dark side right from the beginning and mess with reducing allocations or controlling GC cycles manually. My first targets were SQL queries.

Search filters operate on relatively big dataset, and their results can contain nothing more than just collections of database record ids. There is no need to pull all the stuff or instantiate real models because until all filters are applied, we don’t know whether given record would appear in result. So SQL based filters tend to run rather fast queries that return integer arrays of significant size.

It’s no surprise that ActiveRecord spends fair amount of time on object instantiation. ORM plumbing comes at price. There is well known way to avoid paying that cost by calling connection adapter’s select_values directly. It appears however, that with PostgreSQL we can do better. pg gem has column_values function written in C which returns array of string values for any given column of query result. Here are some benchmarks that compare find with select_values and my column_values wrapper:

  Test sample contains 20000 records
  Benchmarking find
  Time: 34.165s 0.6833s per run
  Benchmarking select_values
  Time: 3.686s 0.0737s per run
  Benchmarking to_int_array
  Time: 2.426s 0.0485s per run

I’ve made an attempt to put some envelope around column_values for accessing it from ActiveRecord in more convenient way. The column_queries gem gives ability to quickly pull id columns and is compatible with scopes and other ActiveRecord native facilities.

Ok, that helped, but still response time wasn’t perfect. No matter how hard I tried to optimize filters that run on app’s own database, there were always 2-3 filters making HTTP calls to other components, and at least one of those going to 3rd party service. So by just summing up time spent on those external calls, it easily approached 800ms, which was still far from the goal. The next logical step was to run filters in parallel, but GIL and poor MRI support for concurrency is a well known story. I was rather skeptical at that point, but still wanted to try.

Though one obvious way to get real concurrency is to switch to JRuby, I sticked to MRI. I may try JRuby on that project some day, however at that moment I was still looking at other options that wouldn’t bring significant infrastructure change to the production environment.

After column_values discovery I was looking into pg gem sources for asynchronous interfaces to be used with threads or fibers. And there were such, but one thing looked even more interesting: async_exec method. It’s interesting because despite its name it is a synchronous call. It blocks and returns only after it gets response from database. However internally, it uses asynchronous libpq interface and uses Ruby rb_thread_select call to release global mutex and wait for data on connection socket. So this method behaves just like regular exec, but releases GIL while blocked on I/O. Moreover, there is already relevant ActiveRecord code in place to work with that, it can be enabled by allow_concurrency: true entry in database.yml:

  development:
    adapter: postgresql
    encoding: unicode
    database: sandbox
    username: postgres
    allow_concurrency: true
    pool: 10

I’ve tested that discovery with this simple test:

  puts "Starting"
 
  threads = []
 
  bm = Benchmark.realtime do
    10.times do
      threads << Thread.new { ActiveRecord::Base.connection.select_values('SELECT pg_sleep(2)') }
    end
 
    threads.each(&:join)
  end
 
  puts "Time: #{"%0.3f" % bm}s"
 
  # with :allow_concurrency => true
  # Starting
  # Time: 2.372s
  #
  # with :allow_concurrency => false
  # Starting
  # Time: 20.337s

That meant I could actually get acceptable level of concurrency in Rails and Ruby 1.9 with pg gem and PostgreSQL database. The only thing left was ActiveResource client or HTTP requests in general.

Can threads with SQL queries run concurrently with threads that make ActiveResource calls? It appears that ActiveResource uses Net::HTTP and native I/O which releases GIL with rb_thread_blocking_region, just like rb_thread_select. So the thing works right out of the box:

  # dummy_server.rb:
  #
  require 'rubygems'
  require 'sinatra'
 
  get '/' do
    sleep 2
    "Hello world!"
  end
 
 
  # test.rb:
  #
  puts "Starting"
 
  threads = []
 
  bm = Benchmark.realtime do
    10.times do
      threads << Thread.new { ActiveRecord::Base.connection.select_values('SELECT pg_sleep(2)') }
    end
 
    10.times do
      threads << Thread.new { Net::HTTP.get('localhost', '/', 4567) }
    end
 
    threads.each(&:join)
  end
 
  puts "Time: #{"%0.3f" % bm}s"
 
  # Starting
  # Time: 2.350s

Note that ActiveRecord connection pool needs to to be tuned carefully: the number of concurrent connections to the database shouldn’t be less than the number of filter threads that run in parallel. Also, each thread should play nicely with pool, so it makes sense to package code in separate Scheduler module:

  module Scheduler
 
    def self.run_group_and_reduce(*workers, &block)
      group = ThreadGroup.new
      workers.each do |w|
        group.add(Thread.new do
          begin
            w.call
          ensure
            ActiveRecord::Base.clear_active_connections!
          end
        end)
      end
      group.list.each(&:join)
      yield
    end
 
  end

The above scheduler method gets an array of callable worker procs and regular do...end block. It first runs all the supplied procs in separate threads, waits for them to complete and then runs its own block. For spawned threads scheduler also takes care of releasing connections to pool. Such semantics of barrier synchronized blocks with dedicated “reduce” step allowed me to experiment with threads, fibers and single-threaded sequential execution. Here’s how scheduler looked like during those experiments:

  class Scheduler
 
    def self.run_group_and_reduce_threaded(*workers, &block)
      group = ThreadGroup.new
      workers.each do |w|
        group.add(Thread.new do
          begin
            w.call
          ensure
            ActiveRecord::Base.clear_active_connections!
          end
        end)
      end
      group.list.each(&:join)
      yield
    end
 
    def self.run_group_and_reduce_sequential(*workers, &block)
      workers.each do |w|
        w.call
      end
      yield
    end
 
    def self.run_group_and_reduce_fibered(*workers, &block)
      counter = 0
      fibers = workers.collect do |w|
        Fiber.new do
          begin
            w.call
            counter -= 1
            EventMachine.stop if counter == 0
          ensure
            ActiveRecord::Base.clear_active_connections!
          end
        end
      end
 
      counter = fibers.size
      EventMachine.run do
        fibers.each(&:resume)
      end
 
      yield
    end
 
    class << self
      # Select desired strategy:
 
      # alias_method :run_group_and_reduce, :run_group_and_reduce_sequential
      # alias_method :run_group_and_reduce, :run_group_and_reduce_fibered
      alias_method :run_group_and_reduce, :run_group_and_reduce_threaded
    end
 
  end

As you may see, I’ve also played with EventMachine and fibers inspired by findings of Ilya Grigorik. That even ended up with a bug report and one line fix to Ruby core. I’ve turned back to threads though, mainly because of the discomfort with the amount of monkey patching in PostgreSQL ActiveRecord connection adapter and ActiveResource. EM Reactor pattern is great at I/O handling but under circumstances that are different from mine. Starting EM event loop and shutting it down just for search phase in serving each request seemed a bit unnatural to me. So by putting all my filters into separate threads I was able to come closer to the desired response time which was about 200ms.

Use of threads in Rails is an intriguing endeavor. Rails is declared thread safe, there is plenty of relevant code that takes care of synchronization, but still the threading practice should be approached with care. While threads aren’t that common in Rails applications, some configuration may be suboptimal for multi-threaded code. Some code can get broken just because regression testing for thread safety is a hard thing to achieve. And some deadlocks and race conditions can only reveal themselves under load or when scheduling becomes more “fair” as MRI progresses. Despite all the dangers, I’ve came across just several minor tweaks and tricks along the way of getting my code into production.

One problem that I’ve stepped into is that BufferedLogger default configuration can result in skipping log messages from the production log. The tricky part is that it is only production mode which is affected:

  # railties/lib/rails/application/bootstrap.rb:
  #
  # Initialize the logger early in the stack in case we need to log some deprecation.
  initializer :initialize_logger do
    Rails.logger ||= config.logger || begin
      path = config.paths.log.to_a.first
      logger = ActiveSupport::BufferedLogger.new(path)
      logger.level = ActiveSupport::BufferedLogger.const_get(config.log_level.to_s.upcase)
      logger.auto_flushing = false if Rails.env.production?
      logger
    rescue StandardError => e
      logger = ActiveSupport::BufferedLogger.new(STDERR)
      logger.level = ActiveSupport::BufferedLogger::WARN
      logger.warn(
        "Rails Error: Unable to access log file. Please ensure that #{path} exists and is chmod 0666. " +
        "The log level has been raised to WARN and the output directed to STDERR until the problem is fixed."
      )
      logger
    end
  end

As you may see, production mode disables buffer auto flushing by default. If threads produce less than MAX_BUFFER_SIZE = 1000 log messages, which is very likely given most logging is off in production, all their legacy remains hidden and that can result in memory leak. The cure is rather simple:

  config.after_initialize { Rails.logger.auto_flushing = true }

Another interesting side note that may be useful is that when you use config.threadsafe! it shuts off ActiveSupport dependency loader. That makes total sense, as it has been test proven that all the magic present in that module can’t cope with multi-threaded environment. So booting app by requiring files in proper order is a good thing and the only option for such applications. The “proper order” part is achieved by ActiveSupport::Dependency still, so Rails does something like this:

  Application.eager_load!
  ActiveSupport::Dependencies.unhook!

That is the reason of problems in rake tasks: with config.threadsafe! you have to manually require all the necessary files in right order. Eager loading doesn’t work there because rails dependency loader is already unhooked. One possible way to boot up app in rake task is this:

  task :some_task => :environment do
    ActiveSupport::Dependencies.hook!
    MyApp::Application.eager_load!
    ActiveSupport::Dependencies.unhook!
 
    # ...
  end

Hope that helps!

2 Comments
  1. Alex permalink

    Nice! Thanks sharing your experience and findings, Serg!

  2. Lars permalink

    Nice article.
    Though one thing puzzles me. The allow_concurrency entry is apparently deprecated from Rails 2.2 and up, and by my testings using Rails 2.3.2, the entry has no effect on the performance using your first thread example above. How come it could have such an impact on your setup ?

    regards
    Lars