Gaining performance boost with Rails, Ruby 1.9, PostgreSQL and concurrency sauce
Recently I was trying to resolve performance issue in a back-end Rails app. The challenge was to reduce response time of one computational heavy back-end request. Target app is using PostgreSQL 9 for the database and running on Ruby 1.9 MRI. Logic for that particular request naturally breaks down into two parts. The first part is search, which is a combination of different filters applied together in order to reduce available dataset. There are filters that run SQL queries and filters making HTTP requests to other services (3rd party and our own). The second part is all about computing additional data based on search results. The fact that search filters can be made independent of each other was a clear hint for adding concurrency to the search step.
But first, I was trying to squeeze out as much performance as I can from single-threaded code itself. Ruby garbage collector is getting quite bad rap these days, but I didn’t want to go to the dark side right from the beginning and mess with reducing allocations or controlling GC cycles manually. My first targets were SQL queries.
Search filters operate on relatively big dataset, and their results can contain nothing more than just collections of database record ids. There is no need to pull all the stuff or instantiate real models because until all filters are applied, we don’t know whether given record would appear in result. So SQL based filters tend to run rather fast queries that return integer arrays of significant size.
It’s no surprise that ActiveRecord spends fair amount of time on object instantiation. ORM plumbing comes at price. There is well known way to avoid paying that cost by calling connection adapter’s select_values directly. It appears however, that with PostgreSQL we can do better. pg gem has column_values function written in C which returns array of string values for any given column of query result. Here are some benchmarks that compare find with select_values and my column_values wrapper:
Test sample contains 20000 records Benchmarking find Time: 34.165s 0.6833s per run Benchmarking select_values Time: 3.686s 0.0737s per run Benchmarking to_int_array Time: 2.426s 0.0485s per run
I’ve made an attempt to put some envelope around column_values for accessing it from ActiveRecord in more convenient way. The column_queries gem gives ability to quickly pull id columns and is compatible with scopes and other ActiveRecord native facilities.
Ok, that helped, but still response time wasn’t perfect. No matter how hard I tried to optimize filters that run on app’s own database, there were always 2-3 filters making HTTP calls to other components, and at least one of those going to 3rd party service. So by just summing up time spent on those external calls, it easily approached 800ms, which was still far from the goal. The next logical step was to run filters in parallel, but GIL and poor MRI support for concurrency is a well known story. I was rather skeptical at that point, but still wanted to try.
Though one obvious way to get real concurrency is to switch to JRuby, I sticked to MRI. I may try JRuby on that project some day, however at that moment I was still looking at other options that wouldn’t bring significant infrastructure change to the production environment.
After column_values discovery I was looking into pg gem sources for asynchronous interfaces to be used with threads or fibers. And there were such, but one thing looked even more interesting: async_exec method. It’s interesting because despite its name it is a synchronous call. It blocks and returns only after it gets response from database. However internally, it uses asynchronous libpq interface and uses Ruby rb_thread_select call to release global mutex and wait for data on connection socket. So this method behaves just like regular exec, but releases GIL while blocked on I/O. Moreover, there is already relevant ActiveRecord code in place to work with that, it can be enabled by allow_concurrency: true entry in database.yml:
development:
adapter: postgresql
encoding: unicode
database: sandbox
username: postgres
allow_concurrency: true
pool: 10I’ve tested that discovery with this simple test:
puts "Starting" threads = [] bm = Benchmark.realtime do 10.times do threads << Thread.new { ActiveRecord::Base.connection.select_values('SELECT pg_sleep(2)') } end threads.each(&:join) end puts "Time: #{"%0.3f" % bm}s" # with :allow_concurrency => true # Starting # Time: 2.372s # # with :allow_concurrency => false # Starting # Time: 20.337s
That meant I could actually get acceptable level of concurrency in Rails and Ruby 1.9 with pg gem and PostgreSQL database. The only thing left was ActiveResource client or HTTP requests in general.
Can threads with SQL queries run concurrently with threads that make ActiveResource calls? It appears that ActiveResource uses Net::HTTP and native I/O which releases GIL with rb_thread_blocking_region, just like rb_thread_select. So the thing works right out of the box:
# dummy_server.rb: # require 'rubygems' require 'sinatra' get '/' do sleep 2 "Hello world!" end # test.rb: # puts "Starting" threads = [] bm = Benchmark.realtime do 10.times do threads << Thread.new { ActiveRecord::Base.connection.select_values('SELECT pg_sleep(2)') } end 10.times do threads << Thread.new { Net::HTTP.get('localhost', '/', 4567) } end threads.each(&:join) end puts "Time: #{"%0.3f" % bm}s" # Starting # Time: 2.350s
Note that ActiveRecord connection pool needs to to be tuned carefully: the number of concurrent connections to the database shouldn’t be less than the number of filter threads that run in parallel. Also, each thread should play nicely with pool, so it makes sense to package code in separate Scheduler module:
module Scheduler def self.run_group_and_reduce(*workers, &block) group = ThreadGroup.new workers.each do |w| group.add(Thread.new do begin w.call ensure ActiveRecord::Base.clear_active_connections! end end) end group.list.each(&:join) yield end end
The above scheduler method gets an array of callable worker procs and regular do...end block. It first runs all the supplied procs in separate threads, waits for them to complete and then runs its own block. For spawned threads scheduler also takes care of releasing connections to pool. Such semantics of barrier synchronized blocks with dedicated “reduce” step allowed me to experiment with threads, fibers and single-threaded sequential execution. Here’s how scheduler looked like during those experiments:
class Scheduler def self.run_group_and_reduce_threaded(*workers, &block) group = ThreadGroup.new workers.each do |w| group.add(Thread.new do begin w.call ensure ActiveRecord::Base.clear_active_connections! end end) end group.list.each(&:join) yield end def self.run_group_and_reduce_sequential(*workers, &block) workers.each do |w| w.call end yield end def self.run_group_and_reduce_fibered(*workers, &block) counter = 0 fibers = workers.collect do |w| Fiber.new do begin w.call counter -= 1 EventMachine.stop if counter == 0 ensure ActiveRecord::Base.clear_active_connections! end end end counter = fibers.size EventMachine.run do fibers.each(&:resume) end yield end class << self # Select desired strategy: # alias_method :run_group_and_reduce, :run_group_and_reduce_sequential # alias_method :run_group_and_reduce, :run_group_and_reduce_fibered alias_method :run_group_and_reduce, :run_group_and_reduce_threaded end end
As you may see, I’ve also played with EventMachine and fibers inspired by findings of Ilya Grigorik. That even ended up with a bug report and one line fix to Ruby core. I’ve turned back to threads though, mainly because of the discomfort with the amount of monkey patching in PostgreSQL ActiveRecord connection adapter and ActiveResource. EM Reactor pattern is great at I/O handling but under circumstances that are different from mine. Starting EM event loop and shutting it down just for search phase in serving each request seemed a bit unnatural to me. So by putting all my filters into separate threads I was able to come closer to the desired response time which was about 200ms.
Use of threads in Rails is an intriguing endeavor. Rails is declared thread safe, there is plenty of relevant code that takes care of synchronization, but still the threading practice should be approached with care. While threads aren’t that common in Rails applications, some configuration may be suboptimal for multi-threaded code. Some code can get broken just because regression testing for thread safety is a hard thing to achieve. And some deadlocks and race conditions can only reveal themselves under load or when scheduling becomes more “fair” as MRI progresses. Despite all the dangers, I’ve came across just several minor tweaks and tricks along the way of getting my code into production.
One problem that I’ve stepped into is that BufferedLogger default configuration can result in skipping log messages from the production log. The tricky part is that it is only production mode which is affected:
# railties/lib/rails/application/bootstrap.rb: # # Initialize the logger early in the stack in case we need to log some deprecation. initializer :initialize_logger do Rails.logger ||= config.logger || begin path = config.paths.log.to_a.first logger = ActiveSupport::BufferedLogger.new(path) logger.level = ActiveSupport::BufferedLogger.const_get(config.log_level.to_s.upcase) logger.auto_flushing = false if Rails.env.production? logger rescue StandardError => e logger = ActiveSupport::BufferedLogger.new(STDERR) logger.level = ActiveSupport::BufferedLogger::WARN logger.warn( "Rails Error: Unable to access log file. Please ensure that #{path} exists and is chmod 0666. " + "The log level has been raised to WARN and the output directed to STDERR until the problem is fixed." ) logger end end
As you may see, production mode disables buffer auto flushing by default. If threads produce less than MAX_BUFFER_SIZE = 1000 log messages, which is very likely given most logging is off in production, all their legacy remains hidden and that can result in memory leak. The cure is rather simple:
config.after_initialize { Rails.logger.auto_flushing = true }
Another interesting side note that may be useful is that when you use config.threadsafe! it shuts off ActiveSupport dependency loader. That makes total sense, as it has been test proven that all the magic present in that module can’t cope with multi-threaded environment. So booting app by requiring files in proper order is a good thing and the only option for such applications. The “proper order” part is achieved by ActiveSupport::Dependency still, so Rails does something like this:
Application.eager_load! ActiveSupport::Dependencies.unhook!
That is the reason of problems in rake tasks: with config.threadsafe! you have to manually require all the necessary files in right order. Eager loading doesn’t work there because rails dependency loader is already unhooked. One possible way to boot up app in rake task is this:
task :some_task => :environment do ActiveSupport::Dependencies.hook! MyApp::Application.eager_load! ActiveSupport::Dependencies.unhook! # ... end
Hope that helps!

Nice! Thanks sharing your experience and findings, Serg!
Nice article.
Though one thing puzzles me. The allow_concurrency entry is apparently deprecated from Rails 2.2 and up, and by my testings using Rails 2.3.2, the entry has no effect on the performance using your first thread example above. How come it could have such an impact on your setup ?
regards
Lars