@tmm1

computer talk by a systems hacker and ruby-core developer

© 2014. All rights reserved.

Ruby 2.1: Profiling Ruby

Ruby 2.1 is shipping with rb_profile_frames(), a new C-api for fetching ruby backtraces. The api performs no allocations and adds minimal cpu overhead making it ideal for profiling, even in production environments.

I've implemented a sampling callstack profiler for 2.1 called stackprof with this API, using techniques popularized by gperftools and @brendangregg. It works remarkably well, and provides incredible insight into the execution of your code.

For example, I recently used StackProf::Middlware on one of our production github.com unicorn workers. The resulting profile is analyzed using bin/stackprof:

$ stackprof data/stackprof-cpu-4120-1384979644.dump --text --limit 4
==================================
  Mode: cpu(1000)
  Samples: 9145 (1.25% miss rate)
  GC: 448 (4.90%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
       236   (2.6%)         231   (2.5%)     String#blank?
       546   (6.0%)         216   (2.4%)     ActiveRecord::ConnectionAdapters::Mysql2Adapter#select
       212   (2.3%)         199   (2.2%)     Mysql2::Client#query_with_timing
       190   (2.1%)         155   (1.7%)     ERB::Util#html_escape

Right away, we see that 2.6% of cpu time in the app is spent in String#blank?.

Let's zoom in for a closer look:

$ stackprof data/stackprof-cpu-4120-1384979644.dump --method 'String#blank?'
String#blank? (lib/active_support/core_ext/object/blank.rb:80)
  samples:   231 self (2.5%)  /    236 total (2.6%)
  callers:
     112  (   47.5%)  Object#present?
  code:
                                  |    80  |   def blank?
  187    (2.0%) /   187   (2.0%)  |    81  |     self !~ /[^[:space:]]/
                                  |    82  |   end

We see that half the calls into blank? are coming from Object#present?.

As expected, most of the time in the method is spent in the regex on line 81. I noticed the line could be optimized slightly to remove the double negative:

   def blank?
-    self !~ /[^[:space:]]/
+    self =~ /\A[[:space:]]*\z/
   end

That helped, but I was curious where these calls were coming from. Let's follow the stack up and look at the Object#present? callsite:

$ stackprof data/stackprof-cpu-4120-1384979644.dump --method 'Object#present?'
Object#present? (lib/active_support/core_ext/object/blank.rb:20)
  samples:    12 self (0.1%)  /    133 total (1.5%)
  callers:
      55  (   41.4%)  RepositoryControllerMethods#owner
      31  (   23.3%)  RepositoryControllerMethods#current_repository
  callees (121 total):
     112  (   92.6%)  String#blank?
       6  (    5.0%)  Object#blank?
       3  (    2.5%)  NilClass#blank?
  code:
                                  |    20  |   def present?
  133    (1.5%) /    12   (0.1%)  |    21  |     !blank?
                                  |    22  |   end

So Object#present? is almost always called on String objects (92.6%), with the majority of these calls coming from two helper methods in RepositoryControllerMethods.

The callers in RepositoryControllerMethods appeared quite simple, but after a few minutes of staring I discovered the fatal mistake causing repeated calls to present?:

    def owner
-    @owner ||= User.find_by_login(params[:user_id]) if params[:user_id].present?
+    @owner ||= (User.find_by_login(params[:user_id]) if params[:user_id].present?)
    end

This simple 2 byte change removed 40% of calls to String#blank? in our app. To further minimize the cost of String#blank? in our app, we switched to @samsaffron's pure C implementation in the fast_blank gem.

The end result of all these optimizations was a dramatic reduction in cpu time (i.e. without idle time) spent processing requests:

With 2.1 and stackprof, it's easier than ever to make your ruby code fast. Try it today!