@tmm12022-11-16T23:12:48+00:00http://tmm1.netAman Guptaaman@tmm1.netRuby 2.1: Out-of-Band GC2014-01-29T00:00:00+00:00http://tmm1.net/ruby21-oobgc<p>Ruby 2.1’s <a href="http://tmm1.net/ruby21-rgengc/">GC is better than ever</a>, but ruby still uses a stop-the-world GC implementation. This means collections triggered during request processing will add latency to your response time. One way to mitigate this is by running GC in-between requests, i.e. “Out-of-Band”.</p>
<p>OOBGC is a popular technique, first introduced by <a href="http://unicorn.bogomips.org/Unicorn/OobGC.html">Unicorn</a> and later integrated into <a href="http://blog.phusion.nl/2013/01/22/phusion-passenger-4-technology-preview-out-of-band-work/">Passenger</a>. Traditionally, these out-of-band collectors force a GC every N requests. While this works well, it requires careful tuning and can add CPU pressure if unnecessary collections occur too often.</p>
<p>In <a href="https://blog.twitter.com/2011/faster-ruby-kiji-update">kiji (twitter’s REE fork)</a>, @evanweaver introduced <a href="https://github.com/twitter-forks/rubyenterpriseedition187-248/commit/951ca6a73e#commitcomment-476298"><code class="language-plaintext highlighter-rouge">GC.preemptive_start</code></a> as an alternative to the <tt>“every N requests”</tt> model. This new method could make more intelligent decisions about OOBGC based on the size of the heap and the number of free slots. We’ve long used a similar trick in our 1.9.3 fork to optimize OOBGC on github.com.</p>
<p>When we upgraded to a <a href="https://gist.github.com/tmm1/8393897">patched 2.1.0 in production</a> earlier this month, I translated these techniques into a <a href="https://github.com/tmm1/gctools">new OOBGC for RGenGC</a>. Powered by 2.1’s new tracepoint GC hooks, it understands both <tt>lazy vs immediate sweeping</tt> and <tt>major vs minor GC</tt> in order to make the best descision about when a collection is required.</p>
<p>Using the new OOBGC is simple:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'gctools/oobgc'</span>
<span class="no">GC</span><span class="o">::</span><span class="no">OOB</span><span class="p">.</span><span class="nf">run</span><span class="p">()</span> <span class="c1"># run this after every request body is flushed</span>
</code></pre></div></div>
<p>or if you’re using Unicorn:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># in config.ru</span>
<span class="nb">require</span> <span class="s1">'gctools/oobgc'</span>
<span class="k">if</span> <span class="k">defined?</span><span class="p">(</span><span class="no">Unicorn</span><span class="o">::</span><span class="no">HttpRequest</span><span class="p">)</span>
<span class="n">use</span> <span class="no">GC</span><span class="o">::</span><span class="no">OOB</span><span class="o">::</span><span class="no">UnicornMiddleware</span>
<span class="k">end</span>
</code></pre></div></div>
<h3 id="oobgc-results">OOBGC results</h3>
<p><img src="http://cl.ly/image/3q1L2h3w1A1s/graph.png" alt="" /></p>
<p>With ruby 2.1, our average OOBGC pause time (<font color="blue">`oobgc.mean`</font>) went from 125ms to 50ms thanks to <a href="http://tmm1.net/ruby21-rgengc/">RGenGC</a>. The number of out-of-band collections (<font color="firebrick">`oobgc.count`</font>) also went down, since the new OOBGC only runs when necessary.</p>
<p><img src="http://cl.ly/image/3c0N1I0p0n2W/graph.png" alt="" /></p>
<p>The overall result is much less CPU time (<font color="purple">`oobgc.sum`</font>) spent doing GC work between requests.</p>
<h3 id="gc-during-requests">GC during requests</h3>
<p><img src="http://cl.ly/image/0G2P2l0N1z0J/graph.png" alt="" /></p>
<p>After our 2.1 upgrade, we’re performing GC during requests 2-3x more often than before (<font color="firebrick">`gc.time.count`</font>). However since all major collections can happen preemptively, only minor GCs happen during requests making the average GC pause only 25ms (<font color="blue">`gc.time.mean`</font>).</p>
<p><img src="http://cl.ly/image/0g463K3q0O05/graph.png" alt="" /></p>
<p>The overall result is reduced in-request GC overhead (<font color="purple">`gc.time.sum`</font>), even though GC happens more often.</p>
<p><strong>Note:</strong> Even with the best OOBGC, collections during requests are inevitable (especially on large requests with lots of allocations). The GC’s job during these requests is to control memory growth, so I <em>do not recommend</em> disabling ruby’s GC during requests.</p>
Ruby 2.1: RGenGC2013-12-30T00:00:00+00:00http://tmm1.net/ruby21-rgengc<p>Ruby 2.1 adds a “restricted” generational collector, with minor mark phases that dramatically reduce the cost of GC.</p>
<p>Let’s take a look at the <a href="/ruby21-rgengc/rgengc.png">evolution of Ruby’s GC</a>.</p>
<h2 id="ruby-18-simple-mark-and-sweep">Ruby 1.8: simple mark and sweep</h2>
<center>![](/ruby21-rgengc/ruby18.png)</center>
<p>Classic mark and sweep implementation. The entire world is stopped during both phases.</p>
<ol>
<li>Traverse object graph from roots and mark live objects,
using a <span style="color:blue">bit</span> inside the object structure (FL_MARK).</li>
<li>Iterate over all heap slots and add unmarked slots to the freelist.</li>
</ol>
<h2 id="ruby-193-lazy-sweep">Ruby 1.9.3: lazy sweep</h2>
<center>![](/ruby21-rgengc/ruby19.png)</center>
<p><a href="http://www.narihiro.info/index.en.html">@nari3</a> adds <a href="http://bugs.ruby-lang.org/issues/show/3203">LazySweepGC</a>, reducing GC pauses to just the mark phase. The heap is swept incrementally as object slots are required.</p>
<h2 id="ruby-20-bitmaps-for-cow-safety">Ruby 2.0: bitmaps for COW-safety</h2>
<center>![](/ruby21-rgengc/ruby20.png)</center>
<p><a href="http://www.narihiro.info/index.en.html">@nari3</a> adds <a href="http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/41916">Bitmap Marking GC</a>, helping Unix systems <a href="http://patshaughnessy.net/2012/3/23/why-you-should-be-excited-about-garbage-collection-in-ruby-2-0">share memory across child processes</a>. The mark phase is also <a href="https://bugs.ruby-lang.org/issues/7095">rewritten to be non-recursive</a>.</p>
<p>Although the memory savings from bitmaps were only modest, the patch freed up a bit (<code class="language-plaintext highlighter-rouge">FL_MARK</code> became <code class="language-plaintext highlighter-rouge">FL_WB_PROTECTED</code> later) and laid the groundwork for a generational collector.</p>
<h2 id="ruby-21-oldgen-and-minor-marking">Ruby 2.1: oldgen and minor marking</h2>
<center>![](/ruby21-rgengc/ruby21.png)</center>
<p><a href="http://www.atdot.net/~ko1/">@ko1</a> designs <a href="https://bugs.ruby-lang.org/issues/8339">RGenGC</a>, a generational collector that can be implemented incrementally and supports C-extensions.</p>
<p>Objects on the heap are divided into two categories:</p>
<ul>
<li>protected by a write-barrier (<code class="language-plaintext highlighter-rouge">FL_WB_PROTECTED</code>)</li>
<li>unprotected (or “shady”)
<ul>
<li>missing write-barrier (<code class="language-plaintext highlighter-rouge">Proc</code>, <code class="language-plaintext highlighter-rouge">Ruby::Env</code>)</li>
<li>unsafe access from C-extension (<code class="language-plaintext highlighter-rouge">RARRAY_PTR</code>, <code class="language-plaintext highlighter-rouge">RSTRUCT_PTR</code>)</li>
</ul>
</li>
</ul>
<p>Only protected objects can be promoted to oldgen.<br />
(This is the “restricted” in RGenGC.)</p>
<p>Unprotected objects cannot be promoted, but if referenced from oldgen they are added to a remembered set. Minor marks are much faster because they only have to traverse references from the remembered set.</p>
<h3 id="heap-layout">Heap Layout</h3>
<p>Ruby objects live on the ruby heap, which is split up into pages. Each page is 16KB and holds ~408 object slots.</p>
<p>Every RVALUE slot on this page occupies 40 bytes. Strings <a href="https://bugs.ruby-lang.org/issues/7095">shorter than 23 bytes</a> and Arrays <a href="http://patshaughnessy.net/2012/1/4/never-create-ruby-strings-longer-than-23-characters">with less than four elements</a> can be embedded directly within this 40 byte slot.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">GC</span><span class="o">::</span><span class="no">INTERNAL_CONSTANTS</span><span class="p">[</span><span class="ss">:RVALUE_SIZE</span><span class="p">]</span> <span class="c1">#=> 40 bytes per object slot (on x84_64)</span>
<span class="no">GC</span><span class="o">::</span><span class="no">INTERNAL_CONSTANTS</span><span class="p">[</span><span class="ss">:HEAP_OBJ_LIMIT</span><span class="p">]</span> <span class="c1">#=> 408 slots per heap page</span>
</code></pre></div></div>
<p>Pages in the heap reside in one of two places: eden or tomb. Eden contains pages that have one or more live objects. The tomb contains empty pages with no objects. The sum of these pages represent the capacity of the ruby heap:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">GC</span><span class="p">.</span><span class="nf">stat</span><span class="p">(</span><span class="ss">:heap_used</span><span class="p">)</span> <span class="o">==</span>
<span class="no">GC</span><span class="p">.</span><span class="nf">stat</span><span class="p">(</span><span class="ss">:heap_eden_page_length</span><span class="p">)</span> <span class="o">+</span> <span class="no">GC</span><span class="p">.</span><span class="nf">stat</span><span class="p">(</span><span class="ss">:heap_tomb_page_length</span><span class="p">)</span>
</code></pre></div></div>
<p>During lazy sweep eden pages are swept one at a time. Each page provides up to 408 object slots for re-use. By the time the lazy sweep completes, all unmarked slots in eden have been replaced by new objects.</p>
<p>Empty pages found during sweep are moved to the tomb. This reduces fragmentation in eden (by filling in sparse pages first), and allows the tomb to be grown or shrunk before it is used.</p>
<p>Once eden runs out of slots, the empty pages in tomb are moved back to eden to add space. This happens incrementally, one page at a time. When the tomb runs out of pages, a mark is triggered and the cycle begins anew.</p>
<h3 id="major-vs-minor-gc">Major vs Minor GC</h3>
<p>As an example, let’s look at github.com’s large rails app.</p>
<p>First, we’ll measure how many long-lived objects are created after the app is booted:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># preload controllers/models and other code</span>
<span class="no">GitHub</span><span class="p">.</span><span class="nf">preload_all</span>
<span class="c1"># measure heap stats after a major mark and full sweep</span>
<span class="no">GC</span><span class="p">.</span><span class="nf">start</span> <span class="c1"># same as GC.start(full_mark: true, immediate_sweep: true)</span>
<span class="c1"># three ways to measure live slots</span>
<span class="n">count</span> <span class="o">=</span> <span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">count_objects</span>
<span class="n">count</span><span class="p">[</span><span class="ss">:TOTAL</span><span class="p">]</span> <span class="o">-</span> <span class="n">count</span><span class="p">[</span><span class="ss">:FREE</span><span class="p">]</span> <span class="c1">#=> 565121</span>
<span class="no">GC</span><span class="p">.</span><span class="nf">stat</span><span class="p">(</span><span class="ss">:heap_live_slot</span><span class="p">)</span> <span class="c1">#=> 565121</span>
<span class="no">GC</span><span class="p">.</span><span class="nf">stat</span><span class="p">(</span><span class="ss">:total_allocated_object</span><span class="p">)</span> <span class="o">-</span>
<span class="no">GC</span><span class="p">.</span><span class="nf">stat</span><span class="p">(</span><span class="ss">:total_freed_object</span><span class="p">)</span> <span class="c1">#=> 565121</span>
</code></pre></div></div>
<p>Of these ~565k long-lived bootup objects, ~95% are promoted to oldgen:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">s</span> <span class="o">=</span> <span class="no">GC</span><span class="p">.</span><span class="nf">stat</span>
<span class="mf">100.0</span> <span class="o">*</span> <span class="n">s</span><span class="p">[</span><span class="ss">:old_object</span><span class="p">]</span> <span class="o">/</span> <span class="n">s</span><span class="p">[</span><span class="ss">:heap_live_slot</span><span class="p">]</span> <span class="c1">#=> 94.90</span>
<span class="mf">100.0</span> <span class="o">*</span> <span class="n">s</span><span class="p">[</span><span class="ss">:remembered_shady_object</span><span class="p">]</span> <span class="o">/</span> <span class="n">s</span><span class="p">[</span><span class="ss">:heap_live_slot</span><span class="p">]</span> <span class="c1">#=> 1.88</span>
</code></pre></div></div>
<p>This means only ~5% of the heap needs to be traversed on minor marks, via references from the ~2% of objects that are remembered.</p>
<p>As expected, this makes minor GC pauses much <em>much</em> shorter: only 7ms in our app compared to 58ms for a major mark.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">time</span><span class="p">{</span> <span class="no">GC</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="ss">full_mark: </span><span class="kp">true</span><span class="p">,</span> <span class="ss">immediate_sweep: </span><span class="kp">false</span><span class="p">)</span> <span class="p">}</span> <span class="c1">#=> 0.058</span>
<span class="n">time</span><span class="p">{</span> <span class="no">GC</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="ss">full_mark: </span><span class="kp">false</span><span class="p">,</span> <span class="ss">immediate_sweep: </span><span class="kp">false</span><span class="p">)</span> <span class="p">}</span> <span class="c1">#=> 0.007</span>
</code></pre></div></div>
<p>The majority of mark phases during code execution will use a minor mark and finish very quickly. However, over time the size of the remembered set and oldgen can grow. If either of these double in size, a major mark is used to reset them.</p>
<p>The limits used to trigger a major mark can be monitored via GC.stat:</p>
<div class="language-irb highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">>> GC.stat.values_at(:remembered_shady_object, :old_object)
</span><span class="p">=></span> <span class="p">[</span><span class="mi">10647</span><span class="p">,</span> <span class="mi">536785</span><span class="p">]</span>
<span class="o">>></span> <span class="no">GC</span><span class="p">.</span><span class="nf">stat</span><span class="p">.</span><span class="nf">values_at</span><span class="p">(</span><span class="ss">:remembered_shady_object_limit</span><span class="p">,</span> <span class="ss">:old_object_limit</span><span class="p">)</span>
<span class="o">=></span> <span class="p">[</span><span class="mi">21284</span><span class="p">,</span> <span class="mi">1073030</span><span class="p">]</span>
</code></pre></div></div>
<p>The frequency of major vs minor marks can also be monitored. For example, you might graph <tt>“major GCs per request”</tt>, <tt>“minor GCs per request”</tt> or <tt>“minor GCs per major GC”</tt>.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">GC</span><span class="p">.</span><span class="nf">stat</span><span class="p">(</span><span class="ss">:count</span><span class="p">)</span> <span class="o">==</span> <span class="no">GC</span><span class="p">.</span><span class="nf">stat</span><span class="p">(</span><span class="ss">:minor_gc_count</span><span class="p">)</span> <span class="o">+</span> <span class="no">GC</span><span class="p">.</span><span class="nf">stat</span><span class="p">(</span><span class="ss">:major_gc_count</span><span class="p">)</span>
</code></pre></div></div>
<h3 id="tuning-variables">Tuning Variables</h3>
<p>In our app above, we use the following GC settings:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">RUBY_GC_HEAP_INIT_SLOTS</span><span class="o">=</span>600000
<span class="nb">export </span><span class="nv">RUBY_GC_HEAP_FREE_SLOTS</span><span class="o">=</span>600000
<span class="nb">export </span><span class="nv">RUBY_GC_HEAP_GROWTH_FACTOR</span><span class="o">=</span>1.25
<span class="nb">export </span><span class="nv">RUBY_GC_HEAP_GROWTH_MAX_SLOTS</span><span class="o">=</span>300000
</code></pre></div></div>
<ul>
<li>
<p><b><code class="language-plaintext highlighter-rouge">RUBY_GC_HEAP_INIT_SLOTS:</code> initial number of slots on the heap <tt>(default: 10000)</tt></b><br />
Our app boots with ~600k long-lived objects, so we use 600k here to reduce GC activity during boot.</p>
</li>
<li>
<p><b><code class="language-plaintext highlighter-rouge">RUBY_GC_HEAP_FREE_SLOTS:</code> minimum free slots reserved for sweep re-use <tt>(4096)</tt></b><br />
Our servers have extra RAM, so we bump this up to trade off memory for time between GCs.
An average request allocates 75k objs, so 600k free slots gives us ~8 requests in between each mark phase.</p>
</li>
<li>
<p><b><code class="language-plaintext highlighter-rouge">RUBY_GC_HEAP_GROWTH_FACTOR:</code> factor used to grow the heap <tt>(1.8x)</tt></b><br />
Since our heap is already quite big with the settings above, we reduce the growth factor (1.25x) to add slots in smaller increments.</p>
</li>
<li>
<p><b><code class="language-plaintext highlighter-rouge">RUBY_GC_HEAP_GROWTH_MAX_SLOTS:</code> maximum new slots to add <tt>(no limit)</tt></b><br />
In addition to reducing the growth factor, we cap it so a maximum of 300k objects can be added to the heap at once.</p>
</li>
</ul>
<h3 id="malloc-limits">malloc() Limits</h3>
<p>So Ruby objects occupy 40 bytes each, inside pages on the eden heap.</p>
<p>When an object needs even more space, it allocates memory on the regular process heap (via a <code class="language-plaintext highlighter-rouge">ruby_xmalloc()</code> wrapper). For instance when a string outgrows 23 bytes, it allocates a separate, larger buffer for itself. The additional memory used by this string (or any object) can be measured with <a href="http://tmm1.net/ruby21-objspace/">ObjectSpace.memsize_of(o) in objspace.so</a>.</p>
<p>Internally the VM keeps track of <code class="language-plaintext highlighter-rouge">malloc_increase</code>, the number of bytes that have been <a href="https://bugs.ruby-lang.org/issues/8985">allocated but not yet freed</a>. This is effectively the memory growth of the process. When more than 16MB is added, a GC is forced even if free slots are still available. The limit starts at 16MB, but adapts to the memory usage patterns in your code.</p>
<p>The initial/max values and dynamic growth factor can also be controlled via environment variables:</p>
<ul>
<li><b><code class="language-plaintext highlighter-rouge">RUBY_GC_MALLOC_LIMIT:</code> <tt>(default: 16MB)</tt></b></li>
<li><b><code class="language-plaintext highlighter-rouge">RUBY_GC_MALLOC_LIMIT_MAX:</code> <tt>(default: 32MB)</tt></b></li>
<li><b><code class="language-plaintext highlighter-rouge">RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR:</code> <tt>(default: 1.4x)</tt></b></li>
</ul>
<p>Similarly, the memory growth associated with oldgen is tracked separately in <code class="language-plaintext highlighter-rouge">oldmalloc_increase</code>. When this limit is tripped, a <em>major</em> GC is forced. These limits can be tuned as well:</p>
<ul>
<li><b><code class="language-plaintext highlighter-rouge">RUBY_GC_OLDMALLOC_LIMIT:</code> <tt>(default: 16MB)</tt></b></li>
<li><b><code class="language-plaintext highlighter-rouge">RUBY_GC_OLDMALLOC_LIMIT_MAX:</code> <tt>(default: 128MB)</tt></b></li>
<li><b><code class="language-plaintext highlighter-rouge">RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR:</code> <tt>(default: 1.2x)</tt></b></li>
</ul>
<p>Both malloc increase and limit values can be monitored via GC.stat:</p>
<div class="language-irb highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">>> GC.stat.values_at(:malloc_increase, :malloc_limit)
</span><span class="p">=></span> <span class="p">[</span><span class="mi">14224</span><span class="p">,</span> <span class="mi">64000000</span><span class="p">]</span>
<span class="o">>></span> <span class="no">GC</span><span class="p">.</span><span class="nf">stat</span><span class="p">.</span><span class="nf">values_at</span><span class="p">(</span><span class="ss">:oldmalloc_increase</span><span class="p">,</span> <span class="ss">:oldmalloc_limit</span><span class="p">)</span>
<span class="o">=></span> <span class="p">[</span><span class="mi">20464</span><span class="p">,</span> <span class="mi">64000000</span><span class="p">]</span>
</code></pre></div></div>
<p>In our app we’ve increased the initial limit to 64MB, to reduce GC activity during boot and when memory usage peaks.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">RUBY_GC_MALLOC_LIMIT</span><span class="o">=</span>64000000
<span class="nb">export </span><span class="nv">RUBY_GC_OLDMALLOC_LIMIT</span><span class="o">=</span>64000000
</code></pre></div></div>
<h3 id="gc-events">GC Events</h3>
<p>And finally, ruby 2.1 ships with new tracepoints that can be used to monitor the GC at runtime.
These are available from C, via <code class="language-plaintext highlighter-rouge">rb_tracepoint_new()</code>:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">RUBY_INTERNAL_EVENT_GC_START</code></li>
<li><code class="language-plaintext highlighter-rouge">RUBY_INTERNAL_EVENT_GC_END_MARK</code></li>
<li><code class="language-plaintext highlighter-rouge">RUBY_INTERNAL_EVENT_GC_END_SWEEP</code></li>
</ul>
<p>C-extensions using these events can also take advantage of <code class="language-plaintext highlighter-rouge">rb_gc_stat()</code> and <code class="language-plaintext highlighter-rouge">rb_gc_latest_gc_info()</code>, which provide safe access to <code class="language-plaintext highlighter-rouge">GC.stat</code> and <code class="language-plaintext highlighter-rouge">GC.latest_gc_info</code>.</p>
<h2 id="ruby-22-and-beyond">Ruby 2.2 and beyond</h2>
<p>With the introduction of RGenGC, Ruby 2.1 includes a significant upgrade to ruby’s GC. Seven millisecond minor marks and a 95% oldgen promotion rate are remarkable achievements, especially considering <em>not one</em> of our C-extensions had to be modified. Hats off to <a href="http://www.atdot.net/~ko1/">@ko1</a>!</p>
<p>Ruby 2.2 will expand the GC algorithm from two generations to three. (In fact, 2.1 already includes a <code class="language-plaintext highlighter-rouge">RGENGC_THREEGEN</code> compile flag to enable the third generation). @ko1 also plans to implement an <a href="http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Stop-the-world_vs._incremental_vs._concurrent">incremental mark phase</a>, which would remove the need for long major GC pauses.</p>
Ruby 2.1: objspace.so2013-12-26T00:00:00+00:00http://tmm1.net/ruby21-objspace<p>ObjectSpace in ruby contains many useful heap debugging utilities.</p>
<p>Since 1.9 ruby has included <code class="language-plaintext highlighter-rouge">objspace.so</code> which adds even more methods to the ObjectSpace module:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">each_object</span><span class="p">{</span> <span class="o">|</span><span class="n">o</span><span class="o">|</span> <span class="o">...</span> <span class="p">}</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">count_objects</span> <span class="c1">#=> {:TOTAL=>55298, :FREE=>10289, :T_OBJECT=>3371, ...}</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">each_object</span><span class="p">.</span><span class="nf">inject</span><span class="p">(</span><span class="no">Hash</span><span class="p">.</span><span class="nf">new</span> <span class="mi">0</span><span class="p">){</span> <span class="o">|</span><span class="n">h</span><span class="p">,</span><span class="n">o</span><span class="o">|</span> <span class="n">h</span><span class="p">[</span><span class="n">o</span><span class="p">.</span><span class="nf">class</span><span class="p">]</span><span class="o">+=</span><span class="mi">1</span><span class="p">;</span> <span class="n">h</span> <span class="p">}</span> <span class="c1">#=> {Class=>416, ...}</span>
<span class="nb">require</span> <span class="s1">'objspace'</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">memsize_of</span><span class="p">(</span><span class="n">o</span><span class="p">)</span> <span class="c1">#=> 0 /* additional bytes allocated by object */</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">count_tdata_objects</span> <span class="c1">#=> {Encoding=>100, Time=>87, RubyVM::Env=>17, ...}</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">count_nodes</span> <span class="c1">#=> {:NODE_SCOPE=>2, :NODE_BLOCK=>688, :NODE_IF=>9, ...}</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">reachable_objects_from</span><span class="p">(</span><span class="n">o</span><span class="p">)</span> <span class="c1">#=> [referenced, objects, ...]</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">reachable_objects_from_root</span> <span class="c1">#=> {"symbols"=>..., "global_tbl"=>...} /* in 2.1 */</span>
</code></pre></div></div>
<p>In 2.1, we’ve added a two big new features: an allocation tracer and a heap dumper.</p>
<h3 id="allocation-tracing">Allocation Tracing</h3>
<p>Tracking down memory growth and object reference leaks is tricky when you don’t know where the objects are coming from.</p>
<p>With 2.1, you can enable allocation tracing to collect metadata about every new object:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'objspace'</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">trace_object_allocations_start</span>
<span class="k">class</span> <span class="nc">MyApp</span>
<span class="k">def</span> <span class="nf">perform</span>
<span class="s2">"foobar"</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="n">o</span> <span class="o">=</span> <span class="no">MyApp</span><span class="p">.</span><span class="nf">new</span><span class="p">.</span><span class="nf">perform</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">allocation_sourcefile</span><span class="p">(</span><span class="n">o</span><span class="p">)</span> <span class="c1">#=> "example.rb"</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">allocation_sourceline</span><span class="p">(</span><span class="n">o</span><span class="p">)</span> <span class="c1">#=> 6</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">allocation_generation</span><span class="p">(</span><span class="n">o</span><span class="p">)</span> <span class="c1">#=> 1</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">allocation_class_path</span><span class="p">(</span><span class="n">o</span><span class="p">)</span> <span class="c1">#=> "MyApp"</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">allocation_method_id</span><span class="p">(</span><span class="n">o</span><span class="p">)</span> <span class="c1">#=> :perform</span>
</code></pre></div></div>
<p>A block version of the tracer is <a href="http://ruby-doc.org/stdlib-2.1.0/libdoc/objspace/rdoc/ObjectSpace.html#method-c-trace_object_allocations">also available</a>.</p>
<p>Under the hood, this feature is built on <code class="language-plaintext highlighter-rouge">NEWOBJ</code> and <code class="language-plaintext highlighter-rouge">FREEOBJ</code> tracepoints included in 2.1. These events are only available from C, via <code class="language-plaintext highlighter-rouge">rb_tracepoint_new()</code>.</p>
<h3 id="heap-dumping">Heap Dumping</h3>
<p>To further help debug object reference leaks, you can dump an object (or the entire heap) for offline analysis.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'objspace'</span>
<span class="c1"># enable tracing for file/line/generation data in dumps</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">trace_object_allocations_start</span>
<span class="c1"># dump single object as json string</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">dump</span><span class="p">(</span><span class="s2">"abc"</span><span class="p">.</span><span class="nf">freeze</span><span class="p">)</span> <span class="c1">#=> "{...}"</span>
<span class="c1"># dump out all live objects to a json file</span>
<span class="no">GC</span><span class="p">.</span><span class="nf">start</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">dump_all</span><span class="p">(</span><span class="ss">output: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="s1">'heap.json'</span><span class="p">,</span><span class="s1">'w'</span><span class="p">))</span>
</code></pre></div></div>
<p>Objects are serialized as simple json, and include all relevant details about the object, its source (if allocating tracing was enabled), and outbound references:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"address"</span><span class="p">:</span><span class="s2">"0x007fe9232d5488"</span><span class="p">,</span><span class="w">
</span><span class="nl">"type"</span><span class="p">:</span><span class="s2">"STRING"</span><span class="p">,</span><span class="w">
</span><span class="nl">"class"</span><span class="p">:</span><span class="s2">"0x007fe923029658"</span><span class="p">,</span><span class="w">
</span><span class="nl">"frozen"</span><span class="p">:</span><span class="kc">true</span><span class="p">,</span><span class="w">
</span><span class="nl">"embedded"</span><span class="p">:</span><span class="kc">true</span><span class="p">,</span><span class="w">
</span><span class="nl">"fstring"</span><span class="p">:</span><span class="kc">true</span><span class="p">,</span><span class="w">
</span><span class="nl">"bytesize"</span><span class="p">:</span><span class="mi">3</span><span class="p">,</span><span class="w">
</span><span class="nl">"value"</span><span class="p">:</span><span class="s2">"abc"</span><span class="p">,</span><span class="w">
</span><span class="nl">"encoding"</span><span class="p">:</span><span class="s2">"UTF-8"</span><span class="p">,</span><span class="w">
</span><span class="nl">"references"</span><span class="p">:[],</span><span class="w">
</span><span class="nl">"file"</span><span class="p">:</span><span class="s2">"irb/workspace.rb"</span><span class="p">,</span><span class="w">
</span><span class="nl">"line"</span><span class="p">:</span><span class="mi">86</span><span class="p">,</span><span class="w">
</span><span class="nl">"method"</span><span class="p">:</span><span class="s2">"eval"</span><span class="p">,</span><span class="w">
</span><span class="nl">"generation"</span><span class="p">:</span><span class="mi">15</span><span class="p">,</span><span class="w">
</span><span class="nl">"flags"</span><span class="p">:{</span><span class="nl">"wb_protected"</span><span class="p">:</span><span class="kc">true</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>The heap dump produced by <code class="language-plaintext highlighter-rouge">ObjectSpace.dump_all</code> can be processed by the tool of your choice. You might try a <a href="http://stedolan.github.io/jq/">json processor like jq</a> or a <a href="http://www.rethinkdb.com/">json database</a>. Since the dump contains outbound references for each object, a full object graph can be re-created for deep analysis.</p>
<p>For example, here’s a simple ruby/shell script to see which gems/libraries create the most long-lived objects of different types:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span><span class="nb">cat </span>heap.json |
<span class="gp"> ruby -rjson -ne ' puts JSON.parse($</span>_<span class="o">)</span>.values_at<span class="o">(</span><span class="s2">"file"</span>,<span class="s2">"line"</span>,<span class="s2">"type"</span><span class="o">)</span>.join<span class="o">(</span><span class="s2">":"</span><span class="o">)</span> <span class="s1">' |
</span><span class="go"> sort |
uniq -c |
sort -n |
tail -4
26289 lib/active_support/dependencies.rb:184:NODE
29972 lib/active_support/dependencies.rb:184:DATA
43100 lib/psych/visitors/to_ruby.rb:324:STRING
47096 lib/active_support/dependencies.rb:184:STRING
</span></code></pre></div></div>
<p>If you have a ruby application that feels large or bloated, give these new ObjectSpace features a try. And if you end up writing a heap analysis tool <a href="http://arborjs.org">or visualization</a> for these json files, do let me know <a href="https://twitter.com/tmm1">on twitter</a>.</p>
Ruby 2.1: Profiling Ruby2013-12-24T00:00:00+00:00http://tmm1.net/ruby21-profiling<p>Ruby 2.1 is shipping with <code class="language-plaintext highlighter-rouge">rb_profile_frames()</code>, a new C-api for fetching ruby backtraces. The api performs no allocations and adds minimal cpu overhead making it ideal for profiling, even in production environments.</p>
<p>I’ve implemented a <a href="https://github.com/tmm1/stackprof">sampling callstack profiler for 2.1</a> called stackprof with this API, using techniques popularized by <a href="https://code.google.com/p/gperftools/">gperftools</a> and <a href="http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/">@brendangregg</a>. It works remarkably well, and provides incredible insight into the execution of your code.</p>
<p>For example, I recently used <code class="language-plaintext highlighter-rouge">StackProf::Middlware</code> on one of our production github.com unicorn workers. The resulting profile is analyzed using <code class="language-plaintext highlighter-rouge">bin/stackprof</code>:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>stackprof data/stackprof-cpu-4120-1384979644.dump <span class="nt">--text</span> <span class="nt">--limit</span> 4
<span class="go">==================================
Mode: cpu(1000)
Samples: 9145 (1.25% miss rate)
GC: 448 (4.90%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
</span><span class="gp"> 236 (2.6%) 231 (2.5%) String#</span>blank?
<span class="gp"> 546 (6.0%) 216 (2.4%) ActiveRecord::ConnectionAdapters::Mysql2Adapter#</span><span class="k">select</span>
<span class="gp"> 212 (2.3%) 199 (2.2%) Mysql2::Client#</span>query_with_timing
<span class="gp"> 190 (2.1%) 155 (1.7%) ERB::Util#</span>html_escape
</code></pre></div></div>
<p>Right away, we see that 2.6% of cpu time in the app is spent in <code class="language-plaintext highlighter-rouge">String#blank?</code>.</p>
<p>Let’s zoom in for a closer look:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>stackprof data/stackprof-cpu-4120-1384979644.dump <span class="nt">--method</span> <span class="s1">'String#blank?'</span>
<span class="gp">String#</span>blank? <span class="o">(</span>lib/active_support/core_ext/object/blank.rb:80<span class="o">)</span>
<span class="go"> samples: 231 self (2.5%) / 236 total (2.6%)
callers:
</span><span class="gp"> 112 ( 47.5%) Object#</span>present?
<span class="go"> code:
| 80 | def blank?
187 (2.0%) / 187 (2.0%) | 81 | self !~ /[^[:space:]]/
| 82 | end
</span></code></pre></div></div>
<p>We see that half the calls into <code class="language-plaintext highlighter-rouge">blank?</code> are coming from <code class="language-plaintext highlighter-rouge">Object#present?</code>.</p>
<p>As expected, most of the time in the method is spent in the regex on line 81. I noticed the line could be optimized slightly to <a href="https://github.com/rails/rails/pull/12976">remove the double negative</a>:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code> def blank?
<span class="gd">- self !~ /[^[:space:]]/
</span><span class="gi">+ self =~ /\A[[:space:]]*\z/
</span> end
</code></pre></div></div>
<p>That helped, but I was curious where these calls were coming from. Let’s follow the stack up and look at the <code class="language-plaintext highlighter-rouge">Object#present?</code> callsite:</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>stackprof data/stackprof-cpu-4120-1384979644.dump <span class="nt">--method</span> <span class="s1">'Object#present?'</span>
<span class="gp">Object#</span>present? <span class="o">(</span>lib/active_support/core_ext/object/blank.rb:20<span class="o">)</span>
<span class="go"> samples: 12 self (0.1%) / 133 total (1.5%)
callers:
</span><span class="gp"> 55 ( 41.4%) RepositoryControllerMethods#</span>owner
<span class="gp"> 31 ( 23.3%) RepositoryControllerMethods#</span>current_repository
<span class="go"> callees (121 total):
</span><span class="gp"> 112 ( 92.6%) String#</span>blank?
<span class="gp"> 6 ( 5.0%) Object#</span>blank?
<span class="gp"> 3 ( 2.5%) NilClass#</span>blank?
<span class="go"> code:
| 20 | def present?
133 (1.5%) / 12 (0.1%) | 21 | !blank?
| 22 | end
</span></code></pre></div></div>
<p>So <code class="language-plaintext highlighter-rouge">Object#present?</code> is almost always called on String objects (92.6%), with the majority of these calls coming from two helper methods in <code class="language-plaintext highlighter-rouge">RepositoryControllerMethods</code>.</p>
<p>The callers in <code class="language-plaintext highlighter-rouge">RepositoryControllerMethods</code> appeared quite simple, but after a few minutes of staring I discovered the fatal mistake causing repeated calls to <code class="language-plaintext highlighter-rouge">present?</code>:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code> def owner
<span class="gd">- @owner ||= User.find_by_login(params[:user_id]) if params[:user_id].present?
</span><span class="gi">+ @owner ||= (User.find_by_login(params[:user_id]) if params[:user_id].present?)
</span> end
</code></pre></div></div>
<p>This simple 2 byte change removed 40% of calls to <code class="language-plaintext highlighter-rouge">String#blank?</code> in our app. To further minimize the cost of <code class="language-plaintext highlighter-rouge">String#blank?</code> in our app, we switched to @samsaffron’s <a href="https://github.com/SamSaffron/fast_blank">pure C implementation in the fast_blank gem</a>.</p>
<p>The end result of all these optimizations was a dramatic reduction in cpu time (i.e. <a href="http://tmm1.net/ruby21-process-clock_gettime/">without idle time</a>) spent processing requests:</p>
<center>
![](https://f.cloud.github.com/assets/2567/1807546/e5990042-6cf6-11e3-9599-58f1e662cb0f.png)
</center>
<p>With 2.1 and <a href="https://github.com/tmm1/stackprof">stackprof</a>, it’s easier than ever to make your ruby code fast. Try it today!</p>
Ruby 2.1: Process.setproctitle()2013-12-23T00:00:00+00:00http://tmm1.net/ruby21-process-setproctitle<p>Custom proclines are a great way to add visibility into your code’s execution. The various BSDs have long shipped a <a href="http://www.freebsd.org/cgi/man.cgi?query=setproctitle&sektion=3"><code class="language-plaintext highlighter-rouge">setproctitle(3)</code></a> syscall for this purpose. At GitHub, we use this technique extensively in our unicorns, resques and even our git processes:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ps ax | grep git:
32326 git: upload-pack AOKP/kernel_samsung_smdk4412 [173.x.x.x over git] clone: create_pack_file: 75.27 MiB @ 102.00 KiB/s
32327 git: pack-objects AOKP/kernel_samsung_smdk4412 [173.x.x.x over git] writing: 9% (203294/2087544)
2671 git: upload-pack CyanogenMod/android_libcore [87.x.x.x over git] clone: create_pack_file: 31.26 MiB @ 31.00 KiB/s
2672 git: pack-objects CyanogenMod/android_libcore [87.x.x.x over git] writing: 93% (90049/96410)
$ ps ax | grep unicorn
2408 unicorn github.com[9013677] worker[01]: 1843 reqs, 2.9 req/s, 23ms avg, 6.9% util
2529 unicorn github.com[9013677] worker[02]: 1715 reqs, 2.9 req/s, 21ms avg, 6.2% util
2640 unicorn github.com[9013677] worker[03]: 1632 reqs, 3.2 req/s, 19ms avg, 6.2% util
$ pstree -s resqued
─┬─ 25012 resqued-0.7.9 master [gen 413] [1 running] config/resqued.rb
└─┬─ 21924 resqued-0.7.9 listener 413 [f48cb15] [running] config/resqued.rb
├─── 22329 resque-1.20.0: [f48cb15] Waiting for low,notifications,services
├─── 22295 resque-1.20.0: [f48cb15] Processing notifications since 1389929817: GitHub::Jobs::DeliverNotifications
</code></pre></div></div>
<p>Linux never gained a specialized syscall, but still supports custom <code class="language-plaintext highlighter-rouge">/proc/pid/cmdline</code> via <a href="https://github.com/torvalds/linux/blob/f5835372ebedf26847c2b9e193284075cc9c1f7f/fs/proc/base.c#L220-L222">one weird trick</a>. The hack relies on the fact that <a href="http://man7.org/linux/man-pages/man2/execve.2.html"><code class="language-plaintext highlighter-rouge">execve(2)</code></a> allocates a contiguous piece of memory to pass <code class="language-plaintext highlighter-rouge">argv</code> and <code class="language-plaintext highlighter-rouge">envp</code> into a new process. When the null byte separating the two values is overwritten, the kernel assumes you’ve customized your cmdline and continues to read into the envp buffer. This means your procline can grow to contain more information as long as there’s space in the envp buffer. (Environment variables are copied out of the original envp buffer, so overwriting this area of memory only affects <code class="language-plaintext highlighter-rouge">/proc/pid/environ</code>).</p>
<p>Starting with Ruby 2.1 <code class="language-plaintext highlighter-rouge">Process.setproctitle</code> is available as a complementary method to <code class="language-plaintext highlighter-rouge">$0=</code>, and can be used to customize proclines without affecting <code class="language-plaintext highlighter-rouge">$0</code>.</p>
Ruby 2.1: Process.clock_gettime()2013-12-23T00:00:00+00:00http://tmm1.net/ruby21-process-clock_gettime<p>Cpu vs idle time is one of the first things I look at when benchmarking rails requests.</p>
<p>Cpu time consists of number crunching, template rendering, method invocation and any other time spent executing instructions on the CPU. Idle time is everything else- generally this is time spent waiting on disk or network I/O, and can be highly variable depending on disk activity, remote server load, network conditions, etc.</p>
<p>In the past I’ve used ruby-prof’s <code class="language-plaintext highlighter-rouge">RubyProf::Measure::ProcessTime.measure</code> to measure cpu time, but with Ruby 2.1 we have a <code class="language-plaintext highlighter-rouge">clock_gettime(3)</code> wrapper built-in!</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">time</span>
<span class="n">real</span> <span class="o">=</span> <span class="no">Time</span><span class="p">.</span><span class="nf">now</span>
<span class="n">cpu</span> <span class="o">=</span> <span class="no">Process</span><span class="p">.</span><span class="nf">clock_gettime</span><span class="p">(</span><span class="no">Process</span><span class="o">::</span><span class="no">CLOCK_PROCESS_CPUTIME_ID</span><span class="p">)</span>
<span class="k">yield</span>
<span class="n">cpu</span> <span class="o">=</span> <span class="no">Process</span><span class="p">.</span><span class="nf">clock_gettime</span><span class="p">(</span><span class="no">Process</span><span class="o">::</span><span class="no">CLOCK_PROCESS_CPUTIME_ID</span><span class="p">)</span> <span class="o">-</span> <span class="n">cpu</span>
<span class="n">real</span> <span class="o">=</span> <span class="no">Time</span><span class="p">.</span><span class="nf">now</span> <span class="o">-</span> <span class="n">real</span>
<span class="p">{</span> <span class="ss">real: </span><span class="n">real</span><span class="p">,</span> <span class="ss">cpu: </span><span class="n">cpu</span><span class="p">,</span> <span class="ss">idle: </span><span class="n">real</span><span class="o">-</span><span class="n">cpu</span> <span class="p">}</span>
<span class="k">end</span>
</code></pre></div></div>
<div class="language-irb highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">>> time{ sleep 1 } # all idle time
</span><span class="p">=></span> <span class="p">{</span><span class="ss">:real</span><span class="o">=></span><span class="mf">1.000452</span><span class="p">,</span> <span class="ss">:cpu</span><span class="o">=></span><span class="mf">0.00041599999999997195</span><span class="p">,</span> <span class="ss">:idle</span><span class="o">=></span><span class="mf">1.000036</span><span class="p">}</span>
<span class="o">>></span> <span class="n">time</span><span class="p">{</span> <span class="mi">10000</span><span class="p">.</span><span class="nf">times</span><span class="p">{</span><span class="mi">2</span><span class="o">**</span><span class="mi">65536</span><span class="p">}</span> <span class="p">}</span> <span class="c1"># all cpu time</span>
<span class="o">=></span> <span class="p">{</span><span class="ss">:real</span><span class="o">=></span><span class="mf">0.21192</span><span class="p">,</span> <span class="ss">:cpu</span><span class="o">=></span><span class="mf">0.211714</span><span class="p">,</span> <span class="ss">:idle</span><span class="o">=></span><span class="mf">0.00020599999999998397</span><span class="p">}</span>
<span class="o">>></span> <span class="n">time</span><span class="p">{</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'http://google.com'</span><span class="p">).</span><span class="nf">read</span> <span class="p">}</span> <span class="c1"># mixed, mostly idle</span>
<span class="o">=></span> <span class="p">{</span><span class="ss">:real</span><span class="o">=></span><span class="mf">0.342832</span><span class="p">,</span> <span class="ss">:cpu</span><span class="o">=></span><span class="mf">0.05224400000000001</span><span class="p">,</span> <span class="ss">:idle</span><span class="o">=></span><span class="mf">0.290588</span><span class="p">}</span>
</code></pre></div></div>
<p>The method also takes an optional second argument <code class="language-plaintext highlighter-rouge">unit</code>, which can be <code class="language-plaintext highlighter-rouge">:millisecond</code>, <code class="language-plaintext highlighter-rouge">:microsecond</code>, <code class="language-plaintext highlighter-rouge">:nanosecond</code>, or a <code class="language-plaintext highlighter-rouge">:float_</code> variant thereof. See <a href="http://ruby-doc.org/core-2.1.0/Process.html#method-c-clock_gettime">the documentation</a> for more details.</p>
Ruby 2.1: Frozen String Literals2013-12-23T00:00:00+00:00http://tmm1.net/ruby21-fstrings<p>In Ruby 2.1, <code class="language-plaintext highlighter-rouge">"str".freeze</code> is optimized by the compiler to return a single shared frozen strings on every invocation. An alternative <code class="language-plaintext highlighter-rouge">"str"f</code> syntax was implemented initially, but later reverted.</p>
<p>Although the external scope of this feature is limited, internally it is used extensively to de-duplicate strings in the VM. Previously, every <code class="language-plaintext highlighter-rouge">def method_missing()</code>, the symbol <code class="language-plaintext highlighter-rouge">:method_missing</code> and any literal <code class="language-plaintext highlighter-rouge">"method_missing"</code> strings in the code-base would all create their own String objects. With Ruby 2.1, only one string is <a href="https://bugs.ruby-lang.org/issues/9171">created and shared</a>. Since many strings are commonly re-used in any given code base, this easily adds up. In fact, large applications can expect up to <a href="https://bugs.ruby-lang.org/issues/9159">30% fewer long-lived strings</a> on their heaps in 2.1.</p>
<p>For 2.2, there are plans to <a href="https://bugs.ruby-lang.org/issues/9229">expose this feature via a new <code class="language-plaintext highlighter-rouge">String#f</code></a>. There’s also a proposal for a <a href="https://bugs.ruby-lang.org/issues/9278">magic <code class="language-plaintext highlighter-rouge">immutable: string</code> comment</a> to make frozen strings default for a given file.</p>
Ruby 2.1: Method Cache2013-12-22T00:00:00+00:00http://tmm1.net/ruby21-method-cache<p>For years, MRI cleared the entire VM’s method cache whenever a new method was defined. In fact, method defintions were only one of <a href="https://charlie.bz/blog/things-that-clear-rubys-method-cache">a dozen ways to clear the method cache</a>. Earlier this year, @jamesgolick decided to improve things with a <a href="http://jamesgolick.com/2013/4/14/mris-method-caches.html">patchset for 1.9 implementing hierarchical invalidation</a>. @charliesome subsequently <a href="http://bugs.ruby-lang.org/issues/8426">ported and committed the patchset</a> to trunk. Starting with Ruby 2.1, altering a class will only invalidate the caches for that class and its subclasses.</p>
<p>To provide visibility, we’ve exposed some basic stats about the method cache via a new <code class="language-plaintext highlighter-rouge">RubyVM.stat()</code> method. For instance, an application can measure <tt>“global invalidations per request”</tt> by comparing <code class="language-plaintext highlighter-rouge">RubyVM.stat(:global_method_state)</code> before and after every request. See <a href="https://github.com/simeonwillbanks/busted">simeonwillbanks/busted</a> for some convenience methods around these new stats.</p>
<p>To track down where (global and non-global) invalidations are happening, Ruby 2.1 ships with a new probe: <code class="language-plaintext highlighter-rouge">ruby::method-cache-clear</code>. This can easily be used via <a href="https://github.com/simeonwillbanks/busted/blob/master/dtrace/probes/examples/method-cache-clear.d">dtrace</a> or <a href="http://avsej.net/2012/systemtap-and-ruby-20/">systemtap</a> to find the source of method cache invalidations in your application.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span><span class="nb">cat </span>ruby_mcache.stp
<span class="go">probe process("/usr/bin/ruby").mark("method__cache__clear") {
</span><span class="gp"> printf("%s(%d) %s %s:%d cleared `%s'\n", execname(), pid(), $</span><span class="nv">$name</span>, kernel_sring<span class="o">(</span><span class="nv">$arg2</span><span class="o">)</span>, <span class="nv">$arg3</span>, kernel_string<span class="o">(</span><span class="nv">$arg1</span><span class="o">))</span>
<span class="go">}
</span><span class="gp">$</span><span class="w"> </span><span class="nb">sudo </span>stap ruby_mcache.stp
<span class="go">ruby(25410) method__cache__clear lib/ruby/2.1.0/ostruct.rb:169 cleared `OpenStruct'
ruby(25410) method__cache__clear lib/ruby/2.1.0/ostruct.rb:170 cleared `OpenStruct'
</span></code></pre></div></div>
<p>Work on improving ruby’s method cache is continuing on ruby-core. Early numbers show up to 5-10% improvements are possible with <a href="http://bugs.ruby-lang.org/issues/9262">a larger and more resilient cache</a>. We hope to land some improvements to ruby-trunk (for 2.2), and maybe backport them into a future 2.1 patchlevel release.</p>