Introduction

Why block can see local variables defined before him? Why can it change them? What kind of sorcery is this? I will try to answer that question in this post. We will see examples of blocks and hidden secret hero behind the magic – binding object.

In our magical journey, we will look into some Ruby code and some CRuby internals to help us better understand the magic.

Great magicians never reveal their secrets – but thankfully Ruby is open source, so we can lift the veil and see the real truth behind it. And believe me, it’s quite beautiful!

A simple block example

We use blocks in our everyday work a lot. But – do you ever ask yourself, how does this work? Let’s start with a trivial example:

a = 2
10.times do
  a += 1
end
puts a

When thinking about the value of a after this code (and output), it’s quite obvious – of course it’s 12. But what if we try to complicate it a little bit?

class BlockTest 
  def self.add_method
    local = 2
    define_method :m do 
      local
    end
  end
end

b = BlockTest.new
BlockTest.add_method
b.m

Now thing gets a little bit more obfuscated. The result is 2, but why? Before answering that, we need to ask ourselves one question.

What is a block?

In Ruby a block (i.e., the construction which gave us such a headache in the last example) consists of two parts: the code (lambda – as in lambda calculus) and the environment. We call this combination a closure.

When we’re thinking about code that is stored in a closure, Ruby gives us quite a helpful tool to look behind the scenes. As you may know, when running Ruby code on the CRuby platform it gets compiled by YARV to a kind-of-assembly language. First, we will use a Ruby-provided tool to get some insights into how this compiled source is stored. Maybe that gives us some hints about the code that is stored within a closure?

p = proc do
  a = 2
  10.times do
    a += 1
  end
end
puts RubyVM::InstructionSequence.disasm(p)

We get the below output (if you’re running the code along with me, you may get a slightly different result):

== disasm: <RubyVM::InstructionSequence:block in [email protected](irb)>=====
0004 putobject        2
0006 setlocal         a, 3
0011 putobject        10
0013 send             <callinfo!mid:times, argc:0, block:block (2 levels) in irb_binding>
0017 leave

== disasm: <RubyVM::InstructionSequence:block (2 levels) in [email protected](irb)>
0004 getlocal         a, 4
0007 putobject_OP_INT2FIX_O_1_C_
0008 opt_plus         <callinfo!mid:+, argc:1, ARGS_SIMPLE>
0010 dup
0011 setlocal         a, 4

Wait a second – getlocal, setlocal? If these are local variables, why can I access them outside of the block?

Are you trying to cheat me again

The answer to that magic trick is stored in the second part of the definition. As we said before, a closure consists of both code and environment. But before we deep further, we need to look into YARV itself (if you’re not familiar with YARV, you may want to look into the whitepaper describing it).

What is YARV?

YARV is a virtual machine (written in C) designed to run your Ruby code fast (well, as quickly as possible right now). It’s kind of classically developed VM running some-kind-of-an-assembly language (part of which you saw a little bit earlier). Most important for us, it is a virtual machine with stack and heap. It holds the current execution state in something (actually a C struct) called a stack frame. As you may guess – that means the local variables are held on the stack.

One fun fact about the stack – when we’re returning from a function, the whole part of memory allocated for this function gets popped from the stack (therefore discarded). So we need some way to store the variables when returning from the function to keep them from disappearing. If we’re using a lambda / a proc that changes variables visible outside of it (first-class citizen), we need some way to store the environment (for later reuse) – as it is required for closure.

Here comes our secret hero – binding

Let’s take a look at simple binding object:

2.2.4 :048 > binding
 => #<Binding:0x007ff5f9930fe8>
2.2.4 :049 > binding.class
 => Binding
2.2.4 :050 > binding.class.superclass
 => Object

Maybe its class hierarchy will be a little bit more interesting?

hierarchy

When looking at binding only from that perspective – it seems dull. But don’t let that fool you! This small little hero holds the full context of execution for our block/lambda/proc. Unfortunately (or fortunately), to get a grip on how it works (and how it’s created) we need to dive deep into the beautiful world of C – the language in which CRuby is written.

Time for some real meat

So, let’s take a look at the source code of CRuby. Our first trail leads to proc.c:

// proc.c
static VALUE
rb_f_binding(VALUE self)
{
    return rb_binding_new();
}

This method is responsible for Ruby’s Kernel.binding method – therefore when we call binding on any object, it’s this C method that is called. Let’s look deeper:

// proc.c
VALUE
rb_binding_new(void)
{
    rb_thread_t *th = GET_THREAD();
    return rb_vm_make_binding(th, th->cfp);
}

Ok, so this is what we’ve expected. The VM is involved, so what happens there? (The code is abridged to focus on vital parts.)

// vm.c
VALUE
rb_vm_make_binding(rb_thread_t *th, const rb_control_frame_t *src_cfp)
{
    // …
    rb_binding_t *bind;
    // …
    envval = vm_make_env_object(th, cfp);
    // …
    vm_bind_update_env(bind, envval);
}
// …
static VALUE
vm_make_env_object(rb_thread_t *th, rb_control_frame_t *cfp)
{
    VALUE envval = vm_make_env_each(th, cfp);
    // …
}
vm_make_env_each(rb_thread_t *const th, rb_control_frame_t *const cfp)
{
    // …
    /*
     * # local variables on a stack frame (N == local_size)
     * [lvar1, lvar2, ..., lvarN, SPECVAL]
     *                            ^
     *                            ep[0]
     *
     * # moved local variables
     * [lvar1, lvar2, ..., lvarN, SPECVAL, Envval, BlockProcval (if needed)]
     *  ^                         ^
     *  env->env[0]               ep[0]
     */

    env_size = local_size +
           1 /* envval */ +
           (blockprocval ? 1 : 0) /* blockprocval */;
    env_body = ALLOC_N(VALUE, env_size);
    MEMCPY(env_body, ep - (local_size - 1 /* specval */), VALUE, local_size);
    // …
    env_ep = &env_body[local_size - 1 /* specval */];
    // …
    env = vm_env_new(env_ep, env_body, env_size, env_iseq);
    // …
    return (VALUE)env;
}

Ok, so what have we learned from these pieces of C code?

  • When we call binding anywhere (or we create a lambda/proc that creates a binding underneath) it calls the rb_f_binding C method,
  • the VM moves the local variables into the heap (therefore persisting them),
  • the local variables from now on point into their heap copies.

Enough talking, how can I use it?

Binding gives us some exciting methods. For example, we can list all of the local variables for a specific binding:

def get_binding
  a = 2
  b = 3
  binding
end

get_binding.local_variables # => [:a, :b]
get_binding.local_variable_get(:a) # => 2

As said before, binding moves the variables to the heap and changes the references accordingly. Therefore when we’re in the scope of the current binding, all of the changes will affect the binding created earlier.

def test_lambda
  a = 10
  l = lambda { a }
  a = "OH HAI"
  l
end
test_lambda.call #=> "OH HAI"

As we know that, it’s time for some trivia:

a = 0
add = lambda { a += 1 }
sub = lambda { a -= 1 }  
add.call
sub.call
add.call
sub.call
add.call
a # => ?

What will be the value of a?

Of course, it will be 1 – both lambdas are created in the same scope. Therefore they operate on the same binding (created implicitly with the add lambda).

One can ask a question – if all lambdas are operating on the same copy of a stack frame, can we change the binding we’re working in? The answer is… almost.

def test
  set = lambda { new_var = 10 }
  set.call
  new_var
end
test
# => NameError: undefined local variable or method `new_var' for main:Object

We can’t do this because in the original stack frame this variable wasn’t present – therefore it isn’t present in the current lexical scope. It is created in another lexical scope – the scope of the lambda. It’s a very desirable behavior – it protects from leaking of variables from lambdas into other scopes (JavaScript: I’m looking at you…).

What if we call a binding from inside of an object?

class BindingTest
  def initialize
    @a = 1
  end
  
  def get_binding
    binding
  end
end

b = BindingTest.new.get_binding
b.receiver
# => #<BindingTest:0x007ff5fa285c10 @a=1>

We can get the self of this binding, on – in other words – the default receiver of all messages for which this binding got created.

What else can we do with a binding?

For example, you can eval some code providing a custom environment:

def get_binding
  a = 2
  b = 3
  binding
end
eval('a + b', get_binding) # => 5

Or even change variables stored in a binding:

def get_binding
  a = 2
  b = 3
  binding
end
b = get_binding
eval('c = a + b', b)
b.local_variable_get(:c) # => 5

When we think about this, local variables are also quite convenient in templates – this is how erb does its thing:

require 'erb'

template =<<~EOF
  <html>
    <%= variable %>
  </html>
EOF
variable = 10
erb = ERB.new(template)

erb.result
#=> NameError: undefined local variable or method 'variable' for main:Object

b = binding
erb.result(b)

The output is obvious:

<html>
  10
</html>

And what is going on behind the scenes?

# erb.rb:856
def result(b=new_toplevel)
  if @safe_level
    proc {
      $SAFE = @safe_level
      eval(@src, b, (@filename || '(erb)'), @lineno)
    }.call
  else
    eval(@src, b, (@filename || '(erb)'), @lineno)
  end
end

We’re eval-ing the given template in the scope of the provided binding. It’s a typical use case.

TracePoint

For our final usage of binding let’s take a quick look at TracePoint.

TracePoint is a relatively new object-oriented API that allows you to hook into some of YARV events, therefore providing excellent debugging facilities (and it’s also fun). Let’s take a look at a simple example. We have this (straightforward) code:

class Cat
  def to_s
    a = 1
    b = 2
    puts "CAT"
  end
end

and this TracePoint addition to it:

bindings = []
trace = TracePoint.new(:return) do |tp|
  puts tp.path
  puts tp.lineno
  bindings << tp.binding
end
trace.enable
c = Cat.new
c.to_s
trace.disable
c = nil

It means that anytime any method will return (don’t try running this in IRB, trust me) we output info about the path and line number of the file which this method is defined in. We also store all bindings for further inspection in an array called bindings. The effect? It’s quite nice:

p bindings.first.local_variables
# [:a, :b]
p bindings.first.local_variable_get(:a) # 1
p bindings.first.local_variable_get(:b) # 2
p bindings.first.receiver
# #<Cat:0x007fdd53854868>

We can quickly inspect the internal state of local variables when the method returns.

But what if we want to catch a specific method? Well, we can always combine the TracePoint API with the power of a debugger (such as Pry):

require 'pry'

class Cat
  def traced_m
    a = 1
    b = 2
    puts "CAT"
  end
end

trace = TracePoint.new(:return) do |tp|
  tp.binding.pry if tp.method_id == :traced_m
end
trace.enable
c = Cat.new
c.traced_m

This snippet will start a Pry session when returning from the method (but only with a specified name – here traced_m). Pretty nice tool for inspecting internal state of variables at a very specific moment. Of course, you can simply put a breakpoint there – but hey, it’s more fun that way!

Conclusion

In this article, I showed you some part of the beautiful world of bindings in Ruby. We’ve learned how blocks work inside, why they work and what is a closure. Quite a nice ride!

If you want to learn more about similar Ruby internals, I can’t recommend enough a great book by Pat Shaughnessy – Ruby Under a Microscope. It made me fall in love with Ruby internals; I hope you will enjoy it as much as I do!

A small fun fact for the end: a proc and a lambda are the same C struct (the only difference is one flag).