What is this bitfield thing?

An array of true/false values. Usually a fixed-size array, due to platform constraints. Also a very compact array – taking exactly one bit to store each element. In other words: a boolean vector or set.

So does anyone use it?

Plenty of usage around, many of it inherited from history. Probably the most known bitfield is the POSIX access mode field, used by every filesystem on your UNIX-like OS:

$ LC_ALL=C ls -l /etc/passwd
-rw-r--r-- 1 root root 2473 Sep  1  2016 /etc/passwd

What?

I’m talking about this field where dashes mix with letters. It shows me that this file is readable and writable by its owner (first triplet after the initial dash: rw-), readable by its group (second triplet: r--) and also readable by other users (last triplet: r--). The first blank is used for displaying any special bits. Underneath this representation lies a bitfield:

Flag Octal value Purpose
S_ISUID 04000 Set user ID on execution
S_ISGID 02000 Set group ID on execution
S_ISVTX 01000 Sticky bit
S_IRUSR, S_IREAD 00400 Read by owner
S_IWUSR, S_IWRITE 00200 Write by owner
S_IXUSR, S_IEXEC 00100 Execute/search by owner
S_IRGRP 00040 Read by group
S_IWGRP 00020 Write by group
S_IXGRP 00010 Execute/search by group
S_IROTH 00004 Read by others
S_IWOTH 00002 Write by others
S_IXOTH 00001 Execute/search by others

How do your strange numbers work? Why do you use only 0, 1, 2 and 4?

Let’s look at bits grouped into threes (BTW: conveniently 2³ = 8 and octo is Latin for ‘eight’ – hence octal, also the number of legs on GitHub’s octocat).

000 = 0
001 = 1
010 = 2
011 = 3
100 = 4
101 = 5
110 = 6
111 = 7

We’re using these digits from 0 to 7 to represent such groups. If we expand 644 to ones and zeros by substituting from the above table, we get 110 100 100 (does it look familiar already?). Also, notice that for 1, 2 and 4 exactly one bit is set, each time in a different location. Adding together numbers which contain only zeros and exactly one of these three digits, at any position, will never carry (increment other columns). This is because 4 + 2 + 1 = 7, the maximum digit at any position. Now, let’s calculate the bitfield again:

field value
S_IRUSR 00400
S_IWUSR 00200
S_IRGRP 00040
S_IROTH 00004
Σ 00644

There, a quick primer on octal numbers.

And how does this translate to -rw-r--r-- again?

Start with the S_IRUSR bit position. Output a dash if bit is unset, otherwise output a letter matching the bit’s intent (r for read – an octal 4, w for write = 2, x for execute = 1). The leading dash is for some additional bits that we’re not going to cover here. Doing that for all eight possible combinations results in this table:

000 = 0 = ---
001 = 1 = --x
010 = 2 = -w-
011 = 3 = -wx
100 = 4 = r--
101 = 5 = r-x
110 = 6 = rw-
111 = 7 = rwx

Other popular values used here are 600, 400 and 777. See if you can figure out what they mean.

You said it’s an array

It’s a compact representation of one. Let’s convert an array to bits, holding on to our filesystem permissions example.

irb> perms = [false, false, false, # special bits
              true,  true,  false, # owner
              true,  false, false, # group
              true,  false, false] # others

irb> perms.inject(0) { |acc, bit| acc << 1 | (bit ? 1 : 0) }
=> 420

That’s not 644

IRb gave us the answer in decimal notation. Let’s ask it for octal again.

irb> 420.to_s(8)
644

Or binary even

irb> 420.to_s(2)
=> "110100100" # Looks familiar?

Okay. And how do I decode the number?

irb> 0o644.to_s(2).rjust(12, "0").each_char.map { |b| b.to_i == 1 }                                        
=> [false, false, false, true, true, false, true, false, false, true, false, false]

Ruby even gives us these bits (in MSB-first-order – starting from the rightmost one) when indexing an integer. Less magic involved.

irb> (0...12).reverse_each.map { |i| 0o644[i] == 1 }
=> [false, false, false, true, true, false, true, false, false, true, false, false]

Prefixing numbers with 0o is a common notation for octals. I’m left padding the string to 12 characters with zeros here to output the unset special bits flags.

And why would I want to use that? Seems complicated

Whenever you have a large number of boolean values, and don’t want to waste storage. This was obviously more relevant in the olden days of tight memory constraints, but still is when programming microcontroller and embedded systems. Not so much on modern hardware, though. If you can use Ruby, you probably don’t have constraints forcing you to use bitfields.

Another use is when you need to compactly represent a set, where allowed values come from a small list. By assigning a single bit to each permitted value, and using its status to signal presence, you can fit up to 64 set members in a single register. (How many members does a set represented by 0644 have?)

It’s also very efficient in terms of memory access – the processor only needs to fetch one location to work on all of these flags, and compilers are very good at optimizing such access. Therefore you will see it used often in kernel code, memory management, network protocols and games.

I have worked on a large legacy application, where the godobject contains a state machine utilizing many booleans. Instead of having a TINYINT(1) field for each of these, resulting in a table with over 50 columns, all of them are stashed into two 32-bit wide bitfields. Given a million records in the table, the overhead of storing each of them separately was negligible when it was designed, and is definitely negligible today. But it does have the advantage of being less cluttered and easier to understand.

Do other languages use these?

Certainly. Rust has a nice crate for that:

#[macro_use]
extern crate bitflags;

bitflags! {
    struct Flags: u32 {
        const A = 0b00000001;
        const B = 0b00000010;
        const C = 0b00000100;
        const ABC = Self::A.bits | Self::B.bits | Self::C.bits;
    }
}

fn main() {
    let e1 = Flags::A | Flags::C;
    let e2 = Flags::B | Flags::C;
    assert_eq!((e1 | e2), Flags::ABC);   // union
    assert_eq!((e1 & e2), Flags::C);     // intersection
    assert_eq!((e1 - e2), Flags::A);     // set difference
    assert_eq!(!e2, Flags::A);           // set complement
}

Historically, C programmers usually defined constants plus accessor macros. An excerpt from Linux code linked above, mind the octal notation again, this time represented with a leading zero:

#define S_IFMT   0170000
#define S_IFLNK  0120000
#define S_IFREG  0100000
#define S_IFDIR  0040000
#define S_IFCHR  0020000
#define S_IFIFO  0010000
#define S_ISUID  0004000
#define S_ISGID  0002000
#define S_ISVTX  0001000

#define S_ISLNK(m)	(((m) & S_IFMT) == S_IFLNK)
#define S_ISREG(m)	(((m) & S_IFMT) == S_IFREG)
#define S_ISDIR(m)	(((m) & S_IFMT) == S_IFDIR)

The first one is a mask for extracting only the relevant bits (bits 13-16 counting from LSB), others are bitfield values.

Modern C has storage specifiers for struct fields. You’d declare a struct type for the permissions field, and for each single permission declare it with a bit length of one. Your compiler will then squish them together as efficiently as possible, preserving order. Then just access it like any other struct member.

struct posix_perm {
  unsigned irusr:1;
  unsigned iwusr:1;
  unsigned ixusr:1;
  unsigned irgrp:1;
  unsigned iwgrp:1;
  unsigned ixgrp:1;
  unsigned iroth:1;
  unsigned iwoth:1;
  unsigned ixoth:1;
}

Rob Pike, designer of Go hates these C bitfields and there is no built-in support, though people have created packages for that.

Erlang and Elixir have a similar concept, and a package to work with them.

Okay, show me how to work with these in a Ruby project. I’m feeling adventurous

Let’s start with Sequel. There’s a battle-tested plugin for it.

class MyModel < Sequel::Model
  plugin :bit_fields, :status_bits, [ :started, :finished, :reviewed ]
end

In migrations you’d use a Bignum (64 bit) column by default, though it will tolerate longer and shorter columns.

Sequel.migration do
  change do
    alter_table :my_models do
      add_column :status_bits, Bignum, :default => 0, :null => false
    end
  end
end

Then you just use the declared bitfield names as if they were regular boolean fields, with some extra features such as scopes.

model = MyModel.create

model.started?        # => false
model.started = true
model.started?        # => true
model.status_bits     # => 1

model.finished?       # => false
model.finished = true
model.finished?       # => true
model.status_bits     # => 3

# let's find all the finished instances
MyModel.finished(true).all

# let's find all unfinished instances
MyModel.finished(false).all

# let's find all the started and the finished instances
MyModel.started.finished.all

These field lookups are translated to bitwise operations. The last line may translate to a query like:

SELECT * from `my_models` WHERE (`status_bits` & 1 == 1) AND (`status_bits` & 2 == 2)

Aha, I see how it’s using these specially bit-positioned numbers. But why the &?

That is bitwise AND. It works by treating the right argument as a mask, and returning only bits from left argument for which 1 was set in the mask. All other bits will be unset. The only two options from x & 2 are either 0 or 2, and by comparing we know whether it’s set or unset.

Clever. How do I set a single bit?

model.finished = true; model.save
UPDATE `my_models` SET `status_bits` = `status_bits` | 2 WHERE id = 424242

This is a bitwise OR, which modifies its left argument by setting bits specified by the right argument; regardless if they were set or not.

And unset?

model.finished = false; model.save
UPDATE `my_models` SET `status_bits` = `status_bits` & ~2 WHERE id = 424242

Actually, this plugin would just calculate the new value and set it directly. The above does a mask again, but this time with a one’s complement of value 2. This is just the value with all bits flipped. This will keep values at all other bits unchanged, but set a zero at this one.

What is this Sequel thing, give me the ActiveRecord version

The most popular analog for ActiveRecord is bitfields. It’s really similar:

class User < ActiveRecord::Base
  include Bitfields
  bitfield :my_bits, 1 => :seller, 2 => :insane, 4 => :sensible
end

And for migrations:

t.integer :my_bits, default: 0, null: false
# OR
add_column :users, :my_bits, :integer, default: 0, null: false

New attributes and scopes work almost like the Sequel version. One difference is that scope methods don’t accept a boolean param to search for that flag set or unset. Rather, the unset version is in a separate scope, prefixed with not_.

So how many of these attributes can I fit in a single bitfield?

Typically up to 64. The Sequel example above does not explicitly assign numeric values to specific attributes (it certainly supports that), while the ActiveRecord one does. Let’s consider how the required numbers work:

bitfield :my_bits,
  1 => :seller,
  2 => :insane,
  4 => :sensible,
  # ...
  128 => :manipulative
  # ...
  1048576 => :closed_on_wednesdays # That's the 21st flag
  2097152 => :closed_on_thursdays
  # ...
  4294967296 => :likes_muffins # 31st
  # ...

What’s the last number needed for all 64 flags? And for 500 flags? Generally this is only feasible for small, easily comprehended numbers of flags.

Is there native database support for this?

PostgreSQL has bit strings, since basically forever. These match what we are doing here very well, with some additional semantics.

Binary strings are different, designed to be more like regular strings but with arbitrary binary data. But you can use get_bit and set_bit functions on them to get more or less the same behavior.

MySQL has BIT as well, matching the PostgreSQL counterpart. It also has SET, which describes a set containing predefined values, but is a bitfield underneath. Presentation is done with a comma-separated string, while any math and aggregation is numeric.

But none of the two plugins above work with any of these. That’s because they target the lowest common denominator, which is bitwise operations on integers – available everywhere.