Introduction

This blog post will show you how to create simple and object oriented solution for complicated background job workflows using Active Job. Don’t worry if you are new to both of these things, they’ll be covered in the introduction.

Background jobs

Request lifecycle of web applications is not particularly friendly to time-demanding computations. Sometimes you need more time than a server can give you before timing out. Users are not particularly friendly when you keep them waiting either. So whenever it makes sense to respond quickly and finish a job outside of the request lifecycle, go for it.

How do you achieve this? Background jobs mechanism can be seen as a queue (or many parallel queues). Job is not executed on the request lifecycle, it is added to a queue instead and executed whenever server has resources to do so. Typical example of a background job is code sending email after user registers. Application only creates job (which takes almost no time) and responds quickly. Job is then handled by separate process as soon as there are resources to do so.

Really quick example of such job:

class WelcomeEmailJob < ActiveJob::Base
  queue_as :default

  def perform(user_id)
    UserMailer.welcome_email(user_id).deliver_now
  end
end

And then in your controller you could write:

class UsersController
  def create
    @user = @user.create(params[:user])
    WelcomeEmailJob.perform_later(@user.id)
    respond_with @user
  end
end

It’s that simple! Depending on the mailer settings we just saved someone up to a few seconds.

ActiveJob

The example I’ve just shown is based on ActiveJob - a library (part of rails since 4.2), which provides a common interface for almost any background jobs library like resque, delayed job, sidekiq etc. If you don’t need to use any library specific features, active job will make choice of background jobs library less significant. You can switch easily if needed. If you want to know more about using Active Job details, please refer to awesome rails guides.

Problem

Let’s say we have a bunch of bicycles which have to be prepared for riding (serviced and cleaned). There is a coordinator who knows which bikes needs to be prepared and as soon as bikes are prepared, notification needs to be sent.

Simple, isn’t it? Well, let’s make our life harder and add a little twist. Servicing of the bike is done by an external service (so we have to assume it can fail from time to time) and cleaning is time consuming for our server. So the goal is to avoid doing each job twice for one bike if something fails and especially not preparing any bike twice. It would be also good to have some kind of monitoring.

Solution

It feels natural to divide that process into smaller parts and background jobs processing library seems like right choice. However, while background jobs libraries do a great job when it comes to independent or simple jobs, it is usually required to use 3rd party libraries to achieve batch or multi steps jobs.

After looking into possible solutions I chose to write one myself. The main reasons for my choice are:

I don’t want to be dependent on any library,
I need to handle retries carefully and almost none of the libraries mention if and how they handle retries for complicated workflows,
I want to be able to examine the state of the process easily in case it fails,
I am not too enthusiastic about DSLs used in libraries I found (I am most often not too enthusiastic about DSLs in general, so that may be only my paranoia).

Implementation

Let’s start with regular rails app with Bike and Coordinator classes and define their associations. We can run following commands to create both classes:

rails new bikes
cd bikes
rails g model bike references:coordinator string:state
rails g model coordinator string:state
rake db:migrate

# app/models/bike.rb
class Bike < ActiveRecord::Base
  belongs_to :coordinator
end

# app/models/coordinator.rb
class Coordinator < ActiveRecord::Base
  has_many :bikes
end

You are probably wondering why we need state column for both of the models, I promise we will get to that later. Now we can implement classes that actually handle hard work:

# app/classes/mechanic.rb
class Mechanic
  def service(bike)
    sleep 1
  end
end

# app/classes/cleaner.rb
class Cleaner
  def clean(bike)
    sleep 1
  end
end

Not really productive mechanic and cleaner, right? You can replace sleep 1 with fancy code of yours, but I will just assume that some heavy computations or external request is happening there.

Since I decided to process those tasks in background, let’s use Active Job on top of that! We will need jobs to clean and service the bike first.

rails generate job ServiceBike
rails generate job CleanBike

ActiveJob created job classes with empty perform method. We will overwrite those classes adding real implementation to perform method:

# app/jobs/clean_bike_job.rb
class CleanBikeJob < ActiveJob::Base
  queue_as :default

  def perform(bike_id)
    bike = Bike.find(bike_id)
    Cleaner.new.clean(bike)
  end
end

# app/jobs/service_bike_job.rb
class ServiceBikeJob < ActiveJob::Base
  queue_as :default

  def perform(bike_id)
    bike = Bike.find(bike_id)
    Mechanic.new.service(bike)
  end
end

While ActiveJob setup is finished, it is only an interface for background job library. We need to pick one.I like sidekiq, so let’s use it. To set it as Active Job backend we need to add gem 'sidekiq' to Gemfile and one line to application config:

# config/application.rb
module Bikes
 class Application < Rails::Application
    config.active_job.queue_adapter = :sidekiq
 end
end

Please refer to sidekiq wiki for more information. It would be good to read at least the basics and enable monitoring to be able to see, what happens to processed jobs. You can run sidekiq by simply executing bundle exec sidekiq in command line.

Now, having all the pieces, let’s put it together. The idea is quite simple: each job will fire the next job as soon as the process finishes. Also state column will be changed accordingly, so we can always easily check on what step are we on and what’s more important not fire any job twice. I will use “Acts As State Machine” library to define these transitions easily. If you are new to state machines or to AASM you can look at AASM readme, but example will use only basic AASM functions. To be able to use AASM, you need to add gem 'aasm' to Gemfile.

class Bike < ActiveRecord::Base
  include AASM

  belongs_to :coordinator

  aasm column: :state do
    state :new, initial: true
    state :servicing
    state :cleaning
    state :ready

    event :service, after_commit: :schedule_servicing do
      transitions from: :new, to: :servicing
    end

    event :clean, after_commit: :schedule_cleaning do
      transitions from: :servicing, to: :cleaning
    end

    event :finish do
      transitions from: :cleaning, to: :ready
    end
  end

  def schedule_servicing
    ServiceBikeJob.perform_later(self.id)
  end

  def schedule_cleaning
    CleanBikeJob.perform_later(self.id)
  end
end

AASM lets you define states and events (firing transition from one state to another) using simple DSL. I defined 4 states: new (which is initial state set after model is created), servicing, cleaning and ready. I also defined 3 events service, clean and finish. Each event adds a method named as an event name. Calling that method will run transition defined in an event. So if I call bike.service, the state will be changed from new to servicing. If I add a bang to that method i.e. bike.service! it will also save the model. The last thing worth noticing are after_commit callbacks. Whatever is defined in such callback will be executed after the state is changed and changes are committed.

Now, will this code work? Almost. Calling bike.service! will indeed fire ServiceBikeJob, but nothing will happen afterwards. Let’s fix that. We need to fire the next state in each background job:

# app/jobs/service_bike_job.rb
class ServiceBikeJob < ActiveJob::Base
  queue_as :default

  def perform(bike_id)
    bike = Bike.find(bike_id)
    Mechanic.new.service(bike)
    bike.clean!
  end
end

# app/jobs/clean_bike_job.rb

  def perform(bike_id)
    bike = Bike.find(bike_id)
    Cleaner.new.clean(bike)
    bike.finish!
  end
end

Now the code should work as expected. Something is not quite right though. Jobs know too much about the process. What if we want to change the order of the execution? Should we have to make changes in the bike model or in the job class?

Answering those questions, the Bike should be the class that knows transitions order, not ActiveJob classes. Let’s fix this, by defining a few more methods on the Bike class:

class Bike < ActiveRecord::Base

# ...

  def prepare
    service!
  end

  def finished_servicing
    clean!
  end

  def finished_cleaning
    finish!
  end
end

This way we can call bike.prepare when we want to start the process and accordingly bike.finished_cleaning in CleanBikeJob and bike.finished_servicing in ServiceBikeJob. That way messages are clear - job’s message sent to bike is “Hey, I’ve finished doing what I was supposed to do, now you decide what to do with it”.

Calling bike.prepare should fire each job until bike is in ready state. We can monitor the process using sidekiq dashboard and if something fails, sidekiq will automatically retry the job if it fails e.g. due to the network problem, so it has the ability to heal itself (you can change that behaviour in sidekiq configuration). We can also examine the state of the bike and easily tell what is the step of current process at any time.

There is still one thing left to do though. Remember Coordinator class? Object of this class needs to be able to prepare many bikes and send notification afterwards. To achieve that effect we will use technique similar to what we did in Bike class.

First, let’s define few states for the coordinator model: new, preparing_bikes, sending_notification and done. Same as before, we will define one event for each transition and in each transition we will define after_commit callback.

class Coordinator < ActiveRecord::Base
  include AASM

  has_many :bikes

  aasm column: :state do
    state :new, initial: true
    state :preparing_bikes
    state :sending_notification
    state :done

    event :start, after_commit: :schedule_preparing_bikes do
      transitions from: :new, to: :preparing_bikes
    end

    event :send_notification, after_commit: :schedule_sending_notification do
      transitions from: :preparing_bikes, to: :sending_notification
    end

    event :finish do
      transitions from: :sending_notification, to: :done
    end
  end

Looks similar, right? There is one big difference though. In Bike class each background job was the object initiating next transition and here we will have to make sure all bikes are ready before firing the next event after prepare_bikes method is called. One way to do so would be to run job, checking if all bikes are ready, waiting for defined time interval, but that doesn’t seem good enough.

What if each bike notified Coordinator object? It would be quite annoying for Coordinator, but hey, I am quite sure this object won’t have any hard feelings. Each time a coordinator object gets a message from one of its bikes, it checks if all bikes are ready and only then fires the next event. Basically, a bike object is telling a coordinator object “hey I am done!”. And a coordinator object is asking a bikes relation: “are you all ready?”.

Let’s modify Bike class to achieve that:

class Bike < ActiveRecord::Base
  # ...

  aasm column: :state do
    # ...

    event :finish, after_commit: :notify_coordinator do
      transitions from: :cleaning, to: :ready
    end
  end

  # ...

  def notify_coordinator
    coordinator.bike_is_ready
  end
end

Now we need to implement all callback and bike_is_ready methods.

class Coordinator < ActiveRecord::Base

  # ...

  def bike_is_ready
    send_notification! if all_bikes_ready?
  end

  def prepare_bikes
    start!
  end

  def notification_was_sent
    finish!
  end

  private

  def schedule_preparing_bikes
    bikes.each do |bike|
      bike.prepare
    end
  end

  def all_bikes_ready?
    bikes.not_ready.empty?
  end

  def schedule_sending_notification
    SendNotificationJob.perform_later(self.id)
  end
end

The code is almost ready. The first method called after calling prepare_bikes (which is alias to start! event) on coordinator object will be schedule_preparing_bikes. This method is really straight forward. It iterates through all bikes and runs prepare method for each. Bike’s state machine will handle whole process and each time it reaches ready state, it will send bike_is_ready message to coordinator.

The method bike_is_ready is quite simple as well: it will fire next event if all_bikes_ready? condition is true. And it is true if bikes.not_ready relation is empty?. We don’t have that scope on Bike class defined so let’s fix that:

# app/models/bike.rb

class Bike < ActiveRecord::Base
  # ...

  scope :not_ready, -> { where.not(state: :ready) }

  # ...

The last thing to do is implementing SendNotificationJob:

rails generate job SendNotification

# app/jobs/send_notification_job.rb
class SendNotificationJob < ActiveJob::Base
  queue_as :default

  def perform(coordinator_id)
    coordinator = Coordinator.find(coordinator_id)
    sleep 1 # notification will be sent from here
    coordinator.notification_was_sent
  end
end

Again, the job is notifying coordinator object that notification_was_sent, not specifically firing finish! event. We might want to add different events after bikes are prepared and we should change Coordinator class definition to achieve that, not the job class.

Now we can check that code in action:

coordinator = Coordinator.create!
10.times { coordinator.bikes.create! }
coordinator.prepare_bikes

So is it done?

It depends ;). Going towards better design, you always need to know where to stop. There are few things I would take into consideration.

State machine

You probably noticed we use all the events internally, but AASM makes all events public. Maybe we should use state machine in some internal object, exposing only methods for notifying object that the job is done (like finished_servicing, notification_was_sent etc.).

Also, I used state machine to focus on the logic, not mechanism itself (I think DSL provided by AASM is really easy to read especially in such simple case), but maybe it is an overkill. Maybe simple abstraction would be enough.

Bike responsibilities

Bike class is the one initiating the process of its preparation and it is the class which knows what are the steps of the preparations. Is it ok? For this simple example it seems to be. Again, adding layers of abstraction could be a distraction from what the essence of this technique is. But for other cases it could make sense to use another class to handle the process and make the bike object its attribute (so we can easily pass other objects).

Performance

If jobs in the process are time consuming and reliability is the key, probably we don’t need to worry about the fact that query is fired each time a preparation is finished. But if for some reason it could be a bottleneck, you can think about implementing a counter based solution or e.g. caching bikes ids first and comparing with ids of bikes that were processed.

Each state change requires update on a model, which can be bottleneck as well. You can think about e.g. using Redis to make transitions faster. But I would call it premature optimization, until it’s not a problem.

Errors handling

If a job fails it can be run again without restarting the whole process. This means you can fix the bug and retry failed job to push the process forward. In particular, sidekiq will retry a job many times increasing interval each time (by default 25 times over 21 days). So when something wrong happens you can fix it and retry job manually or wait till sidekiq does it for you.

There is no way to tell if a process failed or it is stalled though. It might be good idea to implement a code checking what is current job status. This might require writing code dependent on library. E.g. for sidekiq you could save the job id and add methods to check the job status using sidekiq api.

Rebased Team writing about tech we use.

Languages, frameworks, libraries, tools. Certified for 0% fluff.

Complicated workflows using active job

Łukasz Sarnacki