Introduction
This blog post will show you how to create simple and object oriented solution for complicated background job workflows using Active Job. Don’t worry if you are new to both of these things, they’ll be covered in the introduction.
Background jobs
Request lifecycle of web applications is not particularly friendly to time-demanding computations. Sometimes you need more time than a server can give you before timing out. Users are not particularly friendly when you keep them waiting either. So whenever it makes sense to respond quickly and finish a job outside of the request lifecycle, go for it.
How do you achieve this? Background jobs mechanism can be seen as a queue (or many parallel queues). Job is not executed on the request lifecycle, it is added to a queue instead and executed whenever server has resources to do so. Typical example of a background job is code sending email after user registers. Application only creates job (which takes almost no time) and responds quickly. Job is then handled by separate process as soon as there are resources to do so.
Really quick example of such job:
class WelcomeEmailJob < ActiveJob::Base
queue_as :default
def perform(user_id)
UserMailer.welcome_email(user_id).deliver_now
end
end
And then in your controller you could write:
class UsersController
def create
@user = @user.create(params[:user])
WelcomeEmailJob.perform_later(@user.id)
respond_with @user
end
end
It’s that simple! Depending on the mailer settings we just saved someone up to a few seconds.
ActiveJob
The example I’ve just shown is based on ActiveJob - a library (part of rails since 4.2), which provides a common interface for almost any background jobs library like resque, delayed job, sidekiq etc. If you don’t need to use any library specific features, active job will make choice of background jobs library less significant. You can switch easily if needed. If you want to know more about using Active Job details, please refer to awesome rails guides.
Problem
Let’s say we have a bunch of bicycles which have to be prepared for riding (serviced and cleaned). There is a coordinator who knows which bikes needs to be prepared and as soon as bikes are prepared, notification needs to be sent.
Simple, isn’t it? Well, let’s make our life harder and add a little twist. Servicing of the bike is done by an external service (so we have to assume it can fail from time to time) and cleaning is time consuming for our server. So the goal is to avoid doing each job twice for one bike if something fails and especially not preparing any bike twice. It would be also good to have some kind of monitoring.
Solution
It feels natural to divide that process into smaller parts and background jobs processing library seems like right choice. However, while background jobs libraries do a great job when it comes to independent or simple jobs, it is usually required to use 3rd party libraries to achieve batch or multi steps jobs.
After looking into possible solutions I chose to write one myself. The main reasons for my choice are:
- I don’t want to be dependent on any library,
- I need to handle retries carefully and almost none of the libraries mention if and how they handle retries for complicated workflows,
- I want to be able to examine the state of the process easily in case it fails,
- I am not too enthusiastic about DSLs used in libraries I found (I am most often not too enthusiastic about DSLs in general, so that may be only my paranoia).
Implementation
Let’s start with regular rails app with Bike
and Coordinator
classes and define their associations. We can run following commands to create both classes:
rails new bikes
cd bikes
rails g model bike references:coordinator string:state
rails g model coordinator string:state
rake db:migrate
# app/models/bike.rb
class Bike < ActiveRecord::Base
belongs_to :coordinator
end
# app/models/coordinator.rb
class Coordinator < ActiveRecord::Base
has_many :bikes
end
You are probably wondering why we need state column for both of the models, I promise we will get to that later. Now we can implement classes that actually handle hard work:
# app/classes/mechanic.rb
class Mechanic
def service(bike)
sleep 1
end
end
# app/classes/cleaner.rb
class Cleaner
def clean(bike)
sleep 1
end
end
Not really productive mechanic and cleaner, right? You can replace sleep 1
with fancy code of yours, but I will just assume that some heavy computations or external request is happening there.
Since I decided to process those tasks in background, let’s use Active Job on top of that! We will need jobs to clean and service the bike first.
rails generate job ServiceBike
rails generate job CleanBike
ActiveJob created job classes with empty perform
method. We will overwrite those classes adding real implementation to perform
method:
# app/jobs/clean_bike_job.rb
class CleanBikeJob < ActiveJob::Base
queue_as :default
def perform(bike_id)
bike = Bike.find(bike_id)
Cleaner.new.clean(bike)
end
end
# app/jobs/service_bike_job.rb
class ServiceBikeJob < ActiveJob::Base
queue_as :default
def perform(bike_id)
bike = Bike.find(bike_id)
Mechanic.new.service(bike)
end
end
While ActiveJob
setup is finished, it is only an interface for background job library. We need to pick one.I like sidekiq, so let’s use it. To set it as Active Job backend we need to add gem 'sidekiq'
to Gemfile
and one line to application config:
# config/application.rb
module Bikes
class Application < Rails::Application
config.active_job.queue_adapter = :sidekiq
end
end
Please refer to sidekiq wiki for more information. It would be good to read at least the basics and enable monitoring to be able to see, what happens to processed jobs. You can run sidekiq by simply executing bundle exec sidekiq
in command line.
Now, having all the pieces, let’s put it together. The idea is quite simple: each job will fire the next job as soon as the process finishes. Also state
column will be changed accordingly, so we can always easily check on what step are we on and what’s more important not fire any job twice. I will use “Acts As State Machine” library to define these transitions easily. If you are new to state machines or to AASM you can look at AASM readme, but example will use only basic AASM functions. To be able to use AASM, you need to add gem 'aasm'
to Gemfile
.
class Bike < ActiveRecord::Base
include AASM
belongs_to :coordinator
aasm column: :state do
state :new, initial: true
state :servicing
state :cleaning
state :ready
event :service, after_commit: :schedule_servicing do
transitions from: :new, to: :servicing
end
event :clean, after_commit: :schedule_cleaning do
transitions from: :servicing, to: :cleaning
end
event :finish do
transitions from: :cleaning, to: :ready
end
end
def schedule_servicing
ServiceBikeJob.perform_later(self.id)
end
def schedule_cleaning
CleanBikeJob.perform_later(self.id)
end
end
AASM lets you define states and events (firing transition from one state to another) using simple DSL. I defined 4 states: new
(which is initial state set after model is created), servicing
, cleaning
and ready
. I also defined 3 events service
, clean
and finish
. Each event adds a method named as an event name. Calling that method will run transition defined in an event. So if I call bike.service
, the state will be changed from new
to servicing
. If I add a bang to that method i.e. bike.service!
it will also save the model. The last thing worth noticing are after_commit
callbacks. Whatever is defined in such callback will be executed after the state is changed and changes are committed.
Now, will this code work? Almost. Calling bike.service!
will indeed fire ServiceBikeJob
, but nothing will happen afterwards. Let’s fix that. We need to fire the next state in each background job:
# app/jobs/service_bike_job.rb
class ServiceBikeJob < ActiveJob::Base
queue_as :default
def perform(bike_id)
bike = Bike.find(bike_id)
Mechanic.new.service(bike)
bike.clean!
end
end
# app/jobs/clean_bike_job.rb
def perform(bike_id)
bike = Bike.find(bike_id)
Cleaner.new.clean(bike)
bike.finish!
end
end
Now the code should work as expected. Something is not quite right though. Jobs know too much about the process. What if we want to change the order of the execution? Should we have to make changes in the bike model or in the job class?
Answering those questions, the Bike
should be the class that knows transitions order, not ActiveJob
classes. Let’s fix this, by defining a few more methods on the Bike
class:
class Bike < ActiveRecord::Base
# ...
def prepare
service!
end
def finished_servicing
clean!
end
def finished_cleaning
finish!
end
end
This way we can call bike.prepare
when we want to start the process and accordingly bike.finished_cleaning
in CleanBikeJob
and bike.finished_servicing
in ServiceBikeJob
. That way messages are clear - job’s message sent to bike is “Hey, I’ve finished doing what I was supposed to do, now you decide what to do with it”.
Calling bike.prepare
should fire each job until bike
is in ready
state. We can monitor the process using sidekiq dashboard and if something fails, sidekiq will automatically retry the job if it fails e.g. due to the network problem, so it has the ability to heal itself (you can change that behaviour in sidekiq configuration). We can also examine the state of the bike and easily tell what is the step of current process at any time.
There is still one thing left to do though. Remember Coordinator
class? Object of this class needs to be able to prepare many bikes and send notification afterwards. To achieve that effect we will use technique similar to what we did in Bike
class.
First, let’s define few states for the coordinator model: new
, preparing_bikes
, sending_notification
and done
. Same as before, we will define one event for each transition and in each transition we will define after_commit
callback.
class Coordinator < ActiveRecord::Base
include AASM
has_many :bikes
aasm column: :state do
state :new, initial: true
state :preparing_bikes
state :sending_notification
state :done
event :start, after_commit: :schedule_preparing_bikes do
transitions from: :new, to: :preparing_bikes
end
event :send_notification, after_commit: :schedule_sending_notification do
transitions from: :preparing_bikes, to: :sending_notification
end
event :finish do
transitions from: :sending_notification, to: :done
end
end
Looks similar, right? There is one big difference though. In Bike
class each background job was the object initiating next transition and here we will have to make sure all bikes are ready before firing the next event after prepare_bikes
method is called. One way to do so would be to run job, checking if all bikes are ready, waiting for defined time interval, but that doesn’t seem good enough.
What if each bike notified Coordinator
object? It would be quite annoying for Coordinator
, but hey, I am quite sure this object won’t have any hard feelings. Each time a coordinator object gets a message from one of its bikes, it checks if all bikes are ready and only then fires the next event. Basically, a bike object is telling a coordinator object “hey I am done!”. And a coordinator object is asking a bikes
relation: “are you all ready?”.
Let’s modify Bike
class to achieve that:
class Bike < ActiveRecord::Base
# ...
aasm column: :state do
# ...
event :finish, after_commit: :notify_coordinator do
transitions from: :cleaning, to: :ready
end
end
# ...
def notify_coordinator
coordinator.bike_is_ready
end
end
Now we need to implement all callback and bike_is_ready
methods.
class Coordinator < ActiveRecord::Base
# ...
def bike_is_ready
send_notification! if all_bikes_ready?
end
def prepare_bikes
start!
end
def notification_was_sent
finish!
end
private
def schedule_preparing_bikes
bikes.each do |bike|
bike.prepare
end
end
def all_bikes_ready?
bikes.not_ready.empty?
end
def schedule_sending_notification
SendNotificationJob.perform_later(self.id)
end
end
The code is almost ready. The first method called after calling prepare_bikes
(which is alias to start!
event) on coordinator object will be schedule_preparing_bikes
. This method is really straight forward. It iterates through all bikes and runs prepare
method for each. Bike’s state machine will handle whole process and each time it reaches ready
state, it will send bike_is_ready
message to coordinator.
The method bike_is_ready
is quite simple as well: it will fire next event if all_bikes_ready?
condition is true
. And it is true
if bikes.not_ready
relation is empty?
. We don’t have that scope on Bike
class defined so let’s fix that:
# app/models/bike.rb
class Bike < ActiveRecord::Base
# ...
scope :not_ready, -> { where.not(state: :ready) }
# ...
The last thing to do is implementing SendNotificationJob
:
rails generate job SendNotification
# app/jobs/send_notification_job.rb
class SendNotificationJob < ActiveJob::Base
queue_as :default
def perform(coordinator_id)
coordinator = Coordinator.find(coordinator_id)
sleep 1 # notification will be sent from here
coordinator.notification_was_sent
end
end
Again, the job is notifying coordinator object that notification_was_sent
, not specifically firing finish!
event. We might want to add different events after bikes are prepared and we should change Coordinator
class definition to achieve that, not the job class.
Now we can check that code in action:
coordinator = Coordinator.create!
10.times { coordinator.bikes.create! }
coordinator.prepare_bikes
So is it done?
It depends ;). Going towards better design, you always need to know where to stop. There are few things I would take into consideration.
State machine
You probably noticed we use all the events internally, but AASM makes all events public. Maybe we should use state machine in some internal object, exposing only methods for notifying object that the job is done (like finished_servicing
, notification_was_sent
etc.).
Also, I used state machine to focus on the logic, not mechanism itself (I think DSL provided by AASM is really easy to read especially in such simple case), but maybe it is an overkill. Maybe simple abstraction would be enough.
Bike responsibilities
Bike
class is the one initiating the process of its preparation and it is the class which knows what are the steps of the preparations. Is it ok? For this simple example it seems to be. Again, adding layers of abstraction could be a distraction from what the essence of this technique is. But for other cases it could make sense to use another class to handle the process and make the bike object its attribute (so we can easily pass other objects).
Performance
If jobs in the process are time consuming and reliability is the key, probably we don’t need to worry about the fact that query is fired each time a preparation is finished. But if for some reason it could be a bottleneck, you can think about implementing a counter based solution or e.g. caching bikes ids first and comparing with ids of bikes that were processed.
Each state change requires update on a model, which can be bottleneck as well. You can think about e.g. using Redis to make transitions faster. But I would call it premature optimization, until it’s not a problem.
Errors handling
If a job fails it can be run again without restarting the whole process. This means you can fix the bug and retry failed job to push the process forward. In particular, sidekiq will retry a job many times increasing interval each time (by default 25 times over 21 days). So when something wrong happens you can fix it and retry job manually or wait till sidekiq does it for you.
There is no way to tell if a process failed or it is stalled though. It might be good idea to implement a code checking what is current job status. This might require writing code dependent on library. E.g. for sidekiq you could save the job id and add methods to check the job status using sidekiq api.