The Dark Side of Duck Typing

by serge on October 5th, 2009

On the Origin of Species

What walks like duck and quacks like a duck is a duck. That what duck typing is about. It makes development easier. We don’t need interfaces and abstract classes. Many object oriented patterns look much simpler because of such dynamic typing. They come more naturally in ruby code and often we don’t even notice them until we see explicit names like Visitor, Decorator, etc.

After I’ve got used with such a powerful tool, it was too easy to step into dangerous territory without even noticing. Sometimes things do walk and quack like ducks all the time… well, almost all the time. And sometimes in a pack of ducks there are some species that do quack (and that makes you feel safe), but that doesn’t make them ducks.

When all ducks are quacking well

Making real world examples is hard, and the following one is somewhat synthetic. But anyway. Let’s suppose that we have a system with Project model and our system should generate reports by aggregating project instances. We have different types of parametrized reports, and these types share some common features. Let’s see how it fits in code.

 
class Project < ActiveRecord::Base
 
  has_and_belongs_to_many :categories
 
  has_many :items
 
end
 
 
class ReportFilter < ActiveRecord::Base
 
  def filter_project_ids(project_ids)
    # implemented in descendants
  end
 
end
 
class DueDateFilter < ReportFilter
 
  def filter_project_ids(project_ids)
    irrelevant_projects = Project.find(project_ids,
      :conditions => ['due_date > ?', self.cut_off_date])
 
    project_ids - irrelevant_projects.collect(&:id)
  end
 
end
 
class ComplexFilter < ReportFilter
 
  def filter_project_ids(project_ids)
    connection.select_values <<-SQL
      SELECT p.id FROM projects p
        JOIN project_estimates e ON (e.project_id = p.id)
        JOIN metrics m ON (p.id = m.project_id)
        WHERE m.coverage_level < average_coverage_in_group(p.group_id)
          AND e.author_id IN (SELECT id FROM team_members WHERE statisfaction_level > #{desired_level})
        AND p.id IN (#{project_ids.join ','})
    SQL
  end
 
end
 
class CategoryReport < ActiveRecord::Base
 
  belongs_to :category
 
  has_many :report_filters, :order => 'complexity_level'
 
  def projects
    project_ids = category.project_ids
 
    report_items.each do |i|
      project_ids = i.filter_project_ids(project_ids)
    end
 
    Project.find(project_ids)
  end
 
end

Note how CategoryReport generates it’s project set. It iterates through it’s project filters to trim category project list down to some relevant report specific subset. All the report filters implement common interface. Each filter accepts project_ids array and reduces it according to its parameters and semantics. Some filters are fairly easy to calculate, other can be computationally complex. We run simple filters first to reduce set and then refer to complex ones.

Most of our filter implementations should look like DueDateFilter. But we live in real world, and several filters look like ComplexFilter. Don’t look deep into SQL query in it, it was added for dramatic effect only. My point is that due to some (often premature) optimization, complex filters can implement some real black magic under the hood.

And here comes the time for a closer look

Let’s suppose that we always run ComplexFilter as the last filter in our chain. But one day a very good reason for changing that order arises. We place DueDateFilter after ComplexFilter. System works, all the filter rspecs pass, and rspec for Report passes as well. It checks Report behavior with great success by mocking everything except Report (a model’s rspec should be focused on the model it’s testing, right?)

Some time passes and we discover a great surprise. It appears that reports have been working incorrectly for some time now. Some filters that come after ComplexFilter don’t filter projects any more. No exceptions are raised. They’ve just silently became transparent.

Please note how DueDateFilter operates on array of ids that are Fixnums. Then note how ComplexFilter returns an array of strings. They’re still ids, but that doesn’t help when it comes to…

 
project_ids - irrelevant_projects.collect(&:id)

… in other filters.

And a few reflections

A type of bug that is easy to fix, easy to debug, but hard to predict. Who would notice such a thing on code review when all these filters did work previously? And another big question: what can we do to make such bugs harder to sneak into our code?

Well, I don’t know. There’s probably no silver bullet for it. But there’s something we can do however. It’s not new, in fact it’s plain old OOD.

Arrays of fixnums (and arrays of strings) are not domain models for our system. Yes, we don’t want to instantiate full projects for the sake of performance, but passing integers around instead of project models is not the only option that we have.

 
class ProjectSet
 
  def initialize(ids)
    @project_ids = []
    add(ids)
  end
 
  def add(ids)
    @project_ids |= ids.collect(&:to_i)
    self
  end
 
  def exclude(ids)
    @project_ids - ids.collect(&:to_i)
    self
  end
 
  def project_ids
    @project_ids.dup
  end
 
  def projects
    Project.find(@project_ids)
  end
 
  def to_sql_list
    return 'NULL' if @project_ids.empty?
    @project_ids.join ','
  end
 
end

Looks like we’ve avoided the temptation to implement domain specific operations through overriding operators (especially arithmetic). Cool books say that’s not something that grownups do very often. So our methods are named add and exclude. Now we can change our report code to pass an instance of ProjectSet and have a bit more control of operations that are essential to our problem domain.

ProjectSet adds the following benefits:

  • we can encapsulate type checks and explicit conversions in it, and keep filter code clean of such details
  • we can raise an exception if supplied ids are of incorrect type and be informed about problems earlier by QA (500-page is easier to notice than tricky changes in reports behavior)
  • we can add other specific code that otherwise would be scattered across different filters (note how to_sql_list handles empty lists)
  • we can open paths to other caching/performance optimizations that can be hidden under the hood of ProjectSet

Not all .rb files that live under app/models directory have to be AR models. Many applications can benefit from having additional models. And those can be no less domain specific than conventional ActiveRecord::Base descendants.

Comments are closed for this entry.