Ruby design pattern: How to make an extensible factory class?

Ok, suppose I have Ruby program to read version control log files and do something with the data. (I don't, but the situation is analogous, and I have fun with these analogies). Let's suppose right now I want to support Bazaar and Git. Let's suppose the program will be executed with some kind of argument indicating which version control software is being used.

Given this, I want to make a LogFileReaderFactory which given the name of a version control program will return an appropriate log file reader (subclassed from a generic) to read the log file and spit out a canonical internal representation. So, of course, I can make BazaarLogFileReader and GitLogFileReader and hard-code them into the program, but I want it to be set up in such a way that adding support for a new version control program is as simple as plopping a new class file in the directory with the Bazaar and Git readers.

So, right now you can call "do-something-with-the-log --software git" and "do-something-with-the-log --software bazaar" because there are log readers for those. What I want is for it to be possible to simply add a SVNLogFileReader class and file to the same directory and automatically be able to call "do-something-with-the-log --software svn" without ANY changes to the rest of the program. (The files can of course be named with a specific pattern and globbed in the require call.)

I know this can be done in Ruby... I just don't how I should do it... or if I should do it at all.


You don't need a LogFileReaderFactory; just teach your LogFileReader class how to instantiate its subclasses:

class LogFileReader
  def self.create type
    case type 
    when :git
      GitLogFileReader.new
    when :bzr
      BzrLogFileReader.new
    else
      raise "Bad log file type: #{type}"
    end
  end
end

class GitLogFileReader < LogFileReader
  def display
    puts "I'm a git log file reader!"
  end
end

class BzrLogFileReader < LogFileReader
  def display
    puts "A bzr log file reader..."
  end
end

As you can see, the superclass can act as its own factory. Now, how about automatic registration? Well, why don't we just keep a hash of our registered subclasses, and register each one when we define them:

class LogFileReader
  @@subclasses = { }
  def self.create type
    c = @@subclasses[type]
    if c
      c.new
    else
      raise "Bad log file type: #{type}"
    end
  end
  def self.register_reader name
    @@subclasses[name] = self
  end
end

class GitLogFileReader < LogFileReader
  def display
    puts "I'm a git log file reader!"
  end
  register_reader :git
end

class BzrLogFileReader < LogFileReader
  def display
    puts "A bzr log file reader..."
  end
  register_reader :bzr
end

LogFileReader.create(:git).display
LogFileReader.create(:bzr).display

class SvnLogFileReader < LogFileReader
  def display
    puts "Subersion reader, at your service."
  end
  register_reader :svn
end

LogFileReader.create(:svn).display

And there you have it. Just split that up into a few files, and require them appropriately.

You should read Peter Norvig's Design Patterns in Dynamic Languages if you're interested in this sort of thing. He demonstrates how many design patterns are actually working around restrictions or inadequacies in your programming language; and with a sufficiently powerful and flexible language, you don't really need a design pattern, you just implement what you want to do. He uses Dylan and Common Lisp for examples, but many of his points are relevant to Ruby as well.

You might also want to take a look at Why's Poignant Guide to Ruby, particularly chapters 5 and 6, though only if you can deal with surrealist technical writing.

edit: Riffing of off Jörg's answer now; I do like reducing repetition, and so not repeating the name of the version control system in both the class and the registration. Adding the following to my second example will allow you to write much simpler class definitions while still being pretty simple and easy to understand.

def log_file_reader name, superclass=LogFileReader, &block
  Class.new(superclass, &block).register_reader(name)
end

log_file_reader :git do
  def display
    puts "I'm a git log file reader!"
  end
end

log_file_reader :bzr do
  def display
    puts "A bzr log file reader..."
  end
end

Of course, in production code, you may want to actually name those classes, by generating a constant definition based on the name passed in, for better error messages.

def log_file_reader name, superclass=LogFileReader, &block
  c = Class.new(superclass, &block)
  c.register_reader(name)
  Object.const_set("#{name.to_s.capitalize}LogFileReader", c)
end

This is really just riffing off Brian Campbell's solution. If you like this, please upvote his answer, too: he did all the work.

#!/usr/bin/env ruby

class Object; def eigenclass; class << self; self end end end

module LogFileReader
  class LogFileReaderNotFoundError < NameError; end
  class << self
    def create type
      (self[type] ||= const_get("#{type.to_s.capitalize}LogFileReader")).new
    rescue NameError => e
      raise LogFileReaderNotFoundError, "Bad log file type: #{type}" if e.class == NameError && e.message =~ /[^: ]LogFileReader/
      raise
    end

    def []=(type, klass)
      @readers ||= {type => klass}
      def []=(type, klass)
        @readers[type] = klass
      end
      klass
    end

    def [](type)
      @readers ||= {}
      def [](type)
        @readers[type]
      end
      nil
    end

    def included klass
      self[klass.name[/[[:upper:]][[:lower:]]*/].downcase.to_sym] = klass if klass.is_a? Class
    end
  end
end

def LogFileReader type

Here, we create a global method (more like a procedure, actually) called LogFileReader, which is the same name as our module LogFileReader. This is legal in Ruby. The ambiguity is resolved like this: the module will always be preferred, except when it's obviously a method call, i.e. you either put parentheses at the end (Foo()) or pass an argument (Foo :bar).

This is a trick that is used in a few places in the stdlib, and also in Camping and other frameworks. Because things like include or extend aren't actually keywords, but ordinary methods that take ordinary parameters, you don't have to pass them an actual Module as an argument, you can also pass anything that evaluates to a Module. In fact, this even works for inheritance, it is perfectly legal to write class Foo < some_method_that_returns_a_class(:some, :params).

With this trick, you can make it look like you are inheriting from a generic class, even though Ruby doesn't have generics. It's used for example in the delegation library, where you do something like class MyFoo < SimpleDelegator(Foo), and what happens, is that the SimpleDelegator method dynamically creates and returns an anonymous subclass of the SimpleDelegator class, which delegates all method calls to an instance of the Foo class.

We use a similar trick here: we are going to dynamically create a Module, which, when it is mixed into a class, will automatically register that class with the LogFileReader registry.

  LogFileReader.const_set type.to_s.capitalize, Module.new {

There's a lot going on in just this line. Let's start from the right: Module.new creates a new anonymous module. The block passed to it, becomes the body of the module – it's basically the same as using the module keyword.

Now, on to const_set. It's a method for setting a constant. So, it's the same as saying FOO = :bar, except that we can pass in the name of the constant as a parameter, instead of having to know it in advance. Since we are calling the method on the LogFileReader module, the constant will be defined inside that namespace, IOW it will be named LogFileReader::Something.

So, what is the name of the constant? Well, it's the type argument passed into the method, capitalized. So, when I pass in :cvs, the resulting constant will be LogFileParser::Cvs.

And what do we set the constant to? To our newly created anonymous module, which is now no longer anonymous!

All of this is really just a longwinded way of saying module LogFileReader::Cvs, except that we didn't know the "Cvs" part in advance, and thus couldn't have written it that way.

    eigenclass.send :define_method, :included do |klass|

This is the body of our module. Here, we use define_method to dynamically define a method called included. And we don't actually define the method on the module itself, but on the module's eigenclass (via a small helper method that we defined above), which means that the method will not become an instance method, but rather a "static" method (in Java/.NET terms).

included is actually a special hook method, that gets called by the Ruby runtime, everytime a module gets included into a class, and the class gets passed in as an argument. So, our newly created module now has a hook method that will inform it whenever it gets included somewhere.

      LogFileReader[type] = klass

And this is what our hook method does: it registers the class that gets passed into the hook method into the LogFileReader registry. And the key that it registers it under, is the type argument from the LogFileReader method way above, which, thanks to the magic of closures, is actually accessible inside the included method.

    end
    include LogFileReader

And last but not least, we include the LogFileReader module in the anonymous module. [Note: I forgot this line in the original example.]

  }
end

class GitLogFileReader
  def display
    puts "I'm a git log file reader!"
  end
end

class BzrFrobnicator
  include LogFileReader
  def display
    puts "A bzr log file reader..."
  end
end

LogFileReader.create(:git).display
LogFileReader.create(:bzr).display

class NameThatDoesntFitThePattern
  include LogFileReader(:darcs)
  def display
    puts "Darcs reader, lazily evaluating your pure functions."
  end
end

LogFileReader.create(:darcs).display

puts 'Here you can see, how the LogFileReader::Darcs module ended up in the inheritance chain:'
p LogFileReader.create(:darcs).class.ancestors

puts 'Here you can see, how all the lookups ended up getting cached in the registry:'
p LogFileReader.send :instance_variable_get, :@readers

puts 'And this is what happens, when you try instantiating a non-existent reader:'
LogFileReader.create(:gobbledigook)

This new expanded version allows three different ways of defining LogFileReaders:

  1. All classes whose name matches the pattern <Name>LogFileReader will automatically be found and registered as a LogFileReader for :name (see: GitLogFileReader),
  2. All classes that mix in the LogFileReader module and whose name matches the pattern <Name>Whatever will be registered for the :name handler (see: BzrFrobnicator) and
  3. All classes that mix in the LogFileReader(:name) module, will be registered for the :name handler, regardless of their name (see: NameThatDoesntFitThePattern).

Please note that this is just a very contrived demonstration. It is, for example, definitely not thread-safe. It might also leak memory. Use with caution!


One more minor suggestion for Brian Cambell's answer -

In you can actually auto-register the subclasses with an inherited callback. I.e.

class LogFileReader

  cattr_accessor :subclasses; self.subclasses = {}

  def self.inherited(klass)
    # turns SvnLogFileReader in to :svn
    key = klass.to_s.gsub(Regexp.new(Regexp.new(self.to_s)),'').underscore.to_sym

    # self in this context is always LogFileReader
    self.subclasses[key] = klass
  end

  def self.create(type)
    return self.subclasses[type.to_sym].new if self.subclasses[type.to_sym]
    raise "No such type #{type}"
  end
end

Now we have

class SvnLogFileReader < LogFileReader
  def display
    # do stuff here
  end
end

With no need to register it


This should work too, without the need for registering class names

class LogFileReader
  def self.create(name)
    classified_name = name.to_s.split('_').collect!{ |w| w.capitalize }.join
    Object.const_get(classified_name).new
  end
end

class GitLogFileReader < LogFileReader
  def display
    puts "I'm a git log file reader!"
  end
end

and now

LogFileReader.create(:git_log_file_reader).display