Factory classes [closed]

Personally I've never understood the idea of factory classes because it seems a whole lot more useful to just instantiate an Object directly. My question is simple, in what situation is the use of a factory class pattern the best option, for what reason, and what does a good factory class look like?


Here is a real live factory from my code base. It's used to generated a sampler class that knows how to sample data from some dataset (it's originally in C#, so excuse any java faux-pas)

class SamplerFactory
{
  private static Hashtable<SamplingType, ISampler> samplers;

  static
  {
    samplers = new Hashtable<SamplingType, ISampler>();
    samplers.put(SamplingType.Scalar, new ScalarSampler());
    samplers.put(SamplingType.Vector, new VectorSampler());
    samplers.put(SamplingType.Array, new ArraySampler());
  }

  public static ISampler GetSampler(SamplingType samplingType)
  {
    if (!samplers.containsKey(samplingType))
      throw new IllegalArgumentException("Invalid sampling type or sampler not initialized");
    return samplers.get(samplingType);
  }
}

and here is an example usage:

ISampler sampler = SamplerFactory.GetSampler(SamplingType.Array);
dataSet = sampler.Sample(dataSet);

As you see, it's not much code, and it might even be shorter and faster just to do

ArraySampler sampler = new ArraySampler();
dataSet = sampler.Sample(dataSet);

than to use the factory. So why do I even bother? Well, there are two basic reasons, that build on each other:

  1. First, it is the simplicity and maintainability of the code. Let's say that in the calling code, the enum is provided as a parameter. I.e. if I had a method that need to process the data, including sampling, I can write:

    void ProcessData(Object dataSet, SamplingType sampling)
    {
      //do something with data
      ISampler sampler = SamplerFactory.GetSampler(sampling);
      dataSet= sampler.Sample(dataSet);
      //do something other with data
    }
    

    instead of a more cumbersome construct, like this:

    void ProcessData(Object dataSet, SamplingType sampling)
    {
      //do something with data
      ISampler sampler;
      switch (sampling) {
        case SamplingType.Scalar:  
          sampler= new ScalarSampler();
          break;
        case SamplingType.Vector:  
          sampler= new VectorSampler();
          break;
        case SamplingType.Array:
          sampler= new ArraySampler();
          break;
        default:
          throw new IllegalArgumentException("Invalid sampling type");
      }
      dataSet= sampler.Sample(dataSet);
      //do something other with data
    }
    

    Note that this monstrosity should be written each and every time I need me some sampling. And you can imagine how fun it will be to change if, let's say, I added a parameter to ScalarSampler constructor, or added a new SamplingType. And this factory has only three options now, imagine a factory with 20 implementations.

  2. Second, it's the decoupling of the code. When I use a factory, the calling code does not know or need to know that a class called ArraySampler even exists. The class could even be resolved at run-time, and the call site would be none the wiser. So, consequently, I am free to change the ArraySampler class as much as I want, including, but not limited to, deleting the class outright, if, e.g. I decide that the ScalarSampler should be used for array data as well. I would just need to change the line

    samplers.put(SamplingType.Array, new ArraySampler());
    

    to

    samplers.put(SamplingType.Array, new ScalarSampler());
    

    and it would work magically. I do not have to change a single line of code in the calling classes, which could number in the hundreds. Effectively, the factory makes me in control of what and how the sampling occurs, and any sampling changes are efficiently encapsulated within a single factory class that is interfaced with the rest of the system.


The idea here is separation of concerns: If the code that uses the object also has enough information to instantiate it, you don't need a factory. However, if there is some logic or configuration involved that you don't want to have the API user to think about (or mess with), you can hide all that (and encapsulate it for reuse) in a factory.

Here is an example: You want to access one of the services provided by Google App Engine. The same code should work at both the development environment (of which there are two versions, master-slave and high-availabilty) and the completely different local development environment. Google does not want to tell you about the inner workings of their internal infrastructure, and you don't really want to know. So what they do is provide interfaces and factories (and several implementations of those interfaces for the factories to choose from that you don't even need to know about).