How do you find a needle in a haystack?

When implementing a needle search of a haystack in an object-oriented way, you essentially have three alternatives:

1. needle.find(haystack)

2. haystack.find(needle)

3. searcher.find(needle, haystack)

Which do you prefer, and why?

I know some people prefer the second alternative because it avoids introducing a third object. However, I can't help feeling that the third approach is more conceptually "correct", at least if your goal is to model "the real world".

In which cases do you think it is justified to introduce helper objects, such as the searcher in this example, and when should they be avoided?

Of the three, I prefer option #3.

The Single Responsibility Principle makes me not want to put searching capabilities on my DTOs or models. Their responsibility is to be data, not to find themselves, nor should needles need to know about haystacks, nor haystacks know about needles.

For what it's worth, I think it takes most OO practitioners a LONG time to understand why #3 is the best choice. I did OO for a decade, probably, before I really grokked it.

@wilhelmtell, C++ is one of the very few languages with template specialization that make such a system actually work. For most languages, a general purpose "find" method would be a HORRIBLE idea.

Usually actions should be applied to what you are doing the action on... in this case the haystack, so I think option 2 is the most appropriate.

You also have a fourth alternative that I think would be better than alternative 3:

haystack.find(needle, searcher)

In this case, it allows you to provide the manner in which you want to search as part of the action, and so you can keep the action with the object that is being operated on.

There is another alternative, which is the approach utilized by the STL of C++:

find(haystack.begin(), haystack.end(), needle)

I think it's a great example of C++ shouting "in your face!" to OOP. The idea is that OOP is not a silver bullet of any kind; sometimes things are best described in terms of actions, sometimes in terms of objects, sometimes neither and sometimes both.

Bjarne Stroustrup said in TC++PL that when you design a system you should strive to reflect reality under the constraints of effective and efficient code. For me, this means you should never follow anything blindly. Think about the things at hand (haystack, needle) and the context we're in (searching, that's what the expression is about).

If the emphasis is about the searching, then using an algorithm (action) that emphasizes searching (i.e. is flexibly to fit haystacks, oceans, deserts, linked lists). If the emphasis is about the haystack, encapsulate the find method inside the haystack object, and so on.

That said, sometimes you're in doubt and have hard times making a choice. In this case, be object oriented. If you change your mind later, I think it is easier to extract an action from an object then to split an action to objects and classes.

Follow these guidelines, and your code will be clearer and, well, more beautiful.

I would say that option 1 is completely out. The code should read in a way that tells you what it does. Option 1 makes me think that this needle is going to go find me a haystack.

Option 2 looks good if a haystack is meant to contain needles. ListCollections are always going to contain ListItems, so doing collection.find(item) is natural and expressive.

I think the introduction of a helper object is approproiate when:

You don't control the implementation of the objects in question
IE: search.find(ObsecureOSObject, file)
There isn't a regular or sensible relationship between the objects
IE: nameMatcher.find(houses,trees.name)

I am with Brad on this one. The more I work on immensely complex systems, the more I see the need to truly decouple objects. He's right. It's obvious that a needle shouldn't know anything about haystack, so 1 is definitely out. But, a haystack should know nothing about a needle.

If I were modeling a haystack, I might implement it as a collection -- but as a collection of hay or straw -- not a collection of needles! However, I would take into consideration that stuff does get lost in a haystack, but I know nothing about what exactly that stuff. I think it's better to not make the haystack look for items in itself (how smart is a haystack anyway). The right approach to me is to have the haystack present a collection of things that are in it, but are not straw or hay or whatever gives a haystack its essence.

class Haystack : ISearchableThingsOnAFarm {
   ICollection<Hay> myHay;
   ICollection<IStuffSmallEnoughToBeLostInAHaystack> stuffLostInMe;

   public ICollection<Hay> Hay {
      get {
         return myHay;
      }
   }

   public ICollection<IStuffSmallEnoughToBeLostInAHayStack> LostAndFound {
      get {
        return stuffLostInMe;
      }
   }
}

class Needle : IStuffSmallEnoughToBeLostInAHaystack {
}

class Farmer {
  Search(Haystack haystack, 
                 IStuffSmallEnoughToBeLostInAHaystack itemToFind)
}

There's actually more I was going to type and abstract into interfaces and then I realized how crazy I was getting. Felt like I was in a CS class in college... :P

You get the idea. I think going as loosely coupled as possible is a good thing, but maybe I was getting a bit carried away! :)

How do you find a needle in a haystack?

Related

Recent Posts