Your troubleshooting rules, approach to troubleshooting? [closed]

Solution 1:

Just a list of points I wrote down for myself after fighting with a problem for a while:

  1. What is your primary goal ? Should be stated clearly and as concisely. The goal should be very particular. It should not be general. Preferably one sentence.
  2. What is your problem ?
  3. Is there just one problem or many ? If there are many, solve them one at a time.
  4. Try to reproduce the problem with different conditions. Can it be reproduced in all possible conditions or not ? Does it say anything about the nature of the problem ?
  5. If it is an urgent problem is there a workaround ? Try to find as many workarounds as possible.
  6. Try to make as many guesses as possible on what is the cause of your problem.
  7. Try to prove your guesses, experiment with the system.
  8. Be consistent in what you're trying to do. Do one thing at a time.
  9. Keep track of what you're doing, what you've already tried.
  10. Do not deviate from your primary goal. Constantly check if you're still solving your main problem, not a differenet one.
  11. Do not fixate either.

There also was a great list of debugging rules, it was in a PDF form with exaples and explanation for each of the rules. I couldn't quickly find the PDF, but I think this is a poster of the list:

enter image description here

Solution 2:

  • If the problem is Internet-related, it's probably the DNS.

  • If the problem is hard to diagnose, it's probably the RAM.

  • If the problem is with a Windows workstation, it's probably quickest to reimage it.

  • If the problem is on a Friday, it's probably something serious.

Solution 3:

I like to fall back to the scientific method.

From (http://en.wikipedia.org/wiki/Scientific_method)

  1. Define the question
  2. Gather information and resources (observe)
  3. Form hypothesis
  4. Perform experiment and collect data
  5. Analyze data
  6. Interpret data and draw conclusions that serve as a starting point for new hypothesis
  7. Document Results

As a general rule I always like to try and double check my basic assumptions. Does it have power, is it plugged in, is the wiring good. It is very annoying to spend hours on trying to look at a software issue when you have a loose cable.

I find it very important during the hypothesis creation phase to actually come up with as many possible causes of the problem as I can. Then I try and choose ideas to test first based on how easy it is to test, and how probable the idea is.

It is also important to get help. Consult your coworkers, vendor, or whoever is the most knowledgeable about systems in question if you can. Don't spend lots of time spinning your wheels on a problem if there is someone available that can help you solve the issue.

O'Reilly has a good book Network Troubleshooting Tools that has a good set of steps to follow that is very similar to scientific method. I found the book very useful and strongly recommend it. The book goes into a lot more detail and suggests many useful tools.

From Network Troubleshooting Tools

  1. State your goal
  2. Define the system
  3. Identify possible outcomes
  4. Identify and select what you will measure
  5. If appropriate identify test paramaters and factors
  6. Select tools
  7. Establish measurement constraints
  8. Review experimental design
  9. Collect data
  10. Analyze data

See Also:

  • 3COM has a troubleshooting guide
  • Murphy's law - Anything that can possibly go wrong, does.
  • Occam's_razor - All other things being equal, the simplest solution is the best.