Strategy to find your best route via Public Transportation only?

Finding routes for a car is pretty easy: you store a weighted graph of all the roads and you could use Djikstra's algorithm [1]. A bus route is less obvious. With a bus you have to represent things like "wait 10 minutes for the next bus" or "walk one block to another bus stop" and feed those into your pathfinding algorithm.

It's not even always simple for cars. In some cities, some roads are one-way-only into the city in the morning, and one-way-only out of the city in the evening. Some advanced GPSs know how to avoid busy routes during rush hour.

How would you efficiently represent this kind of time-dependent graph and find a route? There is no need for a provably optimal solution; if the traveler wanted to be on time, they would buy a car. ;-)

[1] A wonderful algorithm to mention in an example because everyone's heard of it, though A* is a likelier choice for this application.


I have been involved in development of one journy planner system for Stockholm Public Transportation in Sweden. It was based on Djikstra's algorithm but with termination before every node was visited in the system. Today when there are reliable coordinates available for each stop, I guess the A* algorithm would be the choise.

Data about upcoming trafic was extracted from several databases regularly and compiled into large tables loaded into memory of our search server cluster.

One key to a sucessfull algorith was using a path cost function based on travel and waiting time multiplied by diffrent weightes. Known in Swedish as “kresu"-time these weighted times reflect the fact that, for example, one minute’s waiting time is typically equivalent in “inconvenience” to two minutes of travelling time.

KRESU Weight table

  • x1 - Travel time
  • x2 - Walking between stops
  • x2 - Waiting at a stop during the journey. Stops under roof, with shops, etc can get a slightly lower weight and crowded stations a higher to tune the algorithm.
  • The weight for the waiting time at the first stop is a function of trafic intensity and can be between 0.5 to 3.

Data structure

Area A named area where you journey can start or end. A Bus Stop could be an area with two Stops. A larger Station with several platforms could be one area with one stop for each platform. Data: Name, Stops in area

Stops An array with all bus stops, train and underground stations. Note that you usually need two stops, one for each direction, because it takes some time to cross the street or walk to the other platform. Data: Name, Links, Nodes

Links A list with other stops you can reach by walking from this stop. Data: Other Stop, Time to walk to other Stop

Lines/Tours You have a number on the bus and a destination. The bus starts at one stop and passes several stops on its way to the destination. Data: Number, Name, Destination

Nodes Usually you have a timetable with the least the time for when it should be at the first and last stop in a Tour. Each time a bus/train passes a stop you add a new node to the array. This table can have millions of values per day. Data: Line/Tour, Stop, Arrival Time, Departure Time, Error margin, Next Node in Tour

Search Array with the same size as the Nodes array used to store how you got there and the path cost. Data: Back-link with Previous Node (not set if Node is unvisited), Path Cost (infinit for unvisited)


What you're talking about is more complicated than something like the mathematical models that can be described with simple data structures like graphs and with "simple" algorithms like Djikstra's. What you are asking for is a more complex problem like those encountered in the world of automated logistics management.

One way to think about it is that you are asking a multi-dimensional problem, you need to be able to calculate:

  1. Distance optimization
  2. Time optimization
  3. Route optimization
  4. "Time horizon" optimization (if it's 5:25 and the bus only shows up at 7:00, pick another route.)

Given all of these circumstances you can attempt to do deterministic modeling using complex multi-layered data structures. For example, you could still use a weighted di-graph to represent the existing potential routes, wherein each node also contained a finite state automata which added a weight bias to a route depending on time values (so by crossing a node at 5:25 you get a different value than if your simulation crossed it at 7:00.)

However, I think that at this point you are going to find yourself with a simulation that is more and more complex, which most likely does not provide "great" approximation of optimal routes when the advice is transfered into the real world. It turns out that software and mathematical modeling and simulation is at best a weak tool when encountering real world chaotic behaviors and dynamism.

My suggestion would go to use an alternate strategy. I would attempt to use a genetic algorithm in which the DNA for an individual calculated a potential route, I would then create a fitness function which encoded costs and weights in a more "easy to maintain" lookup table fashion. Then I would let the Genetic Algorithm attempt to converge on a near optimal solution for a public transport route finder. On modern computers a GA such as this is probably going to perform reasonably well, and it should be at least relatively robust to real world dynamism.

I think that most systems that do this sort of thing take the "easy way out" and simply do something like an A* search algorithm, or something similar to a greedy costed weighted digraph walk. The thing to remember is that the users of the public transport don't themselves know what the optimal route would be, so a 90% optimal solution is still going to be a great solution for the average case.


Some data points to be aware of from the public transportation arena:

  1. Each transfer incurs a 10 minute penalty (unless it is a timed transfer) in the riders mind. That is to say mentally a trip involving a single bus that takes 40 minutes is roughly equivalent to a 30minute trip that requires a transfer.
  2. Maximum distance that most people are willing to walk to a bus stop is 1/4 mile. Train station / Light rail about 1/2 mile.
  3. Distance is irrelevant to the public transportation rider. (Only time is important)
  4. Frequency matters (if a connection is missed how long until the next bus). Riders will prefer more frequent service options if the alternative is being stranded for an hour for the next express.
  5. Rail has a higher preference than bus ( more confidence that the train will come and be going in the right direction)
  6. Having to pay a new fare is a big hit. (add about a 15-20min penalty)
  7. Total trip time matters as well (with above penalties)
  8. How seamless is the connect? Does the rider have to exist a train station cross a busy street? Or is it just step off a train and walk 4 steps to a bus?
  9. Crossing busy streets -- another big penalty on transfers -- may miss connection because can't get across street fast enough.

if the cost of each leg of the trip is measured in time, then the only complication is factoring in the schedule - which just changes the cost at each node to a function of the current time t, where t is just the total trip time so far (assuming schedules are normalized to start at t=0).

so instead of Node A having a cost of 10 minutes, it has a cost of f(t) defined as:

  • t1 = nextScheduledStop(t); //to get the next stop time at or after time t
  • baseTime for leg = 10 //for example, a 10-minute trip
  • return (t1-t)+baseTime

wait-time is thus included dynamically in the cost of each leg, and walks between bus stops are just arcs with a constant time cost

with this representation you should be able to apply A* or Dijkstra's algorithm directly