How can fast and slow pointers find the first node of loop in a linkedlist?

When using the method of fast and slow, in a linkedlist which has loop. The two pointers will meet at a node. Then take one of the two pointers to the head node, the other pointer stay. Next, make them all take one stap every time. At the end, they will meet at the first node of the loop. I want to know why. Thanks.what I mean the "first node in the loop"


You will understand this by solving a few examples by hand:

1 -> 2 -> 3 -> 4 (suppose 4 is connected to 2)
- (fast)
+ (slow)
[start moving]

1 -> 2 -> 3 -> 4
     -
          +
1 -> 2 -> 3 -> 4
          -
     +
1 -> 2 -> 3 -> 4
               -
               +
Now the have met. Let's declare 2 pointers @ and # pointing to the head and the node where they met respectively
1 -> 2 -> 3 -> 4
@
               #
1 -> 2 -> 3 -> 4
     @
     #
This is obviously the start of the cycle.

First observation: If there is a cycle slow and fast will meet. This is a result of the length of steps they take. Each and every multiple of 2 is divisible by one.

Second observation: In case of a cycle, the point slow and fast meet is within the cycle. Reason is obvious, fast won't exit the cycle so that's where they meet.

Now, let's say the distance from head to cycle start is A and from cycle start to the node where slow and fast meet is B. We can say that slow has travelled A+B to meet fast. How much has fast traveled in this period? 2A+2B, because it moves at double speed.

Now, let's find another formulation for the distance fast has traveled that may help us solve this equation. We can say that fast has traveled from head to start of the cycle A plus from start of the cycle to the meeting point B plus from meeting point to the end of the linked list, let's call this C, and again from the start of the cycle to the meeting point B. That's A + B + C + B which equals A+2B+C, where C is the distance from meeting point to the end of the linked list. Hope that makes sense without illustration.

To sum it up to this point:

A: dist from start of the list to the start of the cycle
B: dist from start of the cycle to the meeting point of slow and fast
C: dist from meeting point to the end of the list

slow has traveled: A + B
fast has travelled: 2A + 2B (because it moves at double speed)
We can also say fast has traveled: A + B + C + B = A+2B+C

Let's equate these 2 expressions of the distance that fast has moved and hopefully solve for A, the distance from head to start of the cycle.

A+2B+C = 2A+2B
A = C

This effectively means that the distance from head to start of the cycle, namely A, is equal to the distance of the meeting point to the end of the linked list, namely C.

If that's true, we can run 2 pointers, a and b, one from head and one from meeting point. When b hits the end of the linked list, a is at the start of the cycle. But since there is a cycle b will never hit the end. We create a condition that when b == a we have found the start of the cycle. This works because the next node after the last node is the start of the cycle.

Note about my explanation: The equations are not entirely accurate. In case of a long linked list and a very small cycle, fast may move multiples of B and C until slow meets it. So, A+2B+C becomes A+iB+iC+B. But this doesn't change the end result.

This is by the way Floyd's Hare and Turtoise algorithm if you like to further read about it.

If this isn't clear enough, please ask.