Interview: Remove Loop in linked list - Java
There are two parts to this problem:
- Detect if there is a loop in the list
- Identify the start of the loop
Once you know where the loop starts, it's easy to identify the last element in the list since it's the element in the list following the start of the loop that ends up pointing back to the start of the loop. It is then trivial to set the next pointer/reference of this element to null
to correct the cyclic link list (not circular linked list which is where the last elements points back to the first - this would be a specific instance of cyclic lists).
Floyd's cycle detect algorithm, also called the tortoise and hare algorithm as it involves using two pointers/references that move at different speeds, is one way of detecting the cycle. If there is a cycle, the two pointers (say
p1
andp2
) will end up pointing to the same element after a finite number of steps. Interestingly, it can be proved that the element at which they meet will be the same distance to the start of the loop (continuing to traverse the list in the same, forward direction) as the start of the loop is to the head of the list. That is, if the linear part of the list hask
elements, the two pointers will meet inside the loop of lengthm
at a pointm-k
from the start of the loop ork
elements to the 'end' of the loop (of course, it's a loop so it has no 'end' - it's just the 'start' once again). And that gives us a way to find the start of the loop:Once a cycle has been detected, let
p2
remain pointing to the element where the loop for the step above terminated but resetp1
so that it's pointing back to the head of the list. Now, move each pointer one element at a time. Sincep2
began inside the loop, it will continue looping. Afterk
steps (equal to the distance of the start of the loop from the head of the list),p1
andp2
will meet again. This will give you a reference to the start of the loop.It is now easy to set
p1
(orp2
) to point to the element starting the loop and traverse the loop untilp1
ends up pointing back to the starting element. At this pointp1
is referencing the 'last' element list and it's next pointer can be set tonull
.
Here's some quick and dirty Java code assuming a linked list of Node
s where a Node
has a next
reference. This could be optimized but it should give you the basic idea:
Node slow, fast, start;
fast = slow = head;
//PART I - Detect if a loop exists
while (true)
{
// fast will always fall off the end of the list if it is linear
if (fast == null || fast.next == null)
{
// no loop
return;
}
else if (fast == slow || fast.next == slow)
{
// detected a loop
break;
}
else
{
fast = fast.next.next; // move 2 nodes at at time
slow = slow.next; // move 1 node at a time
}
}
//PART II - Identify the node that is the start of the loop
fast = head; //reset one of the references to head of list
//until both the references are one short of the common element which is the start of the loop
while(fast.next != slow.next)
{
fast = fast.next;
slow = slow.next;
}
start = fast.next;
//PART III - Eliminate the loop by setting the 'next' pointer
//of the last element to null
fast = start;
while(fast.next != start)
{
fast = fast.next;
}
fast.next = null; //break the loop
This explanation might help the why behind Part II:
Assume the length of the cycle is M, and the length of the rest of the linked list is L. Let's figure out what is the position in the cycle when t1/t2 first meet?
Define the first node in the cycle is position 0, following the links we have position 1, 2,..., up to M-1. ( when we walk in the cycle, our current position is (walk_length) mod M, right?) Suppose t1/t2 first meet at position p, then their travel time are the same, (L+k1*M+p)/v = (L+k2*M+p)/2v for some k1
So it concludes that if t1 start from p, t2 start from head and move at the same speed, then will grantee to meet at position 0, the first node of the cycle. QED.
More references:
- http://www.quora.com/How-does-Floyds-cycle-finding-algorithm-work
- Explain how finding cycle start node in cycle linked list work?
- Proof of detecting the start of cycle in linked list
- Hristo's answer to this question on this page also quotes an explanation from an interview book
Solution 1 - courtesy of Career Cup and "Cracking the Coding Interview" book:
public static LinkedListNode findStartOfLoop(LinkedListNode head) {
LinkedListNode n1 = head;
LinkedListNode n2 = head;
// find meeting point using Tortoise and Hare algorithm
// this is just Floyd's cycle detection algorithm
while (n2.next != null) {
n1 = n1.next;
n2 = n2.next.next;
if (n1 == n2) {
break;
}
}
// Error check - there is no meeting point, and therefore no loop
if (n2.next == null) {
return null;
}
/* Move n1 to Head. Keep n2 at Meeting Point. Each are k steps
/* from the Loop Start. If they move at the same pace, they must
* meet at Loop Start. */
n1 = head;
while (n1 != n2) {
n1 = n1.next;
n2 = n2.next;
}
// Now n2 points to the start of the loop.
return n2;
}
The explanation for this solution is straight from the book:
If we move two pointers, one with speed 1 and another with speed 2, they will end up meeting if the linked list has a loop. Why? Think about two cars driving on a track; the faster car will always pass the slower one!
The tricky part here is finding the start of the loop. Imagine, as an analogy, two people racing around a track, one running twice as fast as the other. If they start off at the same place, when will they next meet? They will next meet at the start of the next lap.
Now, let’s suppose Fast Runner had a head start of k meters on an n step lap. When will they next meet? They will meet k meters before the start of the next lap. (Why? Fast Runner would have made k + 2(n - k) steps, including its head start, and Slow Runner would have made n - k steps Both will be k steps before the start of the loop ).
Now, going back to the problem, when Fast Runner (n2) and Slow Runner (n1) are moving around our circular linked list, n2 will have a head start on the loop when n1 enters. Specifically, it will have a head start of k, where k is the number of nodes before the loop. Since n2 has a head start of k nodes, n1 and n2 will meet k nodes before the start of the loop.
So, we now know the following:
- Head is k nodes from LoopStart (by definition)
- MeetingPoint for n1 and n2 is k nodes from LoopStart (as shown above)
Thus, if we move n1 back to Head and keep n2 at MeetingPoint, and move them both at the same pace, they will meet at LoopStart
Solution 2 - courtesy of me :)
public static LinkedListNode findHeadOfLoop(LinkedListNode head) {
int indexer = 0;
Map<LinkedListNode, Integer> map = new IdentityHashMap<LinkedListNode, Integer>();
map.put(head, indexer);
indexer++;
// start walking along the list while putting each node in the HashMap
// if we come to a node that is already in the list,
// then that node is the start of the cycle
LinkedListNode curr = head;
while (curr != null) {
if (map.containsKey(curr.next)) {
curr = curr.next;
break;
}
curr = curr.next;
map.put(curr, indexer);
indexer++;
}
return curr;
}
I hope this helps.
Hristo
This response is not meant to compete for the answer, but rather to explain a little more on the meeting of the two nodes in the tortoise and the hare algorithm.
Both nodes will eventually enter the loop. Because one moves faster (F) than the other (S), (F) will eventually lap (S).
If the loop's start is at the list's head, (F) must meet (S) back at the list's head. This is ONLY because (F)'s speed is 2X (S)'s; if it was 3X this then would not be true. This is true because (F) completes one lap when (S) completes half a lap, so when (S) completes its first lap, (F) has completed two laps - and is back at the start of the loop with (S).
-
If the loop's start is NOT at the list's head, then by the time (S) enters the loop, (F) has had a head start of (k) nodes in the loop. Because (S)'s speed is only one node at a time, it will meet (F) at (k) nodes from the loop's start - as in, (k) more steps before reaching the start, NOT (k) steps AFTER the start. This would NOT be true if (S)'s speed was not one and the speed ratio was not 2:1 between (F) and (S).
3.1. This is where it gets a little tricky to explain. We can agree that (F) will continue lapping (S) until they eventually meet (see 1 above), but why at (k) nodes from the loop's start? Consider the following equation where M is the number of nodes or distance of the loop and k is the head start (F) had; the equation represents distance traveled by (F) given time t on the left in terms of distance traveled by (S) on the right:
d_F(t) = 2 * d_S(t) + k
So when (S) enters the loop and has traveled 0 distance in the loop, (F) has traveled only the (k) distance. By the time d_S = M - k, d_F = 2M - k. Because we also have to use modular math in consideration that M represents the total distance of a single lap in the loop, the POSITION of (F) and (S) at any whole M (no remainder) is 0. So then in terms of POSITION (or the differential), this leaves k (or rather, -k).
And so finally, (S) and (F) will meet at position (0 - k), or (k) nodes away from the start of the loop.
Given [3] above, as (k) represents the head start (F) had, and as (F) had traveled 2X the distance (S) traveled to enter the loop from the head of the list, (k) also represents the distance from the list's start, which then represents the start of the loop.
It's a bit late here so I hope I've articulated effectively. Let me know otherwise and I'll try and update my response.
If using an identity hash map (such as IdentityHashMap) is permissible this is terribly easy to solve. It does use more space, though.
Traverse the nodes list. For each node encountered, add it to the identity map. If the node already existed in the identity map then the list has a circular link and the node which was prior to this conflict is known (either check before moving or keep the "last node") -- simply set "next" as appropriate to break the cycle.
Following this simple approach should be a fun exercise: code is left as an exercise for the reader.
Happy coding.