Find longest substring without repeating characters

Given a string S of length N find longest substring without repeating characters.

Example:

Input: "stackoverflow"

Output: "stackoverfl"

If there are two such candidates, return first from left. I need linear time and constant space algorithm.


  1. You are going to need a start and an end locator(/pointer) for the string and an array where you store information for each character: did it occour at least once?

  2. Start at the beginning of the string, both locators point to the start of the string.

  3. Move the end locator to the right till you find a repetition (or reach the end of the string). For each processed character, store it in the array. When stopped store the position if this is the largest substring. Also remember the repeated character.

  4. Now do the same thing with the start locator, when processing each character, remove its flags from the array. Move the locator till you find the earlier occurrence of the repeated character.

  5. Go back to step 3 if you haven't reached the end of string.

Overall: O(N)


import java.util.HashSet;

public class SubString {
    public static String subString(String input){

        HashSet<Character> set = new HashSet<Character>();

        String longestOverAll = "";
        String longestTillNow = "";

        for (int i = 0; i < input.length(); i++) {
            char c = input.charAt(i);

            if (set.contains(c)) {
                longestTillNow = "";
                set.clear();
            }
            longestTillNow += c;
            set.add(c);
            if (longestTillNow.length() > longestOverAll.length()) {
                longestOverAll = longestTillNow;
            }
        }

        return longestOverAll;
    }

    public static void main(String[] args) {
        String input = "substringfindout";
        System.out.println(subString(input));
    }
}

You keep an array indicating the position at which a certain character occurred last. For convenience all characters occurred at position -1. You iterate on the string keeping a window, if a character is repeated in that window, you chop off the prefix that ends with the first occurrence of this character. Throughout, you maintain the longest length. Here's a python implementation:

def longest_unique_substr(S):
  # This should be replaced by an array (size = alphabet size).
  last_occurrence = {} 
  longest_len_so_far = 0
  longest_pos_so_far = 0
  curr_starting_pos = 0
  curr_length = 0

  for k, c in enumerate(S):
    l = last_occurrence.get(c, -1)
    # If no repetition within window, no problems.
    if l < curr_starting_pos: 
        curr_length += 1
    else:
        # Check if it is the longest so far
        if curr_length > longest_len_so_far: 
            longest_pos_so_far = curr_starting_pos
            longest_len_so_far = curr_length
        # Cut the prefix that has repetition
        curr_length -= l - curr_starting_pos
        curr_starting_pos = l + 1
    # In any case, update last_occurrence
    last_occurrence[c] = k

  # Maybe the longest substring is a suffix
  if curr_length > longest_len_so_far:
    longest_pos_so_far = curr_starting_pos
    longest_len_so_far = curr_length

  return S[longest_pos_so_far:longest_pos_so_far + longest_len_so_far]