Programming Interview Questions 25: Remove Duplicate Characters in String

Remove duplicate characters in a given string keeping only the first occurrences. For example, if the input is ‘tree traversal’ the output will be ‘tre avsl’.

We need a data structure to keep track of the characters we have seen so far, which can perform efficient find operation. If the input is guaranteed to be in standard ASCII form, we can just create a boolean array of size 128 and perform lookups by accessing the index of the character’s ASCII value in constant time. But if the string is Unicode then we would need a much larger array of size more than 100K, which will be a waste since most of it would generally be unused.

Set data structure perfectly suits our purpose. It stores keys and provides constant time search for key existence. So, we’ll loop over the characters of the string, and at each iteration we’ll check whether we have seen the current character before by searching the set. If it’s in the set then it means we’ve seen it before, so we ignore it. Otherwise, we include it in the result and add it to the set to keep track for future reference. The code is easier to understand:

def removeDuplicates(string):
    for char in string:
        if char not in seen:
    return ''.join(result)

The time complexity of the algorithm is O(N) where N is the number of characters in the input string, because set supports O(1) insert and find. This is an optimal solution to one of the most common string interview questions.

Note: Complexity of insert and find for set is language and implementation dependent. Average case is mostly constant, but worst case can be logarithmic or even linear. But in an interview setting I think it’s safe to assume constant time insert and find for set.

VN:F [1.9.22_1171]
Rating: 7.8/10 (29 votes cast)
Programming Interview Questions 25: Remove Duplicate Characters in String, 7.8 out of 10 based on 29 ratings
This entry was posted in Programming Interview. Bookmark the permalink.
  • ahmet alp balkan

    How are the python sets are implemented? I’m wondering its corresponding in Java and my guess is TreeSet, however it could also be HashSet.

    In both cases o(1) operations are not guaranteed. TreeSet guarantees logn and HashSet depends on initially allocated capacity and load factor (hardcoded I assume).

    • Arden

      Yes the worst case complexity can be as bad as O(N) in python: But as I noted in the end, I think we may assume constant time in an interview setting. It didn’t hurt me at least..

      • Tony

        if we can use set() on the string, set will return a set of unique elements so the solution would need only 2 lines:

        def removeDuplicates(s):
        return ”.join(set(s))

  • Ashot Madatyan

    If implemented in C/C++, you can even do without O(N) additional storage as is done here in Python. For sake of brevity, I have created that code at the link below:

    • bpin

      Doesn’t the set class you used in C++ require O(N) space?

  • Anonymous

    // this can be done in inline on c++/java
    void remdupwithoutbuff(string& str) {
    int len = str.length();
    int tail = 1;
    int i,j;
    for (i=1; i<len; i++) {
    for (j=0; j<tail; j++) {
    if (str[i] == str[j]) {
    if (tail == j) {
    str[tail++] = str[i];

  • sonesh

    Now if i have “sonesh kumar iit”
    this should become “sonesh kumarit” , Am i right ?

    • Anonymous

      no it should be soneh kumarit


      No ..Since the second occurrence of s is at 5th(4th index) position.the output would be soneh kumar it

      • tree

        I think sonesh means the algo will also remove the 2nd space, according to the code, it surely does

  • djlewis

    I would modify this slightly to take the ‘space’ character into account:

    def removeDuplicates(string,ignoreSpaces=True):
    result = []
    seen = set()
    for char in string:
    if char not in seen:
    elif char==’ ‘ and ignoreSpaces:
    return ”.join(result)

  • anchal dhiman

    Nice Examples… visit more java examples

  • Architect

    //Java 8 Streams & Lambdas solution by the Architect
    String text = "tree traversal";

    IntStream.range(0, text.length())
    .filter(s -> !text.substring(0, s).contains(text.substring(s, s+1)))
    .forEach(s -> System.out.print(text.substring(s, s+1)));