One of the most common string interview questions: Find the first non-repeated (unique) character in a given string.
This question demonstrates efficient use of hashtable. We scan the string from left to right counting the number occurrences of each character in a hashtable. Then we perform a second pass and check the counts of every character. Whenever we hit a count of 1 we return that character, that’s the first unique letter. If we can’t find any unique characters, then we don’t return anything (None in python). Here’s the code:
def firstUnique(text): counts=collections.defaultdict(int) for letter in text: counts[letter]+=1 for letter in text: if counts[letter]==1: return letter
As you can see it’s pretty straightforward once we use a hashtable. It’s an optimal solution, the complexity is O(N). Hashtable is generally the key data structure to achieve optimal linear time solutions in questions that involve counting.
I think we can traverse the string only once by keeping two trees, one is just a binary search tree to keep nodes of characters and their counts, one is a minheap to keep those nodes sorted and while traversing we’ll maintain the heap and then after the traversal is finished, answer is at the root of the heap. That comes with a memory paste and we can maintain the heap in O(N) time, I think. That’s not a better solution, just wanted to leave it here.
Good alternative solution. But if we replace the binary search tree with hashtable, updating the counts will be slightly faster I guess?
Heres a twist to the same problem that I encountered in one of my recent interviews. What if the given string is extremely huge and the character set is lowercase letters (‘a’ to ‘z’)?
Answer: Firstly, we can replace the hash table with an array of pairs of size 26 (each representing a lowercase letter):
– The first element in each pair stores the location where the character occured for the first time in the string (default value is -1 indicating non-occurence).
– The second element in each pair is a boolean that indicates if the character’s occurence is unique. (default value is false and it also indicates repetition. true indicates uniqueness.)
With this data structure ready, it is enough if we parse through the original string only once to populate the array of pairs. Then we do a scan on the array to figure out the first unique character.
You’re totally right. I have a post ready to publish discussing exactly this solution. It’ll be number 25: remove duplicate characters in string. Thanks for the comment..
An optimization which a friend of mine pointed out to me: Instead of an array of pairs, we can use just use an array of integers, where -1 (default value) represents non-occurence of the character and -2 represents duplication.