Programming Interview Questions 16: Anagram Strings

Given two strings, check if they’re anagrams or not. Two strings are anagrams if they are written using the same exact letters, ignoring space, punctuation and capitalization. Each letter should have the same count in both strings. For example, ‘Eleven plus two’ and ‘Twelve plus one’ are meaningful anagrams of each other.

First we should extract only the letters from both strings and convert to lowercase, excluding punctuation and whitespaces. Then we can compare these to check whether two strings are anagrams of each other. From now on when I refer to a string, I assume this transformation is performed and it only contains lowercase letters in original order.

If two strings contain every letter same number of times, then they are anagrams. One way to perform this check is to sort both strings and check whether they’re the same or not. The complexity is O(NlogN) where N is the number of characters in the string. Here’s the code:

def isAnagram1(str1, str2):
    return sorted(getLetters(str1))==sorted(getLetters(str2))
 
def getLetters(text):
    return [char.lower() for char in text if char in string.letters]

Sorting approach is elegant but not optimal. We would prefer a linear time solution. Since the problem involves counting, hashtable would be a suitable data structure. We can store the counts of each character in string1 in a hashtable. Then we scan string2 from left to right decreasing the count of each letter. Once the count becomes negative (string2 contains more of that character) or if the letter doesn’t exist in the hashtable (string1 doesn’t contain that character), then the strings are not anagrams. Finally we check whether all the counts in the hashtable are 0, otherwise string1 contains extra characters. Or we can check the lengths of the strings in the beginning and avoid this count check. This also allows early termination of the program if the strings are of different lengths, because they can’t be anagrams. The code is the following:

def isAnagram2(str1, str2):
    str1, str2 = getLetters(str1), getLetters(str2)
    if len(str1)!=len(str2):
        return False
    counts=collections.defaultdict(int)
    for letter in str1:
        counts[letter]+=1
    for letter in str2:
        counts[letter]-=1
        if counts[letter]<0:
            return False
    return True

I use python’s defaultdict as hashtable. If a letter doesn’t exist in the dictionary it produces the value of 0. The complexity of this solution is O(N), which is optimal. The use of hashtables in storing counts once again proves its advantage.

VN:F [1.9.22_1171]
Rating: 8.7/10 (21 votes cast)
Programming Interview Questions 16: Anagram Strings, 8.7 out of 10 based on 21 ratings
This entry was posted in Programming Interview. Bookmark the permalink.
  • Sameer

    Hey Arden,

    Why the last “sum(counts.values())==0” check is required? If it reaches there, wouldn’t it always be 0?

    • Arden

      Because string1 may contain extra characters. For example if string1=’abc’ and string2=’ab’, the loops would complete just fine but we should check for that extra character ‘c’ in the end. To be sure that each letter has exact same count in both strings.

      • http://sameerbandal@gmail.com Sameer

        But you already compare the lengths of the strings first.

        I mean not a big deal to compare again, I was just making sure that I am not missing any case..

        • Arden

          Ok now I see, you’re totally right. Thanks a lot for the notice, I’ll update the code.

  • Volkan

    how about using XOR, it reduces memory to O(1).

    • Arden

      That’s a great idea, thanks for the comment. Will update the post..

      • Raj

        xor does’nt work.consider the case of string1=”aaaa” and string2=”bbbb”.for this xor of both string is zero.

        • Arden

          You’re right, now I see it. Thanks a lot for the correction!

          • Sai Talasila

            XOR might work if we can add a dummy character to one of the strings to address the above cases. If Str1= ‘aaaa’ and Str2 = ‘bbbb’, then I would XOR ‘aaaa0’ and ‘bbbb’. We should only make sure that we are not using a character that the string might contain.

          • http://javabypatel.blogspot.in Jayesh

            If string contains only unique chars, still XOR solution will not work, consider following case:
            str1 = “BE”
            str2 = “DC”

            xor of these two strings is 0 but they are not anagram of each other.

  • Parameswaran Raman

    Hi Arden,

    First of all, great work in maintaining the blog.

    There is another great solution to this anagram problem (of-course, it will only work with strings having non-repeated characters). Since, its based on a cool great math property, I wanted to mention it here.

    The trick is to assign prime nos to alphabets from a-z (2, 3, 5, 7, .. so on). Now, if we have two strings to be evaluated, say “abc” and “bac”. Compute products of each of the numbers assigned to these words and check if they are equal! Beauty of prime factorization :)

    – Params

    p.s. I am personally also a machine learning and ir guy. looking forward to more interesting posts from you..

    • Aditya

      We might run into integer overflow, but very interesting solution nevertheless!

      • Can

        Why wouldn’t it work with repeated characters? It should work with repeated characters as well.

        • Architect

          You are correct.

      • Architect

        Overflow is not a problem, because if they are anagrams, then both will overflow to the same number (see my solution above).

        • Galax

          Overflow would be a problem with some languages (e.g. C++, C#, Java) as you could get the same value from two different strings, where one had overflowed and the other hadn’t, and erroneously think that they were equal. Python doesn’t overflow however, so this method will work in Python (although doesn’t necessarily beat the hash table approach for large strings).

    • Architect


      //Java 8 Streams & Lambdas solution by the Architect https://disqus.com/by/op8rv315/
      String str1 = "Eleven plus two".toLowerCase().replaceAll("(\s)?(\W)?","");
      String str2 = "Twelve plus one".toLowerCase().replaceAll("(\s)?(\W)?","");
      List primes = Arrays.asList(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101);

      Integer result = Arrays.asList(str1, str2)
      .stream()
      .map(s -> s.chars().map(i -> i - 97).map(i -> primes.get(i)).reduce(1, (x, y) -> x * y))
      .reduce(0, (t, u) -> u - t);

      System.out.println(result == 0 ? "True - Anagrams" : "False - Not anagrams");

  • adilansari

    Good solution!

    I have written a code in JAVA using HashSet.

    static boolean isAnagram(String s1, String s2) {
    if (s1.length() != s2.length())
    return false;
    if (s1.length() == 0)
    return true;

    HashSet set= new HashSet();
    for(int i=0; i < s1.length(); i++) {
    if(set.contains(s1.charAt(i)))
    set.remove(s1.charAt(i));
    else
    set.add(s1.charAt(i));

    if(set.contains(s2.charAt(i)))
    set.remove(s2.charAt(i));
    else
    set.add(s2.charAt(i));
    }

    return (set.size() == 0);
    }

    • adilansari

      Oh i just figured it out, it will fail for:
      String s1= “aapp”;
      String s2= “sskk”;

  • Jeyrs

    Using bit vectors:

    public boolean anagram(String a, String b){

    int v = 0, v2=0;//bit vectors

    if(a.length() != b.length()) return false;

    for(int i = 0; i < a.length() ; i++)

    v |= (1 << (a.charAt(i) – 'a'));

    for(int i = 0; i < b.length(); i++)

    v2 |= (1 << (b.charAt(i) – 'a'));

    return v == v2;
    }

    • Galax

      You’re making the assumption there that your strings only consist of lower case letters. If that’s true then your approach is very efficient.

  • john thomas

    Sorting two strings does not take nLogn because they can be bucket sorted. This brings the complexity to O(n) with the trivial solution.

    • Galax

      You’re going to bucket sort all 1,114,112 code points that are assigned in Unicode 8.0? The hash table approach is suddenly looking much better.

  • kinshuk chandra

    Good solution. I also like the counting method as it is simple and efficient. Here is my post on same – http://k2code.blogspot.com/2011/09/anagram-checkerto-check-if-2-strings.html

  • Jangul Aslam

    Java Script way:

    // remove white-spaces (or any more characters u want to skip), sort the characters and compare for inputs
    var a = ‘Eleven plus two’.replace(/s+/g, ”).toLowerCase().split(”).sort().join(”);
    var b = ‘Twelve plus one’.replace(/s+/g, ”).toLowerCase().split(”).sort().join(”);
    a === b;

  • wen


    def anagram(str1, str2):
    str1, str2 = getLetters(str1), getLetters(str2)
    return collections.Counter(str1) == collections.Counter(str2)

  • Ahed Saed

    below solution will work:

    public boolean isAnagram(String a,String b){
    if(a.length() != b.length()) return false;

    int[] str = new int[a.length()];
    int sum=0;
    for(int i=0; i < a.length() ; i++){

    if ( a.toLowerCase().indexOf(b.toLowerCase().charAt(i)) == -1 ) {
    return false;
    }else{
    str[a.toLowerCase().indexOf(b.toLowerCase().charAt(i))] +=1;
    sum += str[a.toLowerCase().indexOf(b.toLowerCase().charAt(i))];
    }
    if ( b.toLowerCase().indexOf(a.toLowerCase().charAt(i)) == -1 ){
    return false;
    }
    }

    for(int i=0; i < a.length() ; i++){
    sum-= str[a.indexOf(a.charAt(i))];
    str[a.indexOf(a.charAt(i))]-=1;
    }
    System.out.println("SUM : " + sum);
    return ((sum == 0 )?true:false);
    }

  • jeffb

    Just hacking out loud, trying to use timeit module to measure the performance. It appears that the linear time solution takes longer. If this is true, maybe it’s because the strings are relatively short?

    {code}

    #!/usr/bin/env python

    import string

    import collections

    import timeit

    # http://www.ardendertat.com/2011/11/17/programming-interview-questions-16-anagram-strings/

    # solution #1

    def isAnagram1(str1, str2):

    return sorted(getLetters(str1))==sorted(getLetters(str2))

    def getLetters(text):

    return [char.lower() for char in text if char in string.letters]

    str1 = ‘Eleven plus two’

    str2 = ‘Twelve plus one’

    isAnagram1(str1, str2)

    # solution #2: linear time

    def isAnagram2(str1, str2):

    str1, str2 = getLetters(str1), getLetters(str2)

    if len(str1)!=len(str2):

    #print False

    return

    counts=collections.defaultdict(int)

    for letter in str1:

    counts[letter]+=1

    for letter in str2:

    counts[letter]-=1

    if counts[letter]<0:

    #print False

    return

    #print True

    return

    if __name__ == '__main__':

    str1 = 'Eleven plus two'

    str2 = 'Twelve plus one'

    print 'string1: ', str1

    print 'string2: ', str2

    print "nisAnagram1:"

    print(timeit.timeit("isAnagram1('Eleven plus two', 'Twelve plus one')", setup="from __main__ import isAnagram1"))

    print "nisAnagram2:"

    print(timeit.timeit("isAnagram2('Eleven plus two', 'Twelve plus one')", setup="from __main__ import isAnagram2"))

    # output:

    #isAnagram1:

    #12.7596487999

    #isAnagram2:

    #16.2050049305
    {code}

  • Anand

    this is not the right solution..
    try with:
    str1=”abcacbd 123″
    str2=”321 abcdcba”
    it will return True
    The code is not checking the order, so solution is not correct