Programming Interview Questions 6: Combine Two Strings

We are given 3 strings: str1, str2, and str3. Str3 is said to be a shuffle of str1 and str2 if it can be formed by interleaving the characters of str1 and str2 in a way that maintains the left to right ordering of the characters from each string. For example, given str1=”abc” and str2=”def”, str3=”dabecf” is a valid shuffle since it preserves the character ordering of the two strings. So, given these 3 strings write a function that detects whether str3 is a valid shuffle of str1 and str2.

We will use recursion to solve the problem, but first we check whether the length of str1 plus str2 equals to the length of str3. If not, then str3 can’t be a valid shuffle since it contains extra characters, so we return false immediately. Recursion happens as follows. If the first characters of str1 and str3 are the same, then we’ll recurse with new str1 and str3 being all but first characters of the strings, and str2 will stay the same. If first characters of str2 and str3 are the same, then we’ll do the same thing with new str2 and str3 being all but first characters, and str1 the same. Now this is the same problem with shorter strings, so it’s very suitable for recursion. If neither str1′s nor str2′s first character equals str3′s first character, we return false. The base case of the recursion is, if any of the strings is empty then the concatenation of str1 and str2 should be equal to str3. Here is the python code:

def isShuffle(str1, str2, str3):
    if len(str1)+len(str2)!=len(str3):
        return False
 
    if not str1 or not str2 or not str3:
        if str1+str2==str3:
            return True
        else:
            return False
 
    if str1[0]!=str3[0] and str2[0]!=str3[0]:
        return False
 
    if str1[0]==str3[0] and isShuffle(str1[1:], str2, str3[1:]):
            return True
    if str2[0]==str3[0] and isShuffle(str1, str2[1:], str3[1:]):
            return True
 
    return False

The algorithm effectively uses recursion to solve a smaller instance of the same problem. However, the complexity is exponential. Since we don’t cache the evaluated results, we may try to evaluate the same input strings again and again. It becomes clear when we draw the recursion tree. We can also look at the recurrence relation and verify.

F_n = F_{n-1} + F_{n-1} = 2F_{n-1}

n is the number of characters in str3. There are two terms in the relation, one for each recursion described above. The recurrence relation is similar to Fibonacci numbers, and it’s exponential. So, we need to perform dynamic programming and cache the already evaluated results to avoid precomputation. Once we see that two input strings don’t produce a valid shuffle, we cache this information (if they do produce a valid shuffle then we’re done and return True, so no need to cache). In the beginning of the function we will check whether we evaluated the given input strings before trying to compute again.  If we did, then we won’t try again and immediately return False. Here is the code with caching:

def isShuffle2(str1, str2, str3, cache=set()):
    if (str1, str2) in cache:
        return False
 
    if len(str1)+len(str2)!=len(str3):
        return False
 
    if not str1 or not str2 or not str3:
        if str1+str2==str3:
            return True
        else:
            return False
 
    if str1[0]!=str3[0] and str2[0]!=str3[0]:
        return False
 
    if str1[0]==str3[0] and isShuffle2(str1[1:], str2, str3[1:], cache):
            return True
    if str2[0]==str3[0] and isShuffle2(str1, str2[1:], str3[1:], cache):
            return True
 
    cache.add( (str1, str2) )
 
    return False

The cache is a set where the key is the tuple of str1 and str2. We cache the values we already know that can’t produce a valid shuffle and check before trying again. The complexity of this solution is O(NM) where the N and M are the lengths of str1 and str2 respectively. So, from exponential we reduced the complexity to quadratic by using dynamic programming. This is the worst case complexity though, average case would be better.

This question demonstrates effective use of recursion and dynamic programming to achieve an efficient solution.

VN:F [1.9.11_1134]
Rating: 9.5/10 (2 votes cast)
Programming Interview Questions 6: Combine Two Strings, 9.5 out of 10 based on 2 ratings

This entry was posted in Programming Interview. Bookmark the permalink.

6 Responses to Programming Interview Questions 6: Combine Two Strings

  1. Mehmet says:

    Again a nice question and a good solution but I guess there is a better solution. We’ll use 3 indices, like merge sort, initially pointing the beginning of each strings. Lets say them i1,i2 and i3 respectively. We will iterate i3 from 0 to len(str3). In each iteration increment i1 if str1[i1]==str3[i3], or increment i2 if str2[i2] == str[i3]. If neither these two conditions satisfied then str3 is not a valid shuffling. We don’t need to check other characters in str1 and str2 because we shouldn’t see str1[i1+j] before str1[i1] or str2[i2+k] before str2[i2]. If iteration finishes then this is a valid shuffle. This gives us O(N+M) time complexity and O(1) space complexity. Also we didn’t use heap stack as well. In your solution, if strings are too long we may fill up heap stack.

    I like your interview question series. keep up writing :))

    • Arden says:

      What if both str1[i1]==str3[i3] and str2[i2] == str[i3]? Then we don’t know from which one to proceed because any of them may lead to a solution, so you should recurse from both, isn’t it?

  2. John Kings says:

    str3.remove(character set from str1)
    str2 == str3

    • Arden says:

      What if str1 and str2 contain similar or same characters. Then you’ll remove them from str3 as well, which is incorrect. For example, str1=”abc” str2=”bcd” str3=”abbccd”. After removing characters of str1 from str3 we’ll have str3=”d” which is not what we wanted.

  3. hero says:

    Hi Arden,

    I wonder if it would be possible to keep only the index number of failing string compared to original one in the cache instead of whole substring. For your example, it would cache the string tuple as {bc,f}, for some time rigth? Instead, would it be possible to cache {1,2} meaning that 1 is the substring of abc from the index 1 and so on.

    • Arden says:

      That’s a great idea, we can definitely do that. We will save so much space by caching the indexes instead of the whole substring itself. And we can easily implement it by slightly modifying the above algorithm. Thanks for the idea..

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">