Programming Interview Questions 6: Combine Two Strings

We are given 3 strings: str1, str2, and str3. Str3 is said to be a shuffle of str1 and str2 if it can be formed by interleaving the characters of str1 and str2 in a way that maintains the left to right ordering of the characters from each string. For example, given str1=”abc” and str2=”def”, str3=”dabecf” is a valid shuffle since it preserves the character ordering of the two strings. So, given these 3 strings write a function that detects whether str3 is a valid shuffle of str1 and str2.

We will use recursion to solve the problem, but first we check whether the length of str1 plus str2 equals to the length of str3. If not, then str3 can’t be a valid shuffle since it contains extra characters, so we return false immediately. Recursion happens as follows. If the first characters of str1 and str3 are the same, then we’ll recurse with new str1 and str3 being all but first characters of the strings, and str2 will stay the same. If first characters of str2 and str3 are the same, then we’ll do the same thing with new str2 and str3 being all but first characters, and str1 the same. Now this is the same problem with shorter strings, so it’s very suitable for recursion. If neither str1′s nor str2′s first character equals str3′s first character, we return false. The base case of the recursion is, if any of the strings is empty then the concatenation of str1 and str2 should be equal to str3. Here is the python code:

def isShuffle(str1, str2, str3):
    if len(str1)+len(str2)!=len(str3):
        return False
 
    if not str1 or not str2 or not str3:
        if str1+str2==str3:
            return True
        else:
            return False
 
    if str1[0]!=str3[0] and str2[0]!=str3[0]:
        return False
 
    if str1[0]==str3[0] and isShuffle(str1[1:], str2, str3[1:]):
            return True
    if str2[0]==str3[0] and isShuffle(str1, str2[1:], str3[1:]):
            return True
 
    return False

The algorithm effectively uses recursion to solve a smaller instance of the same problem. However, the complexity is exponential. Since we don’t cache the evaluated results, we may try to evaluate the same input strings again and again. It becomes clear when we draw the recursion tree. We can also look at the recurrence relation and verify.

F_n = F_{n-1} + F_{n-1} = 2F_{n-1}

n is the number of characters in str3. There are two terms in the relation, one for each recursion described above. The recurrence relation is similar to Fibonacci numbers, and it’s exponential. So, we need to perform dynamic programming and cache the already evaluated results to avoid precomputation. Once we see that two input strings don’t produce a valid shuffle, we cache this information (if they do produce a valid shuffle then we’re done and return True, so no need to cache). In the beginning of the function we will check whether we evaluated the given input strings before trying to compute again.  If we did, then we won’t try again and immediately return False. Here is the code with caching:

def isShuffle2(str1, str2, str3, cache=set()):
    if (str1, str2) in cache:
        return False
 
    if len(str1)+len(str2)!=len(str3):
        return False
 
    if not str1 or not str2 or not str3:
        if str1+str2==str3:
            return True
        else:
            return False
 
    if str1[0]!=str3[0] and str2[0]!=str3[0]:
        return False
 
    if str1[0]==str3[0] and isShuffle2(str1[1:], str2, str3[1:], cache):
            return True
    if str2[0]==str3[0] and isShuffle2(str1, str2[1:], str3[1:], cache):
            return True
 
    cache.add( (str1, str2) )
 
    return False

The cache is a set where the key is the tuple of str1 and str2. We cache the values we already know that can’t produce a valid shuffle and check before trying again. The complexity of this solution is O(NM) where the N and M are the lengths of str1 and str2 respectively. So, from exponential we reduced the complexity to quadratic by using dynamic programming. This is the worst case complexity though, average case would be better.

This question demonstrates effective use of recursion and dynamic programming to achieve an efficient solution.

VN:F [1.9.22_1171]
Rating: 7.3/10 (12 votes cast)
Programming Interview Questions 6: Combine Two Strings, 7.3 out of 10 based on 12 ratings

This entry was posted in Programming Interview. Bookmark the permalink.
  • Mehmet

    Again a nice question and a good solution but I guess there is a better solution. We’ll use 3 indices, like merge sort, initially pointing the beginning of each strings. Lets say them i1,i2 and i3 respectively. We will iterate i3 from 0 to len(str3). In each iteration increment i1 if str1[i1]==str3[i3], or increment i2 if str2[i2] == str[i3]. If neither these two conditions satisfied then str3 is not a valid shuffling. We don’t need to check other characters in str1 and str2 because we shouldn’t see str1[i1+j] before str1[i1] or str2[i2+k] before str2[i2]. If iteration finishes then this is a valid shuffle. This gives us O(N+M) time complexity and O(1) space complexity. Also we didn’t use heap stack as well. In your solution, if strings are too long we may fill up heap stack.

    I like your interview question series. keep up writing :))

    • Arden

      What if both str1[i1]==str3[i3] and str2[i2] == str[i3]? Then we don’t know from which one to proceed because any of them may lead to a solution, so you should recurse from both, isn’t it?

      • Navin

        Why need recursion ??
        This problem can be solved if we start matching the characters from the end of each string .
        Am I right??
        Please check .

        • melih

          Let me explain the situation instead of Arden. If the strings do use different characters the solution would be easy and linear with a time complexity of O(N+M) and space complexity of O(1). But this is the simple solution. You have to pass the test with the strings below and the simple algorithm will fail on that.

          str1 = “aa”
          str2 = “aaf”
          str3 = “aafaa”

          In that case if we select the first two characters from str1 it will fail. That’s why we have to use recursion. Actually it is like a Non-Deterministic Finite State Machine problem. So the simple solution that you mentioned below will fail for corner cases.

          def isCorrectShuffle(str1,str2,str3):
          iter1,iter2,iter3 = (0,0,0)
          for iter3 in xrange(len(str3)):
          if iter1 < len(str1) and str3[iter3] == str1[iter1]:
          iter1 += 1
          elif iter2 < len(str2) and str3[iter3] == str2[iter2]:
          iter2 += 1
          else:
          return False;
          return (iter1 == len(str1)) and (iter2 == len(str2))

      • anon

        Even you choose str1 over str2 when (str1[i1]==str3[i3] and str2[i2] == str[i3]). Also it does not have exponential properties, as both str2 and str1 calls will never happen. One or the other will be chosen (both paths are never followed as with fib). I think either way it is linear with the recursive being a bit heavier as usual (but more elegant of course.)

        Great series. Thanks.

      • sonesh

        we don’t have to worry about that case, because as both the current char. are equal , choose any one of them, and proceed…!!!

  • John Kings

    str3.remove(character set from str1)
    str2 == str3

    • Arden

      What if str1 and str2 contain similar or same characters. Then you’ll remove them from str3 as well, which is incorrect. For example, str1=”abc” str2=”bcd” str3=”abbccd”. After removing characters of str1 from str3 we’ll have str3=”d” which is not what we wanted.

      • Chandra

        What if we remove first occurance of each char in str1 in str3?

        • Tung

          Hi Arden, thanks for an interesting question. I also have the same idea as Chandra’s. Let’s follow your example:
          str1=”abc”, str2=”bcd”, str3=”abbccd”
          The idea is we will remove str1:”abc” from str3 and what’s left after that should be the same as str2( after trimming the spaces)

          int currentLookUp=0;
          //remove str1 from str3
          for(int i=0;i<str3.length();i++)
          {
          if(str3[i]==str1[currentLookUp])
          {
          str3[i]='';
          if (currentLookUp==str1.length()-1) break;//done remove
          else
          {currentLookUp++;//point to the next char in str1}
          }
          // if there's still character in str1 that are not found in str3
          if (currentLookUp <str1.length()-1) return false;
          if (str3.trim()!=str2) return false;
          else return true;

  • hero

    Hi Arden,

    I wonder if it would be possible to keep only the index number of failing string compared to original one in the cache instead of whole substring. For your example, it would cache the string tuple as {bc,f}, for some time rigth? Instead, would it be possible to cache {1,2} meaning that 1 is the substring of abc from the index 1 and so on.

    • Arden

      That’s a great idea, we can definitely do that. We will save so much space by caching the indexes instead of the whole substring itself. And we can easily implement it by slightly modifying the above algorithm. Thanks for the idea..

  • Ashot Madatyan

    I think the s/c below solves the problem in O(n) time with no additional space requirements:
    ==========================================================
    /** Is the @s3 a valid shuffle of @s1 and @s2 with the preserved order. */
    bool valid_shufle(char *s1, char *s2, char *s3)
    {
    int l1, l2, l3;

    l1 = strlen(s1);
    l2 = strlen(s2);
    l3 = strlen(s3);

    if (!l1 || !l2 || !l3 || (l1 + l2 != l3))
    return false;

    /* Iterate over the @s3 and check whether the current char is either
    of the current char of any of the @s1 or @s2 strings */

    while(*s3){
    if (*s3 == *s1) {
    s1++;
    continue;
    }
    if (*s3 == *s2) {
    s2++;
    continue;
    }
    s3++;
    }

    /* If we are not at the end of both @s1 and @s2 then it was not a valid shuffle */
    if (! (*s1 == 0 && *s2 == 0))
    return false;

    return true;
    }

    • Arden

      Sorry for the late response. What if both *s1==*s3 and *s2==*s3? Then it’s not clear which pointer to increment (s1++ or s2++) because either one can lead to a solution. So we should use recursion and check both cases.

      • anon

        It doesnt matter if they are equal, one set will be exhausted and then the next will be used.

      • Ashot Madatyan

        Just increment whatever pointer matches, you do not need to worry about which one to use for comparison. The whole idea is whether you arrive at the end of both strings. I hope the inlined comments are quite self-descriptive.

        • Anonymous

          Ashot,

          The author didn’t come up with an elaborate solution to solve a corner case that didn’t exist.

          Careful that arrogance and/or vanity does not trump common sense. In other words, assume your wrong until you have overwhelming evidence to the contrary.

          So yes, it matters if the characters are the same. A simple case proves this.

          Suppose s1 = “dude” and s2 = “dog” and the merged string is “dodudeg”

          If you increment the first d in s1, then the next ‘o’ to match in s3 won’t match in s1 since it is pointing to ‘u’ nor s2 since it is still pointing to ‘d’. So it thinks it will fail.

          Thus you won’t end up at the end of both strings, but “dodudeg” is valid shuffle, so your simplified algorithm won’t work.

          Like Einstein said, everything should be as made as simple as possible, but not simpler :)

          • yev

            like :)

  • santa

    Nice solution :)

  • Jim

    Hi Arden, nice site!
    I don’t agree with this solution though. There is no need for recursion or DP.
    It can be done in O(N).
    Just scan the str3 comparing with either str1 or str2 and return false immediatelly on inequality. Just use diferent pointers to index str1 and str2.
    I think the @Ashot Madatyan is on the spot.
    Additionally your solution also doesn’t care if *s1==*s3 and *s2==*s3 as I understand it.
    Am I missunderstanding a requirement?

  • Kewal

    String 1 : bcde
    String 2 : acfg
    String 3 : abcfcdeg
    String 3 is a valid string

    in case of a tie let the algorithm pic the character from the first string in other words string1++
    so the algorithm goes picks the character in this manner:
    string 2(a) , string 1(b) , string 1(c)….
    now the next character to be matched is ‘f ‘
    but it is cannot be since string1 is now pointing to d
    and string2 is pointing to c !

    same explanation can be used to prove this algorithm wrong : delete the characters in string 3 contained in string 1 and then match string 3 to string 2

    so the correct solution is the one arden posted !
    and here is where he checks for the both possibilities *s1==*s3 and *s2==*s3

    if str1[0]==str3[0] and isShuffle2(str1[1:], str2, str3[1:], cache):
    return True
    if str2[0]==str3[0] and isShuffle2(str1, str2[1:], str3[1:], cache):
    return True

  • Kewal

    hope it helps and correct me if i am wrong !

  • PythonGuy

    Note that the code here
    def isShuffle2(str1, str2, str3, cache=set()):

    is a bit dangerous. The default cache value, when modified by any call saves its default value for subsequent calls.

    So, the first time you call this function with two arguments (defaulting the cache), it will work as expected. If you call it again with two arguments, the value of cache is the value from the previous call. Almost never what you want. An accepted Python metaphore is to do

    def isShuffle2(str1, str2, str3, cache=None):
    if cache is None: cache = set()

  • Ani

    Why dont we try permutation of concatof(str1,str2) and check with the str3?
    I am not sure about it.I am just sharing.

  • vesadi

    I am not aware how the cache concept works in C\C++. Can someone please tell me\post the equivalent way for the above soln in C\C++. Appreciate your help!.

    • Sriks

      vesadi,

      My code is in c. Just make sure that the 3 indexes for the 3 strings are initialized to 0 before running the program.

  • Harsh

    Hi Arden,
    I do not understand why the algorithm only caches str1 and str2. How can you say that two strings do not produce ANY valid shuffles? Shouldnt the cache actually be (str1,str2,str3) since what it means is that str3 is NOT a valid shuffle of str1 and str2?

    Thanks,
    H

    • Harsh

      Nvm. Figured it out. Thanks!

  • Anonymous

    I think there is a linear solution

    def check_shuffle(s1, s2, s3):
    if len(s3) != len(s1) + len(s2):
    return False
    s1_pos = 0
    remains = ”
    for index, s in enumerate(s3):
    if s1_pos == len(s1):
    remains += s3[index:]
    break
    elif s1[s1_pos] == s:
    s1_pos += 1
    else:
    remains += s
    if s1_pos != len(s1):
    return False
    return remains == s2

  • Navin

    Hi Arden,
    the main problem comes when both the (str1[i]==str3[i] ) && (str2[i]==str3[i] )
    this can be the only exceptional case.
    for this we can have a special exceptional handler function.
    it will move to the first character which is different in both the strings.
    and then it will match that character and the corresponding index will be increased.

    void excepionHandler(int& i1,int& i2,int& i3)
    {
    int a = i1, b = i2, c = i3;
    while(str1[a] == str2[b])
    a++,b++,c++;

    if( str1[a] ==” || str2[b]==” )
    return; // any of the indexes can be increased in the case of exception

    if(str1[a] ==str3[c])
    {
    i1 = ++a;
    i3 =++c;
    }
    else if(str2[b] ==str3[c])
    {
    i2 = ++b;
    i3 =++c;
    }

    }

    }

    • Assaf

      Try s1=”aaf”, s2 = “aae”, s3 = “aaaeaf”

      a,b & c will each increment twice, at which point

      s1[a]=’f', s1[b]=’e', and s3[c] = ‘a’.

      I think your handler won’t do anything in that case (both ifs return false), and i don’t see what it could do. Cool idea :) though i suspect backtracking is unavoidable

  • Sriks

    This worked for me. Logic is more or less similar to many others who’ve
    discussed this already. I tried to keep it as simple as I could, without recursion & without needing to invoke a new function every time. Handles duplicate chars.

    // str1 & str2 are input strings str3 is the concatenated string
    int str1Len = strlen(str1);
    int str2Len = strlen(str2);

    while((str1Index < str1Len) || (str2Index < str2Len))
    {
    if (str1[str1Index] == str3[str3Index])
    {
    str1Index++;
    str3Index++;
    }
    else if(str2[str2Index] == str3[str3Index])
    {
    str2Index++;
    str3Index++;
    }
    else
    {
    result = MISMATCH;
    break;
    }
    }

    • http://twitter.com/0007gaurav Gaurav Aggarwal

      str1=”ade”
      str2=”abf”
      str3=”abadfe”

      your program will fail in this case.

      • Sri

        I believe there is an assumption that the strings are unique and different

  • sarah

    What about using regular expressions ?
    In used example , RE for string one is a[any char]b[any char]c[any char]
    and for string two d[any char]e[any char]f[any char] and match every string with string 3

  • Santosh

    Hi Arden,

    I had one question about the DP solution. If you are adding to local copy of cache, and returning function, it will not be reflected in the callee function. The cache is always empty. Can you check this?

    Thanks.

  • Ilya

    Here is my O(n) solution, am I right?

  • http://www.facebook.com/tabakozan Ozan Tabak

    Here’s my iterative solution written java which handles the case of (str1[i] == str2[i] == shuffledStr[i]) with O(N+M) time complexity and O(N) extra space complexity,

    public boolean isCorrectlyShuffled(String str1, String str2,String shuffledStr) {

    //handle null strings by converting them to empty strings
    str1 = (str1 == null) ? “” : str1;
    str2 = (str2 == null) ? “” : str2;
    shuffledStr = (shuffledStr == null) ? “” : shuffledStr;

    //eliminate some cases by checking string lengths
    //and empty-nonempty str1-str2 inputs

    if ((str1.length() + str2.length() != shuffledStr.length())
    || ((str1.isEmpty() || str2.isEmpty()) && !(str1 + str2).equals(shuffledStr)))
    return false;

    //use two index values for two strings
    int ind1 = 0;
    int ind2 = 0;

    //keep a hashmap for the case of (str1[i] == str2[i] == shuffledStr[i])
    Map charBackup = new HashMap();

    for (int i = 0; i < shuffledStr.length(); i++) {
    //count occurences of a char at current indexes of three strings
    int occurence = 0;
    char currChar = shuffledStr.charAt(i);

    if (ind1 < str1.length() && currChar == str1.charAt(ind1)) {
    ind1++;
    occurence++;
    }

    if (ind2 0) {
    occurence++;
    charBackup.put(currChar, –backupLimit);
    }

    //if a char occured more than once, add the char to the hashmap
    //to be used later as a backup
    if (occurence == 2) charBackup.put(currChar, backupLimit + 1);

    else if (occurence == 0) return false;
    }

    return true;
    }

  • Lifeng Yin

    Why don’t you use the XOR algorithm in Find Missing Element? It can be solved in O(n)

    • İbrahim Aygül

      Order of the letters should be preserved in the third string. XOR cannot detect the order. Although, you can detect if given third string is a permutation of the given two strings with XOR algorithm.

  • İbrahim Aygül
    • Melih KARAMAN

      This one includes both:


      def isCorrectShuffle(str1,str2,str3):
      iter1,iter2,iter3 = (0,0,0)
      for iter3 in xrange(len(str3)):
      if iter1 < len(str1) and iter2 < len(str2) and
      str3[iter3] == str1[iter1] and str3[iter3] == str2[iter2]:
      return isCorrectShuffle(str1[iter1+1:],str2[iter2:],str3[iter3+1:])
      or
      isCorrectShuffle(str1[iter1:],str2[iter2+1:],str3[iter3+1:])
      if iter1 < len(str1) and str3[iter3] == str1[iter1]:
      iter1 += 1
      elif iter2 < len(str2) and str3[iter3] == str2[iter2]:
      iter2 += 1
      else:
      return False;
      return (iter1 == len(str1)) and (iter2 == len(str2))

  • Utsav Pandey

    Here is a simple O(n) solution. Recursion is overkill for this problem, unless explicitly asked for.

    • Assaf

      Ustav-
      Try Your algorithm with input-
      s1 = “ab”
      s2 = “ae”
      s3 = “aeab”.

      s3 is a reshuffle of s1 and s2.
      but your algorithm will return false, because with the first char (‘a’) you increment s1′s index instead of s2, and in the next iteration you have -
      s1[i]=’b’
      s2[i]=’a’
      s3[i]=’e’

      In such a case you’d need to backtrack

    • Ram

      Brilliant solution ! Thanks !

    • Ronak

      char *a = “ab”;

      char *b = “ae”;

      char *c = “aeab”;

    • vickychijwani

      Ideally, an interviewer would handle this question in this way: “Hmm, let’s see, oh you’ve got an O(N) solution! That’s great! But wait, it won’t work if A and B share common characters, right? How would you solve that?”, which should prompt you to jump to the recursive / DP solution.

  • Jeremy

    When your being tested do you need to add comments?

  • Gaurav Nanda

    I have another intelligent solution, which recurses only if required.

    def checkShuffle(s1, s2, s3):
    if len(s1) == len(s2) == len(s3) == 0:
    return True

    x = y = 0
    for ch in s3:
    #print ch
    if (x < len(s1) and y < len(s2) and s1[x] == s2[y] == ch):
    r1 = checkShuffle(s1[1:], s2, s3[1:])
    if r1 == True:
    return True
    r2 = checkShuffle(s1, s2[1:], s3[1:])
    if r2 == True:
    return True
    return False

    elif (x < len(s1) and s1[x] == ch):
    x += 1
    elif (y < len(s2) and s2[y] == ch):
    y += 1
    else:
    return False

  • http://ahmetalpbalkan.com/ Ahmet Alp Balkan

    if str1[0]!=str3[0] and str2[0]!=str3[0]:
    return False

    is actually redundant since recursion never happens, the return False at the end provides same functionality.

  • Evren

    How about this?

    boolean isValidShuffle(String str1, String str2, String str3) {

    for (char ch : str3.toCharArray()) {
    int index1 = str1.indexOf(ch);
    int index2 = str2.indexOf(ch);
    if (index1 != -1) {
    str1 = str1.substring(0, index1)
    + ((index1 == str1.length()) ? “”
    : str1.substring(index1 + 1, str1.length()));
    } else if (index2 != -1) {
    str2 = str2.substring(0, index2)
    + ((index2 == str2.length()) ? “”
    : str2.substring(index2 + 1, str2.length()));
    } else {
    return false;
    }
    }
    return str1.length() == 0 && str2.length() == 0;
    }

  • Sorrowfull Blinger

    Can be done with O(N) … Scan all 3 strings , at any point if the character at str3 does not match with either str1 ,str2 return false , else keep incrementing the counter corresponding to the matching string . Since it is told the ordering of characters in individual strings is maintained this should work … Let me know if it fails for any case