Programming Interview Questions 1: Array Pair Sum

Once again it’s the college recruiting season of the year and tech companies started the interview process for full time and internship positions. I had many interviews last year these days for a summer internship. Eventually I was an intern at Microsoft Bing, and will be joining there full time next summer. I won’t have any interviews this year, but since most of my friends are actively preparing for them nowadays, I thought it would be useful to share some good quality interview questions and provide my solutions. I come across this particular question pretty often recently: Given an integer array, output all pairs that sum up to a specific value k.

Let’s say the array is of size N. The naive way to solve the problem, for each element checking whether k-element is present in the array, which is O(N^2). This is of course far from optimal and you might not want to mention it during an interview as well. A more efficient solution would be to sort the array and having two pointers to scan the array from the beginning and the end at the same time. If the sum of the values in left and right pointers equals to k, we output the pair. If the sum is less than k then we advance the left pointer, else if the sum is greater than k we decrement the right pointer, until both pointers meet at some part of the array. The complexity of this solution is O(NlogN) due to sorting. Here is the Python code:

def pairSum1(arr, k):
    if len(arr)<2:
        return
    arr.sort()
    left, right = (0, len(arr)-1)
    while left<right:
        currentSum=arr[left]+arr[right]
        if currentSum==k:
            print arr[left], arr[right]
            left+=1 #or right-=1
        elif currentSum<k:
            left+=1
        else:
            right-=1

Most of the array based interview questions can be solved in O(NlogN) once we sort the input array. However, interviewers would generally be expecting linear time solutions. So let’s find a more optimal O(N) solution. But first we should clarify a detail with the interviewer, what if there is more than one copy of the same pair, do we output it twice? For example the array is [1, 1, 2, 3, 4] and the desired sum is 4. Should we output the pair (1, 3) twice or just once? Also do we output the reverse of a pair, meaning both (3, 1) and (1, 3)? Let’s keep the output as short as possible and print each pair only once. So, we will output only one copy of (1, 3). Also note that we shouldn’t output (2, 2) because it’s not a pair of two distinct elements.

The O(N) algorithm uses the set data structure. We perform a linear pass from the beginning and for each element we check whether k-element is in the set of seen numbers. If it is, then we found a pair of sum k and add it to the output. If not, this element doesn’t belong to a pair yet, and we add it to the set of seen elements. The algorithm is really simple once we figure out using a set. The complexity is O(N) because we do a single linear scan of the array, and for each element we just check whether the corresponding number to form a pair is in the set or add the current element to the set. Insert and find operations of a set are both average O(1), so the algorithm is O(N) in total. Here is the code in full detail:

def pairSum2(arr, k):
    if len(arr)<2:
        return
    seen=set()
    output=set()
    for num in arr:
        target=k-num
        if target not in seen:
            seen.add(num)
        else:
            output.add( (min(num, target), max(num, target)) )
    print '\n'.join( map(str, list(output)) )
VN:F [1.9.22_1171]
Rating: 9.2/10 (73 votes cast)
Programming Interview Questions 1: Array Pair Sum, 9.2 out of 10 based on 73 ratings

This entry was posted in Programming Interview. Bookmark the permalink.
  • George

    I came up to htable solution in the first place. The complexity is O(n+m) where n is input size, m amount of htable keys. Which is linear.
    Keep your work going. Nice start.

  • vs

    Arden,

    A great solution to a problem that’s seen on many interview routes! Well done! Again, I appreciate the way you present the least optimal solutions first and slowly lead towards the one that’s optimal. This is a great interview strategy too. Very nice!

  • http://ahmetalpbalkan.com Ahmet Alp Balkan

    @George I think complexity is O(n * n/m) where m is # keys and n is # elements. Assume m=1, now you have n^2 right?

    @Arden, I think Python “set” is not always O(1) on find and insert as documented here. http://wiki.python.org/moin/TimeComplexity If you can somehow instantiate the set with specifying number of keys then you can choose m=n and achieve worst case O(1).

    However if you just instantiate it as set() and do not avoid duplicate pairs and assume (x,y)!=(y,x) then underlying Python “s”et implementation needs to do bucketing which can lead to O(n) for find in worst case, as documented.

    I think currently not possible to specify # of keys in hash table that is staying under set implementation. Python can be problematic, however Java also maintains hash table with “load factor”. We should definitely implement our own hash table… :) What do you think?

    • Arden

      You’re totally right Ahmet. As the load factor of the set increases, the worst case complexity of a single operation becomes linear. But I would assume that after a certain load factor python would resize the set by doubling its size. So, the average time for an operation would still be amortized O(1), but still for some elements it can be O(N) in the worst case as you said. However, during an interview I suppose it’s safe to assume O(1) for operations on sets and hashtables.

      Implementing our own hashtable is a great idea. In my web search course last semester, I remember searching for a hashtable implementation where you can give the size as hint during construction, so that it would perform less resize operations. Because I already knew that I’ll insert millions of elements while implementing a search engine. I think the default size of a dictionary in python is 8, and the load factor threshold for resizing is 2/3. The size is multiplied by 4 during resizing unless the hashtable already big (50,000), otherwise it doubles the size.

      • Achal

        @Arden
        If I am using C++ STL Set ,
        1. Insert takes logarithmic time , but it is amortized constant.
        2. Find takes logarithmic time.

        So I think it would be O(nlogn) rather than O(n)

        • Arden

          The worst case complexity of find in set can be as bad as O(N) as Ahmet mentioned above. But I think it’s safe to use the average case constant complexity for sets and hashtables during an interview, by mentioning the worst case behavior. To be technically precise I should write Omega(N) since big-O is the worst case bound, but these articles are intended to focus more on common interview practices. But you’re right, the very worst case complexity using C++ STL set is O(NlogN). But I don’t think interviewers will object to O(N) as long as you mention the worst case, that’s my experience at least.

  • Say

    Hi Arden,
    In the first solution, how do we decide to use left++ or right–. It might fail if there are duplicates in the array.
    if we use left++ and input array is a = {-48,96,96,96} and k is 48, then it fails
    But if we use right– for same input array it works fine.

    • Arden

      That’s a good point Say, thanks for mentioning. Right– will also fail for some cases, for example input array is a = {-48,-48,-48,96}. So we need additional mechanisms to deal with duplicates, but I omitted them to keep the code simple in the first solution. I address that issue in the second algorithm.

      • Jinesh

        Hi Arden,

        I would like to suggest a slight modification for the first approach.

        Instead of commenting the right-=1 in the first if condition (currentSum==k), keep it as a part of the code over there and change the last else condition to elseif currentSum>k: right-=1

        I believe this should address the issues.
        Please let me know if I am wrong. Thank you.

  • Mahmoud

    Hi Arden,

    using the second algorithm, if you have an array of {1, 2, 5,3} and k=4
    at 1: target = 4-1=3 (add 1 to set) set={1}
    at 2: target = 4-2=2 (add 2 to set) set={1,2}
    at 5: target = 4-5=-1 (add 5 to set) set={1,2,5}
    at 3: target = 4-3=1 (add 3 to set 1) set={1,2,5,3}

    how the logic will output (1,3) ??

    • Arden

      At the last iteration since target=1 is in the set we execute the else part. So, we don’t add 3 to the set, we append (1, 3) to the output.

  • http://yssay.wordpress.com YS

    I have one doubt about this statement “target=k-num;”
    in this case you can only create pair whose come is equal to value k. but the
    question is finding the pair whose sum is upto K.
    example suppose k=4
    and our pair is (1,2) so 1+2=3 but this pair will not come in this algo.

    • Moyed

      Yes this statement needs modification. Target = K -num

  • anon

    Was asked this question in Microsoft interview.
    Things to keep in mind :-
    - duplicates possible
    - overflow, underflow cases

    Approach was to sort the array and then iterate over the array and for each item, do a binary search for k-arr[idx] on the remaining right part of the array.

    • hita

      how do you handle overflow, underflow ?

    • Vishal Hemnani

      Yes, binary search approach is a constant space but O(n logn) time.
      HashTable solution is O(n) space and time.

  • http://dotnetdevblog.blogspot.com Yash

    void Main()
    {
    int[] inputArray = { 10,12,13,15,4,3,1,8,8};
    Hashtable visitedHash = new Hashtable();
    int desiredSum = 16;

    foreach(var number in inputArray)
    {
    if(visitedHash.Contains(desiredSum – number))
    {
    String.Format(“Pair Found {0}-{1}”,number,desiredSum-number).Dump();
    }
    else
    {
    // Add it to the hash, Key will be the number, value will be { desiredSum – number }
    visitedHash.Add(number,(desiredSum – number));
    }
    }
    }

    • Vikram N

      Good one!

      By the way, what is .Dump(); ?

      • Vikram N

        Also, once we have found that visitedHash contains the ‘desiredSum – number’, I think, we need to remove that record from the visitedHash, don’t we?

        Otherwise following will happen (with the existing code):
        If the original array is {10, 6, 6, 6} and desiredSum = 16, then it will print following pairs: {10,6}, {10, 6}, {10, 6} instead of printing {10, 6} just once (I believe, in this case printing {10, 6} only once will be desired).

  • Zeus

    //finding no. of pairs of numbers accounting to a sum = k in O(n) and triads in O(n^2)
    #include

    using namespace std;

    void countpair(int *array,int i,int len,int &count2 , int k)

    {

    int j=len;

    count2=0;

    while(i<j)

    {

    if(array[i]+array[j] == k)

    {

    if(i==0 && j==len )

    {

    count2++;

    cout<<array[i]<<" "<<array[j]<<endl;

    i++;

    j–;

    continue;

    }

    else

    {

    if(array[i]==array[i-1] && array[j]==array[j+1])

    {

    i++;

    j–;

    continue;

    }

    count2++;

    cout<<array[i]<<" "<<array[j]< k)

    {

    j–;

    continue;

    }

    if(array[i]+array[j] < k)

    {

    i++;

    continue;

    }

    }

    }

    void counttriad(int *array,int i,int len,int &count3,int k)

    {

    int count2=0;

    while(i<len)

    {

    if(i!=0)

    {

    if(array[i]==array[i-1])

    {

    i++;

    continue;

    }

    }

    count2=0;

    countpair(array,i+1,len,count2,k-array[i]);//fixing that the value at that index

    //and searching for the concerned pair in th e right part of the array

    count3+=count2;

    i++;

    if(count2!=0)

    cout<<array[i]<<endl;

    }

    }

    int main()

    {

    int count2=0,count3=0;

    int array[]={1,1,2,2,3,3,6,7,7,8,8,9,9,10,10,11,11};

    int len=sizeof(array)/sizeof(int);

    int k=12;

    countpair(array,0,len-1,count2,k);

    //cout<<endl<<endl<<count2<<endl<<endl;

    counttriad(array,0,len-1,count3,k);

    cout<<endl<<endl<<count3<<endl<<endl;

    return 0;

    }

  • Fangfei

    operation find() with set is not O(1), please considering the underlining design of set — RB tree, how can find() takes O(1) ?

    • anon

      For Java, the HashSet is not underlined by a RB Tree. That’ll be a TreeSet.

    • vickychijwani

      sets with O(1) average-case lookups can be implemented using hashtables. See http://stackoverflow.com/a/3949795/504611 for details.

  • Anon

    Your analysis of pairSum2 seems to assume that checking if a number is ‘in seen’ can be done in constant time.

    • vickychijwani

      That’s what I thought too. I think the author means to use a *hashtable* (indeed, Python sets are powered by hashtables: http://stackoverflow.com/a/3949795/504611) for membership checks, so the assumption boils down to this: the data will be non-pathological for the hash function used in this case.