# Programming Interview Questions 1: Array Pair Sum

Once again it’s the college recruiting season of the year and tech companies started the interview process for full time and internship positions. I had many interviews last year these days for a summer internship. Eventually I was an intern at Microsoft Bing, and will be joining there full time next summer. I won’t have any interviews this year, but since most of my friends are actively preparing for them nowadays, I thought it would be useful to share some good quality interview questions and provide my solutions. I come across this particular question pretty often recently: Given an integer array, output all pairs that sum up to a specific value k.

Let’s say the array is of size N. The naive way to solve the problem, for each element checking whether k-element is present in the array, which is O(N^2). This is of course far from optimal and you might not want to mention it during an interview as well. A more efficient solution would be to sort the array and having two pointers to scan the array from the beginning and the end at the same time. If the sum of the values in left and right pointers equals to k, we output the pair. If the sum is less than k then we advance the left pointer, else if the sum is greater than k we decrement the right pointer, until both pointers meet at some part of the array. The complexity of this solution is O(NlogN) due to sorting. Here is the Python code:

```def pairSum1(arr, k): if len(arr)<2: return arr.sort() left, right = (0, len(arr)-1) while left<right: currentSum=arr[left]+arr[right] if currentSum==k: print arr[left], arr[right] left+=1 #or right-=1 elif currentSum<k: left+=1 else: right-=1```

Most of the array based interview questions can be solved in O(NlogN) once we sort the input array. However, interviewers would generally be expecting linear time solutions. So let’s find a more optimal O(N) solution. But first we should clarify a detail with the interviewer, what if there is more than one copy of the same pair, do we output it twice? For example the array is [1, 1, 2, 3, 4] and the desired sum is 4. Should we output the pair (1, 3) twice or just once? Also do we output the reverse of a pair, meaning both (3, 1) and (1, 3)? Let’s keep the output as short as possible and print each pair only once. So, we will output only one copy of (1, 3). Also note that we shouldn’t output (2, 2) because it’s not a pair of two distinct elements.

The O(N) algorithm uses the set data structure. We perform a linear pass from the beginning and for each element we check whether k-element is in the set of seen numbers. If it is, then we found a pair of sum k and add it to the output. If not, this element doesn’t belong to a pair yet, and we add it to the set of seen elements. The algorithm is really simple once we figure out using a set. The complexity is O(N) because we do a single linear scan of the array, and for each element we just check whether the corresponding number to form a pair is in the set or add the current element to the set. Insert and find operations of a set are both average O(1), so the algorithm is O(N) in total. Here is the code in full detail:

```def pairSum2(arr, k): if len(arr)<2: return seen=set() output=set() for num in arr: target=k-num if target not in seen: seen.add(num) else: output.add( (min(num, target), max(num, target)) ) print '\n'.join( map(str, list(output)) )```
VN:F [1.9.22_1171]
Programming Interview Questions 1: Array Pair Sum, 9.0 out of 10 based on 126 ratings
This entry was posted in Programming Interview. Bookmark the permalink.
• George

I came up to htable solution in the first place. The complexity is O(n+m) where n is input size, m amount of htable keys. Which is linear.
Keep your work going. Nice start.

• vs

Arden,

A great solution to a problem that’s seen on many interview routes! Well done! Again, I appreciate the way you present the least optimal solutions first and slowly lead towards the one that’s optimal. This is a great interview strategy too. Very nice!

• http://ahmetalpbalkan.com Ahmet Alp Balkan

@George I think complexity is `O(n * n/m)` where m is # keys and n is # elements. Assume m=1, now you have n^2 right?

@Arden, I think Python “set” is not always O(1) on find and insert as documented here. http://wiki.python.org/moin/TimeComplexity If you can somehow instantiate the set with specifying number of keys then you can choose m=n and achieve worst case O(1).

However if you just instantiate it as `set()` and do not avoid duplicate pairs and assume (x,y)!=(y,x) then underlying Python “s”et implementation needs to do bucketing which can lead to O(n) for find in worst case, as documented.

I think currently not possible to specify # of keys in hash table that is staying under set implementation. Python can be problematic, however Java also maintains hash table with “load factor”. We should definitely implement our own hash table… :) What do you think?

• Arden

You’re totally right Ahmet. As the load factor of the set increases, the worst case complexity of a single operation becomes linear. But I would assume that after a certain load factor python would resize the set by doubling its size. So, the average time for an operation would still be amortized O(1), but still for some elements it can be O(N) in the worst case as you said. However, during an interview I suppose it’s safe to assume O(1) for operations on sets and hashtables.

Implementing our own hashtable is a great idea. In my web search course last semester, I remember searching for a hashtable implementation where you can give the size as hint during construction, so that it would perform less resize operations. Because I already knew that I’ll insert millions of elements while implementing a search engine. I think the default size of a dictionary in python is 8, and the load factor threshold for resizing is 2/3. The size is multiplied by 4 during resizing unless the hashtable already big (50,000), otherwise it doubles the size.

• Achal

@Arden
If I am using C++ STL Set ,
1. Insert takes logarithmic time , but it is amortized constant.
2. Find takes logarithmic time.

So I think it would be O(nlogn) rather than O(n)

• Arden

The worst case complexity of find in set can be as bad as O(N) as Ahmet mentioned above. But I think it’s safe to use the average case constant complexity for sets and hashtables during an interview, by mentioning the worst case behavior. To be technically precise I should write Omega(N) since big-O is the worst case bound, but these articles are intended to focus more on common interview practices. But you’re right, the very worst case complexity using C++ STL set is O(NlogN). But I don’t think interviewers will object to O(N) as long as you mention the worst case, that’s my experience at least.

• Robin Gonzalez Valero

I was thinking… shouldn’t set always evaluate to a constant size? Since a set will always be made of unique characters and therefore would always be in big o of a constant. So it wouldn’t be the same as the input size (n),.

• Say

Hi Arden,
In the first solution, how do we decide to use left++ or right–. It might fail if there are duplicates in the array.
if we use left++ and input array is a = {-48,96,96,96} and k is 48, then it fails
But if we use right– for same input array it works fine.

• Arden

That’s a good point Say, thanks for mentioning. Right– will also fail for some cases, for example input array is a = {-48,-48,-48,96}. So we need additional mechanisms to deal with duplicates, but I omitted them to keep the code simple in the first solution. I address that issue in the second algorithm.

• Jinesh

Hi Arden,

I would like to suggest a slight modification for the first approach.

Instead of commenting the right-=1 in the first if condition (currentSum==k), keep it as a part of the code over there and change the last else condition to elseif currentSum>k: right-=1

I believe this should address the issues.
Please let me know if I am wrong. Thank you.

• Mahmoud

Hi Arden,

using the second algorithm, if you have an array of {1, 2, 5,3} and k=4
at 1: target = 4-1=3 (add 1 to set) set={1}
at 2: target = 4-2=2 (add 2 to set) set={1,2}
at 5: target = 4-5=-1 (add 5 to set) set={1,2,5}
at 3: target = 4-3=1 (add 3 to set 1) set={1,2,5,3}

how the logic will output (1,3) ??

• Arden

At the last iteration since target=1 is in the set we execute the else part. So, we don’t add 3 to the set, we append (1, 3) to the output.

• http://yssay.wordpress.com YS

in this case you can only create pair whose come is equal to value k. but the
question is finding the pair whose sum is upto K.
example suppose k=4
and our pair is (1,2) so 1+2=3 but this pair will not come in this algo.

• Moyed

Yes this statement needs modification. Target = K -num

• anon

Was asked this question in Microsoft interview.
Things to keep in mind :-
– duplicates possible
– overflow, underflow cases

Approach was to sort the array and then iterate over the array and for each item, do a binary search for k-arr[idx] on the remaining right part of the array.

• hita

how do you handle overflow, underflow ?

• Vishal Hemnani

Yes, binary search approach is a constant space but O(n logn) time.
HashTable solution is O(n) space and time.

• http://dotnetdevblog.blogspot.com Yash

void Main()
{
int[] inputArray = { 10,12,13,15,4,3,1,8,8};
Hashtable visitedHash = new Hashtable();
int desiredSum = 16;

foreach(var number in inputArray)
{
if(visitedHash.Contains(desiredSum – number))
{
String.Format(“Pair Found {0}-{1}”,number,desiredSum-number).Dump();
}
else
{
// Add it to the hash, Key will be the number, value will be { desiredSum – number }
}
}
}

• Vikram N

Good one!

By the way, what is .Dump(); ?

• Vikram N

Also, once we have found that visitedHash contains the ‘desiredSum – number’, I think, we need to remove that record from the visitedHash, don’t we?

Otherwise following will happen (with the existing code):
If the original array is {10, 6, 6, 6} and desiredSum = 16, then it will print following pairs: {10,6}, {10, 6}, {10, 6} instead of printing {10, 6} just once (I believe, in this case printing {10, 6} only once will be desired).

• Selçuk İlhan Aydı

This code does not work if you have 2, 2. You need to check it again.

• Zeus

//finding no. of pairs of numbers accounting to a sum = k in O(n) and triads in O(n^2)
#include

using namespace std;

void countpair(int *array,int i,int len,int &count2 , int k)

{

int j=len;

count2=0;

while(i<j)

{

if(array[i]+array[j] == k)

{

if(i==0 && j==len )

{

count2++;

cout<<array[i]<<" "<<array[j]<<endl;

i++;

j–;

continue;

}

else

{

if(array[i]==array[i-1] && array[j]==array[j+1])

{

i++;

j–;

continue;

}

count2++;

cout<<array[i]<<" "<<array[j]< k)

{

j–;

continue;

}

if(array[i]+array[j] < k)

{

i++;

continue;

}

}

}

void counttriad(int *array,int i,int len,int &count3,int k)

{

int count2=0;

while(i<len)

{

if(i!=0)

{

if(array[i]==array[i-1])

{

i++;

continue;

}

}

count2=0;

countpair(array,i+1,len,count2,k-array[i]);//fixing that the value at that index

//and searching for the concerned pair in th e right part of the array

count3+=count2;

i++;

if(count2!=0)

cout<<array[i]<<endl;

}

}

int main()

{

int count2=0,count3=0;

int array[]={1,1,2,2,3,3,6,7,7,8,8,9,9,10,10,11,11};

int len=sizeof(array)/sizeof(int);

int k=12;

countpair(array,0,len-1,count2,k);

//cout<<endl<<endl<<count2<<endl<<endl;

cout<<endl<<endl<<count3<<endl<<endl;

return 0;

}

• Fangfei

operation find() with set is not O(1), please considering the underlining design of set — RB tree, how can find() takes O(1) ?

• anon

For Java, the HashSet is not underlined by a RB Tree. That’ll be a TreeSet.

• vickychijwani

sets with O(1) average-case lookups can be implemented using hashtables. See http://stackoverflow.com/a/3949795/504611 for details.

• Anon

Your analysis of pairSum2 seems to assume that checking if a number is ‘in seen’ can be done in constant time.

• vickychijwani

That’s what I thought too. I think the author means to use a *hashtable* (indeed, Python sets are powered by hashtables: http://stackoverflow.com/a/3949795/504611) for membership checks, so the assumption boils down to this: the data will be non-pathological for the hash function used in this case.

• Arthur_Camara

Sets, usualy, are implemented as hashtables. If you are using C++, make sure to use unordered_set from C++11, since set is actually a tree, with lookup time of logn.

• Saurabh

Arden,
Thanks for taking out time to post all these immensely helpful posts.

Speaking about the last solution presented, I think the expectation is to see how you can come up with the answers yourself instead of using any ready-made solutions, the Java API provides plethora of Data Structures and if we use powers of these sophisticated Data Structures then there is nothing left in the question. I think you’re essentially diluting the problem by using a Set. How different is the most effective solution from simply putting all
the numbers into a HashSet and then checking if K-Num is present in the Set in O(1)
time for every Num in the array in O(n) time, hence resulting into O(n)
in the end? I as an interviewer won’t be impressed with this answer. Using two pointers, one at the start and one at the end showcases more intelligence and I would appreciate that answer more. Just my thoughts and perspectives. Awesome blog though, thank you again.

Thanks,
Saurabh

• ThunderWiring

I got a suggestion: the 2 numbers in the pair are ofcourse less than K, so make an array of K elements, use bucket sort and find your pairs like in your solution with n*logn, except it would be O(n) now.

• andyrekhi

arr = [5, 6, 4, 7, 3, 4, 5, 6, 3, 1, 2, 4]
k = 6

pairSum2 doesn’t work for the mentioned array

• Juliana Louback

A suggestion that runs in O(n + m) time for n = size of vector, m = the number of the target sum. It does however assume no negative integers:
1. For a target number m, allocate a boolean vector Pairs of size m+1. (all false)
2. Iterate through array of numbers; if the current number is less or equal to m, set Pairs[current number] = true; for example, say m=7, if you get 6, set Pairs[6] to be true. (This takes time n)
3. for i=0 to half of the Pairs array: if Pairs[i] AND Pairs[m-i] are true, output (i, m-i).
A benefit is that this works with an unordered set. Does that make sense?

• Oumar Karaga

def sum_pairs(array, l):

new_list = []

for i in range(0, len(array)):

for j in range(i+1, len(array)):

if array[i] + array[j] == l:

t = (array[i], array[j])

new_list.append(t)

return new_list

• Oumar Karaga

hey guys I believe this code is also valid, any comments?

• Šimon

A good to deal with duplicates (in case you are following the hashtable approach) is not to insert them at all and just check whether the duplicate number is not k/2 (which is the only doplicate that matters).

• Architect

``` //Java 8 Streams & Lambdas solution: List numberList = Arrays.asList(1,1,2,3,4); int k = 4;```

``` numberList.stream() .map(s -> k - s) .filter(s -> numberList.contains(s)) .map(s -> new String("(" + Math.min(s, k-s) + "," + Math.max(s, k-s) + ")")) .distinct() .forEach(s -> System.out.println(s)); ```

• magicsign

“Also note that we shouldn’t output (2, 2) because it’s not a pair of two distinct elements.” – You didn’t specify at the beginning that this is a requirement