Facebook Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
If there are 'n' papers in total, this problem can be solved in O(n) with space complexity of O(n). Note that, h-index can be between 0 to n. Say if the h-index is 10, this means, there has to be 10 papers with citation count >= 10. So if we can find out the number of papers with citations >=X for every X (and store it in an array C) where X ranges between 0 to n, then by scanning the count array C from the right to left, we can find the h-index at index i where i == C[i].
Pseudocode:
input array A of length n.
- init array C[0] to C[n] with 0
- foreach p in A, if p >= n, c[n]++; else c[p] +=1
- for i=n-1 to 0, c[i]=c[i]+c[i+1]
- for i=n to 0, if c[i] == i return i
Binary search for the first part as mentioned by @gnahzy:
public static void main(String[ ] args) {
int [ ] sortedCitation = {21, 17, 12, 2, 2, 1};
System.out.println("The h-index value is: " + getHindexFromSorted(sortedCitation));
}
private static int getHindexFromSorted(int[ ] citation) {
int low = 0; int high = citation.length - 1;
int idx = (low+high)/2;
while(low <= high) {
if(citation[idx] >= idx + 1) {
low = idx + 1;
}
else {
high = idx - 1;
}
idx = (low+high)/2;
}
return idx + 1;
}
@Cole's idea in Java:
private static int getHindexFromUnsorted(int[] citation) {
int[] hindex = new int[citation.length + 1];
for (int i = 0; i < citation.length; i++) {
if (citation[i] >= citation.length) {
hindex[citation.length]++;
} else {
hindex[citation[i]]++;
}
}
int count = 0;
for (int i = hindex.length - 1; i > 0; i--) {
if (i >= count) {
count += hindex[i];
}
}
return count;
}
1. we can solve this use binary search, which is O(logN).
2. implement @Cole 's idea in python, which is O(n)
def get_hindex_unsorted2(array):
# the trick is the h index can not be greater than either the size of the array
# or the max citation in the paper list.
max_hindex = min(len(array), max(array))
# citations[i] = the number of paper with citation being i
# for citation > max_hindex, we will put it into citations[max_hindex]
citations = [0] * (max_hindex + 1)
for x in array:
citations[min(x, max_hindex)] += 1
# paper count = the number of papers with citations greater than i
paper_count = 0
for i in range(max_hindex, -1, -1):
paper_count += citations[i]
if paper_count >= i:
# we find more than i papers with citations greater than i
return i
def get_hindex_sorted(array):
return get_hindex_helper(array, 0, len(array) - 1)
def get_hindex_helper(array, left, right):
assert left <= right
if left == right:
return min(len(array) - left, array[left])
mid = (left + right)//2
n = len(array) - mid
m = array[mid]
#there are n papers whose citations are at least m.
if n == m:
return n
elif m > n:
return get_hindex_helper(array, left, mid)
else: # n > m
return get_hindex_helper(array, mid + 1, right)
1. binary-search (O(log(n)). If citations[i] >= i then h >= i (if array's in descending order).
2. Here's a O(n) time & space solution in ruby. The trick is you can ignore citation-counts larger than n.
- Cole September 07, 2012