Google Interview Question
Principal Software EngineersCountry: United States
Interview Type: Phone Interview
Use a hash to count frequency of each unique number, store them in pairs like (unique_num, frequency).
Then sort the item-frequency pair list, in order of decreasing the frequency, using counting sort.
Complexity:
O(N) time for hashing + O(N+ maxFrequency) time for sorting. Note maxFrequency is <= N.
O(N) space for hashing and storing the frequency pair list.
Thus, overall is O(N) time, O(N) space.
Alternatively, we can consider this problem is to find K most frequent items. It can be done in O(N) as well, see this post for detailed discussion: capacode.com/?p=598
I don't understand how could counting sort work here. The link that you have shared talks about radix sort. I don't understand how radix sort would be a solution to the frequency item.
Hi guys:
When you have a list of pairs <item, freq>, the freq is an integer with value at most N.
So, you can sort the list based on freq, using counting sort, with time complexity of O(N+max Frequency).
Pseudo code for sorting the list of pairs <item, freq> is following:
Input: List[]
//counting/distributing:
for i = 1..N
f = List[i].freq;
maxF = max(maxF, f);
Counter[f].push_back(i); // put the index i into the counter f
//collecting:
for f = 1..maxF // for each freq f in increasing order from 1 to maxF
S = Counter[f].size()
for j = 0..S-1
index = Counter[f][j];
NewList.push_back(List[index]); // collect back the items in increasing order of freq
return NewList; //Note: this counting sort is not stable. To make it stable, one more array is needed.
//Complexity: O(N) space, O(N+maxF) time
Radix sort works almost the same, except that it works on each digit/character separately, from the least significant position to the most significant position.
(Each iteration of radix sort is a counting sort)
--ninhnn
Using the counting sort is quite a hack here, since the N (along with frequency) is going to infinity and therefore the memory for counting sort also needs to go. Not sure about this answer, I don't think it is practical to program it this way.
@demo: N can be big, but the frequency is less than N.
Use hash to compute frequency first. Then use counting sort on frequency.
Counting sort sorts the frequencies of the items, not the value of items. Thus, it takes O(N+maxFreq) time and space.
Note: we can't do better than linear time O(N), since we need to check for each and every item.
#include <iostream>
#include <cstdlib>
using namespace std;
struct Node {
Node *next;
int data, count;
Node(int d, int c = 1) : data(d), count(c) {}
};
Node* addNode(Node *list, int data) {
if (!list) {
return new Node(data);
}
Node *cur = list, *last = 0, *l=0;
while (cur) {
if (cur->data == data) {
cur->count++;
if (last && last->count < cur->count) {
int v = last->data; last->data = cur->data; cur->data = v;
v = last->count; last->count = cur->count; cur->count = v;
}
return list;
}
if (!last || last->count != cur->count) {
last = cur;
}
l = cur;
cur = cur->next;
}
l->next = new Node(data);
return list;
}
int main() {
int k = 2;
int a[] = {0, 0, 100, 3, 5, 4, 6, 4, 2, 100, 2, 100};
Node *list = 0;
for (int i : a) {
list = addNode(list, i);
}
while (--k) {
cout << list->data << ", ";
list = list->next;
}
cout << list->data;
}
For each element, place into a max-heap and a hash map (from element value to location in heap). If the element already exists in the map, perform increase-key operation on the element's location in the heap.
This means updating 2 datastructures as you iterate over the elements, but at the gain of constant access time to find an element in the heap.
Iterating over all elements: O(n)
- Inserting to hash: O(1)
- Inserting/updating in heap: O(1) amortized over all elements
Getting k most frequent elements from heap: O(klgn)
Overall: O(n)
What about 4 ? 4 also comes up two times in the initial array.
Here you have a shorter and prettier code:
#include <iostream>
#include <algorithm>
#include <unordered_map>
#include <vector>
using namespace std;
void read(int a[],int &n,int &k){
cin>>n>>k;
for(int i=1;i<=n;++i)
cin>>a[i];
}
bool cmp(pair<int,int> a,pair<int,int> b){
return a.first > b.first;
}
void solve(int a[],int n,int k){
unordered_map <int,int> mp;
unordered_map <int,int>::iterator it;
vector< pair<int,int> > tmp;
for(int i=1;i<=n;++i)
++mp[ a[i] ];
for(it=mp.begin();it!=mp.end();++it)
tmp.push_back(make_pair(it->second,it->first));
sort(tmp.begin(),tmp.end(),cmp);
for(int i=0;i<k;++i)
cout<<tmp[i].second<<endl;
}
int main() {
int a[100],n,k;
read(a,n,k);
solve(a,n,k);
return 0;
}
public class GetKFrequentElements {
private int[] _numbers;
/**
* @param _numbers
* @param _k
*/
public GetKFrequentElements(int[] numbers_) {
this._numbers = numbers_;
}
public int[] getKFrequentNumbers(int k_)
{
Preconditions.checkArgument(k_ <= _numbers.length, "k %s is larger or equal than number of list items",
k_);
// iterate and count into map
Map<Integer, Integer> numFreqMap = Maps.newHashMap();
for(int number : _numbers)
{
Integer freq = numFreqMap.get(number);
if(freq == null)
{
numFreqMap.put(number, 1);
}
else{
numFreqMap.put(number, freq.intValue() + 1); // can we use freq++ ?
}
}
Preconditions.checkArgument(k_ <= numFreqMap.keySet().size(), "k %s is larger or equal than number of distinct items",
k_);
// build kth min map
MinMaxPriorityQueue.Builder<Entry<Integer,Integer>> maxHeapBuilder = MinMaxPriorityQueue.orderedBy(
new Comparator<Entry<Integer, Integer>>(){
public int compare(Entry<Integer, Integer> o1, Entry<Integer, Integer> o2)
{
return ComparisonChain.start()
.compare(o2.getValue(), o1.getValue())
.result();
}
});
MinMaxPriorityQueue<Entry<Integer, Integer>> maxHeap = maxHeapBuilder.maximumSize(k_)
.create(numFreqMap.entrySet());
int[] ret = new int[k_];
for(int i = 0 ; i < k_; ++i)
{
Entry<Integer, Integer> last = maxHeap.poll();
if (last != null) {
ret[i] = last.getKey();
}
}
return ret;
}
/**
* @param args
*/
public static void main(String[] args) {
int a[] = { 0, 0, 100, 3, 5, 4, 6, 2, 100, 2, 100 };
GetKFrequentElements getk = new GetKFrequentElements(a);
System.out.println(Arrays.toString(getk.getKFrequentNumbers(5)));
}
}
public class GetKFrequentElements {
private int[] _numbers;
/**
* @param _numbers
* @param _k
*/
public GetKFrequentElements(int[] numbers_) {
this._numbers = numbers_;
}
public int[] getKFrequentNumbers(int k_)
{
Preconditions.checkArgument(k_ <= _numbers.length, "k %s is larger or equal than number of list items",
k_);
// iterate and count into map
Map<Integer, Integer> numFreqMap = Maps.newHashMap();
for(int number : _numbers)
{
Integer freq = numFreqMap.get(number);
if(freq == null)
{
numFreqMap.put(number, 1);
}
else{
numFreqMap.put(number, freq.intValue() + 1); // can we use freq++ ?
}
}
Preconditions.checkArgument(k_ <= numFreqMap.keySet().size(), "k %s is larger or equal than number of distinct items",
k_);
// build kth min map
MinMaxPriorityQueue.Builder<Entry<Integer,Integer>> maxHeapBuilder = MinMaxPriorityQueue.orderedBy(
new Comparator<Entry<Integer, Integer>>(){
public int compare(Entry<Integer, Integer> o1, Entry<Integer, Integer> o2)
{
return ComparisonChain.start()
.compare(o2.getValue(), o1.getValue())
.result();
}
});
MinMaxPriorityQueue<Entry<Integer, Integer>> maxHeap = maxHeapBuilder.maximumSize(k_)
.create(numFreqMap.entrySet());
int[] ret = new int[k_];
for(int i = 0 ; i < k_; ++i)
{
Entry<Integer, Integer> last = maxHeap.poll();
if (last != null) {
ret[i] = last.getKey();
}
}
return ret;
}
/**
* @param args
*/
public static void main(String[] args) {
int a[] = { 0, 0, 100, 3, 5, 4, 6, 2, 100, 2, 100 };
GetKFrequentElements getk = new GetKFrequentElements(a);
System.out.println(Arrays.toString(getk.getKFrequentNumbers(5)));
}
}
How about this? Wrote this in C#...
public static void OrderOfFrequency(int[] arr, int n)
{
Dictionary<int, int> dic = new Dictionary<int, int>();
for (int i = 0; i < arr.Length; i++)
{
if (dic.ContainsKey(arr[i]))
{
dic[arr[i]]++;
}
else
{
dic.Add(arr[i], 1);
}
}
for (int i = 0; i < n; i++)
{
var x = (from c in dic
select c).OrderByDescending(w => w.Value).ElementAt(i);
Console.Write(x.Key + " ");
if (dic.Count >= n)
continue;
else
break;
}
}
private static Node root;
private static Node[] max = new Node[2];
public static void main(String[] args) {
int[] arr = {0, 0, 100, 3, 5, 4, 6, 4, 2, 100, 2, 100};
int len = arr.length;
for (int i=0;i<len;i++) {
push(arr[i]);
}
System.out.println("["+max[0].key+","+max[1].key+"]");
}
private static void push(int i) {
root = push(root, i);
}
private static Node push(Node n, int i) {
if (n == null) return new Node(i, 1);
if (n.key > i || n.key < i) n.next = push(n.next, i);
else if (n.key == i) n.count++;
makeMaxArray(n);
return n;
}
private static void makeMaxArray(Node n) {
if (max[0] == null) max[0] = n;
if (n.count >= max[0].count) {
Node tmp = max[0];
max[0] = n;
max[1] = tmp;
}
}
private static class Node {
int key;
int count = 0;
Node next;
private Node(int i, int cnt) {
this.key = i;
this.count += cnt;
}
}
Try This:
a) Lets say we want to track top n numbers, init all top n to 0 with freq 0;
b) Do a linear Pass on array:
1) for a number if its >= the top n numbers you know so far , start matching it with your top n numbers,
1.a) it it matches any increase frq count and continue;
1.b) if its greater than a top n, insert it and shift other top n down
2) Print in required format.
package com.anmon.careercup;
import java.util.ArrayList;
import java.util.List;
public class TrackTopNNumbers {
class num
{
public int num;
public int freq;
};
public static void main(String[] args)
{
TrackTopNNumbers n = new TrackTopNNumbers();
int[] in = {22,22,33,33,2,1,3,4,5,6,6,6,6,6,6,7,7,7,7,7,7,8,8,8,8,8};
List<num> top = n.getTopNumbers(in,5);
if(top != null)
{
for(num nn : top)
{
System.out.println("DEBUG: Num: " + nn.num + ", Freq:" + nn.freq);
}
}
}
private List<num> getTopNumbers(int[] a, int n)
{
List<num> top = new ArrayList<TrackTopNNumbers.num>();
for(int i = 0; i < n ; i++)
{
num nn = new num();
nn.num = 0;
nn.freq = 0;
top.add(nn);
}
for(int i = 0; i <a.length; i++)
{
for(int j = 0; j < n; j++)
{
if(a[i] > top.get(j).num)
{
num nnn = new num();
nnn.num = a[i];
nnn.freq =1;
num temp = top.get(j);
top.set(j, nnn);
moveDown(temp, top, j+1);
break;
}
else if(a[i] == top.get(j).num)
{
top.get(j).freq += 1;
break;
}
}
}
return top;
}
private void moveDown(num num, List<num> top, int j)
{
if(j < top.size() && num.num > top.get(j).num)
{
num temp = top.get(j);
num nnn = new num();
nnn.num = num.num;
nnn.freq = num.freq;
top.set(j ,nnn);
moveDown(temp, top, j+1);
}
}
}
import java.lang.* ;
import java.util.* ;
public class Int_frequency {
public static void frequency(int arr[] , int n)
{
HashMap<Integer , Integer> hm = new HashMap <Integer , Integer>() ;
for(int i = 0 ; i<arr.length ; i++)
{
if(hm.containsKey(arr[i]))
{
int k = hm.get(arr[i]);
hm.put(arr[i] , k+1);
}
else
hm.put(arr[i], 1);
}
Set<Map.Entry<Integer, Integer>> s = hm.entrySet();
ArrayList<Map.Entry<Integer , Integer>> al = new ArrayList(s);
Collections.sort(al , new Comparator<Map.Entry<Integer , Integer>>()
{
public int compare(Map.Entry<Integer , Integer> a , Map.Entry<Integer , Integer> b)
{
return b.getValue()- a.getValue();
}
}
);
if(n>al.size())
{
System.out.println("total unique numbers itself are less than the entered number");
return ;}
ArrayList<Map.Entry<Integer , Integer>> ll = new ArrayList(al.subList(0, n));
Iterator it = ll.iterator();
while(it.hasNext())
System.out.println(it.next());
}
public static void main(String args[])
{
int [] a = {1 , 2 , 3 , 3 , 4 , 4 , 3 , 6 , 7 , 8 , 3 , 5 , 3 };
frequency(a , 3);
}
}
import java.lang.* ;
import java.util.* ;
public class Int_frequency {
public static void frequency(int arr[] , int n)
{
HashMap<Integer , Integer> hm = new HashMap <Integer , Integer>() ;
for(int i = 0 ; i<arr.length ; i++)
{
if(hm.containsKey(arr[i]))
{
int k = hm.get(arr[i]);
hm.put(arr[i] , k+1);
}
else
hm.put(arr[i], 1);
}
Set<Map.Entry<Integer, Integer>> s = hm.entrySet();
ArrayList<Map.Entry<Integer , Integer>> al = new ArrayList(s);
Collections.sort(al , new Comparator<Map.Entry<Integer , Integer>>()
{
public int compare(Map.Entry<Integer , Integer> a , Map.Entry<Integer , Integer> b)
{
return b.getValue()- a.getValue();
}
}
);
if(n>al.size())
{
System.out.println("total unique numbers itself are less than the entered number");
return ;}
ArrayList<Map.Entry<Integer , Integer>> ll = new ArrayList(al.subList(0, n));
Iterator it = ll.iterator();
while(it.hasNext())
System.out.println(it.next());
}
public static void main(String args[])
{
int [] a = {1 , 2 , 3 , 3 , 4 , 4 , 3 , 6 , 7 , 8 , 3 , 5 , 3 };
frequency(a , 3);
}
}
import java.util.Vector;
public static void main(String[] args) {
int[]elements={2,7,7,2,9,9,7,9,9,9};
Vector v=frecuency(elements, 2);
System.out.println(v.toString());
}
public static Vector frecuency(int[] elements,int n)
{
//assume that the array is not empty
int[] array=elements;
quickSort(0, array.length-1, array);
int number=array[0];
int major=1;
int count=1;
int first=-1;
int second=-1;
for(int i=1;i<array.length;i++)
{
if(number==array[i])
count++;
else
{
if(count>=n)
{
if(major<count)
{
major=count;
second=first;
first=number;
number=array[i];
count=1;
}
else
{
second=number;
number=array[i];
count=1;
}
}
else
{
count=1;
number=array[i];
}
}
}
if(major<count)
{
second=first;
first=number;
}
Vector vector=new Vector(2);
vector.add(first);
vector.add(second);
return vector;
}
import java.util.Vector;
public static void main(String[] args)
{
int[]elements={2,7,7,2,9,9,7,9,9,9};
Vector v=frecuency(elements, 2);
System.out.println(v.toString());
}
public static Vector frecuency(int[] elements,int n)
{
//assume that the array is not empty
int[] array=elements;
quickSort(0, array.length-1, array);
int number=array[0];
int major=1;
int count=1;
int first=-1;
int second=-1;
for(int i=1;i<array.length;i++)
{
if(number==array[i])
count++;
else
{
if(count>=n)
{
if(major<count)
{
major=count;
second=first;
first=number;
number=array[i];
count=1;
}
else
{
second=number;
number=array[i];
count=1;
}
}
else
{
count=1;
number=array[i];
}
}
}
if(major<count)
{
second=first;
first=number;
}
Vector vector=new Vector(2);
vector.add(first);
vector.add(second);
return vector;
}
Java 8 Stream API allows us write awesome one-liner:
public static int[] getMostFrequent(int[] values, int limit) {
return Arrays.stream(values).mapToObj(Integer::valueOf)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream().sorted(
Comparator.<Map.Entry<Integer, Long>>comparingLong(Map.Entry::getValue).reversed()
.thenComparing(Map.Entry::getKey)
).limit(limit).map(Map.Entry::getKey).mapToInt(Integer::intValue).toArray();
}
Step-by-step:
1. Convert int[] to Stream<Integer>;
2. Construct Map<Integer, Long> with distinct input numbers as keys (Integer) and their frequency (Long) as values;
3. Convert Map<Integer, Long> to Stream<Map.Entry<Integer, Long>;
4. Sort stream members by frequency then by actual number (for total order);
5. Set limit from our input data;
6. Convert Stream<Integer> to int[] through IntStream;
### using Python 2.7
def sortFrequency (array, n):
dicti = {}
for i in array:
if (i in dicti):
dicti[i] += 1
else:
dicti [i] = 1
result = [(v,k) for k,v in dicti.items()]
result.sort (key = lambda tup: tup[0],reverse = True)
print ( [k for v,k in result[0:2]])
array = [0, 0, 100, 3, 5, 4, 6, 4, 2, 100, 2, 100]
n = 2
sortFrequency (array, n)
# using Python 2.7
def sortFrequency (array, n):
dicti = {}
for i in array:
if (i in dicti):
dicti[i] += 1
else:
dicti [i] = 1
result = list( dicti.items())
result.sort (key = lambda tup: tup[1],reverse = True)
return [ k for k,v in result [0:n]]
array = [0, 0, 100, 3, 5, 4, 6, 4, 2, 100, 2, 100]
n = 2
result = sortFrequency (array, n)
print (result)
Here is my Java Solution:
public class freqArray {
public static void main(String[] args) {
int[] lst = {0,100,2,3,5,0,100,2,6,2};
int count = 3;
HashMap<Integer, Integer> tbl = new HashMap<Integer, Integer>();
for(int i = 0; i< lst.length ; i++)
{
int f = 0;
if(tbl.containsKey(lst[i]))
{
f= tbl.get(lst[i]);
}
f++;
tbl.put(lst[i],f);
}
List<Integer> values = new ArrayList( tbl.values());
Collections.sort(values, Collections.reverseOrder());
Set mySet = new HashSet<Integer>();
for(int i = 0; i<count ;i++)
{
for(int key : tbl.keySet())
{
if(tbl.get(key)== values.get(i) && !mySet.contains(key))
{
mySet.add(key);
System.out.println(key);
break;
}
}
}
}
}
vector<int> firstKthNumbers(int A[], int n, int k)
{
assert(A != NULL && n > 0 && k > 0);
unordered_map<int, int> st;
for (int i = 0; i < n; ++i) ++st[A[i]];
int *cnt = new int[n + 1]; memset(cnt, -1, sizeof(int) * (n + 1));
for (auto &p : st) cnt[p.second] = p.first;
vector<int> ans;
for (int i = n; i >= 0 && k > 0; --i)
{
if (cnt[i] >= 0) { ans.push_back(cnt[i]); --k; }
}
return move(ans);
}
void id5186461135536128(int[] array, int n){
Map<Integer,Integer> map = new HashMap<Integer, Integer>();
int range = 0;
for(int i:array){
if(!map.containsKey(i))
map.put(i, 1);
else
map.put(i, map.get(i)+1);
range = Math.max(range, map.get(i));
}
int[] count = new int[range+1];
for(Map.Entry<Integer, Integer> entry:map.entrySet()){
count[entry.getValue()]++;
}
for(int i=1;i<count.length;i++){
count[i] += count[i-1];
}
Entry[] res = new Entry[map.size()];
for(Map.Entry<Integer, Integer> entry:map.entrySet()){
res[--count[entry.getValue()]] = entry;
}
for(int i=res.length-1;i>res.length-1-n;i--){
System.out.println(res[i].getKey());
}
}
public static void displayResult(Integer[] intArray, Integer N) {
Collections.sort(Arrays.asList(intArray));
Map<Integer, Integer> intMap = new HashMap<Integer, Integer>();
int prevElement = intArray[0];
int counter = 0;
int size = intArray.length;
for (int i=0; i< size; i++) {
if(intArray[i] == prevElement){
counter += 1;
} else {
intMap.put(intArray[i-1], counter);
prevElement = intArray[i];
counter = 1;
}
}
intMap.put(intArray[size-1], counter);
Set<Entry<Integer, Integer>> set = intMap.entrySet();
List<Entry<Integer, Integer>> list = new ArrayList<Entry<Integer, Integer>>(set);
Collections.sort(list, new Comparator<Map.Entry<Integer, Integer>>() {
public int compare( Map.Entry<Integer, Integer> o1, Map.Entry<Integer, Integer> o2 ) {
return (o2.getValue()).compareTo( o1.getValue() );
}
});
List<Integer> resultList = new ArrayList<Integer>();
for(Map.Entry<Integer, Integer> entry:list) {
resultList.add(entry.getKey());
}
if(N > resultList.size() ) {
System.out.println("Invalid value entered! ");
} else {
System.out.println("Frequency Map= "+intMap);
System.out.println("Result List: "+resultList.subList(0, N));
}
}
/**
Author : Tom Choi
Date : 09/10/2016
You are given an array of integers. With a given integer n,
print out n most frequency elements in the array.
Strategy
1. Read the array and map (number, frequency) pairs O(n)
2. Create a heap and insert each pair based on frequency O(nlogn)
3. Remove top n items from the heap O(n)
Overall runtime = O(nlogn)
*/
import java.util.*;
public class OrderOfFrequency{
public static void main(String[] args){
int[] arr = {0, 0, 100, 3, 5, 4, 6, 4, 2, 100, 2, 100};
int n = 2;
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
for(int i = 0; i < arr.length; i++){
if(map.containsKey(arr[i])){
map.put(arr[i], map.get(arr[i])+1);
}else{
map.put(arr[i], 1);
}
}
EntryHeap heap = new EntryHeap();
for(int key : map.keySet()){
Entry entry = new Entry(key, map.get(key));
heap.insert(entry);
}
ArrayList<Integer> result = new ArrayList<Integer>();
for(int i = 0; i < n; i++){
Entry removed = heap.remove();
if(removed == null){
break;
}else{
result.add(removed.data);
}
}
System.out.println(result);
}
}
class EntryHeap{
Entry[] heap;
int DEFAULT_SIZE = 100;
int size;
EntryHeap(){
heap = new Entry[DEFAULT_SIZE];
size = 0;
}
Entry remove(){
if(size == 0){
return null;
}
Entry removed = heap[0];
swap(0, --size);
heapDown();
return removed;
}
void heapDown(){
int parent = 0;
while(true){
int left = 2*parent+1;
if(left >= size){
break;
}
int max = left;
int right = left+1;
if(right < size){
if(heap[right].freq > heap[left].freq){
max = right;
}
}
if(heap[parent].freq < heap[max].freq){
swap(parent, max);
parent = max;
}else{
break;
}
}
}
void insert(Entry e){
if(size >= heap.length){
ensureCapacity();
}
if(size == 0){
heap[0] = e;
}else{
heap[size] = e;
}
heapUp();
size++;
}
void heapUp(){
int child = size;
int parent = (child-1)/2;
while(parent >= 0 && heap[child].freq > heap[parent].freq){
swap(child, parent);
child = parent;
parent = (child-1)/2;
}
}
/**
* double the size of the heap
*/
void ensureCapacity(){
Entry[] old = heap;
heap = new Entry[old.length * 2 + 1];
for(int i = 0; i < old.length; i++){
heap[i] = old[i];
}
}
/**
* print the heap
*/
void print(){
for(int i = 0; i < size; i++){
System.out.print("[" + heap[i].data + " - " + heap[i].freq + "] ");
}System.out.println();
}
/**
* swap two entries
*/
void swap(int i, int j){
Entry temp = heap[i];
heap[i] = heap[j];
heap[j] = temp;
}
}
class Entry{
int data;
int freq;
Entry(int data, int freq){
this.data = data;
this.freq = freq;
}
}
use a hashmap to count occurrences of a number and then do quickselect to pick K biggest counts.
O(n * k) time, O(n) space
public class Solution {
public static void main(String[] args) {
int a[] = { 0, 0, 100, 3, 5, 4, 6, 2, 100, 2, 7, 7, 7, 100 };
printArray(findNumHighestKFrequent(a, 8));
}
public static int[] findNumHighestKFrequent(int[] a, int k) {
if (a == null || a.length == 0 || k <= 0 || k > a.length) {
return null;
}
Map<Integer, Integer> numCount = new HashMap<>();
int num;
int count;
for (int i = 0; i < a.length; i++) {
num = a[i];
if (!numCount.containsKey(num)) {
numCount.put(num, 1);
} else {
count = numCount.get(num);
count++;
numCount.put(num, count);
}
}
NumCount[] numCountArr = new NumCount[numCount.size()];
int i = 0;
for (Map.Entry<Integer, Integer> entry : numCount.entrySet()) {
numCountArr[i] = new NumCount(entry.getKey(), entry.getValue());
i++;
}
HashSet<Integer> resultSet = new HashSet<Integer>();
int[] result = new int[k];
int number = -1, counted = 0;
for (int j = numCountArr.length - 1; j >= 0 && counted < k; j--) {
number = quickSelect(numCountArr, j, 0, j).num;
if (!resultSet.contains(number)) {
resultSet.add(number);
result[counted] = number;
counted++;
}
}
return result;
}
private static <E extends Comparable<? super E>> int partition(E[] arr,
int left, int right, int pivot) {
E pivotVal = arr[pivot];
swap(arr, pivot, right);
int storeIndex = left;
for (int i = left; i < right; i++) {
if (arr[i].compareTo(pivotVal) < 0) {
swap(arr, i, storeIndex);
storeIndex++;
}
}
swap(arr, right, storeIndex);
return storeIndex;
}
private static <E extends Comparable<? super E>> E quickSelect(E[] arr,
int n, int left, int right) {
Random rand = new Random();
while (right >= left) {
int pivotIndex = partition(arr, left, right,
rand.nextInt(right - left + 1) + left);
if (pivotIndex == n) {
return arr[pivotIndex];
} else if (pivotIndex < n) {
left = pivotIndex + 1;
} else {
right = pivotIndex - 1;
}
}
return null;
}
private static void swap(Object[] arr, int i1, int i2) {
if (i1 != i2) {
Object temp = arr[i1];
arr[i1] = arr[i2];
arr[i2] = temp;
}
}
private static void printArray(int[] ar) {
for (int n : ar) {
System.out.print(n + " ");
}
System.out.println("");
}
}
class NumCount implements Comparable<NumCount> {
int num;
int count;
public NumCount(int num, int count) {
this.num = num;
this.count = count;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
NumCount other = (NumCount) obj;
if (num != other.num)
return false;
return true;
}
@Override
public int compareTo(NumCount o) {
if (this.count < o.count) {
return -1;
} else if (this.count > o.count) {
return 1;
}
return 0;
}
}
You don't need to sort. Use quickselect O(n) to find Kth frequency. When pass one more time - O(n) - to get all larger frequencies.
Thanks for pointing it out, I've updated my solution using quickSelect instead of sorting the whole array.
I don't see how is this O(n) time. You call quickSelect k times, Each call is O(j) {j:n..n-k}, so the average time complexity is O(k*n), which is not bad for small k-s, but when k ~ n, then it's close to O(n^2), in which case sorting in O(n*log(n)) might be a better choice.
I might miss something though, maybe because we repeat calling quickSelect on the same array, that get's more and more "sorted" the number of swaps is decreasing in each call, though the number of comparsions don't IMHO.
Since you're selecting K elements (but not just the Kth element), the quicksort will have a complexity of O(n*k) as titok said. Is it beter than O(n*logn), well that depends on the value of k.
These are valid points, I agree the complexity is O(n*k), and if k is ~ n then it's better to just sort the array, perhaps if the K is >= n/2 | n/4 would be a limit to decide to sort or not? Notice also that for each round of quickselect it partially sorts the array and once we pick one max K we can reduce the size of the next partition to be searched by one, so each time it will target (n-1), (n-2), (n-3), (n-k) elements.
Since the frequency is at most N, we can use counting sort with O(N) time.
If we are required to output the items in order of their appearance, stable radix sort with O(N) time will do.
I have a question about this solution with QuickSelect algorithm. What if this question asks for top 3 frequency numbers, it might return [100, 0, 0], isn't it?
It won's return repeated values?
Why? CompareTo() only compares NumCount.count. There is no way for QuickSelect to tell the difference between [4], and [0]
I updated the solution to use an aux. HashSet which stores values already calculated, so it won't keep repeated values.
Using HashSet will just solve one issue by creating another. You may end up having [100, 0] for top 3 frequency numbers. The issue is that there is no way for QuickSelect to return a different value every time when the frequencies are the same.
One solution I can think of is to not only compare the count but also the num in NumCount.compareTo method.
Here is a solution using Java 8.
It has the minimal runtime of O(n).
First the input array is parsed and sorted into a Map (Number -> Frequency),
which is O(n), where n is the size of the input array.
Next the Map is copied to a MaxHeap. Building a MaxHeap just needs O(n), as we are sorting just to achieve a partial order. I hope Java's PriorityQueue.addAll does this!
n is in this case the size of the Map, which is smaller then the initial input array (now the numbers are shrinked down to the set of distinct numbers).
Next, we copy n numbers into the result array, where n is the n most frequent numbers.
This happens in O(log n).
Asymptotically O(n = inputArray.length) + O(n = map.size()) + O(log n = resulting numbers) is O(n).
Here is the code:
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.PriorityQueue;
/*
Given an unsorted array of natural numbers. Where numbers repeat in array. Out put numbers in the order of frequency. Where number of out put is passed as parameter.
For Ex:
Array -> [0, 0, 100, 3, 5, 4, 6, 4, 2, 100, 2, 100]
n -> 2
Out put -> [100, 0] or [100, 2]
Since 100 is repeated 3 times and 0, 2 is repeated 2 times, out put up to 2 most frequent elements, program should out put either 100, 0 or 100, 2
*/
public class FreqNumbHeap{
public static int[] mostFrequent(int[] input, int n){
if(n<=0 || input.length == 0){
return null;
}
if(n > input.length){
throw new IllegalArgumentException("less numbers then n!");
}
//convert to map of Numbers->Frequency ( O(input.length) )
Map<Integer, Integer> numberFrequencyMap = new HashMap<Integer, Integer>(n);
for(int number : input){
numberFrequencyMap.compute(number, (k,v) -> v==null?1:v+1);
}
PriorityQueue<Entry<Integer,Integer>> maxHeap = new PriorityQueue<Entry<Integer,Integer>>(numberFrequencyMap.size(),
(e1, e2) -> {
int order = e1.getValue().compareTo(e2.getValue());
order = order*(-1);
return order;
});
//should only need O(f), where f is the count of distinct numbers
maxHeap.addAll(numberFrequencyMap.entrySet());
//build the result ( O (n) )
int[] result = maxHeap.stream()
.limit(n)
.mapToInt( entity -> entity.getKey())
.toArray();
return result;
}
public static void main(String[] args){
int[] input = {0, 0, 100, 3, 5, 4, 6, 4, 2, 100, 2, 100};
int n = 2;
int[] result = FreqNumbHeap.mostFrequent(input, n);
System.out.println(n+" most frequent Numbers are: ["+result[0]+","+result[1]+"]");
}
}
All other solutions use complete sorting, which needs O(n log n).
Using a TreeHashMap is probably the next best solution, as counting the frequencies and sorting happens in the same datastructure at the same time, thus needing lesser space.
Additionally the code would have even less lines and might be better to understand.
Before I came up with the above solution I relied on normal sorting, as in the code below, which makes use of Java 8's new features:
package gPrep.misc.FreqNumb;
import java.util.HashMap;
import java.util.Map;
public class FreqNumb{
public static int[] mostFrequent(int[] input, int n){
if(n<=0 || input.length == 0){
return null;
}
if(n > input.length){
throw new IllegalArgumentException("less numbers then n!");
}
//convert to map of Numbers->Frequency ( O(n) )
Map<Integer, Integer> numberFrequencyMap = new HashMap<Integer, Integer>(n);
for(int number : input){
numberFrequencyMap.compute(number, (k,v) -> v==null?1:v+1);
}
//sort reversed ( O (f log f) (f = #Frequencies)
int[] result = numberFrequencyMap.keySet().stream()
.sorted((Integer n1, Integer n2) -> {
int order = numberFrequencyMap.get(n1).compareTo(numberFrequencyMap.get(n2));
order = order*(-1);
return order;
})
.limit(n)
.mapToInt( number -> number)
.toArray();
return result;
}
public static void main(String[] args){
int[] input = {0, 0, 100, 3, 5, 4, 6, 4, 2, 100, 2, 100};
int n = 2;
int[] result = FreqNumb.mostFrequent(input, n);
System.out.println(n+" most frequent Numbers are: ["+result[0]+","+result[1]+"]");
}
}
Whenever you are adding to the PriorityQueue, it does sorting behind the scenes, probably at O(n*logn). It is just a sorted linked list, which is the same solution I've posted yesterday
Here is a solution using Java 8.
It has the minimal runtime of O(n).
First the input array is parsed and sorted into a Map (Number -> Frequency),
which is O(n), where n is the size of the input array.
Next the Map is copied to a MaxHeap. Building a MaxHeap just needs O(n), as we are sorting just to achieve a partial order. I hope Java's PriorityQueue.addAll does this!
n is in this case the size of the Map, which is smaller then the initial input array (now the numbers are shrinked down to the set of distinct numbers).
Next, we copy n numbers into the result array, where n is the n most frequent numbers.
This happens in O(log n).
Asymptotically O(n = inputArray.length) + O(n = map.size()) + O(log n = resulting numbers) is O(n).
Here is the code:
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.PriorityQueue;
/*
Given an unsorted array of natural numbers. Where numbers repeat in array. Out put numbers in the order of frequency. Where number of out put is passed as parameter.
For Ex:
Array -> [0, 0, 100, 3, 5, 4, 6, 4, 2, 100, 2, 100]
n -> 2
Out put -> [100, 0] or [100, 2]
Since 100 is repeated 3 times and 0, 2 is repeated 2 times, out put up to 2 most frequent elements, program should out put either 100, 0 or 100, 2
*/
public class FreqNumbHeap{
public static int[] mostFrequent(int[] input, int n){
if(n<=0 || input.length == 0){
return null;
}
if(n > input.length){
throw new IllegalArgumentException("less numbers then n!");
}
//convert to map of Numbers->Frequency ( O(input.length) )
Map<Integer, Integer> numberFrequencyMap = new HashMap<Integer, Integer>(n);
for(int number : input){
numberFrequencyMap.compute(number, (k,v) -> v==null?1:v+1);
}
PriorityQueue<Entry<Integer,Integer>> maxHeap = new PriorityQueue<Entry<Integer,Integer>>(numberFrequencyMap.size(),
(e1, e2) -> {
int order = e1.getValue().compareTo(e2.getValue());
order = order*(-1);
return order;
});
//should only need O(f), where f is the count of distinct numbers
maxHeap.addAll(numberFrequencyMap.entrySet());
//build the result ( O (n) )
int[] result = maxHeap.stream()
.limit(n)
.mapToInt( entity -> entity.getKey())
.toArray();
return result;
}
public static void main(String[] args){
int[] input = {0, 0, 100, 3, 5, 4, 6, 4, 2, 100, 2, 100};
int n = 2;
int[] result = FreqNumbHeap.mostFrequent(input, n);
System.out.println(n+" most frequent Numbers are: ["+result[0]+","+result[1]+"]");
}
}
All other solutions use complete sorting, which needs O(n log n).
Using a TreeHashMap is probably the next best solution, as counting the frequencies and sorting happens in the same datastructure at the same time, thus needing lesser space.
Additionally the code would have even less lines and might be better to understand.
Before I came up with the above solution I relied on normal sorting, as in the code below, which makes use of Java 8's new features:
package gPrep.misc.FreqNumb;
import java.util.HashMap;
import java.util.Map;
public class FreqNumb{
public static int[] mostFrequent(int[] input, int n){
if(n<=0 || input.length == 0){
return null;
}
if(n > input.length){
throw new IllegalArgumentException("less numbers then n!");
}
//convert to map of Numbers->Frequency ( O(n) )
Map<Integer, Integer> numberFrequencyMap = new HashMap<Integer, Integer>(n);
for(int number : input){
numberFrequencyMap.compute(number, (k,v) -> v==null?1:v+1);
}
//sort reversed ( O (f log f) (f = #Frequencies)
int[] result = numberFrequencyMap.keySet().stream()
.sorted((Integer n1, Integer n2) -> {
int order = numberFrequencyMap.get(n1).compareTo(numberFrequencyMap.get(n2));
order = order*(-1);
return order;
})
.limit(n)
.mapToInt( number -> number)
.toArray();
return result;
}
public static void main(String[] args){
int[] input = {0, 0, 100, 3, 5, 4, 6, 4, 2, 100, 2, 100};
int n = 2;
int[] result = FreqNumb.mostFrequent(input, n);
System.out.println(n+" most frequent Numbers are: ["+result[0]+","+result[1]+"]");
}
}
public int[] GetTopDuplicateFromArray( int[] arr, int n)
{
for(int val : arr)
{
if(map.containsKey(val))
{
int count = 0;
count = map.get(val);
map.put(val, count + 1);
}
else
{
map.put(val, 1);
}
}
//Sort by values in descending order, get top n
Stream<Entry<Integer, Integer>> sortedByValueMap = map
.entrySet()
.stream()
.sorted(Collections.reverseOrder(Entry.comparingByValue()))
.limit(n);
int[] topN = sortedByValueMap.mapToInt(w -> w.getKey()).toArray();
return topN;
}
Construct a hashmap with the numbers and frequencies, then build a maxheap of k items based on the frequency.
- Anonymous March 25, 2015