## Google Interview Question

Software Engineer / Developers**Country:**India

**Interview Type:**Phone Interview

that depends on question. there are 2 scenarios

case 1. inserting data is duplicate then simply discard

case 2. add one extra variable in hash table "count" that will keep the count of each data... when data is inserted then increment the count. when data is deleted then decrement count and if count becomes zero then only delete the hash entry and and in index array

I suppose it just depends on how you interpret the question. I would think that this is being used for some sort of random bag datastructure, where I think it would probably be more useful to return elements with probability proportional to their frequency. But we really don't know what the requirements are, so we can't say.

@eugene.yarovoi: Since, the question has been asked to implement a set, doesn't it simply state that there should no duplicates in whatever you are using to implement it as is the case with the standard C++ sets in STL.

So, we don't ever need to worry about duplicates in this case.

hi Epic coder,

After reading the question the first doubt that comes to my mind is, what would be arguments with insert(),remove(),get_random().

In case there is no argument then we can assume to operate on top most element as in your case.

But, suppose if the index is given then though it can still be done in O(1), itsnt there a need to rearrange the elements in our array after every remove() ?

First, epic coder, when you resize the array, it's still O(1) amortized, so that's no problem.

To remove randomly, keep the array as a stack. Then keep two hash tables that give you O(1) access. One hash table maps key->stack_index. The other hash table maps stack_index->key.

Then remove is O(1). The algorithm is as such:

look up key in key->index hash table. Then return the value in the stack at the index.

then you want to swap the top of the stack into this position so the stack is not sparse. So you take the last element's index, look it up in the index->key hash table, and now you can modify all of the state you need.

It's all O(1) if your hashtable is O(1).

Hi Ajit,

In this problem, it doesn't matter where you insert/delete the element because the only read operation defined for this data structure is get_random().

If there was an operation defined where you have to lookup the index of a particular element(say get element at ith position), in that case I would bother about inserting/deleting at a particular position.

```
public class RandHashAccess<T>
{
private Map<T, Integer> elementMap;
private List<T> elementArr;
public RandHashAccess()
{
elementMap = new HashMap<T, Integer>();
elementArr = new ArrayList<T>();
}
public void Insert(T element)
{
int arrLength = elementArr.size();
elementArr.add(element);
elementMap.put(element, arrLength);
}
public void Remove(T element)
{
Integer eleArrIndex = elementMap.get(element);
if(eleArrIndex != null)
{
/* swap the element in cur position with nth element and
remove the last element from the array */
int arrSize = elementArr.size();
T eleNth = elementArr.get(arrSize-1);
elementArr.set(eleArrIndex, eleNth);
elementArr.remove(arrSize-1);
/* adjust the nth element array order in the map, since it is swapped with cur element position */
elementMap.put(eleNth, eleArrIndex);
elementMap.remove(element);
}
public T get_random()
{
int randElePos = Math.random(0, elementArr.size() -1);
return elementArr.get(randElePos);
}
}
```

+1: Looks right. Of course, instead of a list you probably need to use a vector like structure, to make get_random fast. Also, you seem to be using inconsistent naming conventions. Any reason for that?

Also, curiously, interchange the order of put and remove in the Remove method (i.e. the last two lines), and you have a bug!

So the answer here is a hash table because you can insert and remove in O(1). The tough part was the get_random() function because if you just try to generate random numbers until you get an index that actually has a value you will not get constant time.

why cant you have one int variable which contain the number of element in the hash table...

@als365, Could you you explain your solution a bit more. We also have to think about how to choose a random element when our key is hashed to a location that has multiple values chained there.

@Vikas, I think what als365 meant was use hashtable/map along with array/arraylist. Map is used to store the element.

Array is used to store the elements in the insertion order.

In the map, we can store the element as key and insertionIndex in array as value.

During insert, add to end of array and note the index. Insert the element into map with the notedIndex as value.

During delete, get the element note the index in array. switch the index element with nth position. Also update the arrayOrder for current index element in the map to point to index now the element to be removed is in last position. Remove it from array and map.

During get_random, choose rand element between 0..array.length and return the element.

```
public RandHashAccess<T>
{
private HashMap<T,Integer> elemMap;
private ArrayList<T> eleArray;
public void Insert(T element)
}
```

@eugene.yarovoi, duplicates can be ignored. If that value is already present in map during insertion, then ignore.

Thanks Aparna.

In this solution, we are not keeping the ordering of elements. The nth element can be moved in the middle of the array. So, it becomes the ith element. Ideally, if we remove the ith element, then i+1 th element should become the ith element.

```
public class HashTable {
private DataItem [] data;
private int size;
private int [] keys;
private int numKeys;
private class DataItem {
public int keyIndex;
public object data;
public DataItem(int k, object d) {
keyIndex = k;
data = d;
}
}
```

```
public HashTable() {
size = 100;
data = new DataItem[size];
keys = new int[size];
numKeys = 0;
}
public Insert (object newData) {
int key = hash(newData);
insertData = new DataItem(numKeys, newData)
data[key] = newData;
keys[numKeys] = key;
numKeys++;
}
public Remove (object removeData) {
int key = hash(removeData);
int keyIndex = data[key].keyIndex;
keys[keyIndex] = keys[numKeys];
numKeys--;
data[keys[keyIndex]].keyIndex = keyIndex;
hash[key] = null;
}
public object GetRandom() {
r = math.rand(0, numKeys-1);
return data[keys[r]].data;
}
}
```

I never like the idea of suggesting hash table for such problems because implementing a hashtable that gives you O(1) insertion and deletion is a big problem in itself. In this problem we are not required to lookup the element in O(1) time and we should exploit that fact.

My idea is to use a very large array and use it to implement a stack using this array. So

1. you can insert only at the top, O(1) time,

2. you can delete only from the top again O(1) time and

3. for finding random element, suppose that we have n elements in the array at the instant we have to generate a random number. Random number would be rand(n)th element of the array, where rand(n) is a random number generator between 0 and n-1. Again O(1) time.

Note that there is always a possibility of running out of space and in that case we'll have to expand the array(which could be visualized as equivalent of rehashing in hash tables)

how you delete an element from array in O(1) time? you are deleting the element at the top but not the required element...

for this problem hash table + array is perfect solution.

What are you talking about? The problem description never specified that remove() took a parameter...

You don't know a very large array, just increase the size of the array dynamically when it runs out of space..

For instance if it was a vector <int> arr[]. You could use

if(array_size == n)

{

arr.resize(2*n)

}

I chose a large array because if you take a large array, the number of resize operations could be minimized.

How large? I can't tell that without having any knowledge of the dataset this data structure is going to support.

Cost of resizing? O(1) amortized

(Theoretically speaking you could choose an array of any size and when you run out of memory, allocate twice as much memory. It would still be O(1))

As long as you are going to allocate more meory, then you may have to move the memory from one place to another. Then it may invlove a lot of invocations of copy cstr and deletion. Then the cost of resizing rising.

@Peter: yes, the O(1) runtime will be amortized runtime in this solution as it is currently stated. I would argue that this is probably acceptable, but should be checked with the interviewer.

@Epic_coder: I second algos opinion that remove() was probably intended to take a parameter. Otherwise, what is its contract? To just remove any element whatsoever? That doesn't seem tremendously useful. I suppose its contract could be to remove the last element inserted, but there's no particularly good evidence to suggest this was the intention.

Hashtable + a vector for quickly getting random elements seems an appropriate solution, with the following additions:

* keep the index of the array in the hash table, for fast deletions from the array

* delete from the array by swapping with the last element and then deleting last element, to make this operation o(1)

C++ solution is below.

* Assuming the elements you are storing are integers.

* Variable names could have been better.

* Handle collisions by "chaining" in a vector (easier to implement than linked list).

```
class HT
{
// pair<int, int> - first is the elemet inserted in HT,
// second is index of locations where you can find that element
vector< vector< pair<int, int> > > ht;
// use this to store elements in compact form and get random number
vector<int> locations;
int buckets;
int hash(int n)
{
return (n % buckets);
}
int get_bucket_index(int bucket, int n)
{
for(int i = 0; i < ht[bucket].size(); i++)
{
if(ht[bucket][i].first == n)
{
return i;
}
}
return -1;
}
public:
HT(int buckets = 9973)
{
this->buckets = buckets;
ht.resize(buckets);
}
void insert(int n)
{
int hashed_n = hash(n);
if(get_bucket_index(hashed_n, n) != -1)
return;
ht[hashed_n].push_back(pair<int,int>(n,locations.size()));
locations.push_back(n);
}
void remove(int n)
{
int hashed_n = hash(n);
int i = get_bucket_index(hashed_n, n);
if(i == -1)
return;
// find and remove in locations
int loc = ht[hashed_n][i].second;
swap(locations[loc], locations[locations.size()-1]);
locations.pop_back();
// update swapped value in ht
int hashed_sw = hash(locations[loc]);
int j = get_bucket_index(hashed_sw, locations[loc]);
ht[hashed_sw][j].second = loc;
// swap with last elem and delete
int last = ht[hashed_n].size()-1;
swap(ht[hashed_n][i], ht[hashed_n][last]);
ht[hashed_n].pop_back();
}
int get_random()
{
int max_index = locations.size();
return locations[rand() % max_index];
}
};
```

```
#include <stdio.h>
#include <malloc.h>
#include <string.h>
struct link_node
{
struct link_node *next;
int index;
int data;
};
class RequiredDataStructure
{
private:
static const int hash_bits = 14;
struct link_node **hash_table;
int arr_ind;
int arr_size;
int *arr;
int fibnocci_hash(int a, int bits)
{
return (int)(((long long)a * 11400714819323198485ULL) >> (64 - bits));
}
public:
RequiredDataStructure()
{
hash_table = (struct link_node **)malloc(sizeof(struct link_node *) * (1<<hash_bits));
memset(hash_table, 0, sizeof(struct link_node *) * (1<<hash_bits));
}
void Insert(int data)
{
struct link_node *node;
if (arr_ind == arr_size) {
/*realloc is O(n) but it will occur rarely. */
arr = (int *)realloc(arr, sizeof(int) * arr_size * 2);
arr_size <<= 1;
}
arr[arr_ind] = data;
node = (struct link_node*)malloc(sizeof(struct link_node));
node->data = data;
node->index = arr_ind++;
int hash = fibnocci_hash(data, hash_bits);
node->next = hash_table[hash];
hash_table[hash] = node;
}
void Remove(int data)
{
int hash = fibnocci_hash(data, hash_bits);
int index;
struct link_node *curr, *prev = NULL;
for (curr = hash_table[hash]; curr != NULL; prev = curr, curr = curr->next) {
if (curr->data == data) {
break;
}
}
/* only deleting if exists. */
if (curr != NULL) {
if (prev != NULL) {
prev->next = curr->next;
} else {
hash_table[hash] = curr->next;
}
// remove element at curr->index
--arr_ind;
if (curr->index != arr_ind) {
index = curr->index;
arr[index] = arr[arr_ind];
data = arr[index];
hash = fibnocci_hash(data, hash_bits);
for (curr = hash_table[hash]; curr != NULL; curr->next) {
if (curr->data == data && curr->index == arr_ind) {
curr->index = index;
break;
}
}
}
free(curr);
}
}
bool GetRandom(int *ran) {
if (arr_ind != 0) {
*ran = arr[rand() % arr_ind];
return true;
}
return false;
}
}
```

Second vincent, expanding on the same idea.

DS: hashmap (unordered) + DL. (maintaing head pointer).

generate_random:

map.size() gives us number of elelments at any given instance. Exploit this fact and access node such as: (rand() % size) from the head pointer.

DL should provide O(1) insertion and O(1) deletion. Furthe, for O(1) random number generation cache the node access and apply rand() function to generate new random node to access, such as:

first time:

return head_; // store this pointer.

second time:

return lastAccess +/- rand()%5 based on pointer location.

```
Class Set{
ArrayList<T> al;
Random r;
public Set(){
al=new ArrayList<T>();
r=new Random();
}
public add(T element){
if(!al.contain(element)){
al.add(element);
}
}
public remove(T element){
if(al.contain(element){
al.remove(element)
}
public T getRandomElement(){
int temp= r.nextInt( al.size());
return al.get(temp);
}
}
}
```

In this solution Insert and Remove function will be of O(N) time complexity. Can we optimize it with say O(1).

Yes. We can implement it with hash Table. So that Insert and delete could be in O(1).

For getRandomElement() we can use Doubly linked list, where we can store the number present in the set.

Hash table should have the <key,value> as <number, pointer to the node in DLL>. So in insert() and remove() also we insert and remove node from the DLL.

To Get the Random Element, We just generate Random number between 1 to num_element (say R) and fetch the Rth element from the DLL.

```
struct set
{
HashTable ht;
DLL *dll;
int num_elements;
};
void insert (set &st, int n)
{
if (set.ht.hasItem(n))
{
return; //Already present
}
DLL *node = new Node(n);
if (!set.dll)
{
set.dll = node;
}
set.dll->end->next = node;
set.ht.insert(n, node);
set.dll->end->next = node;
node->prev = set.dll->end;
set.dll->end = node;
set.dll->end->next = set.dll->start;
set.dll->sart->prev = set.dll->end;
set.num_elements ++;
}
void remove (set &st, int n)
{
if (set.ht.hasItem(n) == false)
{
return; //Item not found;
}
Node *ptr = set.ht.getItem(n);
set.ht.removeItem(n) //Remove from hash table
//Now remove this item from the list list.
ptr->prev->next = ptr->next;
ptr->next->prev = ptr->prev;
delete ptr;
set.num_elements --; //Decrement count of elements
}
int getRandom (set st)
{
int randVal = rand()%st.num_elements;
//Get the randVal th element from the linked list.
int count = 0;
Node *tmp=set.dll->head;
while (count < randval)
{
tmp = tmp->next;
}
return tmp->data;
}
```

algos solution is good. But to optimize insert and add to O(1) we have to implment our own hash. So it will be array of size (lets say 65535). For any elemtent array index K will be key an value will be array[K]

In that case we need our own hash function, for which we can use md5. So add and remove remains O(1) and getRandom can be pure random by fetching any value of rand(size of array)

We can have array of size 65535.

hash function => decimal value of ( last 4 bytes of md5 ) // maximum will be FFFF which is 65535.

For hash function collision, we can use chaining. So each array element will be linked list of data. If array index chosen by random function has more than 1 element, we can again choose rand(size of linked-list) to get Nth element in that linked list.

```
package cc.goog;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
public class SpecialSet<T> {
private Map<T, Integer> elementVsIndex = new HashMap<T, Integer>();
private Map<Integer, T> indexVsElement = new HashMap<Integer, T>();
private Random r = new Random();
public void insert(T t) {
if (elementVsIndex.containsKey(t))
return;
int maxIndex = elementVsIndex.size();
elementVsIndex.put(t, maxIndex);
indexVsElement.put(maxIndex, t);
}
public void remove(T t) {
int maxIndex = elementVsIndex.size() - 1;
Integer removedIndex = elementVsIndex.remove(t);
if (removedIndex != null) {
T e = indexVsElement.remove(maxIndex);
if (e != null) {
elementVsIndex.put(e, removedIndex);
indexVsElement.put(removedIndex, e);
}
}
}
public T getRandomElement() {
int maxIndex = elementVsIndex.size();
int randomIndex = r.nextInt(maxIndex);
return indexVsElement.get(randomIndex);
}
}
```

Java implementation of the algorithm outlined by algos

```
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Random;
public class RandomSet<T> {
public RandomSet(){
}
public boolean add(T obj){
if(data_.containsKey(obj)) return false;
data_.put(obj,elems_.size());
elems_.add(obj);
return true;
}
public boolean remove(T obj){
Integer idx = data_.remove(obj);
if(idx == null) return false;
int lastIdx = elems_.size()-1;
if(lastIdx != idx.intValue()){
T moved = elems_.get(lastIdx);
elems_.set(idx, moved);
data_.put(moved, idx);
}
elems_.remove(lastIdx);
return true;
}
public T getRandomObject(){
return elems_.get(rand_.nextInt(elems_.size()));
}
private Random rand_ = new Random(0);
private final ArrayList<T> elems_ = new ArrayList<>();
private HashMap<T,Integer> data_ = new HashMap<>();
@Override
public String toString() {
return data_.toString();
}
public static void main(String[] args) {
RandomSet<Character> rset = new RandomSet<>();
for(char ch = 'A'; ch <= 'Z'; ch++){
System.out.println("adding " +ch);
rset.add(ch);
}
System.out.println(rset);
System.out.println(rset.getRandomObject());
for(char ch = 'D'; ch <= 'Z'; ch++){
rset.remove(ch);
}
System.out.println(rset);
System.out.println(rset.getRandomObject());
}
}
```

Java implementation of the algorithm outlined by algos

```
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Random;
public class RandomSet<T> {
public RandomSet(){
}
public boolean add(T obj){
if(data_.containsKey(obj)) return false;
data_.put(obj,elems_.size());
elems_.add(obj);
return true;
}
public boolean remove(T obj){
Integer idx = data_.remove(obj);
if(idx == null) return false;
int lastIdx = elems_.size()-1;
if(lastIdx != idx.intValue()){
T moved = elems_.get(lastIdx);
elems_.set(idx, moved);
data_.put(moved, idx);
}
elems_.remove(lastIdx);
return true;
}
public T getRandomObject(){
return elems_.get(rand_.nextInt(elems_.size()));
}
private Random rand_ = new Random(0);
private final ArrayList<T> elems_ = new ArrayList<>();
private HashMap<T,Integer> data_ = new HashMap<>();
@Override
public String toString() {
return data_.toString();
}
public static void main(String[] args) {
RandomSet<Character> rset = new RandomSet<>();
for(char ch = 'A'; ch <= 'Z'; ch++){
System.out.println("adding " +ch);
rset.add(ch);
}
System.out.println(rset);
System.out.println(rset.getRandomObject());
for(char ch = 'D'; ch <= 'Z'; ch++){
rset.remove(ch);
}
System.out.println(rset);
System.out.println(rset.getRandomObject());
}
}
```

this can be solved using hash table + array implementation

- algos October 21, 2012hash data ---- index to array

40 -------------- 0

10 -------------- 1

20 -------------- 2

30 -------------- 3

array index -- 0 1 2 3 4

data ---------- 40 10 20 30

initially index = 0;

insert(data)

1. insert the data into hash table if it is not available

2. enter the index to the hash associated to the key (data)

3. enter the data into array[index] location

4. increment the index by 1 i.e.. index++;

delete(data)

1. verify the data is available in hash table if yes then get the index of the data

i = get_index(data) // index

2. move data from array in location index-1 to i

a[i] = a[index-1];

3. data in a[i] is changed now, so change the index of a[i] in hash table to i

4. decrement index by 1 i.e.. index--;

5. delete the data from the hash table...delete_hash(data)

get_random()

1. generate a random number 0 - (index-1)

r = rand() % index;

2. return array[r]