Saurabh
BAN USER
I assume, when asked additional questions, the interviewer would ask to implement a logic where IP can appear in either a star form (10.0.0.*) or a CIDR form (10.0.0.0/8)
One of the possible solutions will involve creating a Trie data structure with the root pointing to all the possible numbers from 1 to 254. The entire trie will have 4 levels for each of the four 8 bit fields separated by the period.
The first time when we scan the list of blocked IP or the part of the IP, we create a node to the trie node based on the IP address sections appears in the block list.
When the request comes in with IP address information, if Trie matches either the entire 4 sections of the IP address or matches with * at the end, we block the IP.
The question starts with a probability topic and restriction on space both point to a possible solution using Bloom Filter.
Now Bloom Filter is a probabilistic data structure that uses a relatively small fixed-size array to record hashes for the elements that it has seen before. It may give you a false positive with some probability P but never a false negative. (So it may say that the element exists in a set when the element may not really exist)
Usually, a bloom filter has an "add" and "contains" operation which both are very fast (time complexity O(1)). But there is an implementation/extension of the bloom filter called, the Counting Bloom filter with a "remove" operation at the cost of incurring an additional space overhead for counting.
I would assume, the first part of the question where smaller numbers have more probability of appearing first in the stream to suggest that once you start getting larger numbers, you can remove hashes generated using smaller numbers to further reduce the false positive scenario.
As you said O(N^2) answer is pretty straight forward, for each element we check rest of the elements to see if their difference is less than K
One way to get time complexity down to O(N) or O(NlogN) by using a sliding window.
O(N) Time complexity if the input is sorted, else we sort the input and that gives O(NlogN) complexity.
The sliding window will work like below
input: {1,2,5,10} k = 5
We start with the first 2 elements.
i = 0, j = 1
If the difference between the first and last element (j - i) of the slide is <= k, we increase the sliding window to right (j++)
When the difference becomes > k, we increment i (i++)
Every time a new digit gets added to the window, it adds (j-i) number of additional groups to the total.
Example:
input: {1,2,5,10} K = 5
sliding window : [1,2] . total Groups: 1 [[1,2]]
since 2 - 1 <= 5 ==> j++;
sliding window : [1,2, 5] . total Groups: 3 [[1,2], [2,5],[1,5]]
since 5 - 1 <= 5 ==> j++;
sliding window : [1,2, 5, 10] . 10-1 > k so we increment i
sliding window : [2, 5, 10] . 10-2 > k so we increment i
sliding window : [5, 10] . total Groups: 3 [[1,2], [2,5],[1,5], [5,10]]
Few signals from the requirements
1. Stream of uninterrupted data suggests a big data problem.
2. A real-time dashboard suggests a solution like displaying a dashboard / Heavy hitters information.
3. Data displayed for 10 mins could suggest a time window of 10 mins or 1 min time window that is refreshed every minute to show data for the last 10 mins.
The overall design could include exposing endpoints over Websocket connections to accept a stream of real-time data. This data is then processed using a Spark streaming job that writes to an analytics store (like Apache Hbase / Apache Kudu). A dashboard that pulls data from the analytics store and displays top hitters on screen.
For approximate calculation, we can also talk about a faster route that uses Count Min Sketch (CMS) data structure on a single node and uses Min priority queue to hold max 20 IP addresses that are heavy hitters. The advantage of CMS over the Spark streaming route is CMS requires sublinear asymptotic computational complexity thus requiring less computing power and storage.
Since overall design will be large, I have created a google doc to show overall architecture.
docs.google.com/document/d/1uDbXZ_RxJyYNXovAUIED1yN7FJc8_T3XiIsmGykL86g/edit?usp=sharing
Typically how antivirus searches malware or virus on the machine is by searching for its signature. These signatures are stored in the form of a pattern that antivirus searches across all files.
This use case will also fall in the same category where the required service will search for a malware pattern across photos and videos to see if they encode any malicious information.
For simplicity purposes, we will assume that photo and video will be converted from their binary stream format to base64 encoding. Base64 encoding of a photo/video will give us ASCII(text) representation of their contents.
We will keep the signature patterns of all the known malware in the map <Signature, malware name>. We can have multiple signatures for the same malware.
This problem will then translate to a substring search using a pattern. We can use Rabin-Karp or KMP algorithm to search for all the signature substrings across a stream of base64 encoded data that we scan/comes our way.
I assume, the question is to sort the entire array by using ONLY the given reverse method
Input: arr = [2,3,1,5,4]
Output arr = [1,2,3,4,5]
Available method = void reverse(int[] arr, int k)
Restrictions:
1. not allowed to modify the reverse method
Possible solution:
1. Since the method can only reverse elements, one possible solution is to sort the array like a bubble sort O(N^2) by reversing the out of order elements
For example:
Input: [2,3,1,5,4]
Step1: Check index 0 and 1. They are in increasing order, so no reverse call required
Step2: Check index 1 and 2. They are out of order, so call reverse method and pass only 2 elements from the array with k = 2
and so on
Input: [2,3,1,5,4]
Step1: Check 2 and 3. They are in order so change required ==> [2,3,1,5,4]
Step2: Check 3 and 1. They need to be sorted. Call reverse ([3,1], 2) ==> [2,1,3,5,4]
Step3: Check 2 and 1. They need to be sorted. Call reverse ([2,1], 2) ==> [1,2,3,5,4]
Step4: check 1 and 2. They are in order so change required ==> [1,2,3,5,4]
Step5: check 2 and 3. They are in order so change required ==> [1,2,3,5,4]
Step5: check 3 and 5. They are in order so change required ==> [1,2,3,5,4]
Step6: check 5 and 4. They need to be sorted. Call reverse ([5,4], 2) ==> [1,2,3,4,5]
Entire array is sorted.
The idea would be to keep track of which "Iterator" in the list was last accessed. If we reach the end of the list, then wrap it up from the beginning again.
Wrapping part can be done using count % size of the list
The Next method will check if the given iterator has any data left, if not, remove that iterator from the last so that next time this iterator is not accessed.
Here is a sample implementation with the result.
Time Complexity:
next: O(N) --> as we may end up scanning the entire list to get the value
hasNext: O(N)
public class IteratorRoundRobin {
public static void main(String[] args) {
List<Integer> l1 = new ArrayList<>();
l1.add(1);
List<Integer> l2 = new ArrayList<>();
l2.add(10);l2.add(11);
List<Integer> l3 = new ArrayList<>();
l3.add(20);l3.add(21);l3.add(22);l3.add(23);l3.add(24);
List<Iterator> input = new ArrayList<>();
input.add(l3.iterator());
input.add(l2.iterator());
input.add(l1.iterator());
RoundRobinIterator rri = new RoundRobinIterator(input);
System.out.println("rri.hasNext() = " + rri.hasNext()); //Expected value: true
System.out.println("rri.next() = " + rri.next()); //20
System.out.println("rri.hasNext() = " + rri.hasNext()); //true
System.out.println("rri.next() = " + rri.next()); //10
System.out.println("rri.next() = " + rri.next()); //1
System.out.println("rri.next() = " + rri.next()); //21
System.out.println("rri.next() = " + rri.next()); //11
System.out.println("rri.next() = " + rri.next()); //22
System.out.println("rri.next() = " + rri.next()); //23
System.out.println("rri.hasNext() = " + rri.hasNext()); //true
System.out.println("rri.next() = " + rri.next()); //24
System.out.println("rri.hasNext() = " + rri.hasNext()); //false
}
}
class RoundRobinIterator {
List<Iterator> list = null;
int size = 0;
int count = 0;
public RoundRobinIterator(List<Iterator> itrs) {
list = itrs;
size = list.size();
}
public Object next(){
Object result = null;
while(list.size() > 0 || result != null) {
Iterator itr = list.get(count % list.size());
if (itr.hasNext()) {
result = itr.next();
count++;
break;
} else {
list.remove(count % list.size());
}
}
return result;
}
public boolean hasNext() {
boolean result = false;
while(list.size() > 0) {
Iterator itr = list.get(count % list.size());
if(itr.hasNext()) {
result = true;
break;
} else {
list.remove(count % list.size());
}
}
return result;
}
}
Result:
======
rri.hasNext() = true
rri.next() = 20
rri.hasNext() = true
rri.next() = 10
rri.next() = 1
rri.next() = 21
rri.next() = 11
rri.next() = 22
rri.next() = 23
rri.hasNext() = true
rri.next() = 24
rri.hasNext() = false
@rahul, you got a valid point. My code does not handle the use case when the ranges do not overlap.
The immediate solution that comes to mind is to use additional space like a bit/boolean array where the bits are flipped to 1 if they fall in the range. Once all the bits between [0,1) are flipped, the program ends.
I will update the code to cover your use case.
Based on the updates to the question and assuming Drop() function randomly generates location and radius float values, here is one possible implementation
Logic:
=====
We keep track of lower and higher range that Drop function ever generated values for using 2 variables when lower reaches 0 and higher reaches 0.99, we exit the method.
For example
First drop method generated location = 0.20 and radius = 0.17
Lower = location - radius (0.20 - 0.17) = 0.03
Higher = location . + radius (0.20 + 0.17) = 0.37
Since lower is not yet reached 0 and higher not yet covered till 0.99, we make another call to the function.
Say 2nd call drop method generated location = 0.15 and radius = 0.30
Lower = location - radius (0.15 - 0.30) = - 0.15
Higher = location . + radius (0.15 + 0.30) = 0.45
We calculate the final lower value to be minimum between existing lower and lower from the last call to Drop() method. Same goes for the higher as well.
When both lower and higher reaches the expected values, we quit the program and report the number of calls.
Program:
=======
public class IntervalCoverage {
public static void main(String args[]) {
System.out.println("Number of calls made: " + numCalls());
}
private static int numCalls(){
Drop firstDrop = Drop();
int numOfCalls = 1;
float lower = firstDrop.location - firstDrop.radius;
float higher = firstDrop.location + firstDrop.radius;
System.out.println("first lower: " + lower + " higher: " + higher);
Drop current = null;
while(true) {
current = Drop();
System.out.println("Received location: " + current.location + " radius: " + current.radius);
numOfCalls++;
float currentHigher = current.location + current.radius;
System.out.println("Calculated lower: " + (current.location - current.radius)
+ " higher: " + (current.location + current.radius));
if(currentHigher < 1) {
lower = Math.min(lower, current.location - current.radius);
higher = Math.max(higher, currentHigher);
} else {
System.out.println("Ignoring this drop as Higher is above 0.9");
}
System.out.println("lower: " + lower + " higher: " + higher);
if((lower <= 0 && higher > 0.99) || numOfCalls > 1000)
break;
}
return numOfCalls;
}
private static Drop Drop(){
//Generate random values of location and radius
Random r = new Random();
return new Drop(r.nextFloat(), r.nextFloat());
}
}
class Drop {
float location;
float radius;
Drop(float location, float radius){
this.location = location;
this.radius = radius;
}
}
Output:
=======
first lower: -0.8709638 higher: 0.9397204
Received location: 0.822552 radius: 0.55689883
Calculated lower: 0.2656532 higher: 1.3794508
Ignoring this drop as Higher is above 0.9
lower: -0.8709638 higher: 0.9397204
Received location: 0.17664707 radius: 0.13852155
first lower: -0.8709638 higher: 0.9397204
Received location: 0.822552 radius: 0.55689883
Calculated lower: 0.2656532 higher: 1.3794508
Ignoring this drop as Higher is above 0.9
lower: -0.8709638 higher: 0.9397204
Received location: 0.17664707 radius: 0.13852155
Calculated lower: 0.038125515 higher: 0.31516862
lower: -0.8709638 higher: 0.9397204
.......
Calculated lower: 0.9019634 higher: 0.9935021
lower: -0.8709638 higher: 0.9935021
Number of calls made: 33
The trickiest part of this question is constant time operation for last() method. I would expect the interviewer to ask for constant-time time complexity to find the last accessed element.
The idea is similar to LFU (Least frequently used) cache, where we keep track of when the element was accessed, but here instead of keep track of how many times the element was accessed, we keep track of what was the last time the element was accessed.
We can implement this using the following data structure
1. values map - HashMap<Integer, Integer> --> this keeps track of actual keys and values
2. whenAccessed map - HashMap<Integer, Long> --> This keeps track of the element and when was it access in System.nanoTime()
3. lastAccessed map - TreeMap<Long, Integer> --> this keeps track of time against the element. This data structure is used for responding to last() method call.
Time Complexity:
put(key, value) --> O(logN) --> internally TreeMap put is used
get(key) --> O(1)
delete(key) --> O(1)
last() --> O(1)
Here is the code
public class DictionaryWithLast {
Map<Integer, Integer> values = new HashMap<>();
Map<Integer, Long> whenAccessed = new HashMap<>();
TreeMap<Long, Integer> lastAccessed = new TreeMap<>();
public static void main(String args[]) {
DictionaryWithLast cache = new DictionaryWithLast();
cache.set(1, 1);
System.out.println("cache.get(2) = " + cache.get(2));
System.out.println("cache.get(1) = " + cache.get(1));
cache.set(2, 2);
cache.set(3, 3);
System.out.println("cache.get(1) = " + cache.get(1));
System.out.println("cache.last() = " + cache.last());
cache.delete(2);
System.out.println("cache.last() = " + cache.last());
cache.delete(1);
System.out.println("cache.last() = " + cache.last());
}
public void set(int key, int value) {
values.put(key, value);
long currentTime = System.nanoTime();
Long lastValue = whenAccessed.put(key, currentTime);
if(lastValue != null) {
lastAccessed.remove(lastValue);
}
lastAccessed.put(currentTime, key);
}
public int get(int key){
if(! values.containsKey(key)) {
return -1;
} else {
long time = whenAccessed.get(key);
lastAccessed.remove(time);
long currentTime = System.nanoTime();
lastAccessed.put(currentTime, key);
whenAccessed.put(key, currentTime);
}
return values.get(key);
}
public void delete(int key) {
Integer previous = values.remove(key);
if(previous != null) {
Long time = whenAccessed.get(key);
whenAccessed.remove(key);
lastAccessed.remove(time);
}
}
public int last(){
Map.Entry<Long, Integer> entry = lastAccessed.lastEntry();
return entry.getValue();
}
}
Output:
=======
cache.get(2) = -1
cache.get(1) = 1
cache.get(1) = 1
cache.last() = 1
cache.last() = 1
cache.last() = 3
Though the code to calculate vertex degree depends on the implementation of Graph data structure, here is a simple method to calculate a degree of a vertex.
We return TRUE if any of the vertex has a degree equal to N.
public static int degree(Graph G, int v)
{
int degree = 0;
for (int w : G.adj(v)) degree++;
return degree;
}
The logic is same as suggested by other answers.
Use Max priority Queue (implementation of Heap) to identify which seat is farthest from the occupied seats.
Logic: Every time a call to find the seat is made, it performs 2 operations
1. Get the Max element from the heap that represents the longest distance between the currently occupied seats. Return the middle seat for that longest distance.
2. Add 2 more entries in the heap that represent the seat just occupied by the caller.
Example:
If the occupied seats currently are 0 and 9.
Next call to findSeat which get [0,9] from the heap. Divide it in the middle and add 2 entries to the heap that presents [0,4][4,9].
Here is the code
public class FarthestSeatOnBench {
private static class Seat {
int leftEdge, rightEdge;
Seat(int leftEdge, int rightEdge) {
this.leftEdge = leftEdge;
this.rightEdge = rightEdge;
}
public String toString() {
return "["+leftEdge +" " + rightEdge+"]";
}
}
public static void main(String[] args) {
int[] a = {0,0,0,0,0,0,0,0,0,0};
PriorityQueue<Seat> seatingSpace = new PriorityQueue<>(new SeatComparator());
for(int i = 0; i < 10; i ++) {
System.out.println("Next Seat Option: " + getSeat(a, seatingSpace));
}
}
private static int getSeat(int[] input, PriorityQueue<Seat> seatingSpace) {
if(input[0] == 0) {input[0] = 1; return 0;}
if(input[input.length-1] == 0) {
seatingSpace.add(new Seat(0, input.length -1));
input[input.length-1] = 1;
return input.length - 1;
}
//get the max Seat
Seat currentSeat = seatingSpace.poll();
seatingSpace.add(new Seat(currentSeat.leftEdge, (currentSeat.leftEdge + currentSeat.rightEdge) / 2));
seatingSpace.add(new Seat((currentSeat.leftEdge + currentSeat.rightEdge) / 2, currentSeat.rightEdge));
return (currentSeat.leftEdge + currentSeat.rightEdge) / 2;
}
static class SeatComparator implements Comparator<Seat> {
public int compare(Seat one, Seat two) {
return Integer.compare(two.rightEdge - two.leftEdge, one.rightEdge - one.leftEdge);
}
}
}
Output:
========
Next Seat Option: 0
Next Seat Option: 9
Next Seat Option: 4
Next Seat Option: 6
Next Seat Option: 2
Next Seat Option: 7
Next Seat Option: 3
Next Seat Option: 8
Next Seat Option: 5
Next Seat Option: 1
time Complexity:
1. Build Heap: O(N)
2. Return next seat: O(1) --> getMax operation
3. Add entries to heap: O(logN)
Your logic looks correct.
The only change that is needed is the return seat numbers be calculated as either
1. (end_seat + start_seat) / 2 . or
2. start_seat + (end_seat - start_seat) / 2
your logic of (end_seat - start_seat) / 2 will lead to wrong seating index due to missing offset
Example: for finding a seat between 5 and 9 indexes should return 7 (5 + (9-5)/2 ). Your logic will return 2 instead (9-5)/2.
Also run time as per my understanding should be:
Forming heap: O(N)
Each Query (Get Max): O(1)
Insert: O(logN)
As suggested Euclidean distance can be used to find the nearest number in a given plan.
Distance between point (a,b) and (x,y) can be calculated as dist((x, y), (a, b)) = √(x - a)² + (y - b)²
public class EuclideanDistance {
public static void main(String[] args) {
int[][] points = {{1,2}, {4,5}, {-1,3}, {0,1}};
nearestTo00(points);
}
private static void nearestTo00(int[][] points) {
//Eucledian distance is given as dist((x, y), (a, b)) = √(x - a)² + (y - b)²
double dist = Double.MAX_VALUE;
int x = 0, y = 0;
for(int[] point: points) {
double currentDist = Math.sqrt(Math.pow(0-point[0],2) + Math.pow(0-point[1],2));
if(currentDist < dist) {
dist = currentDist;
x = point[0];
y = point[1];
}
}
System.out.println("Nearest Number: {" + x + " " + y + "}");
}
}
Output:
======
Nearest Number: {0 1}
The number of combinations that can arise due to the size of the numbers are huge. So I think its a good idea to talk to the interviewer to see how large the number will be.
In the below example, I have taken a number till thousand. (Logic will remain the same for the larger numbers as well)
There are couple of special handling scenarios that we need to handle.
1. Number that ends in 0.
--> for such numbers, we dont call out last 0. example for 510, we convert this number to "five hundred ten". We dont say "five hundred ten zero"
2. Number that has 1 in its tenth position.
--> For such numbers we write **teen. Example for 513, we convert this number to "five hundred thirteen". We dont say "five hundred ten three"
3. Single digit numbers
Here is the code that handles
public class NumberToEnglishPhrase {
static Map<String, String> tenToNinteen = new HashMap<>();
static Map<String, String> tens = new HashMap<>();
static Map<String, String> digit = new HashMap<>();
public static void main(String[] args) {
int[] n = {512,5123, 10, 9, 340};
populateData();
for(int x: n)
System.out.println(englishPhrase(x));
}
private static String englishPhrase(int n) {
//special handling for smaller than 10
if(n < 10) {
return digit.get(n+"");
}
//special handling for numbers with 1 in tenth position
Stack<String> stack = new Stack<>();
String s = n+"";
char[] digits = s.toCharArray();
if(digits.length > 1) {
if(digits[digits.length-2] == '1') {
String lastTwo = new String(digits[digits.length-2]+"") + new String(digits[digits.length-1]+"");
stack.push(tenToNinteen.get(lastTwo));
} else {
String last = digits[digits.length-1]+"";
if(! last.equals("0")) {
stack.push(digit.get(last));
}
String secondLast = digits[digits.length-2]+"";
int num = Integer.valueOf(secondLast)*10;
stack.push(tens.get( num + ""));
}
}
int x = 2;
for(int i = digits.length - 3; i >=0; i--) {
if(x == 2) stack.push("hundred and");
if(x == 3) stack.push("thousand");
stack.push(digit.get(digits[i]+""));
x++;
}
StringBuilder sb = new StringBuilder();
while(!stack.isEmpty())
sb.append(stack.pop() + " ");
return sb.toString();
}
private static void populateData() {
tenToNinteen.put("10", "ten");
tenToNinteen.put("11", "eleven");
tenToNinteen.put("12", "twelve");
tenToNinteen.put("13", "thrteen");
tenToNinteen.put("14", "forteen");
tenToNinteen.put("15", "fifteen");
tenToNinteen.put("16", "sixteen");
tenToNinteen.put("17", "seventeen");
tenToNinteen.put("18", "eighteen");
tenToNinteen.put("19", "nineteen");
tens.put("20", "twenty");
tens.put("30", "thirty");
tens.put("40", "forty");
tens.put("50", "fifty");
tens.put("60", "sixty");
tens.put("70", "seventy");
tens.put("80", "eighty");
tens.put("90", "ninety");
digit.put("1", "one");
digit.put("2", "two");
digit.put("3", "three");
digit.put("4", "four");
digit.put("5", "five");
digit.put("6", "six");
digit.put("7", "seven");
digit.put("8", "eight");
digit.put("9", "nine");
}
}
Output:
=======
five hundred and twelve
five thousand one hundred and twenty three
ten
nine
three hundred and forty
We can achieve this using 1 Map and 2 Sets.
Time Complexity O(N * K) --> N is the total number of contacts and K is a number of email IDs per contact.
Space Complexity O(K)
Logic: Consider input is a Map<Contact_ID, Set<Email_IDs>>.
1. Create emailToContact Map. This map keep track of email as key and contactID as value.
2. Create uniqueContactSet, duplicateContactSet. These Sets keep contactID that are unique and duplicate.
2. We loop through the input map and for each contact, get the list of email IDs
3. We add email IDs to emailToContact Map. If the key with that email already present, that means this email is duplicate with some other contacts email.
4. If duplicate email found, we populate duplicateContactSet with its contactID
5. At the end, we return nonduplicate contacts only.
Here is the code
private static Set<String> uniqueContacts(Map<String, Set<String>> input) {
Map<String, String> emailToContact = new HashMap<>();
Set<String> duplicateSet = new HashSet<>();
Set<String> uniqueSet = new HashSet<>();
for(String contactId: input.keySet()) {
uniqueSet.add(contactId);
Set<String> emails = input.get(contactId);
for(String email: emails) {
if(emailToContact.containsKey(email)) {
duplicateSet.add(emailToContact.get(email));
duplicateSet.add(contactId);
}else {
emailToContact.put(email, contactId);
}
}
}
uniqueSet.removeAll(duplicateSet);
return uniqueSet;
}
ConcurrentHashMap has 2 primary goals.
1. Keeps a constant amount of buckets to store elements. For every put operation, hash identifies which bucket to copy data to and if the bucket is already full, it uses collision resolution technique like linear probing or separate chaining to find the next available bucket place.
2. The concurrent property signifies, put and get operations are thread safe. That is they acquire the lock before putting the element or reading the element, this ensures that the most recent read is always returned.
Below is the one possible implementation of ConcurrentHashMap. I am using linear probing to perform conflict resolution. If the bucket hashed is found to be occupied, I keep searching for next empty bucket by linearly looking for the bucket until the end of the bucket array.
Instead of locking a single object across all put/get, I lock at bucket level using another object array of the same size as the bucket.
public class ConcurrentHashTableUsingArray {
private static final int HASH_SIZE = 128; // bucket size
HashEntry[] table;
Object[] locks;
ConcurrentHashTableUsingArray(){
table = new HashEntry[HASH_SIZE];
locks = new Object[HASH_SIZE];
}
public void set(int key, int value) {
//Step1: get the hash based on the key
int hash = key % (HASH_SIZE) ;
if(table[hash] == null) { // bucket is empty. Data can be inserted
//Before setting the table with the key, lock the appropriate lock array element
synchronized(locks[hash]) {
System.out.println("set(): No Collision for key = " + key + " hash = " + hash);
table[hash] = new HashEntry(key, value);
locks[hash].notifyAll(); //notify all waiting threads and release the lock
}
}
//Else is the collision issue. In the linear probing, Keep checking next array locations
else {
boolean locationFound = false;
for(int i=hash+1; i<HASH_SIZE; i++) {
if(table[i] == null) {
synchronized(locks[i]) {
table[i] = new HashEntry(key, value);
locationFound = true;
locks[i].notifyAll();
}
}
}
if(!locationFound) {
//There was no space from the hash location to the end of the array. Check
// the begining of the array instead
for(int i=0; i<hash; i++) {
synchronized(locks[i]) {
table[i] = new HashEntry(key, value);
locationFound = true;
locks[i].notifyAll();
}
}
if(!locationFound) {
System.out.println("HashMap is full, we can not store any more data thowing HashMapOverFlowException");
}
}
}
}
public int get(int key) {
//Step1: Generate the Hash from the key
int hash = key % (HASH_SIZE) ;
if(table[hash] != null) {
//Check if this key matches the key stored in the object at this location
HashEntry entry = table[hash];
if(entry.getKey() == key) {
synchronized(locks[hash]) {
//We can again check here to see the key is still same
//correct entry found return this object
System.out.println("get(): No Collision for key = " + key + " hash = " + hash);
return entry.getValue();
}
}
/*
* else we have a collision and wrong value is placed in this location due to linear probing.
* So keep searching the array until we find the object with the matching key
*/
else {
System.out.println("get(): Collision detected for key = " + key + " hash = " + hash);
for(int i=hash+1; i<HASH_SIZE; i++) {
if(table[i] != null) {
entry = table[i];
if(entry.getKey() == key) {
System.out.println("get(): Linear probling key found at location = " + i + " for hash = " + hash);
//correct entry found return this object
return entry.getValue();
}
}
}
//We reached here that means, the key is not found from the hash location till the end of the array
for(int i=0; i<hash; i++) {
if(table[i] != null) {
entry = table[i];
if(entry.getKey() == key) {
synchronized(locks[i]) {
System.out.println("get(): Linear probling key found at location = " + i + " for hash = " + hash);
//correct entry found return this object
return entry.getValue();
}
}
}
}
//Now if we reached here that means the element was never added to the hashmap. So just return the lowest integer
return Integer.MIN_VALUE;
}
}
else {
//If the entry at that location is null that means there is no object in the array matching that key
return Integer.MIN_VALUE;
}
}
public static void main(String[] args) {
HashTableUsingArray map = new HashTableUsingArray();
map.set(5, 5);
System.out.println("map.get(5) = " + map.get(5));
map.set(128, 128);
System.out.println("map.get(128) = " + map.get(128));
map.set(0, 0);
System.out.println("map.get(0) = " + map.get(0));
}
}
Result
=========
set(): No Collision for key = 5 hash = 5
get(): No Collision for key = 5 hash = 5
map.get(5) = 5
set(): No Collision for key = 128 hash = 0
get(): No Collision for key = 128 hash = 0
map.get(128) = 128
get(): Collision detected for key = 0 hash = 0
get(): Linear probling key found at location = 1 for hash = 0
map.get(0) = 0
I think it's worth mentioning to the interviewer that since prime numbers do not change, it's always a good idea to pre-calculate the prime numbers using algorithms like "sieve of eratosthenes" and just use them whenever required.
For example, to get first N prime numbers, just make a lookup into a pre-calculated list of prime numbers and get first N numbers from there.
Having said that if the requirement is to calculate first N prime numbers, here is a simple code to get that.
public class FirstNPrimeNumbers {
public static void main(String[] args) {
int n = 10;
firstNPrime(n);
}
private static void firstNPrime(int n) {
int x = 1;
int i = 3;
System.out.print(2 + " ");
while(x < n) {
boolean prime = true;
//Check if there exists any number that divides I. If yes, its not a prime
for(int j = 2; j < Math.sqrt(i); j++) {
if(i % j == 0) {
prime = false;
break;
}
}
if(prime) {
System.out.print(i + " ");
x++;
}
i++;
}
}
}
Output:
======
2 3 4 5 7 9 11 13 17 19
MapHeap or a Deque would be a good solution if the requirement is to keep track of only the top trending product (1 product) since we need to keep track of Top K products, we will need to keep track of multiple top elements at the same time.
A sliding window is one of the solutions. I have explained the approach in another answer.
There are 2 parts to this question
1. Keeping track of Top K products at all times
2. Data comes as a Stream. So we do not have access to all the data at once. So overall sorting is not possible.
Based on these requirements sliding window is a good option where we keep track of top K products each time a new data come in from a stream.
Note: Trending of products could be on any basis like "quantity sold" or "user ratings" etc.
Idea: We use a sliding window to keep track of Top K products, Within the sliding window, we sort the items based on their trends.
Example, assuming for a stream of products with "quantity sold" values coming as [10,5,13,2,21,8,3,40,12,15,19], what are the top 3 products
Sliding window = 3
Stream: 10 comes in
Window: [10]
Stream: 5 comes in
Window[10, 5]
Stream: 13 comes in
Window: [13, 10, 5]
Stream: 2 comes in
Window: [13, 10, 5] --> 2 is ignored as its not in Top K
etc.
For Sliding Window we can use TreeMap that can sort elements within the window internally. And for keeping it a fixed size, we can have our own implementation of TreeMap as follows:
Time Complexity: O(N)
Note: Sorting within TreeMap will be constant as regardless of the stream size, it will only sort K items in the Map.
Space Complexity: O(K) where K is the size of the window.
import java.util.Collections;
import java.util.TreeMap;
public class TopKTrending {
public static void main(String[] args) {
int[] a = {10,5,13,2,21,8,3,40,12,15,19};
topKTrending(a, 3);
}
private static void topKTrending(int[] a, int k) {
FixedMap<Integer, String> fMap = new FixedMap<Integer, String>(k);
for(int i = 0; i < a.length; i++) {
fMap.put(a[i], "Prod"+i);
printMap(fMap);
}
}
private static void printMap(FixedMap<Integer, String> fMap) {
for(Integer key: fMap.keySet()) {
System.out.print("{" + fMap.get(key) + "|" + key + "}");
}
System.out.println();
}
}
/*Custom implementation of TreeMap with Fixed Size*/
class FixedMap<K,V> extends TreeMap<K,V> {
int size = 0;
FixedMap(int size) {
super(Collections.reverseOrder()); //descending order elements inside TM starting highest
this.size = size;
}
public V put(K key, V value) {
V old = super.put(key, value);
if (size() > size) {
super.remove(lastKey());
}
return old;
}
}
Output:
======
{Prod0 | 10}
{Prod0 | 10} {Prod1 | 5}
{Prod2 | 13} {Prod0 | 10} {Prod1 | 5}
{Prod2 | 13} {Prod0 | 10} {Prod1 | 5}
{Prod4 | 21} {Prod2 | 13} {Prod0 | 10}
{Prod4 | 21} {Prod2 | 13} {Prod0 | 10}
{Prod4 | 21} {Prod2 | 13} {Prod0 | 10}
{Prod7 | 40} {Prod4 | 21} {Prod2 | 13}
{Prod7 | 40} {Prod4 | 21} {Prod2 | 13}
{Prod7 | 40} {Prod4 | 21} {Prod9 | 15}
{Prod7 | 40} {Prod4 | 21} {Prod10 | 19}
modulo has a distributive property
(a * b) % n is equivalent to (a % n * b % n) % n
pow(a,b) is basically multiply a with each other b number of times. so instead of calculating pow(a,b) first which will be a huge number since b is larger than long range, we first calculate a % n and then that value we multiple b numbers of times.
Every time a%n times b goes above long range, we take the % again ... here is a simplified example for clarity
a = 5 b = 4 and n = 3
pow(5,4) % 3
which is the same as
= (5 * 5 * 5 * 5) % 3
= 625 % 3
= 1
assuming pow(5,4) goes above the range, we can also write pow(5,4) % 3 using 5%3 approach like below
= (5 * 5 * 5 * 5) % 3
= (5%3 * 5%3 * 5%3 * 5%3) % 3
since 5%3 = 2, equation becomes
= (2 * 2 * 2 * 2) % 3
= (4 * 4) mod 3 --> here assuming 4 * 4 goes above long range, we take mod again
= (4 % 3 * 4 % 3) % 3
= (1 * 1) % 3
= 1
I assume, intentionally information is held back on the requirement. So let's say we ask more questions and come up with the following requirements
Requirements:
===========
1. The algorithm will identify people with similar interests based on the answers they provide to the set of questions.
Questions can be like.. on a scale from 1 to 5 how important is outdoor sports to you or write briefly about your favorite historical event etc
2. The score of the people would be their similarity index (Cosine or Jaccard index)
3. Rank matching would be a predictive matching score given to a new person when matched with an existing profile.
With the above requirements, essentially we are trying to solve a "K nearest neighbor" (KNN) problem. This problem can be solved using an advanced data structure like quadtree / KD tree or by using a locality sensitive hashing (LSH).
We can use LSH to find the minHash for the answers they provided to the questions. A very simplistic example is, below are the 3 historic events written by 3 people
1. American war of 1775
2. World war II war
3. American independence war
Here based on the Jaccard index (The Jaccard index is an intersection over a union index. We count the number of common elements from two sets/sentences and divide by the number of elements that belong to either the first set or the second set) person 1 and 3 are near to each other in historical liking than 1 and 2 or 2 and 3.
(We will have to provide a much detailed example in the actual interview discussion)
The above index will be the answer to requirement #2.
For the requirement #3, we keep a group of people on 2D scale and when an unknown person appears on the scale, we calculate its distance from K nearest neighbors. For example, if we have 2 set of people, one with liking for sports and 1 with a liking for arts. When a new user is plotted on the scale, we calculate its distance from k nearest neighbors and based on the answer (like its close to 4 sports people and 1 arts person) we decide whether this new user is more likely to be a good match with people from sports category or from an arts category.
There are 3 primary requirements as I see it
1. Implementing Cache - LRU / MRU etc
2. Multi-Level Cache
3. Read Operation: constant time O(1) and Write operation: linear time O(N)
LRU cache can be quickly implemented using a LinkedHashMap and a multi-level cache can be implemented using 2 levels of LRU cache (LRU cache having value as another LRU cache).
The idea will be as follows:
=======
A multi-level cache will be an LRU cache with Key as level id (10, 20, 30 etc) and value will be another instance of an LRU cache. Value LRU cache will store user provided key and value.
Every time an add operation on the cache comes in, we check which level this key can go to. For example, say we have 3 levels, 10, 20 and 30. Any key that comes from a user, we mod that with 30 and decide which level the key value goes to.
Multi-level Cache is set up as follows
<key, Value>
<10, LRU Cache> --> any values between 0 to 9
<20, LRU Cache> --> any values between 10 to 19
<30, LRU Cache> --> any values between 20 to 29
If the input comes as key = 29 and value = 29,
Level = 29 % 30 is 29. So we add this key value to primary cache key = 10, 20 and 30
If the input comes as key = 35 and value = 35,
Level = 35 % 30 is 5. So we add this key value to primary cache key = 10 only.
Read Operation:
---------------
Read operation will always read from the lowest bucket as all the keys will eventually go to the lowest level.
Time complexity: 2 map get operations (one to get the LRU cache for the lowest level and 2nd to get the actual value from value LRU cache) so its 2 * O(1) = O(1)
Write operation:
----------------
At worst, it writes to all levels: based on the number of levels, this operation can be O(N)
Here are the code examples:
LRU Cache:
==========
This is a LRU cache implementation
public class LRUCache<K,V> extends LinkedHashMap<K,V> {
int size;
LRUCache(int capacity){
super(16, 0.75f, true); //true is for access order instead of insertion order
this.size = capacity;
}
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() > size;
}
}
MultiLevel Cache:
=============
This is a multi level cache implementation which internally uses the above LRU cache.
public class MLCache {
static MLCache instance;
private static LRUCache<Integer, LRUCache<Integer, Integer>> multiCache = null;
private MLCache() {}
public static synchronized MLCache getInstance(){
if(instance == null) {
instance = new MLCache();
multiCache = new LRUCache<Integer, LRUCache<Integer, Integer>>(3);
multiCache.put(10, new LRUCache<Integer, Integer>(30)); //this size needs to be more so that all keys can go here without getting removed
multiCache.put(20, new LRUCache<Integer, Integer>(3));
multiCache.put(30, new LRUCache<Integer, Integer>(3));
}
return instance;
}
public int get(int key) {
return multiCache.get(10).get(key); //we need to make sure LRU size for the lowest cache is really large so that all keys can be stored there
}
public void add(int key, int value) {
List<Integer> levelList = getLevel(key);
for(Integer keyList: levelList) {
multiCache.get(keyList).put(key, value);
}
}
private static List<Integer> getLevel(int key) {
int modValue = key % 30;
List<Integer> a = new ArrayList<>();
if(modValue < 10) {a.add(10);}
if(modValue >= 10 && modValue < 20) {a.add(20); a.add(10);}
if(modValue >= 20 && modValue < 30) {a.add(30); a.add(20); a.add(10);}
return a;
}
}
Test Class:
=========
public class CacheTest {
public static void main(String[] args) {
MLCache cache = MLCache.getInstance();
cache.add(1,1);
cache.add(2,2);
cache.add(3,3);
System.out.println(cache.get(2));
cache.add(4,4);
cache.add(29,29);
cache.add(35,35);
System.out.println(cache.get(35));
}
}
I agree with adr suggestion of using Priority queue but since interviewer's main focus is on optimizing space complexity, storing word information across PQ and hashmap is something that can be improved further.
We can implement a priority queue using a TreeMap itself. This TreeMap will accept a TimeComparator to sort the elements as they are iterated and added to the map. The Key of the Map is Word and value can be timestamp
A couple of key points:
----------
1. Add method on the PriorityQueue (implemented as TreeMap) will first get the element. If the element does not exist, or it exists but the timestamp is older than 10 seconds, we add the element back. Else we ignore the element.
How to remove elements from PQ:
----------
I see 2 ways to do it.
1. A separate thread can pollFirstEntry of the TreeMap and if its older than 10 sec, issue a remove call on TreeMap.
2. Other Map methods like contains, when called, can first check if the key exists, if yes check if the timestamp is older than 10 seconds. If yes, remove the key and return false;
Time Complexity: O(1)
Space complexity: O(N)
Here are some key classes. I havnt shown implementation of the iterator method that will add elements to PQ.
PQ implementation:
=======
public class MinPQ {
private static TreeMap<Word, Long> pq = new TreeMap<Word, Long>(new TimeComparator());
public static void add(Word w) {
Long oldTime = pq.get(w);
if(oldTime == null || System.nanoTime() - oldTime > 1_000_000_000 * 10)
pq.put(w, w.getTime());
else
System.out.println("Word: " + w.getText() + " already exists. Ignoring ...");
}
public static boolean contains(Word w) {
Long oldTime = pq.get(w);
if(oldTime == null)
return false;
if(System.nanoTime() - oldTime > 1_000_000_000 * 10) {
System.out.println("Time stamp old. Removing element: " + w.getText());
pq.remove(w);
return false;
}
return true;
}
public static Word top(){
Map.Entry<Word, Long> entry = pq.pollFirstEntry();
return entry.getKey();
}
}
Word and comparator implementations:
======
public class Word {
String text;
long time;
//constructor and other getter setters
//equals method is critical as this method will be used to compare key already exists in the map based just on the word instead of word and timeStap
@Override
public boolean equals(Object obj) {
Word other = (Word) obj;
if(text.equals(other.text)) return true;
return false;
}
}
public class TimeComparator implements Comparator<Word> {
public int compare(Word w1, Word w2) {
if(w1.getText().equals(w2.getText())) return 0;
return Long.compare(w1.getTime(), w2.getTime());
}
}
Sample Test method:
================
public class TestCode {
public static void main(String[] args) {
MinPQ.add(new Word("word1", System.nanoTime()));
System.out.println("word1 exists = " + MinPQ.contains(new Word("word1", System.nanoTime())));
MinPQ.add(new Word("word3", System.nanoTime()));
System.out.println("word3 exists = " + MinPQ.contains(new Word("word3", System.nanoTime())));
try {
Thread.sleep(15_000); //wait 15 secs and the word is removed from the PQ
}catch(Exception e){}
System.out.println("word3 exists = " + MinPQ.contains(new Word("word3", System.nanoTime())));
}
}
Output:
======
word1 exists = true
word3 exists = true
Time stamp old. Removing element: word3
word3 exists = false
Note: Since this answer will be long, I have created a google doc instead of writing everything in the text box here.
docs.google.com/document/d/190Ik3yauub4spoSFRldclBuwjyXuk7RLyxf1GFpPk_U/edit?usp=sharing
I will approach this problem using following steps
Step 1: Requirements and assumptions
Step 2: Interface / API design
Step 3: Back of the envelope estimation (if any)
Step 4: Design data model (classes etc)
Step 5: High-level architecture
Step 6: Detailed architecture
Step 7: Performance improvements (remove bottlenecks)
The description that gives a good indication of what can be the design is as follows:
1. Job_arrived() --> This can be an exposed web service / method to accept job
2. Job_run() --> this can be a separate job processor that reads job (from somewhere) and starts processing, in this case, calls external black box service.
3. External black box service --> Application needs a way to talk to external interfaces over REST/SOAP and load balancers
4. The system to be large scale --> System needs to be distributed in nature
5. More than one job can be accepted each second --> System should be scalable so preferably the job receivers need to be separate from job processors so that each can be scaled independently.
So the overall design could be as follows (more details in the google doc):
==============
Application exposes Job receiver web service. This service accepts the job information and puts the job on a queue (preferably min priority queue). Job processors are looking for any job on the queue. As soon as the job appear, they pick up from the head of the queue and call external black box system.
If the response comes back as a failure, they wait 2 seconds and put the job back on the priority queue. Job maintains failure count so that if it fails the second time, processors do not add them to the queue.
The reason for separating job receivers and processors is scalability, based on the load on the system (That is job queue size), the number of processors can be scaled up or down.
Please refer to the google doc link for rest of the part of the design (link at the top)
Even though delete operation for a single and the doubly linked list is O(1) operation once we have access to its current and previous pointers, finding the elements for deletion is still O(N). Therefore I dont think overall delete operation is O(1) operation when finding that element for removal is necessary
- Saurabh July 01, 2018There are 3 restrictions in the question and above 3 solutions which uses arrays fail to satisfy at least one of the restriction:
1. The additional space complexity should be less than O(n)
--> Means we can not use array as that will require O(n) space to hold all elements
2. The input is coming as a stream
--> This means you can not use array again as you dont have all the data at the same time. You can read each input and add it to the array but that will violate restriction 1.
3. Time complexity is linear. O(n)
--> So we can not run loop inside a loop or go back and forth in the array.
Solution:
=======
Since the input is coming as a stream of integer, the logical solution is to create a binary search tree (BST) as the integers come in.
For each input, insert the integer in BST.
If you find the integer already present you found the repeating element which is the answer.
If integer is not present in the BST, add that integer.
Space complexity < O(N)
This approach would take less than the max number of elements in the input as in the worst case last element will match the existing element and we wont have to store that last element
Time complexity is O(N)
Worst case time complexity for searching a number.
As pointed out in the comment, the time complexity for constructing BST is O(nlogn). But the question asks for linear time complexity for searching a number which is O(n)
Shortest Path in almost all cases is a breadth first search implementation. (BFS).
The idea is as follows:
1. Start from the source location.
2. Find all the possible locations Knight can move to from the given location.
3. As long as the possible locations are inside the chessboard, add all such locations to a queue (to process one after the other)
4. Perform steps 1 to 3 until queue is exhausted.
5. If destination location is found in Step 2, report the location and the path.
Note:
1. To find all possible locations a Knight can jump to:
I have used pythagorean theorem where a move makes a right angle triangle of one side to be of size 1 and second side to be of size 2 positions. So as per the theorem, the addition of squares of the 2 sides is equal to the square of 3rd side. (1^2 + 2^2 = 5)
2. To track back shortest path:
The chess board is created with Position Object which consists of X and Y co-ordinates and depth variable which says deep the location is from the source location.
X and Y objects are the co-ordinates of the source location from where the jump was made.
Below is the complete code which gives the number of jumps required for Knight to reach the destination location and also the actual path.
import java.util.LinkedList;
import java.util.Queue;
import java.util.Stack;
public class KnightShortestPathToDest {
static final Pos[][] chessboard = new Pos[8][8];
static Queue<Pos> q = new LinkedList<Pos>();
public static void main(String[] args) {
//Populate the chessboard with position values as unreachable
populateChessBoard();
//Assume the position for simplicity. In real world, accept the values using Scanner.
Pos start = new Pos(0, 1, 0); // Position 0, 1 on the chessboard
Pos end = new Pos(4, 4, Integer.MAX_VALUE);
//Assign starting depth for the source as 0 (as this position is reachable in 0 moves)
chessboard[0][1] = new Pos(start.x, start.y, 0);
//Add start position to queue
q.add(start);
while (q.size() != 0) // While queue is not empty
{
Pos pos = q.poll(); //read and remove element from the queue
//If this position is same as the end position, you found the destination
if (end.equals(pos)) {
// We found the Position. Now trace back from this position to get the actual shortest path
Iterable<Pos> path = getShortestPath(start, end);
System.out.println("Minimum jumps required: " + pos.depth );
System.out.println("Actual Path");
System.out.println("("+pos.x + " " + pos.y+")");
for(Pos position: path) {
System.out.println("("+position.x + " " + position.y+")");
}
return;
}
else {
// perform BFS on this Pos if it is not already visited
bfs(pos, ++pos.depth);
}
}
//This code is reached when the queue is empty and we still did not find the location.
System.out.println("End position is not reachable for the knight");
}
//Breadth First Search
private static void bfs(Pos current, int depth) {
// Start from -2 to +2 range and start marking each location on the board
for (int i = -2; i <= 2; i++) {
for (int j = -2; j <= 2; j++) {
Pos next = new Pos(current.x + i, current.y + j, depth);
if(inRange(next.x, next.y)) {
//Skip if next location is same as the location you came from in previous run
if(current.equals(next)) continue;
if (isValid(current, next)) {
Pos position = chessboard[next.x][next.y] ;
/*
* Get the current position object at this location on chessboard.
* If this location was reachable with a costlier depth, this iteration has given a shorter way to reach
*/
if (position.depth > depth) {
chessboard[current.x + i][current.y + j] = new Pos(current.x, current.y, depth);
q.add(next);
}
}
}
}
}
}
private static boolean inRange(int x, int y) {
return 0 <= x && x < 8 && 0 <= y && y < 8;
}
/*Check if this is a valid jump or position for Knight based on its current location */
public static boolean isValid(Pos current, Pos next) {
// Use Pythagoras theorem to ensure that a move makes a right-angled triangle with sides of 1 and 2. 1-squared + 2-squared is 5.
int deltaR = next.x - current.x;
int deltaC = next.y - current.y;
return 5 == deltaR * deltaR + deltaC * deltaC;
}
/*Populate initial chessboard values*/
private static void populateChessBoard() {
for (int i = 0; i < chessboard.length; i++) {
for (int j = 0; j < chessboard[0].length; j++) {
chessboard[i][j] = new Pos(Integer.MAX_VALUE, Integer.MAX_VALUE, Integer.MAX_VALUE);
}
}
}
/*Get the shortest Path and return iterable object */
private static Iterable getShortestPath(Pos start, Pos end) {
Stack<Pos> path = new Stack<Pos>();
Pos current = chessboard[end.x][end.y];
while(! current.equals(start)) {
path.add(current);
current = chessboard[current.x][current.y];
}
path.add(new Pos(start.x, start.y, 0));
return path;
}
}
class Pos {
public int x;
public int y;
public int depth;
Pos(int x, int y, int depth) {
this.x = x;
this.y = y;
this.depth = depth;
}
public boolean equals(Pos that) {
return this.x == that.x && this.y == that.y;
}
public String toString() {
return "("+this.x + " " + this.y+ " " + this.depth +")";
}
}
The output of this program:
Minimum jumps required: 3
Actual Path
(4 4)
(2 5)
(1 3)
(0 1)
Time complexity O(V+E) and space complexity of O(V) for storing vertices on queue.
In a real work scenario: a constructor would generate the chessboard with all possible positions where knight can move to.
Therefore time complexity for
1. API call for number of Jumps will be O(1)
2. giving the actual shortest path O(V)
Forgot to mention, the same logic can apply to IPv6 addresses
- Saurabh March 18, 2021