um01
BAN USER- 0of 0 votes
AnswersDesign a system to:
- um01 in United States
- System will consume data points streamed in from multiple streams.
- There can be multiple stream (~100)
- Each stream can have multiple data point per second making it a big data use case. (scalability required)
- The data points should be stored in database for data analysis and searching.(storage db consideration)
- What indexing technique would you use for supporting searching and analysis.
- All data points have timestamps and interesting fields.
Design a system to digest the incoming data in stream in realtime and also making it available for searching/analysis after storage.| Report Duplicate | Flag | PURGE
Software Engineer System Design - 4of 4 votes
AnswersYou are given 2 documents. We want to know how similar they are through N-Grams.
- um01 in United States
Given an input n (n = number of word in the Ngram) and two documents(strings) find the intersection of the NGrams of two document.
E.g doc1 = 'This is a dog'
doc2 = 'This is a cat'
n = 3
Ngrams for doc1 = 'This is a', 'is a dog'
Ngrams for doc2 = 'This is a', 'is a cat'
Output 'This is a'
Find a efficient way and give its complexity.| Report Duplicate | Flag | PURGE
Google Software Engineer - 7of 7 votes
AnswersYou are given a list of word. Find if two words can be joined to-gather to form a palindrome. eg Consider a list {bat, tab, cat} Then bat and tab can be joined to gather to form a palindrome.
- um01 in United States
Expecting a O(nk) solution where n = number of works and k is length
There can be multiple pairs| Report Duplicate | Flag | PURGE
Google SDE1 Algorithm - 3of 3 votes
AnswersGiven a String find the first non repeating char in a single pass of the string.
- um01 in United States
Assume a big character set like utf-8 (eleminate use of char[256])
Avoid any subloop to have a very optimal solution| Report Duplicate | Flag | PURGE
Yahoo SDE1 Algorithm - 1of 1 vote
AnswerThis is design question. Starts with small data size and then goes with bigger data size
- um01 in United States
Design an application to retrieve the value for a key as fast as possible. The data given to you doesnt change.
You are given a data file of 1GB. Its key value data with key being bytes[]. What would be your application design.
Now assume the data set is 1 TB or greater how would you change your application design.(Distributed is a possibility).
Now assume the keys are very small. What are your optimizations.| Report Duplicate | Flag | PURGE
LiveRamp Senior Software Development Engineer System Design
This can accomplished using 2 stack + dfs + visited flag array. Idea is as you go down the graph you build the memberOf groups and as you trace back up you build the members groups for the Group you are traversing.
a. Iterator over hashmap
b. For each entry do following
Invariant is for Groups in parent stack you have computed memberOf groups. When you pop groups from parent stack you will be going back up and computing members group.
1. initial 2 stack. a parent and a child stack. initial parent stack with G4 and child with its children i.e G1. Memberof Parent G4 are none so nthing to do here
2. When you pop a element from child stack , create a new object MyGroups. Get the parent of child (G4, maintain 2 hashmaps for Parent to child and child to parent mapping). Mark it as visited and add users to MyGroup's user list.
3. Add parent (G4) to memberOf of child (G1). Add memberOFs of parent(G4) to membersOf child as well. Push this child to parent stack.
4. Update parent-child mapping and child-parent mapping.
5. Get the children of G1 from input dictionary and push them on child stack.
6. Continue steps 2-5 till child stack is empty.
7. Child stack is now empty.
8. Pop element from parent stack . Note the leaf of the graph will be at top now.
9. Get it's child from the parent-child mapping we build when we worked on child stack.
10. add the child to member of this group's MyGroup object. Get the child's member list and also add it to this group's MyGroup object.
Node. if its a leaf the there won't be any entry the the parent-child mapping unless it a cycle.
11. Continue poppping elements from parent stack and repeat steps 9-10 till parent stack is empty.
c. Once we have execute step b for all entries in input dictionary return our build list of MyGroup
Where there any restriction on Memory complexity. It not this looks like a good candidate of Hashing.
public class Anagram{
HashMap<Character> ana;
public Anagram(char[] word){
for(char c: word){
//populate HashMap with char freq.
}
}
public int hashCode(){
// implement a hashcode for Anagram object.
}
}
1. Create a empty HashSet<Anagram> anagrams.
2. Iterate over each word in the file. ( you can read line by line and split on space for simplicity.
3. Create a Anagram object of word.
4. Check if alread in word in anagrams HashSet. if not dnt add else add.
This is just for clarity of explanation. You can just add it to anagrams Hastset.
HashSet will keep only unique entries based on our hashcode() for Anagram obj.
5. At the end of file the size of HashSet anagrams if the number of anagrams in file.
When we create hashtable of info -> Node. the info code be same phone number belonging to two diff entries or email ids belonging to two different entries.
The previous hashtable entry will be overritten. In that case how will DFS reach that node if we are using Hashtable to store the graph.?
You can extended the hashmap idea. Preprocess input to create a HashMap<id(manager), ArrayList<id(sub-ordinates>>.
To print all subordintes of manager you will to first go through all sub-ordinates of manager ie the arraylist, mark them visited and then make them managers and go through their subordinates. Its similar to BFS in a graph.
If use X.pop in step 3 or use heap-maxify then it will introduce O(log k) extra complexity for each element we insert into X.
If k in worst case is n it going to be more than O(n) i.e. O(nlogn)
Am i understanding this right ?
Also to use median of medians or randomized quickselect we will need to partition and partial sort element in the new array which doesnt feel right when the interviewer says dont modify original heap
HI JSDUDE. Can you please post your solution ?
I believe a optimised way than what you suggested might be a SWAR Algo. Check out these posts
http://stackoverflow.com/questions/109023/how-to-count-the-number-of-set-bits-in-a-32-bit-integer
http://aggregate.ee.engr.uky.edu/MAGIC/#Population%20Count%20%28Ones%20Count%29
Also as KP mentioned if we can use some extra memory lookupTable caching count for 8 bit words can be precomputed
We can expand on aleksey.khishin's solution.
The complete algo steps would be like:
1. add the word to trie
2. Each time we pick a work reverse it and search it in the trie.character by character
3a. If the word match but the we still dont reach the sentinel eg. input word is "h" and we already have a "hellolle" in trie. Here we match "h" and then have "ellole" to go. In such a situation we list all the words from "e" onwards which is just "ellole" in this case and check if they are palindrome. if so we have the desired pair.
3b. If a part of input reversed word matches and we reach a sentinel in trie but there is still a suffix of input reveresed remaining e.g. "leh" is in trie and we have "hellol" as new word from the input. Then we will match "hel" from reversed "leh" but "lol" is yet remaining. Here we just check if remaining portion is palindrome. if so we have the desired pair.
Keeping doing this till we reach end of list of input words
I came up with the following solution:
Its single pass and uses optimal data structures to avoid subloops.
The idea is:
Maintain a Hashmap of <Character, Integer> which stored how many time a character has appeared as we traverse.
The LinkedHashSet maintains a LinkedList of characters in the order they have appeared.
If any character has re-appeared (count of it in Hashmap > 1) while scanning remove it from LinkedHashSet , which is O(1) operation.
The motivation to use this is final lookup operation to get the first non repeating character is O(1).
If it appeared first time add to the LinkedHashSet (the datastructure will add it to the end of underlying LinkedList) and add to HashMap with count 1.
Thus this LinkedHashSet stores only characters which have appeared once in the order they appear in the string.
At the end of scan of string print the first character from the LinkedHashSet as it will the element which has appeared only once and also it has appeared first. Thus its the First Non Repeating Character.
public static void findFirstNonRepeating {
String s = "abbaad";
LinkedHashSet<Character> lhs = new LinkedHashSet<Character>();
HashMap<Character, Integer> hm = new HashMap<Character, Integer>();
for (char c : s.toCharArray()) {
if (hm.get(c) != null) {
Integer occ = hm.get(c);
hm.put(c, ++occ);
if (++occ > 1) {
//O(1) operation
lhs.remove(c);
}
} else {
//O(1) operation
hm.put(c, 1);
lhs.add(c);
}
}
//O(1) operation
Iterator<Character> itr = lhs.iterator();
while(itr.hasNext()){
System.out.println(itr.next());
break;
}
}
HI scorpion king, Thank you for your response. This is a good solution.
If i understand correctly this is 2 scans. One for your main for loop for length of string. Other is the arraylist remove operation
temp.remove((Character) str.charAt(i));
Removing a element from ArrayList is linear time.
The interviewer (as mentioned in question) wanted a single scan solution, without any subloops
HI Sk,
Thank you for your response. This is a good solution.
But the interviewer wanted me to optimize more. you have a loop where we sent the pointer to index with 0 value. I had proposed something similar and he wanted me to remove that loop as well.
I ended up suggesting using a LinkedList as a queue instead of the secondary array (i also maintained a hashmap) and inserting at unique chars at end as and when they come. He wanted the removal ( setting point ptr in your solution) to be O(1) so i suggested maintaining reference to the Node and using a doubly Linkedlist.
I feel this isnt a clean solution but I couldnot think of anything better.
package numbers;
import java.util.ArrayList;
public class FibPairLSD {
public static void main(String[] args) {
System.out.println(dpGenFib(8));
}
public static int dpGenFib(int n){
int[] fibNums = new int[n];
fibNums[0] = 1;
fibNums[1] = 1;
if (n <= 2)
return fibNums[n - 1];
Pairs p = new Pairs(1, 1);
ArrayList<Pairs> list = new ArrayList<Pairs>();
list.add(p);
for(int i = 2; i < n; i++){
fibNums[i] = fibNums[i - 1] + fibNums[ i - 2];
int s = fibNums[i] % 10;
int f = fibNums[i - 1] % 10;
p = new Pairs(f, s);
list.add(p);
}
System.out.println(list.toString());
return fibNums[n - 1];
}
public static class Pairs{
int f;
int s;
public Pairs(int f, int s) {
this.f = f;
this.s = s;
}
public String toString(){
return "(f:" + this.f + " s:" + this.s + ")";
}
}
}
you are using extra space with 3 string reference (lowers, psace, uppers) and many String instance
- um01 August 14, 2015