dhirendra.sinha
BAN USERint findPathLength(String start, String end){
// put start in queue
// put all the children of start in a queue
ObjectLevel o = new ObjectLevel(start, 0);
Queue<ObjectLevel> q= new LinkedList<ObjectLevel>
q.enque(o);
HashSet<ObjectLevel> allWords = new HashSet<ObjectLevel>();
while (!q.isEmpty()){
// find the valid words children and add
ObjectLevel a= q.dequeue();
boolean isEnd = getValidChildren(a, end, Set<ObjectLevel> s);
if(isEnd) {
return a.level + 1;
}
s.removeAll(allWords);
enqueSet(q, s);
allWords.addAll(s);
}
}
boolean getValidChildren(ObjectLevel a, String end, Set<ObjectLevel>){
// Get all valid permutations of the a.text and check isValidWord against it
…
//check if the children contain the end word
if(children.contains(end))
return true;
}
class ObjectLevel {
String text;
int level;
ObjectLevel(String t, int l){
this.text = t;
this.level =l;
}
}
Lets start with all possible children nodes for cat and make a tree out of it with maintaning parent pointer.
Keep track of the already existing parent words and don't repeat in children. Maintain this in a hashset.
For example:
cat - {(bat, eat, fat, hat, mat, oat, pat, rat, sat, tat), (cot, cut), (cab, cad, cam, can, cap, car)}
Now for each of these words, get all possible valid words which we have not seen before. Add these words to the hashset.
So you will be building a big tree with all unique nodes with 3 letter words
As soon as you see dog, just stop and trace to the parent to the root and count the number of levels. Or we can maintain level info for each node.
List<Integer> unionLists(List<Interger>a, List<Integer>b){
List<Integer> c = new ArrayList<Integer>();
int i =0;
int j =0;
int lena = a.size();
int leb = b.size();
while(i< lena && j< lenb) {
if(a.get(i) <= b.get(j)){
c.add(a.get(i));
i++;
}
else {
c.add(a.get(i));
j++;
}
}
// Now either a is done or b is done
if(i == lena){
for (int k=j; j< lenb; j++){
// copy all b rows to c
c.add(b.get(k));
}
}
if(j == lena){
for (int k=i; j< lena; j++){
// copy all a rows to c
c.add(a.get(k));
}
}
}
List<Integer> intersectList(List<Integer> a, List<Integer> b){
List<Integer> c = new ArrayList<String>();
int i =0;
int j =0;
int lena =a.size();
int lenb =b.size();
while (i < lena && j< lenb){
//store common elements in c
if(a.get(i) == b.get(j)){
c.add(a.get(i));
i++;
j++;
}
}
}
Have a circular queue of size 600 slots so that we just store average values of per second in each slot ( or we can go very fine(5000*60*10) = 3 million) slots for each symbol.
And another buffer of all incoming messages for each symbol for 1 second.
Now every time a message comes for a particular symbol, then we just keep collecting the prices and count the number of times it has come in this second and store both. So store as a tuple:
(sum_price: sum(prices),
num_times: N )
After each second just take this tuple and put it in the circular queue for that symbol. And purge the last bucket (10mins + 1st sec).
When asked for the average price, just return the sum of all the sum_price values and divide that by sum of all the num_times value. That will give you the average price.
If we want better accuracy, then just make the circular queue of size 3 million rather than 600.
Design a Cassandra database for this.
Query: Select original_posts where shared_time > current_time - (1 hr/24hrs/5months) order by numshares desc
So design the database around this query.
Create table posts_by_time_numshares (
original_post_id uuid,
original_post_title text,
original_post_creator frozen <user>,
shared_by frozen <user>,
time_shared_at timestamp,
numshares counter
PRIMARY KEY ((original_post_id), time_shared_at, numshares, shared_by)
CLUSTERNING ORDER BY (time_shared_at DESC, numshares DESC, shared_by ASC)
);
So when the write happens it will write to the Cassandra nodes lets say with RF=3, We can read the nodes with CONSISTENCY QUORUM, so we have the agreement with majority of nodes for the counter numshares.
- dhirendra.sinha January 18, 2016The web is like a graph for a domain.
Webcrawler is like traversing that graph.
Now, usually when we traverse the graph, we mark a node visited (or create a hashset and maintain the set of visited nodes ).
In this case we can maintain a hashmap with url -> count and increment the count when its encountered. So when the traverse finishes we will have a map with the count.
At the end, just iterate through all the keys and find the url with max value.
hasPalindrome(String str){
// check null
if(str == null)
return false;
str = str.toLowerCase();
int len = str.length();
if (len == 1)
return true;
// check if there is a possibility of the subStr being a palindrome
for (int i =0; i< len) {
for (j = len-1; j>i; j- -){
if(str.charAt(i).equalsIgnoreCase(str.charAt(j))){
String subStr = str.subString(i, j);
if(isPalidrome(subStr))
return true;
}
}
}
return false;
}
isPalidrome (String subStr){
if (subStr == null) return false;
String rev = String.reverse();
if (subStr.equals(rev))
return true;
}
return false;
}
Assume input comes like 1 file for each page containing words.
Mapper code:
Each Mapper reads one file. Use a separator lets say ($)
output: word, fileName$1
Reducer receives the input as:
word, list of values
values contain fileName$1
for example:
file1$1, file1$1, file2$1, file3$1..
Now we can just process this list and make a structure Map<String, Map<String, Integer>>
which we can write to text file in a format:
word -> file -> count
example:
abc -> file1 -> 2
abc -> file2 -> 1
abc -> file3 -> 1 ...
Assuming the phone number is of format:
+xxx-xxx-xxx-xxxx (13 digits), So theoretically it can have 10^13 numbers which is 10 trillion possibilities, so given only 1 million numbers, but it will be very sparse.
Choose a hash function, something like add all digits (call it "s") and choose the last 5 digits of the number and add "s" to this number.
Now, based on the hash function, we will be able to map each number to a location in an 100,000 sized array.
When we have a collision, then we can store the number in a linkedList which can keep growing with load factor.
So search, insertion and deletion would be O(load factor).
Use NoSQL solution (MongoDB kind)
1. Have two document stores called posts and the other called liker
2. Posts would have post_id, post_title, post_desc, post_owner, liker as array of objects, comments as array of objects and so on and so forth.
3. Liker would have liker_id, post_id and so on and so forth.
4. Everytime a post is posted, we create a document and store in Posts document store.
5. Everytime a post is liked, we retrieve the post and add the liker in the list
This is a distributed solution so it may not be consistent right away, but is eventual consistency, because Availability is more important.
Likes and posts need not be superfast and consistent all the time.
If there is one DB table with auto-increment ID for each new url, then it will become bottleneck, if millions of urls come as input simultaneously.
We could have N tables each having auto-increment ID and then partition the request coming may be on the num of characters or some such quick metric and map these to one of the N db tables.
These N tables would start their auto-increment from different start numbers.
This way we can parallelize the ID creation
Have a MapReduce job:
1. Make the line number as a key and value as the line and Split the input file into several smaller files.
2. Mapper can just reverse the key and value, so now we have line as key and line number as value
3. Reducer will have line and list of line numbers as values (Reducer will get as an input all the same lines together the shuffle, sorting and combining step would do this)
4. Reducer will do a min(line numbers) and output line and minLineNumber
5. Output will have all non duplicate lines, so merge it keeping in mind the line number, so order is maintained.
1. Construct a HashMap<String, HashSet<String>> (lets say productMap) and store all the Amazon products as keys.
2. As and when a new product is launched add an entry to the above map
3. When any customer buys list of products,
- for each product (lets say p1) in that list, convert the rest of products as a set (lets say s1)
- look for p1 in productMap and get the HashSet corresponding to it (lets say s2)
- merge s1 and s2 and write back to the productMap
4. Now anytime a customer shops anything, we can just pull the corresponding Hashset of products from the productMap and display it.
- dhirendra.sinha January 20, 2016