keshi
BAN USERA simple way to do this is to maintain a hashMap of Map<String,Set<String>> where the key is the band name and the set of strings is basically the number of employees who have requested this band. This way we can easily invalidate those bands whose value count is lesser than 2 as nobody else likes this band.
Now with this reduced subset, we can convert it to a graph problem where the verticies are the music bands who have more than 1 recommendations and the edges are the employees. Now this is reduced to a spanning tree problem which can be efficiently solved
Space wise we have just a hashmap with the number of entries equal to the number of bands in this case the worst case scenario would be the case where all the 10000 bands requested by each employee will be different in which case we'd have n * 10000 entries where n is the number of employees. On the flip side there would be no processing required here as the reference count for each value would just be one.
There are many efficient ways to solve the graph problem when we need to process the data.
Seems to be an ideal situation to use indexes that you can store in memory. Build indexes using B+ trees. Like each index points to blocks of 100 nodes in the linked list each. Save these indexes in memory. So your actual disk access would eventually be minimal for what you want to search. B+ trees are designed to save you time spent on sequential access. So if the indexing is done smartly you can even save the entire resultant block from the core dump file in memory and do the search which would take just one disk access.
- keshi July 05, 2012Couple of optimizations possible:
1. Compare only those strings of equal lengths. For anagrams the lengths have to be the same. (surprised why none of the above answers did not use this for their hash table approach and unnecessary sorting).
2. Summation of ascii values of the letters in the anagrams will be the same (since they are made up of the same letters).
Unless there is something else I am missing, there is no need for a hash table or sorting for this use case.
This can be optimized using a doubly linked list.
1. Traverse from head and tail at the same time from node to node and check if value is the same. (head ++ and tail --)
2. continue as long as head == tail or head > tail pointer.
3. If there was no change observed in the values as head and tail in each jump then its a palindrome, else its not.
This needs to test out all combinations of 2 numbers to be considered. A set can be used to do deduplication.
- keshi September 25, 2016