Google Interview Question
Software Engineer / Developersarea wise breakup of files may help here......for ex: if all files having 408 as starting codes would be from california, check for those numbers in california files (which I presume would be less than a billion, probably close to a million entries stores in 1000 files...these files can then be checked for the next 3 digits in the phone number...probably this will reduce it to 2-3 files
create a trie of all the words that contain just numbers in them in all the files. this should be pretty small. the leaf nodes will contain the file indexes where that word is contained. after this is created given a phone number it can be in o(n) where n is length of the phone number.
hahaha ..
the no of nodes in the tree would be small not the space occiupied by pointers. I dont think any data structure can have this fit into primary memory.
say, 1000 records per file
10 billion files = 10^9 files = 10^12 records
nodes in tree [0,1,2,..9]
pointers: 10^6 on an average
My suggestion is:
1. sort the files based on their type. like media files, textual ( txt,doc,pdf,etc...) files. because videos and audios can also be the files in those billions of files.
2. if the phone number has to be searched, then we need to know one thing.... that is the period of usage of that number. if the person took that number few months ago then we can sort files based on the time of their upload and ignore the older files as they cannot contain the phone number details.
3. then we can implement the hashing or any other searching technique. This helps a lot in saving memory and time.
I hope u appreciate.thank you
grep
- effectivec February 26, 2010