Assuming you have 1 billion (o

Google Interview Question for Software Engineer / Developers

0

of 0 votes

11
Answers
Assuming you have 1 billion (or any arbitrary large number) files sitting in a machine. Given a phone number, how do you find files that have the number in it?
Rather than actually give him an algorithm, the interviewer was looking for some ideas.
- AnonymousC December 03, 2009 | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer Ideas

Email me when people comment.

An error occurred in subscribing you.

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

grep

- effectivec February 26, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

you have to use regular expressions, and the rest depends on the tool or language available to you. Generally in unix command line they have a command called grep which will do the job.

- Anonymous January 22, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

I think grep is what the interviewer wanted to hear.

- aditya.sakhuja April 26, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

put index on phone number attribute and then do index search or can go for hashing

- Y December 17, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think the problem is the sheer number of the files.

- Anonymous February 18, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

area wise breakup of files may help here......for ex: if all files having 408 as starting codes would be from california, check for those numbers in california files (which I presume would be less than a billion, probably close to a million entries stores in 1000 files...these files can then be checked for the next 3 digits in the phone number...probably this will reduce it to 2-3 files

- Anonymous March 05, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

create a trie of all the words that contain just numbers in them in all the files. this should be pretty small. the leaf nodes will contain the file indexes where that word is contained. after this is created given a phone number it can be in o(n) where n is length of the phone number.

- Anonymous March 11, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

hahaha ..

the no of nodes in the tree would be small not the space occiupied by pointers. I dont think any data structure can have this fit into primary memory.
say, 1000 records per file
10 billion files = 10^9 files = 10^12 records
nodes in tree [0,1,2,..9]
pointers: 10^6 on an average

- Anonymous December 24, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The first Anonymous is correct i belive
The second Anonymous, it is TRIE not TREE.
how did you calculate that number of pointers will be 10^6?
each node will have [0,1...9] and one more pointer that stores the starting node of file names.
Correct me if I am wrong first Anonymous

- Kiran July 20, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

My suggestion is:

1. sort the files based on their type. like media files, textual ( txt,doc,pdf,etc...) files. because videos and audios can also be the files in those billions of files.

2. if the phone number has to be searched, then we need to know one thing.... that is the period of usage of that number. if the person took that number few months ago then we can sort files based on the time of their upload and ignore the older files as they cannot contain the phone number details.

3. then we can implement the hashing or any other searching technique. This helps a lot in saving memory and time.

I hope u appreciate.thank you

- Sunil April 06, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Create a hash of all the phone numbers you have. Now read every phone number width strings and search in the hash. So, if the phone number length is m, then every file you are doing n+m-1 hash comparisons. Overall for k files, it takes (n+m-1)*k.

- Messi April 18, 2010 | Flag Reply

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews