How does trie handle scalabili

Google Interview Question for Software Engineer / Developers

-3

of 5 votes

17
Answers
How does trie handle scalability as opposed to hashtable? Assuming it is used for a dictionary. Sclability here should cover large size of input, running out of memory, or even running out of memory on multiple machines if distributed system is used.
- Guy February 06, 2014 in United States | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer System Design

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

You can easily distribute hash table into hundreds/thousands machines by assigning each machine a key range. Each machine will know what other machine is responsible for the give key. So any key can be looked up with at most two machine forwards.

You cannot easily distribute Trie, since there is no easy way of splitting data and looking up efficiently. If you want to specify range by first letter, then you have only 26 options (assuming only letters) and it doesn't handle cases when data fits only in thousands machines. You may have to use first 10 letters to locate a machine, then each machine has to keep this big machine table in memory which further reduces memory availability.

- AK February 09, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 votes

Cant agree with you on the distribution of tries, you can have like aa to machine 1, ab to machine 2 and so on. That gives you 26*26 machines and each can know abt the other just like hashmaps. (Ofcourse, the special case of the word 'a' etc that can be stored specifically in say the first machine aa)

- Anshul February 14, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 3 votes

With trie distribution of words across m/cs will be non-uniform if you split by prefixes.

For hashtables, the quality of your hash function determines the distribution of data.

- AlgChamp April 29, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

A trie stores only one 'instance' of a common prefix of several strings. Thus it saves memory that is consumed by 'common' prefixes of strings.

- Murali Mohan February 06, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

What do you mean by 'instance'? For instance, for input strings abc and abcde, common prefix a->b->c, this sequence of nodes will only store once, is that what you mean? But if the input size is extremely large, it will eventually run out of memory, won't it?

- Guy February 06, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Redirection can always be done across a network of nodes if the data set is expected to be large.

- Nil February 08, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Implicit in your question is the premise that a hash table somehow handles running out of memory. A trie handles this the same way a hash table does -- it doesn't.

- eugene.yarovoi February 06, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Then how should a trie handle those cases?

- Guy February 06, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 5 votes

@GZ: Let's review what it means to run out of memory. It means that we need additional space to store some data we need to store, but we just don't have that extra memory. We're out of physical resources.

Your question is like asking "I have a warehouse and I need to put another box in it, but the warehouse is completely full already. How can I handle this case?" You're physically out of space. Your options when you run out of RAM are analogous to the options you have here:

1. Get a bigger warehouse (buy new RAM) or annex a new section to your warehouse (add a stick of RAM). Then you can store your box (data) and possibly many more boxes.

2. Compress the boxes (data) already in the warehouse (memory), if they are compressible. Just like physical goods, not all data is very compressible. Doing this means you might spend extra time compressing and decompressing boxes (data) when putting them in and when retrieving them. Depending on how you compress, it might also make your warehouse less organized, making it harder and/or slower to locate items.

3. Discard some of the old items. This only works if you have something that can be thrown away. If there's a requirement to keep all the old stuff around, there's nothing you can do here.

4. Put whatever you can't fit into your warehouse (RAM) into secondary storage (hard disk). The secondary location may be far away, so placing goods there might take much longer than it would to place them in your warehouse. Retrieving items from secondary storage might also take longer for the same reason. Disk is very slow compared to RAM.

- eugene.yarovoi February 06, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

When is your interview KevinLane aka Karin-Answer?

- Anonymous February 06, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

-2

of 2 votes

what? last month.

- Guy February 06, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

You got asked so many questions? Congrats, you surely must have gotten the job.

- Anonymous February 07, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

lmao, he is just copy pasting glassdoor interview questions

- Anonymous February 07, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

@lmoa: You have missed the sarcasm in those comments completely ;-)

- Anonymous February 07, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Guy is a posting questions from all random places ... I see Guy's question every page at-least 8-10 of them from him.

Can't a GUY attend so many questions and still not get a job in google ;)

- PUdayakumar July 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

actually I am not so understand the question.
does this question is ask how to make trie tree become scalability if there has so much long string?
trie structure can handle huge string input with short length. but if the string length is large, which will make trie become buge,
so in this case, we can build sub trie on sub machine,

- suwei19870312 June 18, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

First, if the dictionary needs to support Range Query, like: "fetch all words starts from a*", the HashTable is not sufficient.
So, let's say the dictionary only needs to support Exist or Not.

Second,
Case 1: The index can be kept in main memory.
Trie: Average Time Cost: O(n), n is the average length of words. Space: O(N * n), N is the number of words.
HashTable: Average Time Cost: O(1). Space: O(N), N is the number of words.
(Both supports Parallel Query)

Case 2: The index cannot be kept in main memory, but can kept in 1 PC's Disc.
Then, Disc read is the major source of time cost.
Trie: O(n) - k, (k is the level of trie that can keep in memory)
HashTable: O(1)

Case 3: The index cannot be kept in main memory, but can kept in a distributed memory on K machines. (assume that there is 1 master machine)
Trie: O(n)
HashTable: O(1) + O(1)

Case 3: The index cannot be kept in main memory, but can kept in a distributed file system on K machines. (assume that there is 1 master machine)
Trie: O(n)
HashTable: O(1)

In summary, in all the cases, HashTable is more scalable than Trie, if only Exist() should be supported.

- xuyan.nus June 26, 2014 | Flag Reply

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews