Google Interview Question for Software Engineer / Developers






Comment hidden because of low score. Click to expand.
2
of 4 vote

The answer partly depends on whether they are talking about a classic hashtable implementation (like HashTable / HashMap in Java) or something more sophisticated. In the end, 30 GB of memory is still quite big for a single machine/VM by today's standards.

So think about what's going on underneath:

It has to read write at an arbitrary position in some massive array.
It has to grow if it fills up beyond some measure; see 'load factor' in Java implementation.
In a garbage collected language/implementation, all the objects stored in the hash table need to be inspected by the garbage collector
Which leads to the following problems:

It is unclear that even today's operating systems deal well with allocating chunks of memory in the tens of GBs
For simplicity, say that half of the table was actually used by the table itself (not the key and value objects). So there is a 15 gb array inside. So every time the table grows, you need to allocate at least another 15 gb
Even if a tens of GB array was allocated, the OS would page some of this memory. Since we are assuming a good hash function, we will break page caching if we use most of the data in the array. There will be a lot of page faults.
Let's say we don't use all the data. Some keys are used frequently, and others are not. To illustrate, say that each key-value is tiny -- 128 bytes. And for simplicity, say that we store everything in the hash table as values. So 30G/128 = ~ 250M entries. But say 25k commonly used keys. (25k / 250M = 0.01%). But with good a hash function, these would be evenly scattered across the massive array. Even with small page sizes -- say 4kb, the 25K (entries) * 128 bytes (entry size) = ~3.5Mb worth of commonly used data costs us 25K (entries) * 4K (page size) = ~ 100Mb of memory that needs to be kept paged in... at a whopping 3.5% efficiency!
In the Java world, practitioners don't recommend heap sizes larger that 4 - 8Gb. Sure there's stuff like Azul, but that simply proves the point -- a typical garbage collector don't scale to these sizes very well.
I agree with other posters that Google is looking for distributed as a solution. But I think at the heart, a simple hashtable stops scaling beyond a point. In the above,

You would have to distribute if all the entries are accessed relatively evenly
If some are accessed a majority of the time, using two maps (one for most frequently used) can buy you a lot.
In the Java world, using specialized maps that store data off the heap can buy you performance too; see Peter Lawrey's work for example.
Even simply striping the underlying array in a hashtable (like Java's ConcurrentHashMap does) can buy you major improvements when you have to grow the hashtable.

- Anonymous May 22, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 3 vote

I think the question is asking what is the possible problem but not the solution...

- Tracy February 08, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

your hash can not go beyond 3GB on 32bit system, you will experience OOM exception, normally 2GB is given for user address space out of 4GB (2GB is for kernal address space) but you can modify it and make it 3GB leaving 1GB for kenal.

on 64bit you can calculate accordingly.

- Anonymous February 12, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Why specifically 30GB?
Maybe we need external hashing?

- GekkoGordan February 03, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 2 vote

hash collisions might be one problem. And if table grows 30gb then it should be stored on disk, it will be inefficient to make frequent disk reads when data store in disk as hash table.

- neo February 04, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

30gb hash table in memory is very bad, so in one way cost of swapping the block will dominate. So, I think you need to say some idea like pagetable having multiple levels.

- Anonymous February 06, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

and you might need to tell the number of levels also.

- Anonymous February 06, 2011 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Why not use a B-Tree then?
you only load part of the tree in memory while searching, the rest is kept on the disk, you can also index segments of the tree using a hash. Thus making faster the search.

- ronin February 06, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Simple, It gives MemoryOutOfBounds exception... You need to use DiskCache explicitly and implement the hashset (Extending original HashSet)

- Lohith Ravi March 26, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

->If hash table is of this big size, then definitely we are talking about disk usage here. Then, overall performance of key lookup will be impacted.
->If more than 50% of the hash table is empty, then something is wrong in the hash function.
(Ignore this, mentioned as part of problem)
->We need to choose a proper file format to store it, so that the performance can be enhanced.

- Anonymous February 07, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I think one problem might be hash table re-allocation. If we want to increase number of slots because we reached some threshold, we will have to copy all the data in a new allocated table, and recalculate all hashes which will be time consuming.

We don't have this problem with tree for example.

- Grindaizer June 25, 2013 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More