Write a Program for Dictionary

Amazon Interview Question for Software Engineer / Developers

5

of 9 votes

11
Answers
Write a Program for Dictionary which has functionality of lookup and insert . This program should be able to add words on the fly

I wrote simple code using HashTable

follow up

1) Now we are getting too many words what happens
me: Hashtable will dynamically resize resulting into performance hit . Also they might get hashed to same location as well as we might run out of main memory

2) Okay you are out of main memory , How will you scale this program
me: I will create buckets of HashTable lets say 26 buckets for one for each alphabet and would put them on different machines

3) Lets say you are out of memory on those machines too
me: Okay I need to put them on secondary storage . Here we can have fileSystem or Database . I chose database . I will create simple DB schema of BucketNumber and word .

I will use buckets on main memory as cache , if we are not able to find a word in the bucket then query databse with bucket number and words then remove the least number times looked up word (every time we lookup a word we increament the count i.e value in key,value pair on hashtable) from that bucket and add this word .

I mentioned that bottleneck in this case will be every time a word is not present we need to query DB which usually has high latency which will result into performance hit

4) Lets say we are okay with latency but what if we are getting inserting words between that are only between only in two buckets ex. words starting from a and b only.

Now that I think about it, is it better to do this in a trie? What do you guys think?
- Guy February 05, 2014 in United States | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer System Design

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 9 vote

Trie is better from HashTable with respect to space complexity. If you have to words with common prefix e.g abc abcde you still have to create to separate keys in HT whereas in trie the words share common prefix part.
But if access time is the most important factor the HT is better. So I think you should have explained both approaches and compare their pros and cons.

- thelineofcode February 05, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How does Trie solve the follow-up issues? Like how does it handle sclability and memory issues? Also, regarding Trie's space complexity, for words share common prefix like abc and abcde, even though they share common prefix in trie, doesn't it still store two values for both abc and abcde?

- Guy February 06, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Tries requires a lot more memory because of indexing arrays in every node. It is not so if a trie is really dense and every node has close to the alphabet links. In that case we just use all redundant space.

- Anonymous February 06, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Well, each trie node just keeps an array of size 26 (one for each letter) or a hashtable<Character, TrieNode> whose size is 26 or less. I don't think it will require any additional space? In the case of hashtable implement, for input abc, abcd, abcde, abcdef, abcdefg, the common prefix abc is stored five times, whereas Trie will only store abc once.

- Guy February 06, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

You can use a function like crc32("abc") which will give you an integer value for your word.
You may chose a server to store or lookup your word as crc32("abc") % (number of your servers)

However, there is a problem with this approach. If you add new servers to your app, you have to rearrange your words by servers again...

- darik June 11, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Store dictionary using B+ tree in a file (same as B+ tree index built in Database). Each page of a file will represent a node of a B+ tree. This will have high fan out of the Tree (for example if page size of a file is 4 KB and if we use 32 bit number to represent page number( pointer ) each node will have 4KB/ 4 B = 1024 fan out ). and hence very low depth of the tree.

B+ tree can be preferred over HashTable when size of hashtable is larger than RAM size. If size of hashtable is larger than RAM size it will increase threshing due to random access pattern of hashtable.
In this case B+ tree gives benefit of locality of reference by storing nodes which are at lower level in RAM (Cache)

- jigs August 30, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I believe it would be solve by Trie search + B+ tree. In your load balancer you can write an algo while search or insert that if some string appears start from a or b then which machine it should go. Now on particular machine, we can use B+ tree to store and retrieve words.

- Phoenix September 01, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

I think he questioned about memory since you are storing everything in the hash table. Here is the implementation with using B-tree.

public class Dictionary
{
    public Node head = new Node();
        
    private class Node{
        
        public Node parent;
        public char index;
        public String definition;
        public HashMap<Character, Node> children;
        
        public Node(){
            parent = null;
            index = '\u0000';
            definition = null;
            children = new HashMap<>();
        }
        
        public Node(Node parent, char index, String definition){
            this.parent = parent;
            this.index = index;
            this.definition = definition;
            children = new HashMap<>();
        }
        
        public Node(Node parent, char index){
            this.parent = parent;
            this.index = index;
            this.definition = null;
            children = new HashMap<>();

        }
        
        private boolean hasDefinition(){
            if(definition != null)
                return true;
            else
                return false;
        }        
    }
    
    public void insert(String word, String definition){

        Node currentNode = head;

        for(int i = 0; i < word.length(); i++){
            char index = word.charAt(i);
            if(currentNode.children.containsKey(index)){
                currentNode = currentNode.children.get(index);     
            }
            else{
                Node newChild = new Node(currentNode, index); 
                currentNode.children.put(index, newChild);
                currentNode = newChild;                
            }
        }
        currentNode.definition = definition;        
    }
    
    public String lookup(String word){
        
        Node currentNode = head;
        String returnDef = "";
        
        for(int i = 0; i < word.length(); i++){
            char index = word.charAt(i);
            if(currentNode.children.containsKey(index)){
                currentNode = currentNode.children.get(index);
            }
            else{
                return "";
            }
        }
        if(currentNode.hasDefinition()){
            returnDef = currentNode.definition;    
        }
        return returnDef;
    }
}

- HurricaneCheetah February 08, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

-2

of 2 vote

You could use MD5 function to generate string for each input and use the resulting string as key in your HashMap, for example:
- md5("abc") = "900150983cd24fb0d6963f7d28e17f72"
- md5("abcd") = "e2fc714c4727ee9395f324cd2e7f331f"

you would have less probability of key clashes on this case.

- guilhebl February 06, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How would this help? Could you please explain? Thank.

- Guy February 06, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 2 votes

Since the interviewer sated on 2) you are out of memory, I believe Distributed HashTable would solve the issue, you would use a Hash function of the key String and with the resulting String you could index a range of machines that could answer the request, using consistent hashing would avoid the bucket collision issues pointed above, and you would be able to scale to several machines.

- guilhebl February 06, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

which bucket collision issues were you referring to? Even thought we use Distributed Hashtable, it would be still possible that all machines run out of memory.

- Guy February 06, 2014 | Flag

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews