In gmail, while composing an e

Google Interview Question for Software Engineer / Developers

0

of 0 votes

18
Answers
In gmail, while composing an email, upon adding a contact, related contacts are displayed. How would you implement that feature?

- Write an algorithm for that.

- What data structure would you use to store the weights?

- In what format would you persist this data?
- pqrabcd November 14, 2014 in United States | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer

Email me when people comment.

An error occurred in subscribing you.

Country: United States

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 3 vote

Better formatting

Root	N1	N2	N3	N4
Root		1	1	1	1	1
N1		1	0	0	0	0
N2		1	0	0	0	0
N3		1	0	0	0	0
N4		1	0	0	0	0

Sends email to N1; Sends email to N3 (2 different emails)
		Root	N1	N2	N3	N4
Root		1	2	1	2	1
N1		2	0	0	0	0
N2		1	0	0	0	0
N3		2	0	0	0	0
N4		1	0	0	0	0

Sends email to N1, N2 (same email)

		Root	N1	N2	N3	N4
Root		1	3	2	2	1
N1		3	0	1	0	0
N2		2	1	0	0	0
N3		2	0	0	0	0
N4		1	0	0	0	0

Sends email to N1, N2, N3

	 	Root	N1	N2	N3	N4
Root		1	4	3	3	1
N1		4	0	2	1	0
N2		3	2	0	1	0
N3		3	1	1	0	0
N4		1	0	0	0	0

- Vijay November 15, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How do we persist this data to disk?

- pqrabcd November 15, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

If there are 10000 contacts, we need to store 10^8 values. Is there a better way?

- pqrabcd November 15, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

we can use sparse matrix to contain the data.

- kaili.zkl November 17, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

We can either use a sparce matrix or use adjacency list instead of adjacency matrix.

- Vijay November 18, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

What about a Trie as the data structure ?
We can display all the nodes reachable from a traversed ancestor.

- Anonymous November 15, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Can you elaborate?

- pqrabcd November 15, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

When we compose a new mail in gmail, it simply displays the emails with matching prefixes. For this purpose trie is a good solution. i too think that was the intention of the question

- Vib December 28, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Here's a simple approach. For each email sent, we note the recipients. Suppose the recipients are A,B,C. We then consider all the pairs among them (order matters). In this case the pairs are (A, B) (B, A) (A, C) (C, A) (B, C) and (C, B). We then use a hashtable (call it ht) to record these pairs. For pair (A, B), we first set ht[A][B] to 0 if this value doesn't already exist. Then we increment ht[A][B] by one. Similarly for all the other pairs.

Now suppose the user is composing an email to recipients A & B. We then examine all the values in ht[A] and ht[B]. Each value will basically be a (recipient, count) pair such as (C, 3) or (D, 5). We then combine pairs with the same recipients, like (C, 3) and (C, 2) into (C, 5). Finally, we sort these pairs by the count and simply suggest say the top 5 recipients.

This is quite inefficient and also treats an email with recipients A, B, C the same as 3 emails with recipients A & B and A & C and B & C, so it discounts the significance of A, B, C occurring together in that email.

- Sunny November 15, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Weighted connected graph.

Suppose there are n emails, the graph will have (n+1) nodes (+1 is for the sender's email). Initially all the edges have weight 0 except the ones connected to the root (which is the sender's email).

First we need to "train" the data (like in machine learning) or in simple words build the graph. For the first few emails, we keep incrementing the weights and after the training is done we will be suggesting the emails to add.

For example: Suppose we have 4 emails then the connected graph will have 5 nodes with the following weights

Root	N1	N2	N3	N4
Root		1	1	1	1	1
N1		1	0	0	0	0
N2		1	0	0	0	0
N3		1	0	0	0	0
N4		1	0	0	0	0

Sends email to N1; Sends email to N3 (2 different emails)
	  Root	N1	N2	N3	N4
Root		1	2	1	2	1
N1		2	0	0	0	0
N2		1	0	0	0	0
N3		2	0	0	0	0
N4		1	0	0	0	0

Sends email to N1, N2 (same email)

	  Root	N1	N2	N3	N4
Root		1	3	2	2	1
N1		3	0	1	0	0
N2		2	1	0	0	0
N3		2	0	0	0	0
N4		1	0	0	0	0

Sends email to N1, N2, N3

	 	Root	N1	N2	N3	N4
Root		1	4	3	3	1
N1		4	0	2	1	0
N2		3	2	0	1	0
N3		3	1	1	0	0
N4		1	0	0	0	0

Coming to predicting/training. When the sender wants to send email to N1. You will see what N1 is connected to and suggest the highest weighted Node (in this case N2) and then N3 until there are no more nodes. And again you will add weights to the corresponding nodes.

- Vijay November 15, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

won't there be n^2 nodes. in which every node is connected to every other node. unless we represent it as a matrix.

- adi November 18, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

My approach would like this: Each E-mail is represented as a node in a graph. When a new E-mail is encountered, it is added to the graph. After an E-mail is submitted to be sent, all of the recipients addresses are connected to each other in the graph via edges.

It is a good idea to assign a weight to these edges - let's say for each iteration of the above process the edges are strengthened by 1. Every time an E-mail is sent to a node (recipient), we will also check which connected nodes were NOT E-mailed, and decrement the strength of those edges by say 0.5 - this prevents situations where we keep recommending E-mails to A + B when the user rarely E-mails A+B together. We should also have a maximum limit to the edge weight, say 3.

Now, for the situations where we have A+B+C and we want to remember and suggest this association - when the user enters A+B, we can see that A,C are connected and B,C are connected. For each connection of every recipient, put the associated nodes in a bucket (say, a hash map) and aggregate the weights of the edges (in this case C gets stronger since it occurs twice). Now we iterate over the map and suggest the nodes with the highest value.

- RG November 15, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think the simplies solution for this is a dictionary of dictionaries. We need to quickly find relations for a given key (email), so dictionary with key by email. We might have selveral relations and we need to store weight for each of them, so another dictionary.
I've created a small sample in C#:

public class RelatedContacts
    {
        private readonly Dictionary<string, Dictionary<string, int>> _storage;

        public RelatedContacts (IEnumerable<string> emailRecipients)
	    {
            _storage = new Dictionary<string,Dictionary<string,int>>(StringComparer.InvariantCultureIgnoreCase);

            foreach (var recipients in emailRecipients)
            {
                var emails = recipients.Split(new[] { ';', ',' }, StringSplitOptions.RemoveEmptyEntries);
                if (emails.Length < 2) continue;

                for (var i = 0; i < emails.Length; i++)
                {
                    for (var j = i + 1; j < emails.Length; j++)
                    {
                        relate(_storage, emails[i].Trim(), emails[j].Trim());
                        relate(_storage, emails[j].Trim(), emails[i].Trim());
                    }
                }
            }
	    }

        private static void relate(Dictionary<string, Dictionary<string, int>> relations, string emailA, string emailB)
        {
            Dictionary<string, int> contacts;
            if (!relations.TryGetValue(emailA, out contacts)) contacts = new Dictionary<string, int>(StringComparer.InvariantCultureIgnoreCase);
            int weight = 0;
            contacts.TryGetValue(emailB, out weight);
            contacts[emailB] = weight + 1;
            relations[emailA] = contacts;
        }

        public IEnumerable<string> Get(string email)
        {
            Dictionary<string, int> contacts;
            if (!_storage.TryGetValue(email, out contacts)) return Enumerable.Empty<string>();
            return contacts.OrderBy((_) => _.Value).Select(_ => _.Key);
        }

        public IEnumerable<string> Get(IEnumerable<string> emails)
        {
            Dictionary<string, int> result = new Dictionary<string, int>();
            Dictionary<string, int> contacts;
            foreach (var email in emails)
            {
                if (!_storage.TryGetValue(email, out contacts)) continue;
                foreach (var pair in contacts)
                {
                    int weight = 0;
                    result.TryGetValue(pair.Key, out weight);
                    result[pair.Key] = weight + pair.Value;
                }
            }

            return result.OrderBy((_) => _.Value).Select(_ => _.Key);
        }
    }

- des.shulepov November 17, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Ok, probably it's too much code in my solution without explanation.
So, it's a directed weighted graph, represented as sparse matrix where vertex is email and for each vertex we store a list of adjacent vertexes with weights.
Data may be persisted as 2 files or tables:
1. list of vertexes: (email, int key) pairs
2. one to many relation of int vertex key to pair of another vertex key and weight

Some improvements might be suggested to my initial solution:
1. If the total number of distinct emails is big it's better to create a help dictionary to associate email with int key and use it to represent vertex in a graph (i.e. dictionary in the code above). Because it's cheaper to store int than pointer to string.
2. If average number of related contacts is not big, we could use a list to store adjacent vertexes. This trade off will be slower on adding related contacts, but will consume less memory.
3. Initial solution is done as immutable class, but it's not hard to add method for adding relations.

- des.shulepov November 18, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

HashMap<String,Histogram> contact2contacts.

Histograms interface:
void countContact(String contact);
List<String> getContacts() // in the order from higher count

On every send and receive email, for each pair of contacts in the same email, get Histogram form HashMap for first and count the second and vice versa.

- CT November 18, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think for each person, the contact list will not so large, so I think brute string match can handle this problem.

- suwei19870312 November 19, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

import java.util.*;
public class TrieNode {
	public static final int TOTALLINKS=26;
	public char Char;
	public HashMap<Integer,TrieNode> links;
	public List<String> texts;
}

import java.util.*;
public class Trie {
	TrieNode head;
	public Trie()
	{
		if ( head == null ) {
			head = constructNode(' ');
		}
	}
	public TrieNode constructNode(char character)
	{
		TrieNode aNode = new TrieNode();
		aNode.Char = character;
		aNode.links = new HashMap<>();
		aNode.texts = new ArrayList<>();
		return aNode;
	}
	public void dumpString(String name)	{
		String[] names = name.split(" ");
		for(String aname : names){
			injectIntoTrie(aname.toLowerCase(),name.toLowerCase());
		}
	}
	public void injectIntoTrie(String element,String text)	{
		TrieNode start = head;
		for(int i=0;i<element.length();i++){
			char character = element.charAt(i);
			if(start.links.get(character-'a') == null){
				System.out.println("Cannot find char - creating new " + character);
				start.links.put(character - 'a',constructNode(character));
			}
			else
			{
				System.out.println("Found char - moving on " + character);
			}
			start = start.links.get(character - 'a');
		}
		start.texts.add(text);
		System.out.println(String.format("%s added to %c",text,start.Char));
	}
	public void findTextSimilar(String prefix) {
		TrieNode start = head;
		for(int i=0;i<prefix.length();i++){
			char character = prefix.toLowerCase().charAt(i);
			start = start.links.get(character - 'a');
			if ( start == null ){
				break;
			}			
			System.out.println("Found link "+ start.Char);
		}
		for( String text : start.texts){
			System.out.println("Prefix String " + text);
		}
		traverseNode(start);
	}
	public void traverseNode(TrieNode start) {
		for(Integer i : start.links.keySet() ){
			System.out.println(String.format("%c",'a'+i));
			traverseNode(start.links.get(i));
		}
		for(String text : start.texts){
			System.out.println("Prefix String " + text);
		}
	}
	public static void main(String[] args)
	{
		Trie t = new Trie();
		t.dumpString("Naveen Venkatesana");
		t.dumpString("Naveeen Benkatesan");
		t.dumpString("Naven");
		t.dumpString("Venkatesan");
		t.findTextSimilar("ben");
	}
}

- Deepan Prabhu Babu November 24, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Not sure it's the best solution for storage, but what you could do is have a table of the form EmailID, From, To <-- this is only one person.

So say A sends first email to P1, P2 and P3,

then you store three records:
1, A, P1
1, A, P2
1, A, P3

as A sends more emails:

1, A, P1
1, A, P2
1, A, P3
2,A,P1
2,A,P3
3,A,P2
3,A,P4
3,A,P3
4,A,P2
4,A,P3

It would then be easy to get all the weights for each edge.

As for the Data Structure, I'd probably use an Adjacency List, since using a Matrix could be a lot more expensive memory-wise (suppose a 1000 contact list, you'd have 1M fields).

- Some Guy December 04, 2014 | Flag Reply

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews