Google Interview Question
Software DevelopersCountry: United States
Interview Type: Phone Interview
public static void main(String[] args) {
Trie root = new Trie("");
add(root, "abc.pqr.google.com");
add(root, "pqr.google.com");
add(root, "pqr.google.net");
add(root, "yahoo.com");
add(root, "abc.yahoo.com");
display(root, "");
}
public static void display(Trie node, String str) {
if (node == null)
return;
if (node.next.isEmpty())
return;
List<Trie> next = node.next;
for (Trie trie : next) {
display(trie, str + "." +trie.word);
System.out.println(str + "." +trie.word + " - " + trie.count);
}
}
public static void add(Trie root, String str) {
String[] arr = str.split("\\.");
int n = arr.length;
int k = 0;
while (k < n) {
Trie node = root;
for (int i = k; i < n; i++) {
Trie t = null;
for (Trie tr : node.next){
if (tr.word.equals(arr[i])) {
t = tr;
break;
}
}
if (t == null) {
t = new Trie(arr[i]);
node.next.add(t);
}
t.count += 1;
node = t;
}
k++;
}
}
static class Trie {
String word;
List<Trie> next = new ArrayList<Trie>();
int count;
public Trie(String word) {
this.word = word;
}
}
Maybe this question is inspired by Google big table paper. But I am not sure what the question really is. Do you want all the urls ending in .com or ending in Google.com?
The question asks for substring, which could be "e.c" ... But then the fact those strings are urls doesn't matter.
Let's assume the list of URLs is huge, is updated rarely whereas there are many queries. Let's further assume the query asks for a common suffix like give me all urls ending in .com or give me all urls ending in "e.com"
In this case, we could revert the urls "com.google..." and sort them once. Then we could binary search into the sorted lists. That can be parallelized neatly (all steps).
That's basically what @Alex does, just he inverts the string when constructing the Trie and constructing the Trie is favorable in O-notation. Just keep in mind, Tries are difficult to scale for big data...
Set<String> set = new HashSet<String>();
set.add("abc.pqr.google.com");
set.add("pqr.google.com");
set.add("pqr.google.net");
set.add("yahoo.com");
set.add("abc.yahoo.com");
Iterator<String> itr = set.iterator();
String[] arr = new String[100];
List<String> ar = new ArrayList<String>();
Map<String, Integer> map = new HashMap<String, Integer>();
while (itr.hasNext()) {
arr = itr.next().split("\\.");
for (int i = 0; i < arr.length; i++) {
if(map.containsKey(arr[i]))
{
map.put(arr[i], map.get(arr[i])+1);
}
else{
map.put(arr[i], 1);
}
}
}
for(Entry<String, Integer> es: map.entrySet()){
System.out.println(es.getKey() +" " + es.getValue());
}
In java:
private void method(){
Set<String> set = new HashSet<String>();
set.add("abc.pqr.google.com");
set.add("pqr.google.com");
set.add("pqr.google.net");
set.add("yahoo.com");
set.add("abc.yahoo.com");
Iterator<String> itr = set.iterator();
String[] arr = new String[100];
List<String> ar = new ArrayList<String>();
Map<String, Integer> map = new HashMap<String, Integer>();
while (itr.hasNext()) {
arr = itr.next().split("\\.");
for (int i = 0; i < arr.length; i++) {
if(map.containsKey(arr[i]))
{
map.put(arr[i], map.get(arr[i])+1);
}
else{
map.put(arr[i], 1);
}
}
}
for(Entry<String, Integer> es: map.entrySet()){
System.out.println(es.getKey() +" " + es.getValue());
}
}
Hi ChrisK,
Thank you for your response.
Your points are pretty good; I realized that I should have asked him if "e.c" is a valid substring as well (in this context). From his example though, I feel as if he is splitting the strings on the basis of the dots.
As for the follow-up, I think we cannot leverage the existing trie structure from step 1:
root
/ \
.com .net
/ \ \
Google Yahoo Google
/ | |
pqr abc pqr
|
abc
As per my understanding, what he meant was, we would have to modify the tree if `abc` now became a parent of `pqr` (as per the follow up). This is what I could understand as per my discussion with him. That is when I thought if a hash table might be useful.
Not sure why "abc.pqr.google.com" and "pqr.abc.google.com" are not valid as URLs.
As far as I understand, in the first part we are looking for the number of times a particular suffix is present in the list of strings. For this, a trie should work (the code below).
And in the second part, it looks like we should find how many times an arbitrary string appears as a substring int the list of strings. Like we can query 'qr.goog'. In this case, for each input string, we can find all of the substrings it can have, and add all of those substrings into hash table: substring => count.
- Alex October 28, 2017