Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
I think T = O(N*M) is the optimal we could achieve.
N = No. of strings in the list.
M = Length of word.
We only need to compare the word against the beginning of each item in the list, keeping track of the longest match. The result is max + 1, unless the entire prefix matched some word, in which case we can exit early and there is no solution (return null).
The prefix "" can never be a result since it matches everything.
Complexity is O(N*m) since we iterate through the list once, comparing m characters (m is the length of word). This is the best we can do since we must compare the prefix against every word.
public class ShortestPrefix {
private static String shortestPrefix(String[] list, String word) {
int max = 0;
// Iterate through the word list once
for(String item : list) {
if(item.length() < word.length())
// Skip this item, it's shorter than word
continue;
int i = 0;
for(i = 0; i < word.length(); i++) {
// Compare char-by-char until there isn't a match
if(item.charAt(i) != word.charAt(i))
break;
}
if(i > max)
// Update max to the longest prefix that matched at least one word
max = i;
if(max == word.length())
// The entire prefix matched at least one word. Therefore there is
// no prefix that isn't a prefix of any word
return null;
}
// Return the length of the longest prefix that matched, plus 1
return word.substring(0, max+1);
}
public static void main(String[] args) {
// Test cases
String list[] = new String[] {"alpha", "beta", "cotton", "delta", "camera"};
// String list[] = new String[] {"alpha", "beta", "cotton", "delta"};
// String list[] = new String[] {"alpha", "beta", "delta"};
// String list[] = new String[] {"catalog", "category", "catacomb"};
String word = "cat";
String result = shortestPrefix(list, word);
}
}
I think it shouldn't simply pass any word that has shorter length than the "word". For example, if there is "ca" in the "list", the answer should still return "cat" as the "shortest prefix that is NOT the prefix of any word". I use a while loop here to replace the inner for and remove the "continue" part:
while(item.charAt[i] == word.charAt[i]) {
i++;
if (i == item.length() && i < word.length())
break;
else if (i <= item.length() && i == word.length())
return null;
}
Let me know if I were wrong. Thanks!
I agree on the previous command on not skipping words, also
I think we can improve the speed slight by doing something similar to BM string search.
Instead of
if(item.length() < word.length())
// Skip this item, it's shorter than word
continue;
int i = 0;
for(i = 0; i < word.length(); i++) {
// Compare char-by-char until there isn't a match
if(item.charAt(i) != word.charAt(i))
break;
}
if(i > max)
// Update max to the longest prefix that matched at least one word
max = i;
if(max == word.length())
// The entire prefix matched at least one word. Therefore there is
// no prefix that isn't a prefix of any word
return null;
We can do
int i = 0; int max = 0;
if(item.length() < max)
// Skip this item, it's shorter than the current shortest prefix
continue; // continue 1
// Compare char at max position
if(item.charAt(max) != word.charAt(max))
continue; // continue 2
}
// if it does not match at position max
for(i = 0; i < word.length(); i++) {
// Compare char-by-char until there isn't a match
if(item.charAt(i) != word.charAt(i))
break; //break2
}
if(i > max)
// Update max to the longest prefix that matched at least one word
max = i;
if(max == word.length())
// The entire prefix matched at least one word. Therefore there is
// no prefix that isn't a prefix of any word
return null;
The reason why this will be (most likely significantly) faster is:
E.G: word = "catfood"
list = list: alpha, beta, cat, cotton, delta, camera
After alpha, max = 0, (it exit on continue 2) // 2 comp.
After beta, max = 0, (it exit on continue 2) // 2+2 comp.
After cat, max = 3, (it exit on break 2 and updates max) // 2+2+4 comp.
The rest are all 2 comparison.
The benefit of this small modification is that we don't have to find the number of character matches in front.
Let say after 3 words, the prefix is "abcdefg" and there is a long list left (most of the times, it won't match all 7 chars), we only need to compare once for most of them.
I think this should do it.
Iterate through the input string character by character. Maintain a set which stores the indices of all the strings in the array of strings to check against. For every character, check if the corresponding character in all the remaining strings is the same or not. If it is not the same, or if the string has ended, remove that string's index from the set. Otherwise keep it. If at the end of checking through the array there are no more elements in the set, it means we are past all the string-prefixes and can hence return a substring containing the chars from 0 to charAt. If we run through all characters and have at least one string that always matches, after the loop we return null because no substring that is not a substring exists. Code below.
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
public class FindShortestPrefix {
//
// Return a shortest prefix of <code>word</code> that is <em>not</em> a prefix of any word in the <code>list</code>
//
// e.g.
// word: cat, it has 4 prefixes: “”, “c”, “ca”, “cat”
// list: alpha, beta, cotton, delta, camera
// Result is “cat”
public static void main(String[] args)
{
{
String str = "cat";
String [] strings = {"alpha", "beta", "cotton", "delta", "camera"};
System.out.println(findShortestPrefix(str, strings));
}
{
String str = "abcdefghijk";
String [] strings = {"abcxxxx", "xxxxxx", "abcdxxxxx", "abxxxxx"};
System.out.println(findShortestPrefix(str, strings));
}
}
public static String findShortestPrefix(String str, String [] args)
{
Set <Integer> stringsLeft = new HashSet();
for(int i = 0; i < args.length; i++)
{
stringsLeft.add(i);
}
for(int charAt = 0; charAt < str.length(); charAt++)
{
Iterator<Integer> it = stringsLeft.iterator();
while(it.hasNext())
{
int strAt = it.next();
if(args[strAt].length() <= charAt)
{
it.remove();
continue;
}
else
{
// System.out.print("Testing " + args[strAt].charAt(charAt) + " != " + str.charAt(charAt) + " ");
// System.out.println(args[strAt].charAt(charAt) != str.charAt(charAt));
if(args[strAt].charAt(charAt) != str.charAt(charAt))
{
it.remove();
continue;
}
}
}
//System.out.println("Strings left )
if(stringsLeft.size() == 0)
{
return str.substring(0, charAt+1);
}
}
return null;
}
}
Complexity should be O(N) where N all the characters in the strings added up, with the worst case being where the first string is the same string as all the strings in the array. I don't think you can do any better than that complexity wise.
package org.com;
import java.util.ArrayList;
public class SuffixChecking {
private String first;
private ArrayList<String> list = null;
public SuffixChecking(String a, ArrayList<String> l){
first = a;
list = l;
}
public void generateSuffix(){
String suffix = null;
boolean found = false;
for(int i=0; i<first.length(); i++){
System.out.println(first.substring(0, i+1));
suffix = first.substring(0, i+1);
found = false;
for(String a : list){
if(a.startsWith(suffix)){
System.out.println("suffix " + suffix + " is in " + a);
found = true;
break;
}
}
if(!found){
System.out.println("Shortes prefix " + suffix);
break;
}
}
if(found){
System.out.println("No suffix ");
}
}
}
This O(N2) in worst case only when for each suffix., you have to search entire list before match found. Otherwise pruning will be veyr much effective..
public class ShortestPrefix {
String[] dict;
public ShortestPrefix(String[] dict) {
this.dict = dict;
}
public String unusedShortestPrefix(String prefix) {
if (prefix == null || prefix.isEmpty()) {
return null;
}
// add +1 more to the size of array to
// allow returning null when we have used
// all the prefixes
String [] prefixes = new String[prefix.length()+1];
// marker to keep track of next available prefixes in the
// array as we may use this to skip discarded prefixes
//
int prefixStart = 0;
// prepare all the possible prefixes
for (int i = 0; i < prefix.length(); i++) {
prefixes[i] = prefix.substring(0, i+1);
}
for (int i = 0; i < dict.length; i++) {
String word = dict[i];
for(int j = prefixStart; j < prefixes.length-1; j++) {
if (!word.startsWith(prefixes[j])) {
// ignore
break;
}
// current prefix matched
// discard the prefix and update the prefix start index
prefixStart++;
// continue to check with the next prefix
}
// all prefixes are used - no unmatched prefix
// abort
if (prefixStart == prefixes.length) {
break;
}
}
// the first unmatched prefix or no unmatched prefix [ null ]
return (prefixes[prefixStart]);
}
/**
* @param args
*/
public static void main(String[] args) {
String[] dict = { "alpha", "beta", "cotton", "delta", "camera" };
ShortestPrefix sp = new ShortestPrefix(dict);
String prefixString = "cat";
String shortestPrefix = sp.unusedShortestPrefix(prefixString);
System.out.println(shortestPrefix);
}
}
@stevenh47 Your solution may work but it will more complex than O(n*m) unless m is very large. Also, sorting a string list takes longer than O(n log n) because string comparisons take longer than O(1) .
Can increase prefix length on-the-go as we iterate over the dictionary ...
import java.util.Arrays;
import java.util.List;
public class ShorestPrefix {
public static String getShortestPrefix(List<String> dict, String word){
int wlen = word.length();
if(dict.isEmpty()) return "";
if(wlen == 0) return null;
String curPrefix = word.substring(0,1);
for(String dword: dict){
while(dword.startsWith(curPrefix)){
if(curPrefix.length() >= wlen) return null;
curPrefix += word.charAt(curPrefix.length());
}
}
return curPrefix;
}
public static void main(String[] args) {
String [] dictArr = { "alpha", "beta", "cotton", "delta", "camera" };
String word = "cat";
List<String> dict = Arrays.asList(dictArr);
System.out.println(getShortestPrefix(dict, word));
}
}
1> build a prefix tree of the lists.
- chenlc626 March 07, 20132> Go though the tree using the each character of the word in order until if hit a dead end,then it is the shortest prefix you wants. Otherwise the word is the prefix of some word in the list.