Return a shortest prefix of <c

Google Interview Question for Software Engineer / Developers

3

of 3 votes

16
Answers
Return a shortest prefix of <code>word</code> that is <em>not</em> a prefix of any word in the <code>list</code>

e.g.
word: cat, it has 4 prefixes: “”, “c”, “ca”, “cat”
list: alpha, beta, cotton, delta, camera
Result is “cat”
- pk March 07, 2013 in United States | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 12 vote

1> build a prefix tree of the lists.
2> Go though the tree using the each character of the word in order until if hit a dead end,then it is the shortest prefix you wants. Otherwise the word is the prefix of some word in the list.

- chenlc626 March 07, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I think T = O(N*M) is the optimal we could achieve.

N = No. of strings in the list.
M = Length of word.

- Anonymous March 31, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Check the Barry Fruitman's answer, it is much easier to code.

- chetan.j9 April 03, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

since we are comparing the word list against one single word, the prefix tree is not that straightforward as the string representation.

- aimjwizards April 07, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 4 vote

We only need to compare the word against the beginning of each item in the list, keeping track of the longest match. The result is max + 1, unless the entire prefix matched some word, in which case we can exit early and there is no solution (return null).

The prefix "" can never be a result since it matches everything.

Complexity is O(N*m) since we iterate through the list once, comparing m characters (m is the length of word). This is the best we can do since we must compare the prefix against every word.

public class ShortestPrefix {
	
	private static String shortestPrefix(String[] list, String word) {
	
		int max = 0;
	
		// Iterate through the word list once
		for(String item : list) {

			if(item.length() < word.length())
				// Skip this item, it's shorter than word
				continue;
			
			int i = 0;
			for(i = 0; i < word.length(); i++) {
				
				// Compare char-by-char until there isn't a match
				if(item.charAt(i) != word.charAt(i))
					break;
			}
			if(i > max)
				// Update max to the longest prefix that matched at least one word
				max = i;
			
			if(max == word.length())
				// The entire prefix matched at least one word. Therefore there is
				// no prefix that isn't a prefix of any word
				return null;
		}
	
		// Return the length of the longest prefix that matched, plus 1
		return word.substring(0, max+1);	
	}



	public static void main(String[] args) {
		// Test cases
		String list[] = new String[] {"alpha", "beta", "cotton", "delta", "camera"};
//		String list[] = new String[] {"alpha", "beta", "cotton", "delta"};
//		String list[] = new String[] {"alpha", "beta", "delta"};
//		String list[] = new String[] {"catalog", "category", "catacomb"};
		String word = "cat";
	
		String result = shortestPrefix(list, word);
	}
}

- Barry Fruitman March 17, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

I think it shouldn't simply pass any word that has shorter length than the "word". For example, if there is "ca" in the "list", the answer should still return "cat" as the "shortest prefix that is NOT the prefix of any word". I use a while loop here to replace the inner for and remove the "continue" part:

while(item.charAt[i] == word.charAt[i]) {
    i++;
    if (i == item.length() && i < word.length())
        break;
    else if (i <= item.length() && i == word.length())
        return null;
    }

Let me know if I were wrong. Thanks!

- Anonymous April 08, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I agree on the previous command on not skipping words, also

I think we can improve the speed slight by doing something similar to BM string search.

Instead of

if(item.length() < word.length())
				// Skip this item, it's shorter than word
				continue;

			int i = 0;
			for(i = 0; i < word.length(); i++) {
				
				// Compare char-by-char until there isn't a match
				if(item.charAt(i) != word.charAt(i))
					break;
			}
			if(i > max)
				// Update max to the longest prefix that matched at least one word
				max = i;
			
			if(max == word.length())
				// The entire prefix matched at least one word. Therefore there is
				// no prefix that isn't a prefix of any word
				return null;

We can do

int i = 0; int max = 0;
			if(item.length() < max)
				// Skip this item, it's shorter than the current shortest prefix
				continue; // continue 1
				// Compare char at max position
			if(item.charAt(max) != word.charAt(max))
				continue; // continue 2
			}
			// if it does not match at position max
			for(i = 0; i < word.length(); i++) {
				
				// Compare char-by-char until there isn't a match
				if(item.charAt(i) != word.charAt(i))
					break; //break2
			}
			if(i > max)
				// Update max to the longest prefix that matched at least one word
				max = i;
			
			if(max == word.length())
				// The entire prefix matched at least one word. Therefore there is
				// no prefix that isn't a prefix of any word
				return null;

The reason why this will be (most likely significantly) faster is:

E.G: word = "catfood"
list = list: alpha, beta, cat, cotton, delta, camera
After alpha, max = 0, (it exit on continue 2) // 2 comp.
After beta, max = 0, (it exit on continue 2) // 2+2 comp.
After cat, max = 3, (it exit on break 2 and updates max) // 2+2+4 comp.
The rest are all 2 comparison.

The benefit of this small modification is that we don't have to find the number of character matches in front.
Let say after 3 words, the prefix is "abcdefg" and there is a long list left (most of the times, it won't match all 7 chars), we only need to compare once for most of them.

- Mo Lam April 23, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Why not insert the word(e.g. CCD) to that list, and sort it, then compare the one before CCD and the one after CCD? O(nlgn)

- stevenh47 April 03, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think this should do it.
Iterate through the input string character by character. Maintain a set which stores the indices of all the strings in the array of strings to check against. For every character, check if the corresponding character in all the remaining strings is the same or not. If it is not the same, or if the string has ended, remove that string's index from the set. Otherwise keep it. If at the end of checking through the array there are no more elements in the set, it means we are past all the string-prefixes and can hence return a substring containing the chars from 0 to charAt. If we run through all characters and have at least one string that always matches, after the loop we return null because no substring that is not a substring exists. Code below.

import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;


public class FindShortestPrefix {
//
//	Return a shortest prefix of <code>word</code> that is <em>not</em> a prefix of any word in the <code>list</code> 
//
//	e.g. 
//	word: cat, it has 4 prefixes: “”, “c”, “ca”, “cat” 
//	list: alpha, beta, cotton, delta, camera 
//	Result is “cat”
	
	public static void main(String[] args) 
	{
		{
			String str = "cat";
			String [] strings = {"alpha", "beta", "cotton", "delta", "camera"};
			System.out.println(findShortestPrefix(str, strings));
		}		
		
		{
			String str = "abcdefghijk";
			String [] strings = {"abcxxxx", "xxxxxx", "abcdxxxxx", "abxxxxx"};
			System.out.println(findShortestPrefix(str, strings));
		}
	}

	public static String findShortestPrefix(String str, String [] args)
	{
		Set <Integer> stringsLeft = new HashSet();
		for(int i = 0; i  < args.length; i++)
		{
			stringsLeft.add(i);
		}
		for(int charAt = 0; charAt < str.length(); charAt++)
		{
			Iterator<Integer> it = stringsLeft.iterator();
			while(it.hasNext())
			{
				int strAt = it.next();
				if(args[strAt].length() <= charAt)
				{
					it.remove();
					continue;
				}
				else
				{
//					System.out.print("Testing " + args[strAt].charAt(charAt) + " != " + str.charAt(charAt) + " ");
//					System.out.println(args[strAt].charAt(charAt) != str.charAt(charAt));
					if(args[strAt].charAt(charAt) != str.charAt(charAt))
					{
						it.remove();
						continue;						
					}
				}				
			}
			//System.out.println("Strings left )
			if(stringsLeft.size() == 0)
			{
				return str.substring(0, charAt+1);
			}
		}
		return null;
	}
}

- Rax March 08, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Complexity should be O(N) where N all the characters in the strings added up, with the worst case being where the first string is the same string as all the strings in the array. I don't think you can do any better than that complexity wise.

- Rax March 08, 2013 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

wrong!

- Anonymous March 31, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

package org.com;

import java.util.ArrayList;

public class SuffixChecking {
	
	private String first;
	private ArrayList<String> list = null;
	
	public SuffixChecking(String a, ArrayList<String> l){
		first = a;
		list = l;
	}

	public void generateSuffix(){
		String suffix = null;
		boolean found = false;
		for(int i=0; i<first.length(); i++){
			System.out.println(first.substring(0, i+1));
			suffix = first.substring(0, i+1);
			found = false;
			for(String a : list){
				if(a.startsWith(suffix)){
					System.out.println("suffix " + suffix + " is in " + a);
					found = true;
					break;
				}				
			}
			
			if(!found){
				System.out.println("Shortes prefix " + suffix);
				break;
			}
		}
		
		if(found){
			System.out.println("No suffix ");
		}
	}
}

This O(N2) in worst case only when for each suffix., you have to search entire list before match found. Otherwise pruning will be veyr much effective..

- nil_dream March 08, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

public class ShortestPrefix {

	String[] dict;

	public ShortestPrefix(String[] dict) {
		this.dict = dict;
	}

	public String unusedShortestPrefix(String prefix) {
		if (prefix == null || prefix.isEmpty()) {
			return null;
		}

		// add +1 more to the size of array to
		// allow returning null when we have used
		// all the prefixes
		
		String [] prefixes = new String[prefix.length()+1];
		
		// marker to keep track of next available prefixes in the
		// array as we may use this to skip discarded prefixes
		//
		int prefixStart = 0;
		
		
		// prepare all the possible prefixes
		for (int i = 0; i < prefix.length(); i++) {
			prefixes[i] = prefix.substring(0, i+1);
		}
		
		for (int i = 0; i < dict.length; i++) {
			String word = dict[i];
	
			for(int j = prefixStart; j < prefixes.length-1; j++) {
				if (!word.startsWith(prefixes[j])) {
					// ignore
					break;
				}
				
				// current prefix matched
				// discard the prefix and update the prefix start index
				prefixStart++;
				
				
				// continue to check with the next prefix
			}
			
			// all prefixes are used - no unmatched prefix
			// abort
			if (prefixStart == prefixes.length) {
				break;
			}
		}

		// the first unmatched prefix or no unmatched prefix [ null ]
		return (prefixes[prefixStart]); 
	}
	
	

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		String[] dict = { "alpha",  "beta", "cotton", "delta", "camera" };

		ShortestPrefix sp = new ShortestPrefix(dict);

		String prefixString = "cat";

		String shortestPrefix = sp.unusedShortestPrefix(prefixString);

		System.out.println(shortestPrefix);
	}

}

- Raj Srinivasan March 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@stevenh47 Your solution may work but it will more complex than O(n*m) unless m is very large. Also, sorting a string list takes longer than O(n log n) because string comparisons take longer than O(1) .

- Barry Fruitman April 03, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 votes

yeah, so if it was me, I would chat with interviewer about this.

- stevenh47 April 04, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Can increase prefix length on-the-go as we iterate over the dictionary ...

import java.util.Arrays;
import java.util.List;


public class ShorestPrefix {

	public static String getShortestPrefix(List<String> dict, String word){
		int wlen = word.length();
		if(dict.isEmpty()) return "";
		if(wlen == 0) return null;
		String curPrefix = word.substring(0,1);
		for(String dword: dict){
			while(dword.startsWith(curPrefix)){
				if(curPrefix.length() >= wlen) return null;
				curPrefix += word.charAt(curPrefix.length());
			}			
		}
		return curPrefix;
	}
	
	public static void main(String[] args) {
		String [] dictArr = { "alpha", "beta", "cotton", "delta", "camera" };
		String word = "cat";
		List<String> dict = Arrays.asList(dictArr);
		System.out.println(getShortestPrefix(dict, word));
		
		
	}
	
}

- konst May 12, 2014 | Flag Reply

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews