Amazon Google Interview Question
SDE1s Software Engineer / Developers SDE-2sCountry: India
Interview Type: Phone Interview
Example for Alphabet C, A, B:
Imagine this is your dictionary below.
1. C
2. CAC
3. CB
4. BCC
5. BA
These are the steps based on the example above.
1. Compare line 1 and line 2. Create vertices C and A
2. Compare line 2 and 3. Create vertex B. Create Edge E = A->B, meaning A comes before B
3. Compare line 3 and 4. Add edge C->B meaning C comes before B
4. Compare line 4 and line 5: Add edge C->A
This yields to a DAG below.
C->B
C->A
A->B
C->A->B
| ^
|---------|
and the topological sort will give you:
C,B,A
As you can see, you cannot determine the correct order of the alphabet, if there is a directed cycle in the graph. So the dictionary given must generate a DAG to find a solution.
@oOZz - is there a way to determine that you've processed enough of the dictionary to generate the alphabet order? In an English dictionary, it may be enough to just process the a's to get enough info for a proper ordering. Not that this is a process that needs to be completely optimized since you will probably run it once per language.
How would your solution handle the case where there isn't enough info? Say the letter f only appears in three words in this strange language and those words may or may not give us any ordering info at all.
@oOZz - I know this is a pretty old post, but if anyone stumbles across this (like I did), it should be known that the resulting alphabet of the example is incorrect. It should be C, A, B. As lines 2 and 3 have A coming before B and in lines 4 and 5, C is said to come before A. Therefore, the sequence would have to be C, A, B.
I go on building pairs of characters(defining order) from each pair of strings, until I get 25 pairs with different first character(as 26 characters in total), according to this algo -- the first non-matching character from the given two strings defines an order between those unmatching characters. Ex: if str1=KPLLKDE, str2=KPLJKDE, in dictionary str1 appears before str2, L<J. So we add a pair L-J.
At this point, I update two counts. If this is the first time L is found on left side of a pair, numOfOutgoings++.
If this is the first time J is on right side of a pair, numOfIncomings++;
Continue this process till we have numOfIncomings == 25 && numOfOutgoings==25.
Now I build a graph as below -
First 26 nodes with 1 for each char. And if my pairs are {sk, kp, pl, lm etc} in my graph these edges exist s->k, k->p, p->l, l->m etc.
At this point I have a graph with 26 nodes, with at least 25 edges defining the order between some 2 chcaracters covering enough orders to define complete ordering.
Now do the topological sort on these nodes, and you have the order of the new alphabet.
Correction: "until I get 25 pairs", should be "until I get numOfIncomings=25 & numOfOutgoings=25" as explained in later statements.
Also, it is true that graph has enough info to give required info. But any topological sort may not necessarily give the answer.
Hi,
I dont know whether i am clear with the question. My Answer is,
you just want to find out the order of Alphabets.
-> Take the first string and in that take out the first Alphabet, this represents the first alphabet.
-> Select second string, such that 1st alphabet should be different, than the previous, which represents second alphabet in the series.
EX:
Let dictionary be
Gun
God
Graveyard
Money
Maintain
-> here 1st alphabet is g, in the second string check whether 1st alphabet == g, if yes, skip else store, which represents Second alphabet.
-> Hope this solution can work. Now once u get all the alphabets u can store in a graph or linked list.
Seems to be right.. but in case if there is no word starting from a particular letter, you might miss it in the sequence...
In Dictionary is there any word, starting with special characters means #, ?.. etc. If it starts like that also, it will store the first alphabet, as i am checking with letters which we got previously. here checking 'm' with 'g'.
@prashanthreddy: question clearly states that characters will be of english alphabet, so we can assume, you wont get special characters.
@sgarg: this solution should be extended to resolve this issue.
1) We prepare a new ordering list of alphabets based on 1st character of new dictionary. Also will keep count for number of characters found.
2) If number of characters are less than 25, then we should continue the same process for second alphabet. This time for only those string which has same matching first character.
3) During step 2, we may found new characters which must be in between of existing list, so list should be updated accordingly.
4) If after 2 & 3 steps, we still lacks required number of characters, then same 2 & 3 steps should be repeated but this time for 3 character of strings whose 1st two characters are same.
Repeatedly doing this until we get all characters.
class AlphabetOrderFinder
{
private char[] find(String[] orderStrings)
{
Digraph g = new Digraph(26);
int j = 0;
for (int i = 0; i < orderStrings.Length - 1; i++)
{
string high = orderStrings[i + 1];
string low = orderStrings[i];
int minLength = Math.Min(low.Length, high.Length);
for (j = 0; j < minLength; j++)
{
if (low[j] == high[j])
continue;
if (low[j] != high[j])
g.addEdge(get(low[j]), get(high[j]));
}
}
DirectedCycle cycle = new DirectedCycle(g);
if (cycle.hasCycle()) {
foreach (var c in cycle.getCycle())
Console.Write(c + " -> ");
Console.WriteLine();
return null;
}
List<char> orders = new List<char>();
DFSDiagraph dfs = new DFSDiagraph(g);
foreach (var v in dfs.sort()) {
orders.Add(get(v));
}
return orders.ToArray();
}
private int get(char c) {
if (!Char.IsUpper(c))
c = Char.ToUpper(c);
return c - 'A';
}
public char get(int c) {
return (char)(c + 'A');
}
public void test()
{
const string NEW_ORDER = "HIJKLMNOPQRSTUVWXYZABCDEFG";
String[] newlyOrdered = {
"HACD",
"HIEF",
"IBEA",
"ID",
"JAC",
"K",
"KB",
"KC",
"KD",
"KE",
"KF",
"KG",
"L",
"LL",
"LM",
"LN",
"LO",
"LP",
"LQ",
"LR",
"LS",
"LT",
"LU",
"LV",
"LW",
"LX",
"LY",
"LZ",
"Z",
"A",
"B"
};
char[] orders = find(newlyOrdered);
String printRes = String.Join<char>(" -> ", orders);
String res = String.Join<char>("", orders);
Console.WriteLine(printRes);
AssertHelper.areEqual(NEW_ORDER, res);
}
}
There is a cycle in here and hence it is an invalid DAG.
"HACD",
"HIEF",
"ID",
"JAC",
"K",
"KG",
"L",
Order of the strings above suggest that a -> i -> j -> k -> l
"LZ",
"Z",
"A"
This order suggests that l -> z -> a which is contradictory to the order deduced above. This is leading to a loop - a -> i -> j -> k -> l -> z -> a
Build a directed graph according to the relationship of sorted strings.
Also add a directed cycle detect logic to do the input detection in case the input is not correct.
Then use a topological sort to get the sorted chars, it is basically a depth-first-search, then use a stack to reverse, which will get the ordered output.
What about another approach? What if we just sort the dictionary of new alphabet words by length and hopefully find a very small set of words with a very long length. For instance, en.wikipedia.org/wiki/Longest_word_in_English has the longest words. If we can find just one word that is unique in length, then we can easily map it back from the scrambled to the original alphabet. If we have a few, well, then we could then go and try each of them out exhaustively.
import java.util.HashMap;
import java.util.Map;
public class AlphabetOrder {
static class Pos {
public Pos(int start, int end) {
this.start = start;
this.end = end;
}
private int start, end;
public void setStart(int start) {
this.start = start;
}
public void setEnd(int end) {
this.end = end;
}
public int getStart() {
return start;
}
public int getEnd() {
return end;
}
}
private static String getChar(String word) {
return word.substring(0, 1);
}
private static boolean put(Map<String, Pos> alp, String ch, int pos) {
boolean insert = false;
if (alp.containsKey(ch)) {
Pos p = alp.get(ch);
if (p.getStart() > pos) {
p.setStart(pos);
}
if (p.getEnd() < pos) {
p.setEnd(pos);
}
} else {
alp.put(ch, new Pos(pos, pos));
insert = true;
}
return insert;
}
private static void calculate(String[] dictionary, int start, int end,
Map<String, Pos> alp) {
if (end - start == 1)
return;
int mid = (start + end) / 2;
String ch = getChar(dictionary[mid]);
boolean insertFlag = put(alp, ch, mid);
if (insertFlag) {
calculate(dictionary, start, mid, alp);
calculate(dictionary, mid, end, alp);
} else {
if (start < alp.get(ch).getStart()) {
calculate(dictionary, start, alp.get(ch).getStart(), alp);
}
if (end > alp.get(ch).getEnd()) {
calculate(dictionary, alp.get(ch).getEnd(), end, alp);
}
}
}
public static void main(String... args) {
String[] dictionary = { "plenty", "poem", "pink", "pink", "pink",
"pink", "pink", "pink", "pink", "pink", "pink", "pink", "pink",
"pink", "pink", "about", "and", "computer", "college", "cat",
"june", "eight" };
// String[]
// dictionary={"plenty","poem","pink","mood","mango","mike","is","ink","ilets","in","table","type","toy","tv","apple","about","and","computer","college","cat","june","eight"};
Map<String, Pos> alp = new HashMap<String, Pos>();
int i = 0;
int j = dictionary.length - 1;
put(alp, getChar(dictionary[i]), i);
put(alp, getChar(dictionary[j]), j);
calculate(dictionary, i, j, alp);
System.out.println(alp.keySet());
System.out.println(alp.keySet().size());
}
}
Another approach for this is you maintain one bool array of 26 elements(assuming new language alphabet can be represented with ASCII char set). You traverse the first word's characters and set the right position in bool, if its not set, it means it's a unique char, Add it to one List collection. If its already set, don't add it. Once you list collection size is 26, it means you discovered all character sequences.
Why not recursively:
Take the dictionary, first word, ok starting letter is [a].
Last world, starting letter is [z].
World at half the dictionary, starting letter is [X].
([X]!=[a] && [X]!=[z]) then call again the function for 2 separate dictionaries: [a-x] and [x-z].
Of course the dictionary could split in one half only not 2, if [X]=[a] or [X]=[z]. And so on.
Yeah, exactly. The best way I guess.
(I proposed the recursive version), forgot to put the name :)
But it took me longer than 15 minutes, 'cause I tried an approach with tree sets, to split the array of word node by node... but that is lot more complex. And useless.
An array and 2 indexes. Easy and straightforward.
sadly, no link allowed, so no solution from ideone. :(
//the recursive part
public void split(int a, int z) {
iterations++;
int offset = a;
int split = z - a;
if (vocabulary[a].toLowerCase().charAt(0) == vocabulary[z].toLowerCase().charAt(0)) {
if (letterOrder.indexOf(vocabulary[a].toLowerCase().charAt(0))==-1) {
letterOrder = letterOrder + vocabulary[a].toLowerCase().charAt(0);
}
return;
} else {
split(a, offset + split / 2);
split(offset + split / 2 + 1, z);
}
}
How about this,
Take first word of dictionary. Add its first letter in a linked list. Lets call this linked list Alphabets.
Take next word of dictionary. Find the index where this word is differnent from the last word. If the letter of difference is not in list already, insert it in the list after the position of letter in last word at the same index.
For ex: let the last word be 'uiodl' and the current word is 'uiydk' or 'uiydl'. then the new letter would be 'y'. Now as it replaces the 'o' in last word. 'y' should come after 'o' in the Alphabet. Now insert letter 'y' in Alphabet linked list after 'o'.
Repeat this for rest or the words.
That seems to be slower than the recursive solution.
Still, the cost-efficient recursive version has one problem: if the dictionary does not contain any letter starting with one letter, said that "h", for example.
If you consider this particular condition, you will have to consider also the possibility of a letter contained in one word only.
If you include that as a possibility, there is no other way to deal with the problem than to scan it completely. Every word of it. In that case, which kind of storing you will use (string, linked list etc) doesn't do any difference, 'cause the expensive part of the problem will be the search.
1. Take every two consecutive words and find first difference letters in them
hat
hello
here we have the letters order a->e
For every letter of alphabet create an array where check what letters are before it.
It can be bit array. 4 bytes integer is enough for 26 letters alphabet.
26 letters * 4 bytes = 104 bytes memory.
2. Next step for every letter of alphabet count how much letters are before it. Here use array of ranks and recursion algorithm for counting not counted letters before.
int rank[26] = {-1,-1, ... -1};
int letters_before[26]; // created on 1st step
int GetRank(int letter) {
if (rank[letter] != -1)
return rank[letter];
int curr_rank = 0;
for (int i = 0; i < 26; i++) {
if (letters_before[letter] & (1 << i))
curr_rank = max(curr_rank, GetRank(i));
}
rank[letter] = curr_rank+1;
return rank[letter];
}
3. Last step sort 26 letters by rank.
big O( size_of_dictionary * size_of_word) and just 208 bytes memory!
import java.util.Arrays;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
class DictionaryCopmerator implements Comparator{
Map map = new HashMap();
DictionaryCopmerator()
{
map.put("c", 1);
map.put("b", 2);
map.put("a", 3);
map.put("t", 4);
}
@Override
public int compare(Object o1, Object o2) {
// TODO Auto-generated method stub
String s1 = (String)o1;
String s2 = (String)o2;
boolean b = true;
int i=0,j=0;
while(b)
{
String c1 = ""+s1.charAt(i);
String c2 = ""+s2.charAt(j);
Integer order1 = (Integer)map.get(c1);
Integer order2 = (Integer)map.get(c2);
if(order1>order2)
return 1;
if(order1==order2)
{
if(s1.length()>s2.length())
return 1;
else if(s1.length()<s2.length())
return -1;
return 0;
}
if(order1<order2)
return -1;
i++;
j++;
}
return 0;
}
}
public class NewDictionary {
public static void main(String[] args) {
String s[] ={"a","cat","c","bat","c"};
for (int i = 0; i < s.length; i++) {
System.out.println(s[i]);
}
System.out.println("\n");
Arrays.sort(s, new DictionaryCopmerator());
for (int i = 0; i < s.length; i++) {
System.out.println(s[i]);
}
}
}
r u sure your "sorted words" correct? " ntr " comes before "act" means, "n" is ahead of "a", but words " tan" shows "a" comes before "n".
Will someone please tell me how to infer "n" is head of "r"? In my opinion, the order of "n" and "r" can not be determined based on the given example.
Yeah seems like you can only conclude "r" comes before "c" because "arc" comes before "act". So I guess in this case both "n r c a t" and "r n c a t" are valid solutions, and I returning any one of them is fine? The question should indeed specify which solution to return when there are multiple...
if the order of some character can be determined, this character is not in final result.
public static ArrayList<Character> charOrder(String []words)
{
if(words == null || words.length == 0)
return null;
HashMap<Character, Character> relations =
new HashMap<Character, Character>();
for(int i=1; i<words.length; i++) {
String CurrRelat = geneRelation(words[i-1], words[i]);
if(CurrRelat != null) {
relations.put(CurrRelat.charAt(0), CurrRelat.charAt(1));
}
}
ArrayList<Character> order = new ArrayList<Character>();
Character front = words[0].charAt(0);
Character back = null;
order.add(front);
while((back = relations.get(front)) != null) {
relations.remove(front);
order.add(back);
front = back;
}
//may have undermined relation
return order;
}
public static String geneRelation(String w1, String w2)
{
if(w1 == null || w1.length() == 0 ||
w2 == null || w2.length() == 0)
return null;
int wordLen = w1.length() > w2.length() ? w2.length() : w1.length();
for(int charIdx=0; charIdx<wordLen; charIdx++) {
char front = w1.charAt(charIdx);
char back = w2.charAt(charIdx);
if(front != back)
return front+""+back;
}
return null;
}
Question about this code:
for(int i=1; i<words.length; i++) {
String CurrRelat = geneRelation(words[i-1], words[i]);
if(CurrRelat != null) {
relations.put(CurrRelat.charAt(0), CurrRelat.charAt(1));
}
}
Does it mean the new relation will replace the existing one if their keys (the first characters) are the the same? Ex: you have (a,b) relation in the map already. And then a new relation (a,c) coming up. Does (a,c) replace (a,b)?
How about you have 26 trees, one for each alphabet,
Properties of the tree: It has a root, and n children
Root occurs before all its children.
In English tree for A has 25 children, B has 24 Children. Similarly keep building the trees and stop when the number of children in all the trees is 25*(25+1)/2=325. Now order the roots based on the number of children you have.
This is naive implementation -- will not perform efficiently:
1. Check 2 words of alien dictionary at a time (eg: check word1 & word2, word2 & word3 and so on..)
2. Iterate each letter in both words, find differing letters. Eg: if the words are:
zxlihbrr
zxlialkdj
Then we know a comes after h.
3. Continue to store this 'comes after' information in a list datastructure until all word in the dictionary is visited
Edit: instead of a list, use a tree data structure. If the 'comes after' information we receieve is transitive, eg: x->u , u->q then we can simply create a tree x->u->q. But if it's branching, eg: x->u x->w, we don't know yet what's the precedence of u vs w. Temporarily add both as children of x until more information is found. The memory address of tree nodes can be indexed using a map to improve performance
We can start with the first character of the first word of the Alien's dictionary and go through all the words until the entire dictionary is exhausted. In each step we can search for the next word that starts with an new character (hence the new alphabet of this particular language) in 2^k and locate its exact location within the list of the words via searching between 2^(k-1) and 2^k.
The code is somewhat like this:
index=0;
for(i=0;i<n;i++)
{
c1=word[i].at(0);
NewLanguageCharList[index++]=c1;
k=0;
while(true)
{
c2=s[(1<<k)-1].at(0);
if(c1==c2)
continue;
else
{
b=1<<(k-1);
e=(1<<k)-1;
while(b<=e)
{
m=b+(e-b)/2;
if(s[m].at(0)==c1)
{
i=m;
break;
}
else if(s[m].at(0)<=c1)
b=m+1;
else
e=m-1;
}
}
}
}
there o(lg n) to find k, and O(lg n) to find index m. The for loop will take maximum of O(n). Hence, the complexity is O(n log n). Please correct me if there is a mistake in my interpretation.
Hi peeps, read through a few answers and couldn't find someone thinking the same way I did, could someone point out whether this is correct
Approach:
Avoid searching the entire dictionary (could be huge and not fit in memory
We could store dict info as Letter { char c; int first; int last; Letter * next}, where first and last is the indexes of the first and last words in the dict begins with the character.
Keep a linked list of Letter(s), and a HashMap<char c, Letter * node> to the letter of char c
Look at the first and last string in the dict, update Letters and Hashmap
Recursively look at (first, mid-point) and (mid-point, last) to update the linked list and hashmap
Pseudo code: will add later
I personally don't like this kind of interview questions. The reason is: even if you can resolve this problem through preparation, it doesn't mean you are a good software engineer. It is just NOT practical at all. Once you finished school for years, how would you remember topological sorting...prepare for it every time you look for a new job? what a waste of time! Oh well...
You can create a directed graph where vertices represent the characters in the language and the edges represent the order relationship, i.e., if there is an edge Eij, then character Vi comes before character Vj in the alphabet.
- oOZz June 10, 2013You need to process two consecutive words in the dictionary at a time to determine the precedence relationship of the characters, so it will take O(N * M) time to create the graph, where N is the number of words and M is the length of the words.
Then use topological sorting to get characters in the correct order in O(|V| + |E|) time