Google Interview Question
Software Engineer in Testsyou need to know user's geography
based on geography, there is a banned list of words which engine should not suggest like in Germany Naz*.
as user is typing, engine knows the history of user's past visited sites.
keep a dictionary of those and show all matching given set of characters typed forming begining of the text. as user types more text, keeps moving down this dictionary and showing subset of choices.
We need to test how well the ranking can predict which query a user will make. A good input data set would be the N most common search queries from Google users in the past (where N depends on system constraints and queries are from the same region/language), and the frequency of each word can be stored in a key-value pair database. The actual test would select each query, then add characters from the query one at a time and obtain the current ranking of the actual query by calling the query completion module function with the current substring until the highest-ranking completed query corresponds to the actual query or there are no more characters to add. Results for each character of the query can be quantified by checking if that character was part of any completed queries, then awarding scores proportionally to how high of a ranking the correct query has for the substring ending with the character. Next the scores for each character can be summed to get the score for a particular query, then the frequency of a query can be multiplied with the score to get a weighted value. The module would pass the test if the sum of all weighted values is above a certain threshold. My motivation for this test is that the weighted values should correspond to the average reduction in characters that the user would have to type assuming that he/she always uses the completion suggestions.
Search in other search engines and compare if results make sense :-)
- Messi April 18, 2010