Google Interview Question
Country: United States
Hashing. The big URL will the input to your hashing function. Maintain multiple buckets (partitioning) to achieve scalability. Length of the url may be an attribute to partition it.
Posting a possible solution to the problem to get comment and feedback. Solution uses an Hashmap to store the real URL and the tiny URL where the tiny URL is the key and the real URL is the value. The tiny URL is generated by counting through characters the same way you count through numbers. A single character can give you 256 entries.
static final String fixedURLPart = "fixedURLPart";
static Hashmap<String,String> urlMap = new Hashmap<String, String>;
// Generates the Tiny URL
String generateTinyURL(String inURL){
String tinyURL = fixedURLPart + getNextString();
urlMap.put( tinyURL, inURL );
return(tinyURL);
}
// Returns the URL
String getRealURL(String tinyURL){
return(urlMap.get(tinyURL));
}
// Should be able to generate 2,560,000,000 tiny URLs with this.
// This will take up 9.54MB. Can increase number to increase scale.
static char[] charList = new char[10000000];
static int i = 0;
// Generates the unique string for each tiny URL
String getNextString(){
if(charList = null){
charList = new char[1];
charList[0] = (char)0;
} else {
int savedi = i;
while(i < charList.length){
if(charList[i] < 255)
charList[i] = (char)((int)charList[i] + 1);
break;
else{
charList[i] = (char)0;
i++;
}
}
i = savedi;
}
return(new String(charList));
}
That is not a critic's job. But, anyway, since you asked...
Try answering the following questions:
How will the user actually use the tiny url?
How many users can you sustain at a time?
How easily can you scale with an increasing user base?
What happens if the machine running the above piece of code crashes?
What happens if you reboot the machine, and the user then decides to use the tinyurl?
Would you restart generation of the tinyurls, after the process is restarted?
This is a scalable system design question (multiple machines, failure scenarios, persisting data reliably etc).
You cannot just answer with code like this.
Thanks for the feedback. This is good to know. My solution only focuses on implementing the core URL generation mechanism which was my initial interpretation of the question. I agree it does not answer the scalability question.
Even without considering scalability, you are doing nothing about persistence (like your machine reboots etc), not considering possible multi-threaded issues etc.
Also, the ids you generate are completely predictable, which may or may not be a good thing.
And once scalability comes into picture, your code will probably have to be thrown out, as you cannot expect to run that code on all machines.
This is more like a design problem other than a coding problem. I guess the key points to cover is
- nixfish October 05, 20121. web servers to handle URL shortening request and to handle URL access request
2. database to store the mapping from URL to tiny url
3. caching to reduce the response delay and the hit on database
4. capacity: how much traffic is supposed to be served on average, at peak?
5. scalability: how to survive huge amount of requests?
6. availability: how to survive any system failure: server crash, database crash, cache crash
7. security: how to handle malicious requests?
Seems a lot to discuss here