Google Interview Question
Software Engineer / DevelopersCountry: Israel
Interview Type: In-Person
something like that:
std::string encode( int num )
{
static char alphabet[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
std::vector<char> sb;
const size_t base = sizeof(alphabet);
while ( num > 0 )
{
size_t index = num % base;
sb.push_back( alphabet[ index ] );
num /= base;
}
std::string result = alphabet;
std::reverse_copy( sb.begin(), sb.end(), result.begin() );
result[ sb.size() ] = '\0';
return result;
}
I am thinking about two big hashmaps.
One maps the long URL to the short one. The other maps the short one to the long one.
For each long URL, do the hash with SHA or something, if found in the map A, return the short URL. If not found, generate a new seq number (the latest + 1), convert this seq number to a string containing a-z0-9. For 6 char short URL, it supports 36^6 = 2.1 billion URL address. If adding chars like "_-", 38^6=3 billion
If there is one DB table with auto-increment ID for each new url, then it will become bottleneck, if millions of urls come as input simultaneously.
We could have N tables each having auto-increment ID and then partition the request coming may be on the num of characters or some such quick metric and map these to one of the N db tables.
These N tables would start their auto-increment from different start numbers.
This way we can parallelize the ID creation
While shortening the URL:
Store the Long URL in a DB table which has an auto-generated identity column (increments by 1 each record)
Next write an algorithm to convert the integer ID of this database record in to a base 62 number and then represent the number in base62:
where a-z are mapped to 0-25
A-Z are mapped to 26-51
and 0-9 are mapped to 52-61
essentially we want to say that the shortened URL would contain a-z; A-Z and/or 0-9 characters only
While retrieving the complete URL from Short URL; we need to decode the shortURL to get the DB ID out of it.
Example: if shortened URL was kkDN
k maps to 10 in our APPLICABLE_CHARS String
D maps to 29
and N maps to 39
then DB ID = 10 X 62^ 3 + 10 X 62^2 + 29 X 62 ^1 + 39 X 62 ^0
= 2383280 + 38440 + 1798 + 39
= 2423557
- lazycoder87 January 04, 2014