Nikhil
BAN USERIf one string maps to one hash value (kept as one bit), we need 4 billion bits to represent 4 billion strings. Compressing 8 bits in one byte, means 4 billion bits can be compressed to 0.5 billion bytes (i.e. 512 MB). This assumes that hashing scheme is ideal i.e. one string has exactly one hash in 0 to 2^32-1 range.
- Nikhil November 16, 2012You are right, can't use a single counter here. The problem with storing time stamps is memory consumption. A 27-bit tick value (each value corresponding to a millisecond during the day) for 1 billion entries alone will lead to ~3.5GB memory consumption. Another problem would be trying to keep the values sorted by time stamp. Insertions are expensive in data structures that provide this capability. If we don't maintain the sort order "upfront" on each insert, then 1bth search query becomes "expensive" since 1b elements need to be sorted. Secondly, how would we know that this condition has been reached?
One question that I would clarify from the interviewer is how reliable the system needs to be. For e.g. what happens if the server goes down? Do we still need to be able to tell the "real" 1bth user? Or do we restart? Or is "estimation" okay in these cases?
I like this solution. Assuming good hashing scheme, this can do 4 billion strings in 512MB memory (ideal case). This means ~64 billion strings in 8GB. If N becomes two large, a distributed hash table can be used to span the data across multiple servers. Another alternative could be to store the data on disk and maintain indexes for faster seeks and updates.
- Nikhil November 01, 2012This can be done using two linked lists. One denoting the longest increasing sequence "so far" (currentSeq). And another denoting the "next potential" increasing sequence (nextSeq). In the beginning, currentSeq = firstElementOnly and nextSeq = firstElementOnly. The moment nextSeq becomes longer than currentSeq, we do currentSeq = nextSeq. If an input element is encountered that is less than nextSeq.Tail, we do nextSeq = null. When the input ends, currentSeq will denote the longest sequence, and we just need to print it starting from currentSeq.Head.
- Nikhil November 01, 2012
Access can be time-limited to expire on a pre-set date
- Nikhil November 16, 2012