## Amazon Interview Question

Software Engineer / Developers**Country:**United States

**Interview Type:**In-Person

The max. no of inputs possible is 24*60*60(86400). Can we use the array with size 86400 and increment the array index by converting the entry and exit time to the number of seconds?

Anyways, with n people with entry time as 00:00:01 and exit as 23:59:59 will mean updating/traversing the entire array 86400*n times.

As the file has lots of entry and exits times it means N is large. what if N is 100k , then you have to traverse 86400*N. Not a good option!

Dude, I was only trying to optimize the space for the solution posted by Manan... Array with size 235959+1 is not necessary...

We have to parse the data at least once, so no choice. The whole point of pre-processing is to reduce lookup time and thats what we achieved.

I am sure that was the answer interviewer would be expecting. Thats why he specifically mentioned "he could call the function every second with input 11:05:20, 11:05:21,11:05:22, 11:05:23..........14:30:30".

@Manan: I'm actually pretty sure that was not the solution the interviewer was expecting. He probably said he could call it every second to draw attention to the fact that it gets called a lot, so it should run fast (say, logN time)

@Anon..Ben is right....to find the # of persons active at any time xx:yy:zz, just check the value of the array at index (xx*3600+yy*60+zz) and lookup for the value is constant time operation in an array when we have the index

Good solution, but the time complexity to set up this lookup table is O(n*86400) i.e. when all entries are 00:00:00 -> 23:59:59. It's expensive to set up but like you said, returns result in constant time. To sort will take O(nlogn), it'll take a astronomical number to make log(n) bigger than 86400, so sorting is actually cheaper in some cases.

@chris: the interviewer explicitly states that the method could get called very often. So, sorting is not cheaper than pre-processing and constant lookup. To be more precise, assume that we will call this function m times a day.

1) This algorithm: O(n*86400) + m* O(1)

2) Using sorting: O(m*n*logn)

Obviously, for even almost large values of n and m, the first algorithms works better

Do a merge sort.... It will take O(nlogn)

Then traverse along...

Before doing sorting tag each entry as exit and entry time.

While traversing For each entry add 1 and for each exit subtract 1. You will get your answer.

example:

entry time exit time

09:12:23 10:14:35

10:34:01 13:23:40

10:34:31 11:20:10

After Sorting:

09:12:23(entry)

10:14:35(exit)

10:34:01(entry)

10:34:31(entry)

11:20:10(exit)

13:23:40(exit)

So at any time e.g at 11:00:00 the number of persons will be :

+1

-1

+1

+1

= 2

for every call to this function to find number of intersections during that time you need to traverse full list right? so time complexity for every call is O(n)... I don't think this is acceptable one, O(log n) or O(1) would be better

Definitely not acceptable. Why bother sorting if you're going o scan the entire data anyway?

We can keep two arrays of sorted elements O(n lg n). One based on the entry time and one based on the exit time.

For the provided time find the index in the array using binary search in both the arrays O (lg n).

The answer is the number of people entered - people exited.

yeah, this is possible solution, but as given this is a huge file so array of this size may not be possible to allocate.

Dates and times can be well sorted using radix sort, whose time complexity is O(K*n) where K is maximum length of element of input string. Since K is less and fixed, we can sort the given array using almost O(n) time rather than merge sort. The rest solution is same as what I think.

Actually if the data is in a text log , we need to parse it to get the value.

So while parsing we can perform insertion sort and carry on the process.

We can use a self balanced BST (AVL types)

* Implement the self balancing on the entry time of each employee

* Once u obtain the tree search for the subtree where everyone enters the offc before input-time(let this be subtree have K entries)

* From the subtree obtained visit every node and find who left the offc before input-time(let this number be L)

* Ur answer should be ---> K - L

I don't see how this as a good answer let alone a perfect answer.

First, to build such a tree, it takes roughly O(nlogn) time which is the same as sort.

Then, it takes log(n)+n=n time to search e.g. search who's in office at 23:59:59 will make you traverse the whole tree.

Finally, to find a sub tree will take way longer than the O(1) time to find a sub-array or sub-list. Using a tree doesn't add any value, it actually adds overhead.

use an augmented red black tree, to build an interval tree. Each node maintains interval and max value apart from pointer.

Search for particular time can be performed in lg n time.

If there are say K occurences where interval overlaps, it will take O(K * lg n) time.

To find out a node where interval overlaps, it take lg n time.

Since there can be multiple nodes where interval overlaps it will take K lg n time where K is the number of nodes where interval overlaps.

Procedure to find out node:

1. Build red black tree

2. search for a node

3. if found delete that node and again search for other node

this is Output Sensitive Algorithm

Getting all the overlapping intervals in augmented red-black tree is O(min(n, Klgn)), without modification. You can't delete an element, otherwise next query would would not able to find the particular element. Please check the question where it clearly says that DB can be queried many times.

CalculatePresentPersons(N)

(1) Take an array of size 86400(60*60*24) Say array D

(2)Initialize whole array to zero.

(3)For Each a in file

-temp=a->start->hour*60*60+a->start->min*60+a->start->second;

-temp2=a->end->hour*60*60+a->end->min*60+a->end->second;

for i=temp to temp2

D[i]=D[i]+1;

end inner loop

end outer loop

(4)Return(D[N])

(5)Finish

P.S This procedure will be run once and then in O(1) time we can calculate the number of persons present in the hall.

I know Initial complexity is high but searching is O(1)

Read the file.

Store the data in an interval tree.

The interval is stored in seconds and represents the time spent in the room

( so need a function for converting HH:MM:SS to just S )

To compute the # of people in the room at time HH:MM:S

Convert the time to check to S

Get a list of all intervals that overlap the requested time.

the number of people in the room is equal to the size of this list.

IntervalTree provides a good balance between space and time here.

Can also be solved with a hashTable

Key is the time in seconds

Value is the number of people in the room at that time.

Time to populate the table is more expensive as is the memory usage --

but the lookup would be faster.

Use bucket sort at 2 levels

Preprocessing:

1. create 1 bucket for 1 hour.. so u have 24 buckets for login time..

2. for each login bucket created above .. create another 24 buckets for logout time..

you have total 24(login) + 24*24(logout) buckets in total

suppose given time is 2:15

you can forget about 24 login buckets and look into 576 logout buckets..

1. start looking from 2*24 + 3 = 51th bucket(all logout times greater than 2:15) + all items in 2*24 + 4,5,6 ... 24 buckets

suppose given time is 6:20

1. start looking from 6*24 + 7 = 99th bucket(all logout times greater than 2:15) + all items in 6*24 + 8,9,10..->24

if you forget the one time preprocessing step which is o(n)+ o(n), the actual time taken is directly proportional to the no of hits.

The entry time will be in ascending order. Assuming that the exit time can be in any order, i.e. the first person to enter is not the first one to exit.

Iterate over the indexes till the entryTime > inputTime. Increase count whenever entryTime < inputTime < exitTime

int CountPeople (Calendar inputTime) {

int peopleInside = 0;

BufferedReader br = new BufferedReader(new FileReader("filePath"));

String line;

try {

while ((line = br.readLine() )!=null) {

Calendar entryTime = Calendar.getInstance();

String entryTimeInString = line.subString(0, line.indexOf(" "));

//fill the time in entryTime using entryTimeInString

if (entryTime.after(inputTime) {

break;

}

Calendar exitTime = Calendar.getInstance();

String exitTimeInString = line.subString(line.indexOf(" ") + 1, line.length());

//fill the time in exitTime using exitTimeInString

if (exitTime.after(inputTime) {

peopleInside++;

}

} catch (IOExecption e) {

e.printStackTrace();

}

return peopleInside;

}

what if we are calling CountPeople() function multiple times. You are iterating the huge file multiple times. Is there any way to optimize this ?

If we were to call the function multiple times it would be better to create a BST based on the exit time (or entry time). Each node would keep the other value saved as well.

For the input time we can do a search for the input time and then count the number of people inside in the sub tree.

Since the file is large reading it each time would be a problem . We could store the number of people at a given time in a hash table. Once the table is built retrieval of number of persons at a given time would be O(1).

The size of the table would be 60*60*24. Generating such a hash table would be very costly. We can reduce the cost by maintaining a hashtable of the entries based on 15 mins or 30 mins. When the number of people are to be obtained then the values can be iterated to find the exact number.

Sort the file based on exit data... and then apply binary search on exit data, and find the

starting point i.e (exit data >= target time)

end point i.e (exit data <= target time)

All the intervals that lies between start and end point is the answer

Entry time is not required at all. Read in exit times. Now create a map (can implement it using a 2D array) with (exit time, no of people) as the "key, value" pair. Once this map is created, sort the map according to key (exit time). Now the problem reduces to adding values of all keys greater than input time, which gives you the total number of people in office at that time.

Entry Time will be required.

Reason:-Lets say a person enters at 11:15 AM and exits at 11:20 AM.Now, we call our function at 11:10 AM.

According to your algo this person will be counted which is wrong.

We can insert a check to see if input time was greater than input, if true don't add the value. This would require two maps to be created. One with (in time, exit time) and the other as mentioned above.

What will be the search complexity?

And lets say consecutively this same function with same arguments is called.Then it will result into recomputation of everything.

My answer is just above ur post.Plz go thorough it.If u like it then plz tell me..Thanks

Search complexity will be the same as the binary search. I am sorry, I can't comment on your solution. I am not getting it.

At the time of storage, restore the log file time into blocks, say 1pm to 2pm block or even

smaller blocks. Multiple hash tables can be used to quicken the access to these blocks.

So at time of retrieval only the entries inside the blocks (at most two blocks) need to be scanned through instead of all the entries.

I think interviewer want to check how do you solve the problem different ways. Here few approach come in my mind.

1. At first, Add all element into map depend on entry time is key. So, every time you search ppl present at perticular time. Search who enter before that time and compare exit time is not before it.

2. As interview says, it is huge file. It may possible above approach wont work as memory will not be sufficient to hold that much data. In That case, You can create multiple temporary files depending upon entry times slot (30mins/1 hour). When user try to find ppl between particular slot. Open files which has entry time before particular time, generate maps and search into it. It this case, LRU can be maintained for n maps depending upon memory and first search into LRU list and maps and then open into other files add those into LRU.

Drawback: if every time requested time different that LRU. It will less effective as time required to build the map is waste.

3. Create multiple files with (30min/1hr) time slots i.e 11.00-11:30, 11:30-12:00 etc. Add entries which fall into those time slots. Then, whenever there is request open file generate map and search into it. Maintain LRU for the same.

Drawback: if every time requested time different that LRU. It will less effective as time required to build the map is waste.

Create two arrays. One based on entry time say A and the next based on exit time say B. Sort both the arrays.

When a person wants to find out how many persons are there. He just has to perform a Binary search(This Binary search returns an index which is at most greater than or equal to the given value.) The index returned from A minus the index returned from B will be the answer.

i think insertion sort using link list will give better performance than merge sort and other method

I assume that data is already sorted by enter time.

1. Partition the data by the enter time creating hourly partitions (size of the partition can be configurable, as an extra optimization night time hours can be grouped into single partition). Ideally partition should fit in memory/

2. For each partition create an index of exit times: for each hour keep a)list of references that have exit time in this hour b)bitmap index representing items in the main list

c)interval tree index

3. When request comes query only partitions <=currenttime; inside of the partition query buckets >= currenttime and do more precise filtering by enter time

4. If resources enable distribute partitions to different machines and execute query in parallel.

1. Read file and populate two arrays: long[] personIn and long[] personOut. personIn represents first column of data file and personOut represent second column. Date must be converted to long!!!

2. Sort personOut in ascending order. Suppose that personIn array is already sorted in ascending order(Log file is written in ascending order).

3. Write function that will effectively finds a position for the long element in sorted array(use binary search algorithm): positionInAscSortedArray(long[] arr, long element).

4. calculate the result:

positionInAscSortedArray(personIn, timeAsLong) - positionInAscSortedArray(personOut, timeAsLong)

One solution that I can think of is to keep 24 nodes (1 for each hour). each of these 24 nodes will keep a list of 60 nodes each as children (1 node for each minute in an hour). and each of the minute nodes will again keep 60 nodes under it (1 for each second in a minute). So the internal representation will have 3 levels one each for hours, minutes and seconds. Each of the 24*60*60 nodes will have an counter variable indicating how many persons are there in office during that time.

Insertion:

========

The technique here is that we will increment the counter of the hour node only if the person is in office for the complete hour. Similarly we will increment the minute node of a particular hour only if the person is in office for the complete minute during that hour. Similarly we will increment the second counter of a minute only if the person is not inside for the complete minute but for few seconds within that minute.

.

Example: 09:55:50 - 15:05:05

in this case the data structure is updated as follows:

9-10: we will not increment the counter for hour 9 as we are not inside office for the complete hour. We will increment the 56, 57, 58, 59 and 60 minutes counter under hour 9. Similarly we will increment seconds counter for 50-60 under minutes 55. No other counter needs to be updated for this hour.

10-11 We will increment hour 10 node only as we are inside for the whole hour.

11-12 We will increment hour 10 node only as we are inside for the whole hour.

12-13 We will increment hour 10 node only as we are inside for the whole hour.

13-14 We will increment hour 10 node only as we are inside for the whole hour.

14-15 We will increment hour 10 node only as we are inside for the whole hour.

15-16 Again we are not inside for the whole hour so we will not increment the hour node but minutes and seconds nodes only as in the case of 9-10.

So using this technique we will at max need to change a constant number of nodes (22hour nodes + 59 minutes node and 59 seconds node) in the worst case for an employee which is not bad considering that we need to make the data structure only once and it makes the search operation constant time.

Search

=======

Now during searching if we want to search the number of employees at say 10:20:30 then we will go to node 10 of first level and get the value of the counter there. It will represent all the employees who were inside office for the whole hour from 10 to 11. Next we go to minute node 20 of second level which indicate the number of employees who were present for the whole minute during that hour. next we go to scond node 30 at third level and get the counter value there which indicate the employees who were inside during that second in the minute. Adding these 3 counters is our result.

So it just took us O(3) or constant time.

Space complexity:

==============

We need to store 24*60*60 nodes. The size of each node will be 4 bytes to store the counter(assuming Unsigned int on 32 bit m/c). We can avoid using a pointer to represent the next or child node by using some clever array arithmetic as the number of nodes are constant and known.

So the total requirement is 24*60*60*4 bytes = 337.5KB which is not at all large considering the search complexity we are achieving with this solution.

Nice algorithm, but one point is not clear, while inserting data in data structure as shown in example, why always counter for 10 is incremented, as far as i am concerned i think,

10-11 We will increment hour 10 node only as we are inside for the whole hour.

11-12 We will increment hour 10,11 node only as we are inside for the whole hour.

12-13 We will increment hour 10,11,12 node only as we are inside for the whole hour.

13-14 We will increment hour 10,11,12,13 node only as we are inside for the whole hour.

please suggest and clarify.

You are correct. it's a copy paste error :). what you are saying is exactly what should happen here.

How this solution is different from the array based solution proposed in by @Manan, where he is storing info in 24*60*60=86400 array?

The difference is in the way we store the information. In my solution you are storing the information in a tree like data structure and representing that tree as an array in memory. So if suppose a person is in office for 5 hours from 09:00:00 to 14:00:00 then in My solution you will need to update only 5 hour node values where as in @Manan solution you will need to update 5*60*60 = 18000 node values for every second in that time duration. So the insertion is almost constant time in my solution.

What I can think of is not much different from few of the above solutions but yes I think the look ups must not need calculations to be done. In problems which involve querying multiple times we generally talk about the cost of pre processing and querying.

Suppose a solution takes f[n] time to pre-process (which is a one time job) and g[n] time to query then we say that the complexity is <O(f[n]), O(g[n])>. In such solutions our main aim is to optimize g[n] as this is the part which will be called unlimited number of times.

So, let's start with the pre-processing steps:

1) Allocate an array of size 24 * 60 * 60 and initialize it with -1. ( this is the basic necessity to make g[n] a constant mostly 1)

2) Store 0 at the first index of the array.

3) Traverse the entry column of the log file and convert the time encountered into seconds, let it be s. Find the first non -1 entry in the array before s , let it be at index k. Add 1 to A[s] and populate the indices till k with 1+A[s].

4) Now traverse the exit column and convert the times into seconds and maintain a counter for number of exit times say count which is 0 at the beginning. Let the time be p so count++ and then A[p] = A[p]-count. do this till you get to the index of second exit time and so on. This will be done in two iterations. So I can say O(n).

Now, you can look up any time, by converting it into seconds and give them A[seconds].

As per the question the minimum time difference is 1 second

This method would suffice:

private static int convertIntoSeconds(String s) {

int hour = Integer.parseInt(s.substring(0, 2));

int minute = Integer.parseInt(s.substring(3, 5));

int second = Integer.parseInt(s.substring(6, 8));

return 3600 * hour + 60 * minute + second;

}

private static void preProcessEntry(String[] entry, String[] exit) {

int k = 0;

int lastCount = 0;

int j = 0;

for (String timeEntry : entry) {

k = convertIntoSeconds(timeEntry);

for (j = k - 1; j > 0; j--) {

if (a[j] != -1){

lastCount = a[j];

break;

}

}

for (; j < k - 1; j++) {

a[j] = lastCount;

}

a[j] = ++lastCount;

}

int k1 = convertIntoSeconds(exit[0]);

lastCount = 0;

for (int z = 1; z < exit.length; z++) {

k = convertIntoSeconds(exit[z]);

for (j = k1; j < k; j++) {

a[j] -= lastCount;

}

k1 = k;

lastCount++;

}

}

Assuming we have already converted entry time and exit time in seconds.

Take two arrays intime[] and outtime[], each of size 86400. Assuming intime is already sorted, Sort outtime in ascending order.

Check first outtime value, Let say its x. Count no. of intime entries before 'x'. and update 'x-1' entries with that count. count is no. of people inside at any particular time.

Update xth entry as count-1.

Repeat this again, check second outtime entry, and count no. of intime entries from (x+1)th second till (second_outime_entry-1). Update entries from x+1 to (second_outime_entry-1) with count at xth second + count till second_outtime entry. and so on

The data is static. The file is huge. How huge?

There are only ~86k discrete intervals possible.

Is there a sign-in sign-out pattern? Are these localized? Will we achieve any space optimization by using an interval tree?

If so, should we consider memoization on the fly instead of preprocessing everything.

Any optimization will have to consider the context and decide.

This problem is similar to counting the vacancies in a parking station. In the real world, you build the running total in real time rather than doing it at the end of the day because 1. it spreads out the cpu cost (it's almost negligible) and 2. it's more useful, wouldn't you want to find out how many people in the office any time of the day rather than reading the report next day knowing that the running total approach can generate the same report?

PAPP. If you can change the problem, you can save a lot of time in algorithm, program and process.

public class EntryExitCode {

int entryPointer = -1;

static int employeeCount = 0;

TreeMap<String, Integer> employeeCountAtAnyTime = null;

ArrayList<String> entryList = null;

PriorityQueue<String> exitQueue = null;

TimeComparator tc = null;

void createEntries(String file) {

tc = new TimeComparator();

employeeCountAtAnyTime = new TreeMap<String, Integer>();

entryList = new ArrayList<String>();

exitQueue = new PriorityQueue<String>();

String temp = null;

String[] tempArray = null;

FileReader reader = null;

BufferedReader brReader = null;

try {

reader = new FileReader(new File(file));

brReader = new BufferedReader(reader);

temp = brReader.readLine();

while (temp != null) {

tempArray = temp.split("[ ]");

entryList.add(tempArray[0]);

entryPointer++;

employeeCount++;

exitQueue.add(tempArray[1]);

checkExitStatus();

temp = brReader.readLine();

}

} catch (IOException e) {

e.printStackTrace();

} finally {

try {

if (brReader != null)

brReader.close();

if (reader != null)

reader.close();

} catch (IOException e) {

e.printStackTrace();

}

}

}

private void checkExitStatus() {

String temp1 = entryList.get(entryPointer);

String temp2 = exitQueue.peek();

if (temp1.compareTo(temp2) >= 0) {

temp2 = exitQueue.poll();

while (temp1.compareTo(temp2) >= 0) {

employeeCount--;

employeeCountAtAnyTime.put(temp2, new Integer(employeeCount));

temp2 = exitQueue.poll();

}

}

}

public static void main(String[] args) {

EntryExitCode eec = new EntryExitCode();

eec.createEntries("EntryExitFile.txt");

// System.out.println(eec.employeeCountAtAnyTime);

String ceiliString = eec.employeeCountAtAnyTime.ceilingKey("11:19:20");

System.out.println(eec.employeeCountAtAnyTime.get(ceiliString));

}

}

Let me give a completely different strategy... Assuming N is very large million or billion

- loveCoding June 06, 2012Assume time is between 00:00:00 and 23:59:59

Since we want to optimize the lookup time , we need to preprocess data.

below is the strategy to have constant time for lookup...

1. initialize an array with 86400 elements and all elemets are 0

2. Now for every element b/w starttime and end time, increment the array by 1, for e.g for 00:00:00(Start) and 00:00:59(End) array index 0 to 59 are all incremented by 1.

3. Do this for all entries.

4. Now to look at the people present in office just refer to array index (CONSTANT TIME)