Amazon Interview Question
Software Engineer / DevelopersTeam: Aggregation
Country: India
Interview Type: Phone Interview
we can just keep an int array of size of number of unicode characters. Since upper limit of 1 million is given int array would be sufficient. Then also have a count.As the characters comes in increment the count. If the int array already have a value> 0 make it zero.If it is zero make it count. Once the stream is finished, just iterate through the int array to find the lowest count. That index will give the character .
@mani : in case all the stream characters are unique, the insertion into heap costs O(nlogn) by the end of stream.
@bharat: Technically, given this example the characters are known to be of type "char" which, by definition (and assuming the use of the full range of the data type) is only 256 possible values. Even assuming a unicode value yields a (relatively) small number compared with the total in the stream.
Although still, for this case it might be reasonable to argue a better data structure than a heap exists (even a priority queue with the 8-bit constraint). That's not enough computations or memory usage (in most cases) to be overly worried about the performance.
Why heap ? Queue is sufficient to keep track of first occurance of each letter. Have Count Map also with you, Just deque and check if its count is 1 or not from the Count Map
ok, I see optimization possible with heap. If you see any character is repeated you can go and delete that from the heap. In the end, you can simply return the root.
Deletion will take o(n) time in this case. You can take a circular array, where head will point the first character that came in the stream, and in the tail you can add the first character came currently in the stream. When a character is repeated, dirty the position in the circular array by replacing -1, if the character is pointed by head, increase the head pointer, till we found a non -1 value.
So, 1 array (hash) for holding the current status of the character. 1 circular array for the first non-repeated character pointed by head of this array.
Like other Solutions keep a HashMap. As each character is encountered add to HashMap with key=character and value=index where found (you can keep an int counter as you iterate over stream). Once you find a character is duplicate, set value for character in HashMap to -1. Then at end, simply iterate over HashMap (which at most has as many characters as character set) and select the character which has the lowest positive value.
Assuming single-byte encoded stream, you can solve this problem in linear time
and using constant space by simply utilizing the counting sort algorithm with an
array of a data structure that stores the first position of the encountered character.
As soon as the stream has no more characters in it, we simply exit the while loop,
which has already built the stream statistics for us - the count of each character
and its first occurence position in the stream.
So, the problem is reduced to finding the character that has occurred only once
and that has the least occurrence position (the first unique).
================================================================================
static const int STAT_SIZE = 256;
static const int INVALID_POS = (-1);
struct char_pos {
int count;
int position;
char_pos() : count(0), position(-1){};
};
int main()
{
char_pos stream[STAT_SIZE];
char_pos *chstat;
int pos = 0, curpos;
char chunique;
/* Accumulate the stream statistics */
while (hasNextChar()){
char ch = getNextChar();
chstat = &stream[ch];
chstat->count++;
if (chstat->position == INVALID_POS)
chstat->position = pos;
++pos;
}
pos = INVALID_POS;
/* Iterate over the statistics and find the first unique character */
for (int i = 0; i < STAT_SIZE; ++i){
if (stream[i].count == 1) {
chunique = (char) i;
curpos = stream[i].position;
if (INVALID_POS == pos) {
pos = curpos;
}
else {
if (curpos < pos)
pos = curpos;
}
}
}
if (INVALID_POS == pos)
printf("No unique character found in the stream\n");
else
printf("The first unique character: 0x%02x found at position %d\n", chunique, stream[chunique].position);
return 0;
}
Maintain two bits and the first position of a given char (there are at most 256 chars). Since we expect 1M characters, we can use 32 bits for each char and so can use an 256 size array of 32 bit ints, using the last two to determine the count (0, 1, or >1) of the char.
Once we have the array filled, finding the required char should be easy.
1. Take two integer arrays A & B of size 256 (or as many characters possible).
2. For each character in the stream,
a. increment the count in array A and do one of the following:
b. If count is 1, store position of the character in B
c. if count is =2, reset the position of character in B
d. if count >2, do nothing
At the end of stream, get the index of B with minimum position value.
Instead take 2 arrays- 1 boolean one int......
if count>1, make boolean[count]=false, and position[count]=-1
then at last print the min of position[count] (>0)
Where do you store the count? You cannot start with a default false with the boolean array as one cannot differentiate between count=0 Vs count>1.
count=false initially, and when first time a char comes, make the position=current_pos and count=true, now next time a value comes, and it is already true, then make position=-1, keep count=true, for subsequent visits
We can actually solve this with one integer array A initialize to -1 initially.
Now if we are seeing a char for the first time (A[char]==-1), set A[char] = position
if we are seeing a char second time (A[char] > -1), set A[char]=-2
if we have seen a char more than once (A[char]==-2), do nothing
take min position ignoring all elements < 0.
I think first we should make sure the encoding of the string (i.e. the number of bits that can represent 1 character).
If the encoding is ascii, we can use two bitmap (each has 255 bits) to solve this problem. Bitmap A is used for remembering if the character appeared once, and bitmap B is used for remembering if the character appeared more than once. We need another integer array (capacity 255) to remember the last time a character appeared, then the problem will be solved. Time complex is O(n), space complex is O(1) (255bits * 2 + sizeof(int) * 255)
If the encoding need many bits to represent a character, we can use multi-level bitmap structure to reduce the memory cost.
1) First we need to store the index of all unique characters in an array of fixed size.
2) Now find the smallest index from this array.
Here is the code
int find_first_unique_index()
{
int index, small_idx = INT_MAX, temp;
char array[256], c;
for (index = 0; index < 256; index++) {
array[index] = 0;
}
index = 1;
while (hasNextChar()) {
c = getNextChar();
if (array[c] > 0) {
/* Invalidate the index */
array[c] = -array[c];
} else if (array[c] == 0) {
array[c] = index;
}
index++;
}
/* Now find the smallest valid index */
for (index = 0; index < 256; index++) {
temp = array[index];
if (temp > 0 && temp < small_idx)
small_idx = temp;
}
return small_idx;
}
while(Stream.hasNextChar())
{
char current_char = stream.readChar();
if(array[character - '0'] == 0)
{
array[character - '0'] = 1;
Linked_Hash_Set.put(current_char);
}
elseif(array[character - '0'] == 1)
{
array[character - '0'] = 2;
Linked_Hash_Set.remove(character);
}
}
return the first element of the Linked_Hash_Set
Since the number of character is constant, ASCII - 128, Extended ASCII - 256 and Unicode - 65535, storage space required for linked hash map and the array is constant.
I have a solution for this.
We can maintain an array (of size 256 ) wherein an element is inserted if not present and deleted from the array if already present ( re-arrange rest of them after deletion).
When 1M chars are scanned the element present at index 0 of the array will be the First Unique element.
This method has more complexity since for every element in the stream we will need 256 comparisons in the worst case.
There can be 3 states for each character
0- Not present in the stream at all
1-Present once in the stream (this we are interested)
2-Present more than once in the stream
to represent the above 3 valus, 2 bits are enough.
so allocate an array of size (256*2) bits, if character represented by ASCII value.
for any character 'x' in the stream, bits 2* asciivalueof('x') and 2* asciivalueof('x')+1 represents the appearance state in the array
1. scan the full stream one by one, set this 3 values properly in the array
2. then scan the array to find the value 1 and return the index.
space needed : constant O(1)
time needed : O(n)
Could u plz explain, how ur method ensures that the index is of the "first unique" element from the stream?
Using a single array can only tell whether the character was repeated or not. It wont help us in finding which is the first unique character.
So i suggested that in addition to this array, we need to build an ascending heap(A binary tree with lowest value in the root). As you read in the stream when a character is encountered insert that characters position(index in the stream, which according to the problem is 0,1,2...million)
struct Node
{ unsigned long int index; //This is the key
char c;
}
When the same character is encountered in the stream do not insert it into the stream. In order to do this we can have an array of bools to know whether the character was added or not.
Now after all the characters in the stream are read, take the node in the root of the heap and check if that character was unique or repeated. If not delete that node from the heap and take the next smallest node from the heap. Again check in the array whether repeated or not. continue until you fine a character that is not unique. That should be your answer
If you want to use an array, instead of using the count for the respective character, try to use the index as a value and if you find any character repeating( i.e. if the corresponding value in the array is positive, then make it -1). after entire stream is over, you can find the minimum value >= 0 from the array which is the index of the first non repeating character.
Should make use of Hash. There can be 3 states for each character
- Mani November 15, 20120- Not present in the stream at all
1-Present once in the stream
2-Present more than once in the stream.
Finally which ever characters are having the state 1 are unique. But we want to know the first unique character.For this we can keep a ascending heap. Key to this heap should be the index of the first occurrence of the character on the stream. There should not be any duplicate characters in the heap. For doing this keep another array to know whether the character is already added or not.
After all the characters in the stream are read. Take nodes from the root and check if the state of the character is 1 in the hash array we have kept. If not go to the next min node from the heap and do the same operation.
Time complexity
O(n) for hashing to the array.
Insertion into heap will happen only for the different characters.
O(n) should be the answer