Algorithm: khanji has a LOT of

Gold-Tier Interview Question for Software Engineer / Developers

0

of 0 votes

21
Answers
Algorithm: khanji has a LOT of characters in their alphabet and are using a way of compressing it so that the most common characters are 1 byte and the less common are 2 bytes. If it's a '1 byte' character, it starts with a 0, If it's a 2 byte character, it starts with a 1. If you're at a particular character (at the beginning), how do you find where the previous character begins?
- Gayle L McDowell April 04, 2005 | Report Duplicate | Flag | PURGE
Gold-Tier Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

Hi all,
I am pretty new to this forum.
My solution to the above problem would be:
We already know that current pointer points to beginning of a character.
Our target is know where does the previous character begin. The previous character can begin at 1 byte left or 2 bytes left. We don't bother about other characters.
two/three/four previous bytes inspection
pattern 00 -- previous common character starts with immediate left
pattern 11 -- previous uncommon character starts with two bytes left
pattern 01 -- This is not an allowed pattern. Note that we will have found a character when we have moved 2 bytes to the left. here 1 means it is not a1 byte character and 0 means it is not a 2 byte character.
patttern 10 -- This is ambiguous and we need to explore furthur.
needs to look the third byte to the left and we derive 110
or 010 .
If it is 010,then it is (0)(1 0), i.e. the previous character is a 2 byte character, note that if we took
(0 1)(0), we would encounter a (0 1) pattern which is not an allowed pattern.
if it is 110, then it can be (..1) (1 0) or it can be
(1 1)(0).So we need to explore furthur one byte.
So we can have 1110 or 0110.
In case of 0110, we can have (..0)(1 1)(0) again because (0 1) is not an allowed pattern.
In case of 1110, we have ambiguity, so explore furthur.say we get 01110, so the pattern is (1 1)(1 0).( see that (1 1)(0) is not allowed because we will again have 0 1 , which is not an allowed pattern.
lesson: we get ambiguity for 11111...0 series, and once we get 01111...0, we stop.
So once we get a zero on left, we stop. Now for finding the answer, we can look at our lesson, we can count the no of 1's , if we get an even no of 1's,(like 0110), we can say that last character begins at the 1st byte from the current position.
if we get an odd no of 1's(like 01110), we cna say that last character begins at 2 bytes from the current character.
Hope the problem is solved.

- sudhanshu22 June 09, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

& the previous two bytes with 0x80. If they both return a 0 then the previous character begins at the previous byte, otherwise it starts 2 bytes behind this current pointer. Not sure if I understood/solved this correctly!!

- Sam May 15, 2005 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Actually you need to check only the value returned after anding with the previous two bytes.

- Srik May 29, 2005 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Suppose you read the following two previous bytes (I'm only listing the first bit of each byte):
... ? 1 0 [current byte].
This could yield several scenarios (I'll parenthesis the whole characters and read back one more byte)
... (1 1) (0) [current byte].
... 1) (1 0) [current byte].
... (0) (1 0) [current byte].
... 0) (1 0) [current byte].

So, Sam and Srik... it appears reading two previous bytes back is not necessarily sufficient.

- Gayle L McDowell May 30, 2005 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

So as Gayle hints, we need to look 4 bytes back since even three bytes is ambiguous.

- Kevin June 02, 2005 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Is four bytes even sufficient?
What if you had:
... 1 1 1 0 [current byte]
There are at least these two scenarios:
... (1 1) (1 0) [current byte]
... 1)(1 1) (0) [current byte]

- Gayle L McDowell June 02, 2005 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I'm only describing the tricky case (...1 1 1 0). When the end byte is 1 it's obvious we're dealing with a 2 byte character.
We need too look back until we hit a 0, that's the end of a character for sure. Then you count the number of 1's in between the two zeros.

0 1 1 ...1 1 1 0 - if the number of ones is even it's single byte, else double byte.

PS: I asked this question in an interview the other day :)

- Petru November 29, 2007 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

need to keep looking until you have two consecutive zero's? (or you have reached the beginning)

- astha June 04, 2005 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

If previous two are two zero the previous character is a single byte character. Otherwise keep looking until you reach a sequence 01. For example if we have two pointers prev, curr and the sequence 11001111 you stop when prev reaches in position 4 and curr reaches in position 5. The reason we stop at 01 is that this seq. gives us the information that the 1 means that the character is double byte character hence we can backtrack from there to the position provided as the input and ascertain each character size on the way.

- sb July 18, 2005 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

It needs tracking for the first byte that starts with 0 and is not the immediate previous byte.
.[0]..[1][current]... or ..[0]..[0][current]
so, the byte after the first [0] shown above will tell us that the next byte will be a new char. Then, we can tracking down. Or, it will reach the beginning char.

- skinner March 28, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Gayle, if its a 2byte character can the second byte in the 2byte character start with 0?

- chinta March 28, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

two to three previous bytes inspection
pattern 00 -- previous common character starts with immediate left
pattern 11 -- previous uncommon character starts with two bytes left
pattern 01 -- previous uncommon character starts with one byte left
patttern 10 -- needs to look the third byte to the left and we derive 110
or 010 which is answered above

- xyz March 28, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

hi xyz,
for pattern 11 -- if you look one more byte, it can be result of 11 1[current] or 0 11 [current], so you can not make sure if you can delete one or two bytes. So you must find another 0 to make sure the answer.

- skinner March 28, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

skinner,

The problem says you're at the beginning of a character so I don't think you to take account of the case when it could be 11 1[current]

- Michael April 14, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

So, can anyone please post the solution?
I am fairly confused by the discussions :)

- algoMan January 19, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

No, it can’t be found unless you restart from the beginning.

Since khanji uses variable bit length to encode its characters, it is a case of Prefix coding, which is designed to get the benefit of less space compared to fixed bit length encoding like ASCII. In such encoding the bit patterns of characters are so designed that any character can be identified from its unique prefix bits (certain bits from left if you write bits of a character in little Endian style) but the characters can’t be identified from its prefix bits. This is why, when a set of such characters are encoded in binary form, each characters can be identified by reading from beginning (left to right) but moving in reverse direction will fail to identify the characters.
One such example is the Network IP address class scheme. Class A has prefix 0, class B prefix 10, Class C 110 and Class D with prefix 1110. Looking at any IP address’ prefix you can tell its class but not from its suffix. In this particular example, all IP addresses are fixed length of 32 bits so a suffix of 32 bit (eventually all the bits) can do the task, but in case of variable IP address lengths it would have not been possible to identify their class from their suffix.

- kg February 16, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Errata of previous post:
..In such encoding the bit patterns of characters are so designed that any character can be identified from its unique prefix bits (certain bits from left if you write bits of a character in little Endian style) but the characters can’t be identified from its suffix bits...

- kg March 09, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Ok, The whole point here is to reach an index that one is sure to conclude is the start of a character. How?? Here is a solution: (The dots and the bits in brackets represent the MSB of that index)
-Case (0)(0)......(Given Index).... Here definitely, the second zero is the start of a character (ASCII) where as nothing can be said of the first zero
-Case (0)(1)......(Given Index)... Here, the (1) is definitely the start of a character and nothing can be said about 0
-Case (1)(0)....(Given index).... Here, 10 can be a character or 0 can be a character , so no use
-Case (1)(1)....(Given index)..... Here 11 can be a character or second 1 can be start of a character etc., so ambiguous.

From above 4 cases, the patterns that are useful are 00 and 01. Now, the solution is simple:

1) Start
2) cur=given index
3) q=MSB of cur-1 index
4) p=MSB of q-1 index
5) if(p is 0) {index associated with q is the start of a character}
6)else{cur=q}
7) repeat from 3 until if(p is 0) is satisfied

Now you have index associated with q as the start of character. Depending on whether q is a 0 or 1 increment until you reach the given index each time storing the previous index and thats the answer.

So, initially, u are going back until you reach the pattern 00 or 01.

- Krrish March 25, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

if previous byte's first bit is one then done.
otherwise go back untill u find one of
1. a byte with first bit zero
2. start of the string

- Ravi Kant Pandey April 09, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

going 3 byte back is suffi:

....... (1/0).......(1/0)......(1/0).......X(current)
B1 B2 B3
Byte1 Byte2 Byte3 starts with with
- 0 0 ands (1 byte char )
1 1 0 0 start point
0 1 0 1 start point
1 0 0 0 start point
- 0 1 not possible
- 1 1 2 byte char

- Vishnu May 19, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Some observations:
1.if the previous byte begins with 1, it can only be a part of two bytes, so the previous character begins two bytes ahead
2.if the previous byte begins with 0, it can be two situations:
1) it is the beginning of the previous character
2) it is part of the previous character
3. so keep looking forward to the previous previous bytes:
if it begins with 0, then the previous byte is the beginning
if it begins with 1, then keep looking forward.
4. for the previous previous previous previous byte:
if it begins with 1, then the previous previous byte is the beginning
if it begins with 0, then the previous byte is the begining

So we shall look into the previous four bytes.From the beginning,mkeep looking forward until we meet one byte that begins with 1

- MD February 28, 2010 | Flag Reply

CareerCup

Gold-Tier Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews