Google Interview Question
Software Engineer / Developers"how do you escape characters in a string"
He probably was looking for some compression/ coding technique and serialization technique. Not sure .. you should have asked more questions, I usually ask more questions because sometimes the answer evolves out of the details of the question
He was particular about the ability to reproduce the string on the other side. I guess the answer should just be using serializable in java as it takes care of everything. The code should involve in Output and input streams. Any more suggestions are welcome.
He must have asked , "how to treat escape characters in the string" e.g newline, tab etc. Search serialization in C++ FAQ's. generic method is given there. (google C++ FAQ + serialization)
Google normally doesn't ask API based Question !!
In context of socket programming, i would say he added hint of escape character because we stop reading when we encounter one...i am not sure how to send such an information over network now..
Take the case of passing a string to printf(). The string is between " (double quotes). And if the string to be printed has a double quote, then you escape it with double quote ".
In a similar way, you can pick a character/byte to represent the start of the string. If that character appears in the string, then escape it (insert the escape character) and send it over.
However, both sides need to know the escape character before hand.
I would use bytes too. In the context of serialization, your protocol would have to on agree on:
1) How many bytes to read in (either a byte-count or a sentinel is common)
2) What character each unique byte maps to
Since there are 256 valid characters, we couldn't use a sentinel without using 9 bits to represent the data, since 8 bits would mean we couldn't tell a sentinel apart from the data itself. Might as well use a byte count instead.
On the serialization end of things you would send a number indicating the number of bytes (characters) to read in, followed by each character mapped to its byte representation. On the deserialization end, keep reading bytes and mapping them to characters up to the number given.
Here's your answer: The basic idea is that you need to be able to represent any of the characters in the data while also using those same characters to mark the end of a string. You do this by using an escape sequence. For example, the stream sees "\0" and knows that the string is terminated and a new one comes next. The final problem is then how to represent a backslash in the actual string, which can simply be "\\".
The reason this is important in serialization is because a stream of data is one continuous flow of bytes. There is no notion of new lines or boundaries between objects until you explicitly code such boundaries in there.
Here is an example Pythonic pseudocode implementation:
def write(str, stream):
for char in str:
# doesn't check for double back slash, just checks for back slash,
# since Python has escape sequences too
if char == '\\':
stream.write(char) # write an extra backslash
stream.write(char)
# write escape sequence in stream to indicate end
stream.write('\\')
stream.write('0')
I think that a good answer is to send the length of the string in bytes (coded for example as a 4 bytes unsigned integer) followed by the list of bytes, one per char.
- claudio.corsi January 04, 2011The receiver reads the first 4 bytes and understand the string length (let says L), then it reads the following L bytes and build the string.
Here we are assuming that the string is ASCII encoded, so we don't need any other information.
Anyway for completeness, we can encode also the string encoding(ASCII, UTF8, UTF16, ISO*, etc...) using for instance an extra byte after the string length field. In this case the reader reads 4 bytes for the length, 1 byte for the encoding and L bytes for the content. Depending on the encoding the receiver can interpret correctly the string content.