Google Interview Question for Developer Program Engineers


Country: United States
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
3
of 3 vote

if you forget about delimiters for strings and try to think about something else, you probably will find that it is possible to use length:

arr = ["aaa", "b", "ccccc"]

serialized string: 3,1,5-aaabccccc

to deserialize just find the first "-", take this prefix, split the prefix by "," and read strings based on the length

- Anonymous December 09, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

1) Serialize function: you just go through an array and form 1 big string inserting some delimiter string between items of the array. The delimiter string must be exotic in order not to repeat some part of array's items, e.g. "~`*&#]^". The key problem here is to choose the delimiter string.
(Another solution for creating the delimiter string is the following: you scan an array and get the longest possible length of all array's items. Then in loop you create a delimiter string of random characters, which will be longer by 1 then the longest string from array. In this case the total complexity will be O(n), 'cause of 2 loops only). By the way, here goes the hint from the interviewer, if your delimiter string occurs to be quite long as well as with large initial array, you can face string overflow problems, but interviewer knows about that and hints you not to take this into consideration.
That's all.
2) Deserialize function: you take this big string and in loop restore the array, you just take the sequences of chars in between the delimiter strings.O(n).
(The implementation is quite straightforward. If you want, I can provide it.)

- zr.roman December 08, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

That's basically exactly what i said, but the interviewer I think was looking for a better solution

- TheShocker1999 December 08, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

let's see.
In your solution you do "Check(string[i])". - this will cause O(n^2) complexity. And in case of fault (i.e. string contains your delimiter), you will have to start ab ovo: choose another delimiter and start again. And so on. This is not good, not to say bad.
There are no such problems in my solution: my delimiter is by 1 longer than any string of the array => so automatically I do not do any checks, I just go through array, and incrementally create the output. This is O(n) against your O(n^2).
And I use the hint of the interviewer - does not matter how long the delimiter is (as well as the array), I do not worry about the string overflow.
So, I offer O(n) solution. Is it possible to find better solution?

- zr.roman December 08, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

No, maybe i wrote down my explanation wrong. the deilimiter would be a character that was pre determined, a rare exotic character, and hard coded in (ex. !).

1. Loop through array of strings one by one.
2. Checkstring() would take that string, merge sort it, then check each index of the string until it find either the escape, or an unicode character value GREATER than the escape, and delimit it.
3. The the entire string would be returned, and escaped (ex Like: string1!string2)

- TheShocker1999 December 08, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

"strings can contain any unicode character" - so, there are no exotic characters, because the task says ANY character is possible to be in string. It is the same as: ALL the characters are there among the strings of the array (quite possible variant). So, rare exotic hard coded character will not work.
Frankly, I don't understand your Checkstring() function, looks like, after it you will not be able to restore the array in Deserialize function.

- zr.roman December 08, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Ok, the checkstring was meant to see if the exotic character that I used to delimit was there in the string before we serialized it. But I see what you are saying, if all characters are possible, how can i use one to delimit, something I would have figured out, if i had a unicode chart, and more time than 9 minutes to think.

All characters are possible in Json also though, but json manages to be serialized. So I was using a similar method.

- TheShocker1999 December 08, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@zr.roman your solution is perfect. However I wouldn't say the implementation is straight forward. Below is what I coded in Java.

* @author shivam.maharshi
 */
public class ArraySerializerDeserializer {

	public static String serialize(String[] a) {
		StringBuilder output = new StringBuilder();
		int maxLenght = 0;
		for (String s : a)
			if (s.length() > maxLenght)
				maxLenght = s.length();
		maxLenght++;
		output.append(maxLenght).append(":");
		String delimiter = generateRandString(maxLenght);
		for (String s : a)
			output.append(delimiter).append(s.length()).append(":").append(s);
		System.out.println(output.toString());
		return output.toString();
	}

	public static String[] deserialize(String s, int size) {
		String[] output = new String[size];
		StringBuilder sb = new StringBuilder();
		StringBuilder num = new StringBuilder();
		int i = 0;
		while (s.charAt(i) != ':') {
			num.append(s.charAt(i));
			i++;
		}
		i++;

		int maxWordSize = Integer.valueOf(num.toString());
		num = new StringBuilder();

		boolean parsingNum = false;
		boolean parsingDelimiter = true;
		int charCount = 0;
		int nextWordLenght = 0;
		int wordCount = 0;
		while (i < s.length()) {
			if (parsingDelimiter) {
				while (charCount < maxWordSize) {
					i++;
					charCount++;
				}
				parsingDelimiter = false;
				parsingNum = true;
				charCount = 0;
			} else if (parsingNum) {
				while (s.charAt(i) != ':') {
					num.append(s.charAt(i));
					i++;
				}
				parsingNum = false;
				nextWordLenght = Integer.valueOf(num.toString());
				num = new StringBuilder(); // Emptying.
				i++;
			} else {
				while (nextWordLenght > 0) {
					sb.append(s.charAt(i));
					i++;
					nextWordLenght--;
				}
				parsingDelimiter = true;
				output[wordCount] = sb.toString();
				wordCount++;
				sb = new StringBuilder(); // Emptying.
			}
		}
		return output;
	}

	private static String generateRandString(int size) {
		StringBuilder sb = new StringBuilder();
		for (int i = 0; i < size; i++) {
			sb.append((char) (65 + (26 * Math.random())));
		}
		return sb.toString();
	}

	public static void main(String[] args) {
		String[] a = { "this", "is", "very", "nice", "I", "like" };
		String s = serialize(a);
		String[] output = deserialize(s, a.length);
		for (String out : output)
			System.out.print(out + " ");
	}

}

- Shivam Maharshi January 20, 2016 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Why can't we just add length add the beginning of each string. Then all these we don't have to find the delimiter.

Suppose, first string is of 10 characters (fellowship), second is of 5 characters(sandy), third is of 7(brownie) characters. So, we will serialize the objects like this 10-fellowship5-sandy7-brownie. Here '-' character will serve as delimiter as the we need to distinguish between characters and length numbers.

While deserializing, we know that the length of the first string (characters before '-'), then read that many numbers of characters to retrieve the string. Repeat this process until we read the entire serialized string.

- Psycho January 24, 2016 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Why not chose a delimiter like dash (-) and escape any other dash in the string by adding another dash to it. this way you know that you've reached a delimiter if you've reached a single dash.
after you finished extracting strings into array, you can replace '--' with '-'.
this seems like an O(n) solution to me. unless I am missing something....

- Yossi January 25, 2016 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Yossi, what would you do for "--"? If you wanna replace it with "---", consider this: how would you distinguish ["goog-", "le"] from ["goog--le"]?

- harshjain30 January 31, 2016 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

I only had 9 minutes, when he asked me this question. I got as far as whats below, then when I had only 2 minutes left, he asked me to explain the rest of my thinking, which I did(see comments as to what I said).

string serialize(string input[], int size)
{
string Output;
for (int i = 0; i < size, i++)
{
Check(string[i]); //Check string for escape character, use a character high on the unicode chart, sort each string indexes by unicode value using merge sort then check first character of each string, if it is greater than the character you chose, then you know the string does not contain your escape character

//Add escape character to each string to separate values for deserialization
Output +=string[i];
}
return Output;
}

string deserialize(string input[], int size)
{
//didnt get to explain this function

}

I didnt have enough time to optimize or even think of a solution that would be better than O(n2). I know there are better solutions, but nothing came to mind during the interview.

- TheShocker1999 December 08, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

deserialize function has wrong signature, it should be:

string[] deserialize(string input, int size)

- zr.roman December 08, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

It makes sense to me to keep use the length of each string as part of the serialization. E.g. "test" becomes "4:test".

#!/usr/bin/env python
# -*- coding: utf-8 -*-

NUMERIC = xrange(ord('0'), ord('9') + 1)


def serialize(list_of_strings):
    serialized_string = ''

    for blob in list_of_strings:
        n = len(blob)
        serialized_string += str(n) + ":" + blob

    return serialized_string


def get_next_length(a):
    end_index = 0
    for i in xrange(0, len(a)):
        if ord(a[i]) not in NUMERIC:
            end_index = i
            break

    if end_index == 0:
        return None

    return int(a[0:end_index])


def int_width(n):
    width = 0
    while n:
        n //= 10
        width += 1

    return width


def deserialize(a):

    next_length = get_next_length(a)
    stop_idx = 0
    result = []
    while next_length:
        start_idx = stop_idx + int_width(next_length) + 1  # extra colon character
        stop_idx = start_idx + next_length
        next_length = get_next_length(a[stop_idx:])
        result.append(a[start_idx:stop_idx])

    return result


if __name__ == '__main__':
    some_unicode_strings = []
    some_unicode_strings.append('générales okie dokie')
    some_unicode_strings.append('555é okie dokie";')

    serialized = serialize(some_unicode_strings)
    print serialized
    deserialized = deserialize(serialized)
    for blob in deserialized:
        print blob

- bork December 09, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

yes, also a good O(n) solution.

- zr.roman December 09, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Is it UTF-8?

- EPavlova December 09, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

да: "Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8, UTF-16" (wikipedia).
но суть не в этом, похоже, это задачка на смекалку: как подобрать раздлитель, когда набор символов один и тот же и для разделителя и для самих строк.

- zr.roman December 09, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Determine the delimiter (for example, max used character+1) and write it at the beginning of the file. You can read it and use while deserialization.

- NotImplemented December 09, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

zr.roman , I see your point ( unfortunately my russian is not very good to answer). Why should we use delimiter to split string array? Coudn't we use byte length followed by string encoding for the dictionary? Then if some string repeat in the array we could encode string index in dictionary, not the whole string. Just ponderiong over it.

- EPavlova December 09, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

sorry, I thought you were Russian, 'cause of surname.
I don't quite catch your idea, if possible, provide some solution.

- zr.roman December 09, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

if you forget about delimiters for strings and try to think about something else, you probably will find that it is possible to use length:

arr = ["aaa", "b", "ccccc"]

serialized string: 3,1,5-aaabccccc

to deserialize just find the first "-", take this prefix, split the prefix by "," and read strings based on the length

- Anonymous December 09, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

test

- Anonymous December 09, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

test

- Anonymous December 09, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

If you forget about delimiters and try to think about something else, you will probably find that you can use lengths.

arr = ["aaa", "b", "ccccc"]

serialized string: 3,1,5-aaabccccc

to deserialize just find the first "-", take this prefix, split the prefix by "," and based on lengths read the strings.

- Den December 09, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

If you forget about delimiters and try to think about something else, you will probably find that you can use lengths.

arr = ["aaa", "b", "ccccc"]

serialized string: 3,1,5-aaabccccc

to deserialize just find the first "-", take this prefix, split the prefix by "," and based on lengths read the strings.

- Den December 09, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

yes, @bork already provided similar solution above.

- zr.roman December 09, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

C# simple implementation that runs in O (N) time

public static string Serialize (IEnumerable<string> strings)
        {
            StringBuilder sb = new StringBuilder();
            
            foreach (var str in strings)
            {
                sb.Append(str.Length); // how many characters
                sb.Append("|");
                sb.Append(str);
            }

            return sb.ToString();
        }

        public static IEnumerable<string> Deserialize (string s)
        {
            if (String.IsNullOrEmpty(s))
                yield break;

            var readingData = false;

            int fwd =0;
            for (var i = 0; i < s.Length; i++)
            {
                
                if (readingData == false )
                {
                    fwd = ReadNumber(ref i, s);
                    readingData = true;
                    continue;
                }
                else
                {
                    yield return ReadString(ref i, s, fwd);
                    readingData = false;
                }
            }
        }

        private static int ReadNumber (ref int index, string s)
        {
            StringBuilder sb = new StringBuilder(10); // maximum 10 digits can have an int32 in C#
            while (s[index] != '|')
                sb.Append(s[index++]);
                
            return int.Parse(sb.ToString());
        }

        private static string ReadString (ref int index, string s, int howMany)
        {
            StringBuilder sb = new StringBuilder(howMany);
            var cnt = 0;
            while (cnt < howMany)
            {
                sb.Append(s[index++]);
                cnt++;
            }
            index--;
            
            return sb.ToString();
        }

- aurelian.lica December 09, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Inorder to serialize these kind of strings we need to place the string length along with delimiter. let's take our delimiter as ";<num>;" So, String "Hello" will be serialized as ";5;Hello"

Since, we know our delimiter string format, we can always deserialize it with ease.

- Psycho December 10, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

I dont agree with that, please explain. If this was the optimal way, why isnt this used widely? Json is more widely used than this format, how is this more efficient space wise than using a exotic character as a separator and delimiting the character?

You are adding 2 bytes to every single string. Please explain this method

- TheShocker1999 December 10, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Write size of string in bytes(hopefully its only a byte width size), then write the string in bytes. Do this till the end. We need some methods to get size in bytes.

- Joshi of Vatsa December 17, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Write size of string in bytes(hopefully its only a byte width size), then write the string in bytes. Do this till the end. We need some methods to get size in bytes.

- Joshi of Vatsa December 17, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

#!/usr/bin/env python
# -*- coding: utf-8 -*-

def serialize(arr):
    s1 = ''
    for i in xrange(len(arr)):
        if not isinstance(arr[i],unicode):
            text  = arr[i].decode('utf-8')
        else:
            text = arr[i]
        s1 += text
        arr[i]=len(text)

    return s1,arr

def deserialize(str,arr):
    start=0
    l=0
    for i in xrange(len(arr)):
        l += arr[i]
        text = str[start:l]
        if isinstance(text,unicode):
            text = str[start:l].encode('utf-8')
        arr[i] = text
        start=l
    return arr


#Test:
a=['serialize','deserialize','YćY','ℙƴ☂ℌøἤ','2016']
print a
first = serialize(a)
print first[0],first[1]
second = deserialize(first[0],first[1])
print second

- himalayas October 07, 2016 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More