1. You have a set of 10000 asc

Interview Question

0

of 0 votes

4
Answers
1. You have a set of 10000 ascii strings (such as perhaps loaded from a file)
2. A string is input from stdin.
3. Write pseudocode that returns (to stdout) a subset of strings in (1) that contain the same distinct characters (regardless of order) as input in (2). Optimize for time.
4. Assume that this function will need to be invoked repeatedly. Initializing the string array once and storing in memory is okay . Please avoid solutions that require looping through all 10000 strings.

For example, if you have strings in (1): mary, brad, pitt, yygr
and the user types in: ry --> the output should be "mary" and "yygr"
or if the user types in: dd --> brad
- huangyingw November 20, 2012 in United States | Report Duplicate | Flag | PURGE
Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

Take a hit up front and build a hash table mapping characters to an array of dictionary strings that contain that character, i.e: 'r ==> "mary", "yygr"'

When you search for a string, you take the intersection of the dictionary strings for each character in the input string. You are looking for strings from the dictionary that show up for *every* character in the input string.

Python example (doesn't read from file or stdin):

def dictionary_to_database(dict):
    database = {}
    for line in dict.split("\n"):
        for char in sorted(list(set(line))):
            if char not in database:
                database[char] = []
            database[char].append(line)
    return database

def find_entries(database, string):
    intersection = database[string[0]]
    for char in string[1:]:
        intersection = set(intersection) & set(database[char])
    return list(intersection)

d = dictionary_to_database("""mary
brad
pitt
yygr""")
print find_entries(d, "ry")
print find_entries(d, "dd")

- mrmekon November 20, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I was thinking the same thing but with following difference:
. Put all the string in an array ARR
. Create a HashMap that holds key-value pair as 'character'->'index of the string in the ARR array'
. Then whenever 2nd string is entered, we check each character of this string and try to identify the list of indexes that match that character.
. For each such character, we have to identify the intersection of the indexes.
. Since we need to identify the intersection between the list of indexes(integers), it would be easy compared to identifying intersection between string.
. For optimization purpose, we can remove the duplicate characters from the 2nd input string

- Mario November 20, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@mario

The difference between our solutions just comes down to specifics of the language. In Python, the strings will all have the same object IDs. I'm not sure how the set's "intersection" function does comparisons, but if it just compares object IDs then my solution is already doing integer comparison instead of string comparison. But it might not be -- definitely something to keep in mind.

- mrmekon November 20, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

If storage is not a concern then we can use suffix tree.
It will provide best searching solutions in such cases.

- ashu November 20, 2012 | Flag Reply

CareerCup

Interview Question

Books

Videos

Resume Review

Mock Interviews