Given an array of strings, fin

Google Interview Question for Software Developers

0

of 0 votes

20
Answers
Given an array of strings, find out in how many cases is any of the anagrams of the string at location i, a substring of the string at location i+1.

Test Case I: ["ab", "ab", "abc", "bca"]
Answer: 3

Test Case II: ["ab","aqb"]
Answer: 0
- lkjhgfdsa September 30, 2017 in United States | Report Duplicate | Flag | PURGE
Google Software Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

@ChrisK. That's correct. It's definitely not Rabin-Karp algorithm, it's something similar, in sense that it's using a hash and a sliding window.
Regarding time complexity, we are both right, because the complexity is really O(n*f), but since f is a constant, it's actually O(n).

- Alex October 01, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

the problem is finding at least one anagram of a string as a sub-string in an other string.

an anagram can be any permutations of a string S. Obviously we could search any
permutations in the next string N. This is the brute force approach and maybe codeable
in 15 minutes (probably with errors) if we accept creating repeating permutations.
Then it's O(|S|!*|L|*|S|) if |S|>=|L|

(|X| denotes the length of the string X, a larger string is never an anagram of a shorter)

Improved and easier to code is by sorting characters of |S| string and extracting
all sub-strings of the size of |S| from |L|, sorting the character of this sub-strings
and compare the sub-string against the sorted S. This is better to code and faster,
but still not optimal.

Time complexity: O(|S|*lg(|S|)+(|L|-|S|)*|S|*lg(|S|)+|S|) = O((|L|-|S|)*|S|*lg(|S|))

Best is:
1) Anagram is the multi-set S of a string P.
note: A set or multi-set ignores the order.

2) Sub-string B is a sequence or interval of a string L with length |P|
for all such sub-string B in L, if multi-set(B) = multi-set(B) then at least one sub-string
of L is a anagram of P

How to code it:
1) build a frequency table F for P (how many times does each character occure)

2) initialize a sub-string B = empty string

3) for each character c in L
append to sub-string, decrement occurrence of c in F
if sub-string is now larger than P, remove first character from sub-string and increment
occurrence of c in F
check if all frequencies are 0 in F, if so, it's a sub-string

Time complexity for a single pair of strings would be O(m+(n-m)) = O(n) if n >= m, that's
better. In the given problem we do this at most |A|-1 times where A is the given array of
strings.

#include <string>
#include <vector>
#include <unordered_map>
#include <algorithm>
#include <iostream>

using namespace std;

// solution with sorting O((|b|-|a|)*|a|*lg(|a|))
bool isAnagramSubstring_sort(const string& a, const string& b)
{
	if (a.size() > b.size()) return false;
	vector<char> av(a.begin(), a.end());
	sort(av.begin(), av.end());
	auto bb = b.begin();
	auto be = next(b.begin(), a.size());
	while(true) {
		vector<char> sbv(bb, be);
		sort(sbv.begin(), sbv.end());
		if (equal(av.begin(), av.end(), sbv.begin())) return true;
		if (be == b.end()) break;
		++bb;
		++be;
	} 
	return false;
}

// O(|b|) solution
bool isAnagramSubstring(const string& a, const string& b)
{
	if (a.size() > b.size()) return false;	
	unordered_map<char, int> am;
	for_each(a.begin(), a.end(), [&am](char v) { am[v]++; });
	int mismatch_ct = a.size();
	auto be = b.begin();
	auto bb = b.begin();
	while (be != b.end()) {		
		int c = am[*be]; // *be: element entering the window
		am[*be] = c - 1;
		mismatch_ct += abs(c - 1) - abs(c);		
		if (be - bb >= a.size()) { // *bb: is an element leaving?
			c = am[*bb];
			am[*bb] = c + 1;
			mismatch_ct += abs(c + 1) - abs(c);
			++bb;
		}
		if (mismatch_ct == 0) return true;
		++be;
	}
	return false;
}

// driver
int countNoNextAnagramOfPrev(const vector<string>& arr) 
{
	int count = 0;
	for (size_t i = 1; i < arr.size(); ++i) {
		if (isAnagramSubstring(arr[i - 1], arr[i])) count++; // or isAnagramSubstring_sort 
	}
	return count;
}

int main()
{
	cout << countNoNextAnagramOfPrev({ "ab", "ab", "abc", "bca" }) << endl;
	cout << countNoNextAnagramOfPrev({ "ab", "aqb" }) << endl;
}

- Chris September 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

We can solve this by something similar to Rabin-Karp algorithm. Since size of the alphabet is constant, worst case time complexity is O(N), where N is total number of characters in the input strings.

#include <vector>
#include <iostream>
#include <string>

using namespace std;

class Hash {
	public:
		Hash()
		{
			char_counts_.resize(256, 0);
		}
		void AddChar(char c)
		{
			++char_counts_[c];
		}
		void RemoveChar(char c)
		{
			--char_counts_[c];
		}
		bool operator==(Hash const &other)
		{
			for (int i = 0; i < char_counts_.size(); ++i) {
				if (other.char_counts_[i] != char_counts_[i]) {
					return false;
				}
			}
			return true;
		}

	private:
		vector<int> char_counts_;
};

bool ContainsAnagram(string const s, string const sub_s)
{
	if (!sub_s.empty() &&
		sub_s.size() <= s.size())
	{
		Hash sub_s_hash;
		for (char c : sub_s) {
			sub_s_hash.AddChar(c);
		}
		Hash s_window_hash;
		for (int i = 0; i < s.size(); ++i) {
			s_window_hash.AddChar(s[i]);
			if (i >= sub_s.size()) {
				s_window_hash.RemoveChar(s[i - sub_s.size()]);
			}
			if (s_window_hash == sub_s_hash) {
				return true;
			}
		}
	}
	return false;
}

int AnagramCasesCount(vector<string> const &strings)
{
	int count = 0;
	for (int i = 0; i + 1 < strings.size(); ++i) {
		if (ContainsAnagram(strings[i + 1], strings[i])) {
			++count;
		}
	}
	return count;
}

int main()
{
	cout << AnagramCasesCount({"ab", "ab", "abc", "bca"}) << "\n";
	cout << AnagramCasesCount({"ab","aqb"}) << "\n";
}

- Alex September 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I was asked to solve this in 15 minutes.

- lkjhgfdsa September 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Why is the answer to the first test case 3? By definition of substring abc isn't a sub string of bca?

- SomeoneOnTheInternet September 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@SomeOneOnTheInternet:
"ab" (at i=0) has two anagrams "ab" and "ba" of which, "ab" is a substring of "ab"(at i=1). Thus, `counter` is `1`. The string "ab"(at i=1) which is an anagram of itself is the substring of string "abc" (at i=2). Thus `counter` is now 2. One of the anagrams of "abc"(at i=2) is "bca" which is a substring of "bca" at (i=3). This makes the counter `3`, which is now our answer!

- lkjhgfdsa September 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Hi Alex, thank you for your answer. Could you please explain your code? I am finding it difficult to understand it. Appreciate your help!

Regards,
lkjhgfdsa

- lkjhgfdsa September 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@ lkjhgfdsa. You are welcome. To decide if any anagram of string sub_s is present in string s, we create a hash of sub_s, and then, we compare the hash with the hash of the sliding window (of size sub_s.size()) in string s. If the hashes are equal, then we've found a window in s which has the same characters (and the same amount of those characters) as in sub_s. The hash is basically an array, containing character counts (frequencies).

- Alex September 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Alex, Hi, sorry to pinch you, but I consider this page as a learning platform ... anyways your solution is not O(n), because within your "n loop" you compare the "hash" which is a loop over the number of distinct character, so it becomes O(n*f) where f is the number of distinct characters.
Rabin-Karp maintains a hash (some integer, comparable in O(1)) which it does using modulo arithmetic ... you could do the same within your hash class so it would become close to O(n) expected average case, but Rabin-Karp has the problem, that if the hash matches it will have to deep compare, so one needs to distinguish between worst and average case behavior.
For pattern matchin R-K is bad, when the patterns occurs often in the search string (e.g. "a" in "aaaaaaaaaaa"). which is not the case here. So Rabin-Karp is a cool idea, but it will not work exactly, because it takes the order into account. So you will need to come up with a "flying" hash that will ignore order (e.g. by applying a prime to each character ...). It's a cool idea!

- Chris October 01, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Since finding all anagrams is costly as it is equivalent to finding all permutations, why don't we instead go through all the substrings in arr[i+1] of length arr[i]?

let isAnagram(s1, s2) return True if the two strings are anagrams of each other, by creating a dictionary of counts for each of their letters and checking equality.
let counter = 0
let x = len(arr[i]) (our current string)
there are O(n) substrings in arr[i+1]
We can go through each substring, and call our isAnagram(s1, s2) function. If true, increment our counter.

Time complexity:
O(n^2m). We are going through O(n) substrings for each string, which is O(nm) where m=len(arr). Then for each substring, we are checking if it is an anagram, which is O(n).

This is a costly solution, but probably a quick one given the time constraint?

- noname October 01, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Alex I took your code and improved it a bit. I added a "m_totalChars" counter to Hash class. There is no need to "operator==" - you simply check if "m_totalChars" is equal to 0

So the logic for "ContainsAnagram" is now like this:

* Construct a Hash class for the substring
* Loop over the second string, for each char call sub_s_hash.RemoveChar (which reduces the total counter by 1)
* We return sub_s_hash.isEmpty() if its true this means its an anagram

#include <vector>
#include <iostream>
#include <string>
#include <algorithm>

class Hash
{
private:
    std::vector<int> m_charCounts;
    size_t m_totalChars;

private:
    void AddChar(char c)
    {
        ++m_charCounts[c];
        ++m_totalChars;
    }

public:
    Hash(const std::string& str)
        : m_totalChars(0)
    {
        m_charCounts.resize(256, 0);
        std::for_each(str.begin(), str.end(), [&](char ch) { AddChar(ch); });
    }
    void RemoveChar(char c)
    {
        if(m_totalChars == 0) return;
        if(m_charCounts[c]) {
            --m_charCounts[c];
            --m_totalChars;
        }
    }
    bool isEmpty() const { return m_totalChars == 0; }
};

bool ContainsAnagram(const std::string& s, const std::string& sub_s)
{
    if(sub_s.empty()) return false;
    if(sub_s.size() > s.size()) return false;
    Hash h(sub_s);
    for(size_t i = 0; i < s.size(); ++i) {
        h.RemoveChar(s[i]);
        if(h.isEmpty()) break;
    }
    // If we "consumed" all the characters from the string itself, than "sub_s is" contained in "s"
    return h.isEmpty();
}

int AnagramCasesCount(const std::vector<std::string>& strings)
{
    int count = 0;
    for(size_t i = 0; i + 1 < strings.size(); ++i) {
        if(ContainsAnagram(strings[i + 1], strings[i])) {
            ++count;
        }
    }
    return count;
}

int main()
{
    std::cout << AnagramCasesCount({ "ab", "ab", "abc", "bca" }) << std::endl;
    std::cout << AnagramCasesCount({ "ab", "aqb" }) << std::endl;
    return 0;
}

- PenChief October 01, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@PenChief. Cool! The solution looks like ChrisK's algorithm applied to my code structure :)
Though, only half of ChrisK's algorithm is applied. There should also be a pieces of code for cases when we add a char from the long string, and this char makes the difference count greater. A test case that will work wrong with your solution: "ab", "acb".

- Alex October 01, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

ChrisK's algorithm with improved readability (in my opinion):

bool ContainsAnagram(string const s, string const sub_s)
{
	if (!sub_s.empty() &&
		sub_s.size() <= s.size())
	{
		vector<int> char_counts;
		char_counts.resize(256, 0);
		int diffs_count = 0;
		for (char c : sub_s) {
			++char_counts[c];
			++diffs_count;
		}
		for (int i = 0; i < s.size(); ++i) {
			if (char_counts[s[i]] > 0) {
				--char_counts[s[i]];
				diffs_count--;
			} else {
				--char_counts[s[i]];
				diffs_count++;
			}
			if (i >= sub_s.size()) {
				if (char_counts[s[i - sub_s.size()]] < 0) {
					++char_counts[s[i - sub_s.size()]];
					diffs_count--;
				} else {
					++char_counts[s[i - sub_s.size()]];
					diffs_count++;
				}
			}
			if (diffs_count == 0) {
				return true;
			}
		}
	}
	return false;
}

- Alex October 01, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Alex: you're right, f is constant, the alphabet size - I missed that, thanks! cool code, you're last one, definitely better to read and with less constant cost ;-)

- Chris October 01, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Based on hashing -

public static void main(String args[]) {
		String[] arr = {"ab", "ab", "abc", "bca"};
		int n = count(arr);
		System.out.println(n);
	}

	public static int count(String[] arr) {
		int n = arr.length;
		Map<String, Integer> map = new HashMap<>();
		int count = 0;
		String p = arr[0];
		for (int i = 1; i < n; i++) {
			int k = p.length();
			int l = arr[i].length();
			for (int j = 0; j < l; j += k) {
				if (arr[i].length() >= (j + k) && getHash(p, map) == getHash(arr[i].substring(j, j + k), map)) {
					count += 1;
					break;
				}
			}
			p = arr[i];
		}
		return count;
	}

	public static int getHash(String str, Map<String, Integer> map) {
		if(map.containsKey(str))
			return map.get(str);
		
		char[] c = str.toCharArray();
		int h = 0;
		double cons = 2.35;

		for (int i = 0; i < c.length; i++) {
			h += ((int) c[i] * cons);
		}
		map.put(str, h);
		return h;
	}

- sudip.innovates October 04, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

A simple solution without hashing, but has complexity of O(n*m) :

import sys

def anagram( src, dst ):
    try:
        dst = list(dst)
        for char in src:
            del dst[dst.index( char )]
    except:
        return False

    return True

if __name__ == "__main__":
    input = sys.argv[1:]
    #input = [ 'ab', 'ab', 'abc', 'bca' ]
    count = 0
    for i in range( len( input ) - 1 ):
        if len( input[i] ) > len( input[i+1] ):
            continue
        for j in range( len( input[i+1] ) - len(input[i]) + 1 ):
            if anagram( input[i], input[i+1][j:j + len(input) - 1] ):
                count += 1

    print count

- Tgoofnik October 06, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

And with a simple sum hash:

import sys

def hash( wholeString ):
    sum = 0;
    for char in wholeString:
        sum += ord( char )
    return sum

def rollhash( prev, next , current ):
    return current - ord(prev) + ord(next)

def anagram( src, dst ):
    try:
        dst = list(dst)
        for char in src:
            del dst[dst.index( char )]
    except:
        return False

    return True

if __name__ == "__main__":
    input = sys.argv[1:]
    count = 0
    for i in range( len( input ) - 1 ):
        if len( input[i] ) > len( input[i+1] ):
            continue
        patternHash = hash( input[i] )
        subhash = 0
        for j in range( len( input[i+1] ) - len(input[i]) + 1 ):
            substr = input[i+1][j:j + len(input[i])]
            if j == 0:
                subhash = hash( substr )
            else:
                subhash = rollhash( input[i+1][j-1], input[i+1][j + len(input[i]) - 1], subhash )
            if patternHash == subhash and \
               anagram( input[i], input[i+1][j:j + len(input[i])] ):
                count += 1

    print count

- TGoofnik October 06, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@TGoofnik Could you kindly explain your logic?

- lkjhgfdsa October 06, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Well, I just noticed that my code only handles permutations of the i'th string as substrings in the ( i+1) string. permutations are only a subset of anagrams.
I used the karp-rabin like approach, by using a 'sum of characters' hash, because a string and its permutations have the same hash value with this hash. I roll this hash when the i+1 string is larger then the i'th string.
Finally, because a sum of chars hash can have lots of multiple hits, if the hashes are equal, I verify that it is a permutation, by taking every character in the original string, and removing it in the candidate substring. If I found all the characters and removed them in the candidate, it is a permutation.

- TGoofnik October 07, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

public static int findAnagrams(String[] s)
	{
		int[] c = new int[26];
		String prev;
		String next;
		int res=0;
		for(int i=0; i<s.length-1; i++)
		{
			Arrays.fill(c, 0);
			prev = s[i];
			next = s[i+1];
			int j=0;
			if(prev.length() > next.length())
				continue;
			while(j < next.length())
			{
				if(j<prev.length())
					c[prev.charAt(j) - 'a']++;
					
				if(j<prev.length())
				{
					c[next.charAt(j) - 'a']--;
				}else
				{
					c[next.charAt(j-prev.length()) - 'a']++;
					c[next.charAt(j) - 'a']--;
				}
				if(j >= prev.length()-1)
					if(isAnagram(c))
						res++;
				j++;
			}
		}
		return res;
	}
	private static boolean isAnagram(int[] c)
	{
		int n=c.length;
		for(int i=0; i<n; i++)
			if(c[i] != 0)
				return false;
		return true;
	}

public static void main(String[] args) {

		String[] s = {"ab", "ab", "abc", "bca"};
		System.out.println(findAnagrams(s));
}

- clk October 13, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Using a frequency table and finding all substrings of element i+1 with length of element i.
Implemented in C++98.

#include <vector>
#include <string>
#include <map>
#include <iostream>
using namespace std;

vector<string> getAllSubstringsOfLength(string s, int l){
	vector<string> subs = vector<string>();
	for(int i = 0; i < s.size() - l + 1; i++){
		string sub = s.substr(i, l);
		subs.push_back(sub);
	}
	return subs;
}

bool isAnagram(vector<string> subs, string w, int l){
	for(vector<string>::iterator it = subs.begin(); it != subs.end(); it++){
		bool check = false;
		map<char, int> lookup = map<char, int>();
		string sub = (*it);
		for(int i = 0; i < l; i++){
			if(lookup.find(w[i]) != lookup.end())
				lookup[w[i]]++;
			else
				lookup[w[i]] = 1;
		}	
		for(int i = 0; i < l; i++){
			if(lookup.find(sub[i]) != lookup.end()){
				lookup[sub[i]]--;
				if(lookup[sub[i]] < 0){
					check = true;
					break;
				}
			}
			else{
				check = true;
				break;
			}
		}
		if(!check){
			for(map<char, int>::iterator it1 = lookup.begin(); it1 != lookup.end(); it1++){
				char key = (*it1).first;
				int val = (*it1).second;
				if(val != 0){
					check = true;
					break;
				}
			}
			if(!check)
				return true;
		}
	}
	return false;
}

int main(){
	int count = 0;
	vector<string> list = vector<string>();
	list.push_back("ab");
	list.push_back("aqb");
	//list.push_back("abc");
	//list.push_back("bca");
	for(int i = 0; i < list.size() - 1; i++){
		string word2 = list[i+1];
		string word1 = list[i];
		vector<string> subs = getAllSubstringsOfLength(word2, word1.size());
		count += (isAnagram(subs, word1, word1.size())) ? 1 : 0;
	}
	cout << count << endl;
}

- Alireza October 15, 2017 | Flag Reply

CareerCup

Google Interview Question for Software Developers

Books

Videos

Resume Review

Mock Interviews