mcmillhj
BAN USERYou can do this in O(n log n) time using a modified mergesort.
Thought behind this:
Mismatches in the two strings can actually be thought of as array inversions (elements out of order in an ordered sequence)
let s1 = kamal
let s2 = amalk
If we assume that s1 is the "correct" ordering, we can create an array P that holds a mapping from a character in s2 to its position in s1.
# correct ordering
0 1 2 3 4
k a m a l
# ordering of s2
1 2 3 4 0
a m a l k
So, P = [1, 2, 3, 4, 0]
We can see the inversions of P by drawing intersection lines between the elements of P and the correct ordering:
0 1 2 3 4
\/_/_ /_/
/ / / / |
1 2 3 4 0
0 ------ 1
1----/ ------ 2
2----/ ------ 3
3----/ ------ 4
4----/ 0
(assume a line between 0 and 0, ascii art is not the best medium)
Counting the number of intersections will tell use the number of elements "out of order", noting that "out of order" in this case means distance of s2 from s1 in swaps.
To implement inversion counting, you can modify the merge_sort and merge steps of the standard merge_sort algorithm:
#include <stdio.h>
int merge_sort(int[],int,int);
int merge(int[],int,int,int);
int main(int argc, char ** argv) {
int array[] = { 1, 2, 3, 4, 0 };
int array_size = sizeof(array)/sizeof(array[0]);
int inversions = merge_sort(array, 0, array_size - 1);
printf("Found %d inversions\n", inversions);
return 0;
}
int merge_sort(int a[], int start, int end) {
if ( end > start ) {
int mid = start + (end - start) / 2;
int x = merge_sort(a, start, mid);
int y = merge_sort(a, mid + 1, end);
int z = merge(a, start, mid, end);
return x + y + z;
}
return 0;
}
int merge(int a[], int start, int mid, int end) {
int l = start, r = mid + 1;
int i = 0;
int temp[end - start + 1];
int splitInversionCount = 0;
while ( l <= mid && r <= end ) {
if ( a[l] < a[r] ) {
temp[i++] = a[l++];
}
else {
splitInversionCount += mid - l + 1;
temp[i++] = a[r++];
}
}
// copy whichever half of the array remains
while ( l <= mid ) {
temp[i++] = a[l++];
}
while ( r <= end ) {
temp[i++] = a[r++];
}
// copy temp back into a
int k;
for(k = 0; k < i; k++) {
a[k + start] = temp[k];
}
return splitInversionCount;
}
There might be a few other optimizations, none that come to mind though.
- mcmillhj May 14, 2014Perl solution, because why not? Added a cache so that sorting only happens a single time for duplicate words. I also did some other sanitizing e.g removed all whitespace and punctuation:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
sub group {
my @words = @_;
my %anagram_map;
my %sort_cache;
foreach my $word ( @words ) {
# strip all whitespace
$word =~ s/[^0-9A-Za-z]+//g;
# force to lowercase
$word = lc $word;
# sort word unless already sorted
my $sorted_word = $sort_cache{$word} || join '', sort +split //, $word;
# add to cache unless it already has an entry
$sort_cache{$word} = $sorted_word
unless $sort_cache{$word};
# group by anagram (sorted string)
push @{$anagram_map{$sorted_word}},
$word;
}
# values of the anagram map are the grouped solutions
return [values %anagram_map];
}
print Dumper group(qw(cat bat act tab));
print Dumper group('the END of the world is nigh', 'down this hole frightened', 'b', 'rome wasn\'t built in a day', 'but laid in two years man', 'a man a plan a canal panama');
__DATA__
# results
[
[
'cat',
'act'
],
[
'bat',
'tab'
]
];
[
[
'theendoftheworldisnigh',
'downthisholefrightened'
],
[
'romewasntbuiltinaday',
'butlaidintwoyearsman'
],
[
'b'
],
[
'amanaplanacanalpanama'
]
];
You can do this in O(n * log n) time without sorting. Given the problem statement,
the interviewer probably doesn't want you to sort anyway since half of the 2-tuples will
already be sorted. What you *can* do instead is use binary search to find the correct
index to insert the unsorted 2-tuples into the list of sorted 2-tuples.
ex:
(a, 1, 5), (b, 2, 4), (c, 7, 8)
(a, 1), (b, 2), (c, 7) (sorted)
(a, 5), (b, 4), (c, 8) (not sorted)
You might be thinking to yourself O(n*log n) is no different from just using mergesort
in the first place, and you would be right, but half of the 2-tuples are already sorted.
If you sorted again you would be doing repeat work, likely what the interviewer doesn't
want you to do.
- mcmillhj September 12, 2014