How do you remove the duplicat

Microsoft Interview Question

-1

of 1 vote

57
Answers
How do you remove the duplicate characters in a given string without using any additional buffer. Give the Algo., write code and list the test cases.
- Sadineni February 26, 2007 | Report Duplicate | Flag | PURGE
Microsoft Coding Algorithm

Email me when people comment.

An error occurred in subscribing you.

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

using bitset which can be stored on the origianl string.
suppose the string only contains 26 letters(A-Z,a-z), whose ascii is from 01100101 to 01111010, the first 3 msb are useless, so if the length of string is no smaller than ceil(26/3), we can use these useless bits to store a 26 bits bitset. no extra space at all.

- winia October 30, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Can you explain the little bit how to use bitset for this scenario..

- Poster October 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

If you can sort the string alphabetically, you can use a pointer and do straight lookup-remove till the end of string.

- TodaysVisitor February 27, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Righto!

- Vishal Goswami March 04, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Or you can alternately use 3 pointers and scan the string.

- visitor March 05, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Algo
1. Scan through the string looking for the occurrence of one character at a time.If the character is found more than once replace the duplicate occurrences with a speacial character say '$'. Repeat this process for all the characters.

2. Now shift the non-duplicate characters over writing the '$' characters.

Remarks-

Complexity of this algo is O(n^2). Please let me know if there is any efficient solution.

TestCases-

Try with different inputs for string

1. Null

2. Empty string

3. String having all caharacters same. Say: "aaaaa aaaaa"

4. String with huge length: What is the upper limit on string length??

5. String with duplicate characters at alternate locations. Say: “abab cdcd ecec”

- Sadineni.Venkat March 09, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

how abt aaaaaa$$$

- Vel November 01, 2007 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Instead of using $, you could use "\0". Of course you would need to find the string length at the beginning (otherwise you would be lost in C about the end of the string). With the original length though, remove all the duplicates replacing them with "\0", and them shift them as the last step.

- Anonymous June 15, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I think it is the best solution.

- russoue February 16, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

How do you use 3 pointers for this?
Sorting the string I'm sure is not an option. You're just supposed remove duplicate characters, not rearrange them into an unintelligible string.

- Anonymous March 12, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

OK. Then in that case you can use two pointers. You'll be looping like crazy though... have the first pointer start from the beginning of the string and second pointer from index right after the first one. Then compare... uhh... here's the code, that's easier!
---------------------------------
char str[] = "Hello Worlld!!";
char *ptr1 = str;
char *ptr2 = NULL, *ptr3 = NULL;

for( ; *ptr1 != NULL; ++ptr1)
{
for(ptr2 = ptr1 + 1, ptr3 = ptr2; *ptr3 != NULL; )
{
if(*ptr2 == *ptr1)
{
++ptr3;
}
else
{
++ptr2;
++ptr3;
}
*ptr2 = *ptr3;
}
*ptr2 = NULL;
}
---------------------------------

- TodaysVisitor March 12, 2007 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I mean "three" not "two" in the first sentence: "OK. Then in that case you can use two pointers"

- Anonymous March 12, 2007 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

It seem there are some mistakes in TodaysVisitor's code. It seem *ptr2 = *ptr3 should be in (*ptr2 != *ptr1) condition. Correct me if I am wrong.

if(*ptr2 == *ptr1)
{
++ptr3;
}
else
{
++ptr2;
++ptr3;
*ptr2 = *ptr3;
}

- Anonymous February 18, 2009 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

in your case, the condition would have to be if(*ptr3 == *ptr1). See what I mean?

- Anonymous March 01, 2009 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Try 121, your code doesn't work when the last letter is a duplicate one.

- Anonymous July 09, 2009 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

I couldnt manage to come up with the better solution either without additional space

- Noname June 27, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

This can be done in O(n) with some memory allocation for UTF-16 and very little for ASCII.

1. Use a bitmap to represent the charset
   for instance the ASCII charset could be represented in 32bits (DWORD),
   or 16-bit UNICODE could be presented in 65536 bits(8KB).
   for each bit we will assign one char. For instance A=65th bit
2. Zero the bitmap
3. have two markers one for the read position one for the write position
4. for each char in the string 
   4.a read char from reader marker 
   4.b check the char is marked in the bitmap
       yes : move the reading marker ahead
             keep writer marker at the same place
       no  : mark the bitmap for the char, 
             copy the char from the reader to writer 
             move the reader and writer marker ahead
   4.c mark the bitmap corresponding to the character

- uxa August 23, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

copy the char from the reader to writer...
did not understand that, isn't that some extra memory use?

- Anonymous November 12, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

other solution is to sort the string first in O(nlogn) time and using similar method given by uxa (4.a and 4.b) we remove the duplicates in O(n) time. So total O(n+nlogn) time

- Ramu September 03, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

you need extra memory to sort !

- Anonymous November 01, 2007 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

sorting will remove the order of the string...
wotz the point if u totally destroy the original string

- Anonymous January 27, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

We dont need extra memory to sort and there was no requirement to retain the order in the string.... All we need is to remove the duplicates

- chinni September 16, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

"there was no requirement to retain the order in the string"??

you must be seriously kidding

- qi October 07, 2009 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

uxa's method is good but needs additional memory....

- Anonymous October 10, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

without any buffer?
eh...even a pointer will take 4 bytes memory.
We can use an integer (32 bits) as a bitmap to store the appeared letters. Then it's a piece of cake.

- likexx October 23, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

hmm, that's basically a good question to ask the interviewer: if no additional buffer means O(1) memory consumption we can use a hash of constant size, i.e. 256 bits, to solve the problem in O(n) time..

- pavel.em June 16, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

It does not matter if the array is char[] or int[]. This can be done in O(n) time with no additional overhead. The only overheads required are the index on the left, index on the right & the position where there is a next redundancy. If we call this next redundancy index as hole or gap, then

1) Initialize the left index to 0th element of the array.
2) Initialize the right index to 1st element of the array.
3) Iterate your array & try to find out the next position of hole. Hole represents the index at which next 'clean' element will be copied.
You can think of hole similar to pivot index of Quick sort where you move all elements less than pivot on left & all elements greater than pivot on right. So hole here repesents an index before which all elements of the array are unique and all elements after the hole can be redundant.
4) Maintain a bool or int variable toCopy. This variable represents the condition under which is it required to copy the element from the right to the index pointed by hole.
5) After all elements from the right are compared & copied to index on the left of hole i.iteration has reached the length of array. Then, inside a new loop iterate from hole to the end of array length & set arr[hole] = 0 or -1 etc.

- The Hercules October 30, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Actually i can give my code here. I dont know how does the indentation works on this site. so bear with me, if the indentation is not good.

void main(){
int array[] = {1,1,1,1,2,2,3,3,4,6,6,7,11,11,13,15,17,17};
int iSize = sizeof(array)
int i;

int removed = RemoveDuplicate(array,iSize);
if(!removed)
{// Handle your error here
}

for(i=0;i<iSize;i++)
{
cout<<array[i]<<" ";
}
}

int RemoveDuplicate(int array[], int iSize)
{
if( 1 > iSize )
{
return 0;
}
if(array == NULL)
{
return 0;
}

if(iSize == 1)
{
return 1;
}

int left = 0;
int toCopy = 0;
int hole = iSize;

for(int right = 1; right < iSize; right++)
{
if(array[left] == array[right])
{
if( 0 == toCopy )
{
toCopy = 1;
hole = right;
}
}
else if( array[left] != array[right] )
{
if(toCopy)
{
array[hole] = array[right];
hole++;
}
}
left++;
}

for(;hole<iSize;hole++)
{
array[hole] = -1;
}
return 1;
}

- The Hercules October 30, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

So aren't you making the assumption that the input array (or string) is sorted?

- Anonymous November 27, 2007 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

yes, this works only for sorted arrays,

as a counterexample, try {1, 2, 1}

- asm November 24, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Needless to say for a string i.e char * after the 1st loop put a terminating null i.e '\0' character at the index corresponding to the hole.

for(;hole<iSize;hole++)
{
array[hole] = -1;
}
In case of int array the above code is good & 2nd loop is required.

In case of char [], just do this
array[hole] = '\0'; // i.e there is no need of 2nd loop

- The Hercules October 30, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Sort the array using Quick Sort or Merge Sort and then traverse down the string O(nlogn) + O(n) = O(n)

- Vamsi December 05, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

how the hell in this world can the above be true? Dont you have atleast a small brain to make the above claim? just think atleast once before you post.Dont just type. If some folk preparing for interview sees this answer, it just adds confusion to hiim. Please consider your answer before posting!
O(nlogn) + O(n) = O(nlogn) not O(n)!!
u stupid..

- ghost rider May 01, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

great comment. Also, why are people assuming array is sorteed.

- eName October 06, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

moreover merge sort requires extra space

- thinker June 07, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

If there is not additional space then Do it brut force...
Example STRUCTURES --- result-- STRUCE

once we see S... traverse through the entire list and update whever thers is a S with '~' .... simillarly do it for T and whenver we see a '~' we do northing ... once we are done till the end ..update the string by removing '~' ---

I cogitate this works

- Vamsi December 06, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think Uxa method is one of the most efficient method.
No other method will give more efficient solution than that.

- Vandna December 28, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Can someone please put up the code for Uxa's solution using a bitmap ?

- Rakesh February 04, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

void RemoveDuplicates(char *str)
{
if(str == NULL)
return;

int arr[256] = {0};

int rptr = 0;
int wptr = 0;

while(str[rptr]!='\0')
{
//First time occured
if(arr[str[rptr] == 0)
{

arr[str[rptr]] = 1;
str[wptr++] = str[rptr];
}

rptr++;
}
}

- Swati February 07, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Sorry there was one bug in code so reposting it
void RemoveDuplicates(char *str)
{
if(str == NULL)
return;

int arr[256] = {0};

int rptr = 0;
int wptr = 0;

while(str[rptr]!='\0')
{
//First time occured
if(arr[str[rptr] == 0)
{

arr[str[rptr]] = 1;
str[wptr++] = str[rptr];
}

rptr++;
}

str[wptr] = '\0';
}

- Swati February 07, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Read the question, if no extra buffer is allowed, why do you think extra array is allowed?

- J May 04, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Sort the characters using inplace heap sort O(nlogn). Then traverse the list and compare adjacent elements.

- Temp February 08, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

With two pointers.

char* remdupchar(char* str){
	if(strlen(str)<2)
		return str;
	char* a = str;
	char* b = str+1;
	
	while(*a != '\0'){
		if(*a != *b){
			a++;
			*a = *b;
		}
		b++;	
	}
	return str;
}

- Carbon April 01, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Carbon

good one..

just minor change:

char* remdupchar(char* str){
if(strlen(str)<2)
return str;
char* a = str;
char* b = str+1;

while(*b != '\0'){
if(*a != *b){
a++;
*a = *b;
}
b++;
}
a++;
*a='\0';
return str;
}

- changchunmingwei July 09, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

so you assumed the array is sorted?

- snowgoose July 12, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I think so, otherwise his algo cannot handle cases such as "Hellowwo", where 2 'o' is separated. However, in IMHO, such an assumption is wrong.
TodaysVisitor's algo is the best solution so far. Comprehensive iteration is unavoidable.

- XYZ September 15, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

for each char c in the string s
ptr = strstr(s, c)
if(ptr != NULL)
overwrite by$

scan string from left:
if $ shift the next char

- any October 27, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

http://en.wikipedia.org/wiki/Sorting_algorithm
heapsort is O(nlogn) and O(1) memory

- Anonymous November 06, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

What if the question is without using extra memory and without sorting?

- Anonymous December 29, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

for those who have the book, note that both solutions to this problem in the book have a bug in the code and don't do what they claim to do.
way to go author and editors.

- tester November 09, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

int main(){
char a[] = "aaaaaaabbcdcdddeedd"; //String to process.
n = strlen(a) - 1;
for(int i = 0; i < n; i++){
int k = i;
for(int j = i+1; j < n; j++){
if(a[i] == a[j]){
k = j;
while(a[k] == a[j] && j < n){ j++; }
while(a[k] == a[j] && j == n && k <= j){
a[j--] = '\0';
n--;
}
while(j <= n && k < j){
a[j-1] = a[j];
j--;
}
}//end of if
}
}

- Suren September 29, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

forgot to handle special cases.
if(n < 2)
printf("%s\n", a);
etc.

- Suren September 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

actually if we solve it without using additional buffer then it will take more than o(n)
1)sort the array and then find remove duplicay (ex merge sore O(nlogn)+O(n).
2)or O(n^2)

- prakash November 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

but the actual question was :
Given a string, you have remove duplicates from it in O(n) time and O(1) space.
String can have ASCII characters. Try to use minimum extra space.

we can do this in o(n)
1)crate a bit array for 256 lenght
2) traverse through string set the cooresponding bit if not set ,
if set then skip that character.

please let me know your comment

- kumar November 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

//c# O(n)
        static char[] removeDuplicateChars(char[] _str)
        {
            for (int i = 0; i < _str.Length; i++)
            {
                for (int j = i+1; j < _str.Length; j++)
                {
                    if (_str[i] == _str[j])
                    {
                         _str[j] = ' ';
                    }
                }
                    
            }
            return _str;
        }

- anup.h.nair December 22, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

{

int main()
{
	int i,j,c;
	string a;
	cout<<"\n\n\nenter:String:";
	cin>>a;
	
	for(i=0;i<a.length();i++)
	{
		if(a[i+1]=='\0')
		break;
	for(j=i+1;j<a.length();)
	{	
		c=0;
		if(a[i]==a[j])
		{	
			c=j;
			while(1)
			{
				a[c]=a[c+1];
				if(a[c]=='\0')
				break;
				c=c+1;
			}
		continue;
		}
		j++;
	}
		
	cout<<"\n\n"<<a<<endl;	
		
	}
	
	
}

Test case #1
enter:String:rttoookklll
rttoookklll
rtoookklll
rtokklll
rtoklll
rtokl

enter:String:tyyhjkrrr
tyyhjkrrr
tyhjkrrr
tyhjkrrr
tyhjkrrr
tyhjkrrr
tyhjkr

- karthikeyan.yuvraj August 26, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

{

int main()
{
	int i,j,c;
	string a;
	cout<<"\n\n\nenter:String:";
	cin>>a;
	
	for(i=0;i<a.length();i++)
	{
		if(a[i+1]=='\0')
		break;
	for(j=i+1;j<a.length();)
	{	
		c=0;
		if(a[i]==a[j])
		{	
			c=j;
			while(1)
			{
				a[c]=a[c+1];
				if(a[c]=='\0')
				break;
				c=c+1;
			}
		continue;
		}
		j++;
	}
		
	cout<<"\n\n"<<a<<endl;	
		
	}
	
	
}

Test case #1
enter:String:rttoookklll
rttoookklll
rtoookklll
rtokklll
rtoklll
rtokl

enter:String:tyyhjkrrr
tyyhjkrrr
tyhjkrrr
tyhjkrrr
tyhjkrrr
tyhjkrrr
tyhjkr

- karthikeyan.yuvraj August 26, 2014 | Flag Reply

CareerCup

Microsoft Interview Question

Books

Videos

Resume Review

Mock Interviews