ImmortalRat
BAN USERThat's because you lose the sequence. Even though it will be fast
- ImmortalRat November 16, 2012GC - garbage collection. every concatenation of two strings creates a new string. but previous two do not disappear from memory - memory for string if always allocated in heap, and not in stack. which means that when you exit the method scope, all the strings that you have just allocated are all sitting in dynamic memory. They will get removed from memory during next garbage collection (unless you were doing all that on a large input string, which is 8kb or 4k characters - then it goes into the large objects heap and stays there longer), but they go create a huge fragmentation of the heap, and garbage collection will probably be forced even before you finish running the method - that blocks entire process (before .NET 4) and all threads have to wait until garbage collector reclaims all unused objects (which get created in a huge volume). Even if you use StringBuilder - you are still doing calls to Substring - that method creates a new string every time, and while you do reduce number of strings that get created - you are still creating a huge amount of work for GC. Try to run the following test - run your method for 10000 times while giving it really large string as an input (usual stuff for evaluating performance) - and check hom much memory the app uses (you can also check out counters for GC runs on generation 1).
- ImmortalRat November 15, 2012you probably have not noticed my text right before the code :) No one ever mentioned in a task whether duplicates are treated as duplicates only if they are in sequence or not, so I specifically mentioned which kind of problem my code solves.
Imagine that you are on interview, and you have been asked to solve the problem. My solution with along my comment would get me way more credits that any other solution without a comment or clarification, as the task is intentionally unclear :)
this is probably the slowest and the most GC-intensive method to remove duplicates
- ImmortalRat November 15, 2012If duplicates mean same character multiple times in a row, then there are two options here:
private string RemoveDubs( string str )
{
int length = str.Length;
if (length < 2)
return str;
StringBuilder builder = new StringBuilder(length);
char previous = (char)0;
for (int i = 0; i < length; i++)
{
char current = str[i];
if (current != previous)
{
builder.Append(current);
previous = current;
}
}
return builder.ToString();
}
(Use pointer-base implementation if extra 30% of performance is critical)
- ImmortalRat November 15, 2012
Will work great on average, but you still need to code that part that actually builds a output string - that's what most of people do wrong :)
- ImmortalRat November 16, 2012BTW, speed-wise it would be event better to create an array of booleans (65535 items) - then it takes just few cpu instructions to check whether you have that letter already or not :)