Improving String Manipulation in .NET

Vanderlei Adriano Morais - Sep 25 - - Dev Community

One thing so trivial as transforming a string could have considerable impact in the performance depending on how is implemented. Let's check an example of a simple requirement that I had to work some time ago.

The Requirement

Given a "CPF" (Brazilian National ID) stored as string with only numeric digits, add some standard separators (“.” and “-”) to display it in a more user-friendly format, pretty straightforward as:

> Input: “12345678909”  
> Output: “123.456.789–09”
Enter fullscreen mode Exit fullscreen mode

Initial Solution: Stack Overflow

Like a good developer I google it for a quick solution and, of course, I ended up in Stack Overflow. In the best voted answer, I found this implementation:

public string Format(string cpf)
{
    return Convert.ToUInt64(cpf).ToString(@"000\.000\.000\-00");
}
Enter fullscreen mode Exit fullscreen mode

Looks like a good solution, right? To use the number format string - that allows defining a mask for a number in the ToString method - the string was converted to UInt64.

Pretty clever, huh? Not much...

My Solution

I had doubts about the performance of the solution found, especially because the conversion part, so I tried to implement my own solution in a very simple way:

public string Format(string cpf)
{      
    return $"{cpf.Substring(0, 3)}.{cpf.Substring(3, 3)}.{cpf.Substring(6, 3)}-{cpf.Substring(9, 2)}";
}
Enter fullscreen mode Exit fullscreen mode

Basically, I just used Substring to split the CPF into four parts, inserting the corresponding separators.

Then, to compare the approaches I used Benchmark.DotNet, here are the results:

Method Mean Ratio
FormatCpfConvert 301.97 ns baseline
FormatCpfSubstring 127.54 ns -56%

My solution was 50% faster than the Stack Overflow one!

Final Solution

Even that my solution was an acceptable implementation I felt it could still be improved, the problem is that using Substring for extracting the CPF sections generates new allocations in the memory for each part of the string.

To make this process efficiently instead of creating new substrings we ideally could just pick slices of the original input and add the required separators. Here is where the power of Span and Slice can be used.

The Span<T> type provides a way to point to a specific part of an object in the memory by using the Slice method, for manipulating strings this can be very helpful in scenarios where parts of the existing string can be used to produce the desired result.

So, by just applying the extension method AsSpan into the string and using Slice I implemented this new solution:

        public string Format(string cpf)
        {
            var cpfAsSpan = cpf.AsSpan();

            return $"{cpfAsSpan.Slice(0, 3)}.{cpfAsSpan.Slice(3, 3)}.{cpfAsSpan.Slice(6, 3)}-{cpfAsSpan.Slice(9, 2)}";
        }
Enter fullscreen mode Exit fullscreen mode

After running the benchmark again, we have:

Method Mean Ratio
FormatCpfConvert 301.97 ns baseline
FormatCpfSubstring 127.54 ns -56%
FormatCpfAsSpan 75.94 ns -74%

Final solution is almost 75% faster than the initial one!

Conclusion

I hope this gives you an idea of Span type and its benefits regarding performance. Although this type was introduced a few years ago is not common to see it being used or explained.

Also, this experience reinforced practices that I recommend and try to follow every day:

  • Stay up to date about the features of the programming language/framework that you are working with, you never know when you can find something useful to apply in your day-to-day activities
  • Make it work, then make it better/faster
  • Don't blindly rely on solutions from the internet

Links:

.
Terabox Video Player