The most performant way to count substrings in .NET

I benchmarked 14 common approaches to counting substrings in .NET. The approaches differed up to 60-70x in execution time with memory allocations ranging from zero to over 160KB.

Setup

Tests searched for substring “the” in two strings on an Apple M3 using .NET 8 with BenchmarkDotNet’s ShortRun configuration. The big string was the first chapter of The Hobbit, and the small string was the first 100 chars of the big string.

Results

Approach Small (ns) Large (ns) Allocated
Span 17.71 8,227 0 B
IndexOf (Ordinal) 18.93 8,662 0 B
IndexOf (OrdinalIgnoreCase) 20.47 10,463 0 B
String.Replace 37.33 24,645 216 B / 87,963 B
Cached, compiled Regex 127.17 40,968 560 B / 162,880 B
Instantiating a Regex inline 416.44 49,698 2,528 B / 164,848 B
Static Regex (Regex.Match) 154.42 50,996 560 B / 162,880 B
String.Split 145.47 70,195 304 B / 111,058 B
IndexOf (InvariantCulture) 1,216.64 523,154 0 B / 1 B
IndexOf (InvariantCultureIgnoreCase) 1,314.57 534,426 0 B / 1 B
IndexOf (CurrentCultureIgnoreCase) 1,329.19 536,436 0 B / 1 B
IndexOf (CurrentCulture – default) 1,224.49 553,913 0 B / 1 B

Allocated column shows small/large text allocations.

Key Findings

  • Ordinal string operations are 60x faster than culture-aware operations.
  • Span and IndexOf with StringComparison.Ordinal both achieve zero allocations and optimal performance.
  • Regex approaches allocate 160KB+ for large texts despite reasonable performance.
  • Split creates an array of all segments, explaining its 111KB allocation.
    • With larger strings, this creates an object on the Large Object Heap, which has different garbage collection characteristics, and should be avoided.

Recommendation

If you’re a backend or line of business developer modeling your domain, you probably want IndexOf with the Ordinal or OrdinalIgnoreCase comparer, depending on domain semantics.

See also

Leave a Reply

Your email address will not be published. Required fields are marked *