Efficient natural sorting using magnitude prefixing
Everyone prefers a nicely sorted list, no?

test1.txt

test10.txt

test11.txt

test2.txt

test3.txt

test4.txt

test5.txt

test6.txt

test7.txt

test8.txt

test9.txt
This list is what you get if you use a "default" string sorting function in any programming language. It’s technically correctly/lexically sorted, but humans typically prefer if the numbers inside the strings are sorted numerically (also known as "natural sorting").
There’s a rather elegant technique for implementing natural sorting which I call "magnitude prefixing". I can’t remember where I first read about this, but it doesn’t seem to be so wellknown.
Prelude: Leftpadding numbers
Let’s start by talking about a suboptimal way of implementing natural sorting:
Storing the "sort key" as a separate field turns out to be very convenient: Your database doesn’t need to support custom sort ordering and a regular index makes querying efficient. It also makes it possible to add other types of normalization (e.g. convert to lower case).
There’s however two problems with this approach:
First of all, you need to decide how many digits you want to support in advance.
You want to naturally sort numbers up with 9 digits?
Well, then you need to pad all numbers with up to 9 zeros, and any number with more digits will not be corrected sorted.
In addition, the sorting key becomes extremely long when the string contains many smaller numbers.
A timestamp such as 20240204 19:26:34.867641 +0100
would be stored as 000002024000000002000000004 000000019:000000026:000000034.000867641 +000000100
.
It’s maybe not a big problem, but definitely adds a lot of unnecessary overhead and suddenly you’ll reach a limit in a surprising way.
Better: Magnitude prefixed numbers!
Turns out there’s a slight variant which is easy to implement and much better:
A few examples:

In
test105.txt
we have a number with three digits so we end up withtest3105.txt
. 
In
test20.txt
we have a number of two digits so we end up withtest220.txt
. 
In
test2024.txt
we have a number of four digits so we end up withtest42024.txt
.
This works because it ensures that all 1 digit numbers become "grouped" together before any 2 digit numbers and so forth.
Notice that we’re only using a single extra byte per number. The worst case is for single digit numbers and these strings become 100% longer. Therefore we know that the sorting string is maximum 2x larger than our initial string. This is way better than the 10x increase we saw when we used leftpadded strings.
Handling more than 9 digits
What if we have a number with, let’s say, 14 digits?
We clearly can’t prefix it with 14
since this makes to ordered together with the 1 digit numbers.
Inspired by the UTF8 encoding, our solution to this problem is to let "9" act as a "continuation" byte: We first place a "9" and then we add another byte representing how many additional digits we have. This might go on multiple times if it’s a really huge number.
A few examples:

1 digit numbers are prefixed with "1"

5 digit numbers are prefixed with "5"

8 digit numbers are prefixed with "8"

9 digit numbers are prefixed with "90"

10 digit numbers are prefixed with "91"

11 digit numbers are prefixed with "92"

17 digit numbers are prefixed with "98"

18 digit numbers are prefixed with "990"

19 digit numbers are prefixed with "991"
This works for any number of digits: We can just keep adding "9"s.
Bonus: There’s no 0 digit numbers
There’s one property which lets us use one less byte for some numbers: Since every number has at least one digit we can start counting at 0 instead of 1. This means that 9 digit numbers are prefixed with "8" instead of "90" (and 18 digit numbers are prefixed with "98" instead of "990"). It doesn’t make a huge difference, but there’s no reason to not use 0.
This gives us the following rules:

1 digit numbers are prefixed with "0"

5 digit numbers are prefixed with "4"

8 digit numbers are prefixed with "7"

9 digit numbers are prefixed with "8"

10 digit numbers are prefixed with "90"

11 digit numbers are prefixed with "91"

17 digit numbers are prefixed with "97"

18 digit numbers are prefixed with "98"

19 digit numbers are prefixed with "990"
A JavaScript implementation
Okay, okay, I know you’re probably here for copypasting code. Here you go. This should be easily adjustable to other languages, but note that the integer division needs to round down.
const NINES = "999999999999999999999999999"
function magnitudePrefixed(s) {
return s.replace(/\d+/g, (num) => {
const digits = num.length  1
return NINES.slice(0, digits / 9) + (digits % 9) + num
})
}
Bonus: It’s reversible!
Another cool thing about this approach is that it’s trivial to recover the original string: Just remove the prefix from any number:
function unmagnitudePrefixed(s) {
return s.replace(/9*\d(\d+)/g, '$1')
}