Why does this implementation of strlen() work?

Solution 1:

Although this is technically undefined behavior, in practice no native architecture checks for out-of-bounds memory access at a finer granularity than the size of a word. So while garbage past the terminator may end up being read, the result will not be a crash.

Solution 2:

I don't see at all why alignment would be any relevant if the array is not long enough and we are reading past its end.

The routine starts with aligning to a word boundary for two reasons: first, reading words from an aligned address is faster on most architectures (and it's also mandatory on a few CPUs). The speed increase is enough to use the same trick in a host of similar operations: memcpy, strcpy, memmove, memchr, etc.

Second: if you continue reading words starting at a word boundary, you are assured the rest of the string resides in the same memory page. A string (including its terminating zero) cannot straddle a memory page boundary, and neither can reading a word. (1)

So this is always fastest and safest, even if the memory page granularity is sizeof(LONG_BIT) (which it isn't).

Picking up an entire word near the end of a string may pick up additional bytes after the final zero, but reading Undefined Bytes from valid memory is not UB -- only acting upon its contents is (2). If the word contains a zero terminator anywhere inside, the individual bytes are inspected with test_byte, and this, as is shown in the original source, will never act on bytes after the terminator.

(1) Obviously they can, but I meant "never into a locked page" or something similar.

(2) Under Discussion. See (sorry about that!) under Sneftel's answer.