Why PHP strlen() and Javascript xxx.length is not equal?

I have following text:

Ankylosaurus was an armored dinosaur that lived roughly 67 million years ago, at the very end of the Cretaceous Period. This genus was among the last of the non-avian dinosaurs, living alongside Tyrannosaurus, Triceratops, and Edmontosaurus. Its name means 'fused lizard'; bones in its skull and other parts of its body were fused, increasing their strength. Ankylosaurus was up to 6.25 m (20.5 feet) long and 1.7 m (5.6 feet) tall, weighing about 4.8–8 tonnes (11,000–18,000 lb). It had a broad, robust body with a wide, low skull. The front parts of the jaws were covered in a beak, with rows of small, leaf-shaped teeth behind it, adapted for a herbivorous diet. It was covered in armor plates for protection against predators, with bony half-rings covering the neck, and had a large club on the end of its tail which may have been used as a weapon. Fossils from a few specimens of Ankylosaurus have been found in various geological formations in western North America, but a complete skeleton has

Now I run bellow PHP and JS code:

echo strlen(trim($text));

and

var text = "above text";
alert( text.length);

Now PHP is showing 1004 and JS is showing me 1000 characters, Why?


Your two versions are unlikely to print the same output because they do different things.

JavaScript's String.length property returns a character count (though based in an early and outdated definition of character):

console.log(`–`.length);
console.log(`💩`.length);

PHP's strlen() function returns a byte count and you're possibly using a multi-byte encoding like UTF-8 (or you should). Please compare:

var_dump(strlen('–'), mb_strlen('–'));
var_dump(strlen('💩'), mb_strlen('💩'));
int(3)
int(1)
int(4)
int(1)

You're also removing leading and trailing whitespace only in the JavaScript version and spaces are people too.


To build a reliable character-count cross-language function:

  • PHP: mb_strlen() should work fine out of the box, as long as you configure your application to tell PHP about encoding being used (or specify the encoding manually every time) and you feed it with properly encoded data. In 2018 there's normally no reason to use anything else than UTF-8.

    var_dump(mb_strlen('–💩', 'UTF-8'));
    
  • JavaScript: String.length may seem to work for you if you think you don't need to account for emojis but, to be on the safe side, you can check JavaScript has a Unicode problem for some workarounds (the article is interesting even for pure learning purposes).


It's because you are using not the normal dash within following lines 4.8–8 tonnes (11,000–18,000 lb). This character uses 3 bytes (you used it twice, so it's 6 instead of 2 length).

To prevent that you can use mb_strlen($string) or change the , with an -.

I would recommend using the mb_ variant, so you are not only safe for the future, but also don't remove possible Typo`s (if this "dash" is actually the correct dash.. there are so many https://typefacts.com will help you a lot if this is in your interest).