Optimal font for Tesseract? (specifically the .NET wrapper)
Solution 1:
I've done an experiment to answer this question.
- Generate a document with random 6000 characters from the base 64 character sets (basically all letters upper and lower case + digits).
- For each font on my system (a Linux box), generate an image with the same content
- Give it to Tesseract
- Measure the error rate / accuracy
Here are the results for Tesseract v4.1.1, I give the top performing fonts:
- mitra
- TeX_Gyre_Bonum
- DejaVu_Serif
- Roboto
- Cantarell
See also this wrap-up: https://www.monperrus.net/martin/perfect-ocr-digital-data