OCR Tesseract, Empty page error?
try the psm option.
-psm N
Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR.
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
Reference: http://hilojack.sinaapp.com/?p=866
Process your image to threshold background color. Turn the text color to black (for better recognition. once you have threshholded the background it should be fairly simple to change color values.) turn your image into grayscale. Then convert to .tif
format.
Now you might have a chance at processing that image (super user doesn't let me post images so I link them):
Processed Image
Now run the following command:
tesseract test.tif test_output -psm 7
and the result was:
Tist
Which is pretty good given that I have not used any additional training data beside standard eng.
Screenshot of the result
Tesseract is not trained to recognize handwriting. Don't know what it does with those colours either.
You could try and train tesseract with that handwriting...