How does recaptcha know you aren't entering bogus translations of the pictures [closed]
From what I understand, Captchas are text that have been distorted by the application of filters, noise and other miscelaneous algorithms. Therefore, to find out whether the person's reading ability is that of a person, you compare what they answered to what the known answer is.
Now, reading up on ReCaptcha, it says that the words that are displayed are those that cannot be translated by OCR. In addition, recaptcha is being used to translate those images. How can it tell whether you are indeed right in your reading or are just making stuff up?
If it knew what it said, it wouldn't be used in recaptcha as translation material. If it doesn't know what the text says, then how does it validate your answer?
I'm guessing this is probably some probability based analysis with huge sample sizes before it flags anything as translated.
Does anyone know where the answer to this is?
Book pages are basically photographically scanned, and then transformed into text using "Optical Character Recognition" (OCR) and fed to the web in the form of an image with one word that is known to the computer program behind reCAPTCHA and one word that is not yet known.
The user then types both words out and if they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct. Therefore, the system is a self-improving service that gets better with time.
http://www.google.com/recaptcha/learnmore
This is why reCaptcha has you enter two words. One of the words is already known, and one of the words is not known. Whether you pass or fail the captcha only depends on how you answer for the word that is known. Your answer for the other (unknown) word will be used, along with other responses to the same word, to turn it into a known word as well.