Friday, October 5, 2012
Sneaky CAPTCHA images contribute to the digitization effort
Recognize this image? Of course you do.
These goofy CAPTCHA images serve as the gatekeeper on many websites such as Facebook, Amazon, and Ticketmaster. Part of their function is to prove that the user is a human rather than a spamming computer, but thanks to the work of Luis von Ahm, a computer scientist at Carnegie Mellon, the CAPTCHA is also contributing to the digitization of books and periodicals. Somehow I hadn't heard about this additional function, although I've filled out perhaps hundreds of these little boxes in the past several years... I blame my dissertator tunnel vision for this.
Despite advances in Optical Character Recognition (OCR), computers are not yet able to match the human mind's amazing ability to recognize symbols such as text, even when they are inconsistent, distorted, or poorly reproduced. Von Ahm has developed a version of the CAPTCHA program, called reCAPTCHA, in which the user is asked to type in two words instead of one. One of these words serves to confirm that the user is human, but the other is an image from a book or periodical, and our response helps to translate the image into text. Several users are given the same image, and if they consistently interpret the image as the same word, it is considered successfully converted to text.
Here's an article from the NPR website with more information about this program: http://www.npr.org/templates/story/story.php?storyId=93605988
and also an article in <i>Science</i>:
from the quill of Scott Prinster at 10/05/2012 11:41:00 AM