History of Science, Medicine, and Technology at the University of Wisconsin: Sneaky CAPTCHA images contribute to the digitization effort

Friday, October 5, 2012

Sneaky CAPTCHA images contribute to the digitization effort

Recognize this image? Of course you do.

These goofy CAPTCHA images serve as the gatekeeper on many websites such as Facebook, Amazon, and Ticketmaster. Part of their function is to prove that the user is a human rather than a spamming computer, but thanks to the work of Luis von Ahm, a computer scientist at Carnegie Mellon, the CAPTCHA is also contributing to the digitization of books and periodicals. Somehow I hadn't heard about this additional function, although I've filled out perhaps hundreds of these little boxes in the past several years... I blame my dissertator tunnel vision for this.

Despite advances in Optical Character Recognition (OCR), computers are not yet able to match the human mind's amazing ability to recognize symbols such as text, even when they are inconsistent, distorted, or poorly reproduced. Von Ahm has developed a version of the CAPTCHA program, called reCAPTCHA, in which the user is asked to type in two words instead of one. One of these words serves to confirm that the user is human, but the other is an image from a book or periodical, and our response helps to translate the image into text. Several users are given the same image, and if they consistently interpret the image as the same word, it is considered successfully converted to text.

Here's an article from the NPR website with more information about this program: http://www.npr.org/templates/story/story.php?storyId=93605988

and also an article in <i>Science</i>:
http://www.sciencemag.org/content/321/5895/1465.full?sid=e16c1bda-edda-462d-9198-baa2096672f9

4 comments:

MeganOctober 5, 2012 at 12:53 PM
Neat stuff! If only they would speed up so I can OCR handwritten letters... But wait, even my typewritten sources come out as gobbledygook! Oh well.
ReplyDelete
Replies
Meridith Beck SayreOctober 8, 2012 at 7:37 AM
This is cool, Scott! I didn't realize that the second image had that function.

Megan ~ I'm surprised that your typewritten letters are not working. Have you considered scanning them and uploading them into Adobe Acrobat Pro ( if you have it ) and then using Adobe's recognize text function? I've done that with several 19th century typewritten letters to good effect.

As a side note, many of my docs from the Wisconsin Historical Society are copies of typewritten letters on carbon paper with very characteristic blue ink that were produced through the hectograph process. I should write a blog post about that :)
ReplyDelete
Replies
Scott PrinsterOctober 8, 2012 at 7:43 AM
Megan, I know that you're really wishing for an OCR that recognizes handwriting, but the program I use for converting type is ABBYY FineReader. Since part of what I'm doing is textual analysis, I convert about an average of one book or article a day, and I'm generally pleased with it. It really sucks up the processing power of a computer while it's running, though, so I do OCR conversion on my laptop while I'm doing other things on my desktop.
ReplyDelete
Replies
noravictoriaOctober 27, 2021 at 5:02 AM
I see the greatest contents on your blog and I extremely love reading them. equipoise for sale
ReplyDelete
Replies

Add comment

Welcome to HSMT at Wisconsin

Welcome to the grad student blog for Wisconsin's program in the History of Science, Medicine, and Technology. This is an informal forum for our grad students and guest bloggers to share ideas, announcements, and discussions about our research and interests. We hope that this cross-pollination will bring more collaboration and fun to our work together.

The UW-Madison Program in the History of Science, Medicine, and Technology is one of the largest and oldest academic programs of its kind in the United States. We are administered by the Department of the History of Science, and staffed by faculty from the departments of History of Science and Medical History & Bioethics. We explore all historical aspects of science, medicine, and technology -- from their internal development to their broader social contexts, including their relationships with institutions, philosophy, religion, and literature.

If you have questions about Wisconsin's HSMT program, please follow the link below to the department's official website.