Tuesday, December 16, 2008

Naughty software

Today I read some documents that had been changed from PDF to html format using character-recognition software. As was to be expected, there were a few character-recognition errors -- places where the software saw one character (such as "I") but thought it saw another (such as "1"). These weren't too big of a problem, since they didn't happen very often and didn't affect the reader's understanding of the document.

Until, that is, I came across a place where the software mistook "P" for "F". This error led to a list that read as follows:

Part 1
Part 2
Part 3
Fart 4

Who knew that character-recognition software had such a juvenile sense of humor?

