When the Data hits the Fan!: Text Capture and Optical Character Recognition 101

This blog will introduce text capture by describing the different methods with a focus upon historical documents. I will introduce the basics of OCR and rekeying with discussion of handwriting and voice recognition.
What is text capture
Text capture is a process rather than a single technology. It is the means by which textual content that resides within physical artefacts (such as in books, manuscripts, journals, reports, correspondence etc) may be transferred from that medium into a machine readable format. My focus here is on the capture of text from digital images that have been rendered from physical artefacts. Such digital images may be made via scanners or digital cameras and stored as digital page images for later access and use.