Handwritten Text Recognition (HTR) Being Used to Digitize Cultural Heritage Materials in Sweden
When the university library digitises printed books from heritage collections, it uses software that converts the pages to digital text, known as Optical Character Recognition (OCR). The software interprets the printed information and makes it searchable. With handwriting, HTR technology – handwritten text recognition – is used instead. It is the development of this technology which is creating something of a race among researchers worldwide.
‘You want to be the first to find a program that works. If someone today had an algorithm to carry out large-scale digital searches of things like the collection of manuscripts in the Vatican Library, it would be worth a fortune. Whilst the market value is enormous, so is the scale of the task’, says Anders Brun, project manager at the Department of Information Technology.
The core of the work is all about text decoding, achieving a method via which the computer tries to interpret the digital image of the text. The researchers are trying to avoid text interpretation because handwritten text can look very different depending on who was holding the pen. Instead, they want to teach the computer to interpret the material.
‘Using expert knowledge, we try to give the computer the right answer for a small portion of the material and then automate this’, says Fredrik Wahlberg.