10 February 2006

Google to put ancient text on the internet

Irish Independent

Katherine Donnelly

FANCY googling the Book of Kells or all 65,000 pages of first US president George Washington's diaries?

A new collaboration between Irish scientists and internet wizards Google may allow for web-based searches of handwritten documents in a way that has never been possible.

A joint Dublin City University (DCU)-based team has applied its internationally recognised expertise in video analysis to making images of handwriting searchable.

Research leader Professor Alan Smeaton said they "stumbled across" the possibility in the course of other research and approached Google, which has come up with funding for a year.

His colleague Dr Noel O'Connor said that with handwriting, which is at present not searchable, they were getting very good detection using the shape of a word - even though the writer always altered the way he or she wrote the same word each time.

The George Washington diaries and memoirs have, in fact, been made available for the research, which is being carried out in conjunction with two US universities.

"We've applied the approach to hundreds of pages of George Washington's diaries and memoirs, getting very good results. For example, you can select the word 'battle' and find all the references to that word in Washington's writings," said Dr O'Connor.

Up to now, the kind of material they are hoping to open up on the web, has been kept behind closed doors, or is accessible for examination in digital libraries, at a slow and cumbersome one page at a time.

Prof Smeaton believes that the techniques being developed in this project could lead to handwritten manuscripts being available for searching in the giant Google index within a couple of years.

"As a company, Google moves very fast and if the techniques we are developing in this project are as good as early results indicate, we can expect to see Google take up the outputs," he said.

Prof Smeaton said that libraries around the world were in the process of digitising their rare and historical manuscripts, so in the future, using this technology, Google search engines could make these manuscripts available and searchable worldwide.

The research is being carried out at the Adaptive Information Cluster (AIC), a Science Foundation Ireland multi-disciplinary research group involving DCU and UCD working in sensor science, software engineering, electronic engineering and computer science.

The team is also involved with the Dublin Institute of Advanced Studies in the Irish Script on Screen (ISOS) project - digitising old manuscripts written in the Irish language.

Thousands of images have been scanned with the intention also of making them searchable on the internet.

The system is based on "object detection" in video - detecting and identifying images of people or other objects in different video frames, even though there may be altered positions or angles, and applying this to differing slants or shapes of words in handwriting.

Comments: Post a Comment

Links to this post:

Create a Link

<< Home

This page is powered by Blogger. Isn't yours?