spacer.png, 0 kB
Home arrow Blog arrow Google Desktop Plug-in Enables PDF Search by Don Fluckinger
spacer.png, 0 kB
spacer.png, 0 kB
 
Google Desktop Plug-in Enables PDF Search by Don Fluckinger PDF Print E-mail

ScanSoft provides a plug-in that can scan text PDFs as well as index and search image-based documents such as faxes and paper scans.

The Google Desktop utility went live Monday, after about a six-month beta cycle. The 1.0 release supports PDF search, for the first time. Moreover, ScanSoft has brought to market a beta plug-in called OmniPage Search Indexer that not only supports PDFs containing text, but also can OCR and index image-based PDFs with scanned text and return the results on a locally served Google-type page.

"We're very pleased to be one of the first developers to work with Google and their new API to enable this," said Robert Weideman, senior vice president of marketing and product strategy for ScanSoft's Productivity Applications Division.

Weideman added that OmniPage Search Indexer also handles other image file formats such as BMP, MAX and TIFF. "We see this as an important event [for ScanSoft] and one we're evaluating, should we decide to support OmniPage Search Indexer with other desktop search products."

 <A TARGET="_blank" href="http://ad.doubleclick.net/click%3Bh=v8/369d/3/0/%2a/z%3B196763484%3B0-0%3B0%3B24081179%3B4252-336/280%3B25543701/25561558/1%3B%3B%7Esscs%3D%3fhttp://www.ziffdavisenterpriseciosummit.com"><IMG src="http://m1.2mdn.net/1663907/08ciosum336x280.gif" alt="" BORDER=0></A>

That's likely to happen, Weideman said, because there's a need. Most companies that offer desktop search utilities—like Google, Yahoo!, Ask Jeeves and Microsoft Corp.'s MSN—live in the Internet search space, where there is little call for image-based document search, as most companies don't post faxes and scans of paper documents to the Web. So search vendors don't develop tools to search them.

 

When entering the desktop arena—where users need to tap into archives of image documents stored on hard drives or on company intranets—an image-based document search tool suddenly becomes important.

"While it's rare that there's [scanned text documents] on the Web, it's actually quite common that it's on a person's PC," Weideman said. "There's a big gap right now when the companies traditionally participating in the public Web search arena came to the desktop. ... If you're a lawyer, you're not posting contracts you've scanned in on your public Web site, but you definitely have them on your PC and in your network environment."

Neither a spokesperson for Google—expressing a desire to display equal enthusiasm for all third-party plug-in developers—nor ScanSoft, bound by non-disclosure, offered much information about how the two companies came together.

Weideman did say, however, that the companies have a "mutual interest" in seeing scanned paper publications and speech content made visible to Internet search engines beyond the Google Desktop.

He also said that the two companies has worked together for some time to bring the OmniPage Google Desktop plug-in to market.

"We worked with Google on their definition of the API; we were [involved] very early in the process and provided them feedback on how they can make the API better," Weideman said.

Currently, only an English version of OmniPage Search Indexer is available. ScanSoft says it plans to make Dutch, French, German, Italian, Portuguese and Spanish localized versions available at the same download sites within 30 days.

The current beta version of OmniPage Search is free. ScanSoft may charge for the commercial release version—which will come after what Weideman estimates will be a 30- to 60-day beta cycle—but he also added that a free, time-limited demo will remain available for download from the Google site.

In addition, he pointed out that while ScanSoft may be first to market with a PDF search tool for the Google Desktop, that's not an exclusive. In time there could be competing tools. ScanSoft was able to customize a search tool the fastest in part, Weideman said, because the company owns six different OCR engines. Each requires various file-size, accuracy and speed trade-offs; the one the company settled on turned into about a 5MB runtime download.

"I would expect to see some of our competitors come to the party with their own offerings," Weideman said. "Their challenge is ... getting their runtimes down to 5MB or smaller."

For more information on OmniPage Search, go here.

 
spacer.png, 0 kB
spacer.png, 0 kB
2011 - Centaur Academic Media - design by Joshua Arciniega spacer.png, 0 kB