A new Chrome extension called
Project Naptha allows users to copy and delete text from images
first thing to say is that this functionality does exist elsewhere. Certain
pieces of software, such as Microsoft OneNote, Google Drive and Google Street
View use optical character recognition (OCR) to identify text within images.
Project Naptha, on the other hand, uses a method call Stroke Width Transform
(SWT) that was developed by Microsoft Research. Unsatisfied with the open-source
OCR algorithms that were available, developer Kevin Kwok spent time trying to
find a solution. He tells Gizmag that he spent weeks looking at letters as "cryptogram
puzzles" and recognizing text with an advanced language model, as well as
more weeks "trying to build a kind of brute force text recognizer."
Ultimately, he decide to use SWT. This approach uses the width of the lines that
make up letters as a means of identifying elements that could potentially be
text, rather than trying to spot predetermined separate features as a marker of
text. This gives it certain advantages over OCR.
"[Stroke Width Transform] is capable of identifying regions of text in a
language-agnostic manner," explains Kwok. "In a sense
that’s kind of like what a human can do; we can recognize that a sign bears
written language without knowing what language it's written in, never mind what
is also able to detect angled text and text in photos,
and indeed was actually designed for the purpose of the latter. This means it
isn't limited to making out text in scans of printed letters or screenshots from
the Web, in which occurrences text tends to be more
familiar to that produced by computers and therefore easier to pick out.
explains to Gizmag that Project Naptha was something he initially worked on as
part of a hackathon at MIT (at which he won 2nd place).
"Selecting text in pictures was something which was quite doable on a technical
level, that is, the technology that it requires to function exists, and has done
so for quite some time," he explains. "But for some kind of inexplicable reason,
it hadn't been done before. Everything else, the transcription, translation,
text erasure, and modification just came as an obvious and trivial addition once
the first, kind of useless, part of the idea was accomplished."
gives a number of example sources with which Project Naptha can be used,
including scans, photos containing text, diagrams with labels, screenshots and
images with text overlays. He also demonstrates the ability for text overlays to
be deleted from images and the image backfilled, as well as for highlighted text
within images to be translated. To provide a seamless experience for the user,
Naptha tracks the movement of the cursor and continuously extrapolates a second
ahead based on its position and velocity, so it can begin processing any
potential text that the user might want to pick out from an image.
acknowledges that much of the functionality in Project Naptha needs to be
improved and suggests that, over time, text recognition, translation and
deletion can all be developed further (he actually says in a tweet that the
reason he has launched now is to make use of some credit he has with Google that
was due to run out). Nevertheless, the basic functionality is very usable and
the potential for the more advanced technology is exciting.
think the real value that Naptha provides is the experience, which as far as I
am aware, is unprecedented," muses Kwok. "In terms of its various subcomponents
and algorithms, it's probably quite a few years behind the state of the art, and
one of the exciting things would be the possibility of a team to bridge that gap
between research and consumer use."
you were wondering, the name Naptha is derived from the use of a substance
called naptha in lighter fuels and the process of highlighting text.
can find out more about Project Naptha and test drive a demo at the Project
Source: Gizmag URL: