We’re thrilled to announce another major Jatheon Cloud update – OCR. Read on for more information on what it is, a how-to guide, and setup details.
What Is Optical Character Recognition?
Optical Character Recognition (OCR) is a technology that convers scanned images, digital images, of PDF files into machine-readable text. It uses machine learning and algorithms to analyze the patterns of text characters, “extracts” them from scanned files, and allows the text to be indexed, edited, searched, and processed.
Today, OCR is used in a wide range of applications, from document management and data entry to image search and language translation. Some examples include automatic number plate recognition, extracting information from passports and other scanned documents, or converting historical hard-copy documents into searchable digital formats.
Why Is OCR Important?
Business workflows involve sending and receiving information that was originally in paper form (e.g. invoices, scanned attachments, documents, or contracts). Once the document is digitized, it becomes an image that has lots of text hidden in it, but text in images cannot be processed like regular text documents.
OCR solves this problem because it can extract this text from images.
For this reason, traditional email archiving solutions cannot guarantee that the information you retrieve for a request is 100% complete, since some of the evidence could have been missed.
The Benefits of OCR Feature in Data Archiving
These are the main benefits of having the OCR feature in your data archiving solution:
- Increased searchability
- Better accuracy
- More speed
- Cost savings
Increased searchability: With OCR, you can quickly and easily search through non-textual email attachments, which saves time and effort that would otherwise be spent manually typing out the text from the documents.
You can now pinpoint specific information even if it’s contained in an attachment that’s an image or a scanned document.
Before this update, we only supported searching through attachments that were text-only (word and .doc files)
Better accuracy: OCR technology is highly accurate, with some software able to recognize text characters with up to 99% accuracy.
This means that you can rely on OCR to comb through your archive without the risk of errors that are common during manual search and review.
More speed: Imagine having through go over 100+ email attachments, reading through scanned documents and finding a keyword or phrase. OCR will save you tons of time.
Cost savings: The accuracy of the results you produce for a request will save dollars you’d otherwise spend on manual extraction and retyping.
OCR on Jatheon Cloud: Setup and How-To
The Optical Character Recognition (OCR) system needs to be set on the Client level, so you’ll need to contact your Account Admin to be able to use OCR for your emails.
Once that’s done, you’ll start getting results matching the desired search terms from your search.
In this example, we used Advanced Search and picked the following criteria: Attachment, Contains Phrase, and added “sample page” as the phrase in the keyword bar.
And this is the result we got:
You can see that the attachment is a scanned PDF file, and since it contains the phrase we specified, the snippet is shown in a single result pane, with the phrase highlighted in orange.
The formats we support OCR for are PNG, JPEG, TIFF, JPEG 2000, GIF, WebP, BMP.
OCR is available to all Jatheon Cloud customers, free of charge.
We welcome customer feedback, so if you need more help or have an idea about a feature that would make your life easier, you can always ping our Customer Success Manager, Vlad, at firstname.lastname@example.org.