1. What is OCR (Optical Character Recognition)?

The OCR (Optical Character Recognition) is a technology for converting different types of documents – scanned documents, PDF files, or digital photos – into editable, searchable data.

In order for OCR to create a digital copy of a document, special hardware (such as a scanner, digital camera, or mobile device) will be required that can communicate the recognized data to special software with OCR capabilities.

2. Difference between a scanner and OCR technology

A scanner is not sufficient to extract the relevant information from a document and make it editable.

In fact, all a scanner can do is create an image of the document itself, nothing more than a collection of black and white or colored dots (raster image).

OCR software is required to extract and reuse the information contained in a scanned document, digital photograph, or image-only PDF.

3. How does OCR software work?

The program analyzes the image structure of the document and divides the page into elements (such as blocks of text, tables, and images).

Lines are divided into words and words into characters. Once all the characters are distinguished, the software compares them with a set of sample images and creates several hypotheses about which letter they might be.

Based on these hypotheses, it then analyzes the different ways lines can be divided into words and words into characters. After processing a large number of such probabilities, the OCR program is finally able to make a decision and display the recognized text.

4. OCR technology for reading resumes

OCR technology allows Skillskan to read candidates’ resumes and extract professional experience and acquired skills, regardless of the type of format in which resumes are sent.

The identification and organization of this data is essential to create an efficient candidate database to then enable the activities of ranking and matching profiles with job offers.