When a CV turns into a crime scene

When a CV turns into a crime scene

Software Developers on the trail of Sherlock Holmes

At some point, most of our customers inevitably face an awkward situation: our CV-Parser CVlizer does not extract the information that is actually contained in a CV. The CV in question may look like any other – and yet: the result of its extraction process appears to be a patchwork of letters or it seems at least quite mysterious. How can that happen?

Software Developers as Detectives

Whenever this occurs, the analysts of our development team are asked for advice and, in their attempt to find a sound solution, they have to adopt the perspective of a "digital detective".

Most cases are not that tricky anyway and a solution is very quickly found: often, a badly scanned document is the reason why the text recognition produces rather mediocre results. Sometimes, errors are due to a second-rate PDF renderer software that does not comply with general standards or simply doesn't work properly.

But there are trickier cases to solve: vanished or hidden photos of candidates, invisible text or hieroglyphs instead of letters – how is that possible despite the CV in question looks so normal?! Often, the devil is in the details. Thus, the difference between what is displayed on the screen when a document is opened and what our extraction software “sees” when it automatically processes a document is sometimes enormous.

As humans, we only perceive the final result: an image of a CV represented pixel-by-pixel, which hopefully looks exactly like the author intended it to look. Our software, however, “sees” the file structure beneath the surface, the letters and symbols that were used, the positioning and formatting information of specific elements, etc. In rare cases, the structure and symbol information can be that confusing that it leads to a partial or complete failure of the extraction process, whose original aim is to decode information and not to represent it. Moreover, the difference between decorative text features and the actual content may not always be recognizable for a software system.

For a human being, for example, it is immediately obvious that an embedded telephone symbol is supposed to indicate that a telephone number appears next to it in the document. A computer system is not yet able to identify the telephone symbol as such. While humans find nothing disconcerting if the vocal „o“ is used as a bullet point, a software, on the contrary, has difficulty in categorizing it properly.

The aforementioned problems, however, can be dealt with quite easily. Things become really difficult if a text – be it intentionally or not – is “hidden” and if, for example, white text appears on white background. As our software also processes this „invisible“ text, seemingly inexplicable extraction errors may occur for the time being.

In most cases, such problems can quickly be spotted by marking the whole document and copying it into the clipboard. The representation obtained by this operation corresponds almost always to the image that the software “sees”. Only if this view neither shows any irregularities, the time has come for JoinVision’s Sherlock Holmes: then “our detectives” have to track down the reason for the erroneous extraction results, so that another “mysterious” case can be solved.

Contact us

Our Office

Wehrgasse 28 / Top 3+4

1050 Vienna


+43 (0)1 505 80 70

+43 (0)1 505 80 70 60

Drop us a line

JoinVision is a leading provider of multilingual semantic recruiting technology. With the two parsers CVlizer and JOBolizer, applicant documents and job advertisements are automatically recorded, analyzed and coded. Modules, such as HRclassifier, HRcapture and HRmerger, expand the possibilities to have all information immediately available as a standardized, structured candidate or job profile in XML format. At the end of 2019 JoinVision took over the commercialization of Joveo, a US-based technology provider for programmatic job advertising, in the German-speaking countries.

Connect with us

Latest Tweets

  • An dem Thema kommt keine #Recruiting Messe aktuell vorbei: Mensch vs Maschine. Auch die Personalmesse München nicht… https://t.co/tjB7mbyfaa
    Tue Nov 05 10:54:03 +0000 2019

  • Zwei innovative HR-Technologie-Anbieter haben sich gefunden: https://t.co/fXzDVxisBN @joveoinc @JobCloudAG #Job… https://t.co/C5OIBshDXR
    Wed Oct 23 14:51:25 +0000 2019