Parsing Resumes to Identify Skills Data – Certain Limitations
Parsers are used widely for pulling information from resumes and job descriptions. However, they have their limitations. Creation of structured information on skills is a better alternative
The Recruitment Industry invests a lot of time and effort into the parsing and pulling of data from resumes and job descriptions. Once the data has been extracted, activities like matching of data from resumes to data from job descriptions, are carried out. The whole process is quite tedious and gives rise to an urgent need for the standardization of data available on skills and to present it in a structured format.
Efforts have been made to perform the above matching function. Natural Language Processing (NLP) or the process of making the modern computers more equipped to understand natural human language, has been used to compare the extracted data in a more efficient form. Pattern Matching or the process of scanning documents to find series of particular phrases, words, signifiers, etc to match with tokens from another document, have all been used. Unfortunately, all these attempts have been unsuccessful.
However, the continuing research has narrowed down the approach towards a more structuralized skills data to the following two forms:
- Create Structured Data From Output
- Extract unstructured output from texts like resumes and job descriptions.
- Convert the unstructured output into structured output.
- Create Structured Data At Input
- Create structured data at the point of input.
Dealing With Data
To be able to create structured data at the point of input, the types of data available needs to be slotted into different sub-categories. While classifying data, two major sub-categories emerge:
- Unambiguous Information
This includes basic tags like name, email, contact number, location, names of companies previously worked at, titles, gender, etc. Parsers work well in picking such information from the resumes and job descriptions. The reason for this being the presence of discernable patterns, like the presence of ‘@’ in email IDs.
- Description of Skills
This is the more descriptive form of information and gives the parsers a hard time analysing. Several issues arise while dealing with information on skills. One of them is different names for the same skill or different phrases being used to describe the same skill. For example, Digital Marketing can also be called Online Marketing or Web Marketing or Internet Marketing. The use of abbreviations also causes problems in the analytical process.
Another problem that arises while dealing with this information is that skills in isolation do not make much sense. Therefore, a need to analyse skills in a way that does not distort the meaning, arises.
In a typical scenario, running a parser through a person’s resume to look for Java programming skills can be helpful if you are looking for candidates with the knowledge of Java. But if the search is much more specific than that, like looking for a Java Developer who knows server-side programming in a particular development environment, with a good understanding of object oriented programming for development of software systems in the banking domain, then the parsing process fails.
Unfortunately, the parsing of resumes and job descriptions do not bring out quality structured information from the unstructured information i.e., resumes and job descriptions.
- The Simple Solution
The simplest solution available is to provide autosuggestions at the point of creation of resumes and job description. Although it is a much straight forward approach, it limits the data entering process and data entered that does not correspond to any of the suggestions will still face all the above discussed issues.
- The Effective Solution
The most effective way to handle the problem is to avoid unstructured information. The major cause of unstructured information is not the entry of information but the documents into which the information is being added, i.e. the resumes and job description. The most logical thing to do in this case is to remove the resumes and job descriptions completely from the equation.
If we replace these documents with structures, it can help solve the problem of structuralization from its root. Applications like Linkedin work on the concept of structures but cannot be taken as an ideal example as their performance falls in the case of skills.
The best way to deal with this issue would be to create a structure known as a Skill Profile where a skillset belonging to a universally known pool of skills can be added.
Crunched information on skills can be stored in these structures which will make matching jobs to people a much faster and more efficient task.
Information from such a system can be used by other industries too, including Talent Development, Market Analytics, Resource Deployment, etc.
It’s Your Skills
It’s Your Skills offers a mechanism for creating a structured skills profile. At the backend, this consists of a database of skills or a Skills Ontology, one that is constantly updated. At the front-end, there is a simple user interface. The output of Skills Profile is applicable for both jobs as well as people.
Adopting the technology is just the first step. A bigger challenge lies in adapting to the behavioral changes required in the technology. This is the vital first step towards making the talent landscape more efficient.