How I used NLP to help efficiently place prospective candidates
I always jump at the opportunity to tackle new and interesting challenges using Machine Learning and Advanced Analytics. In the constantly changing field of AI, this is how I discover and push the bounds of what is truly possible.
In the fast-paced world of job recruitment, staying ahead of the curve requires innovative solutions. Recently, I was tasked with probably my toughest challenges by a client who operates a job recruitment firm within South Africa. I undertook the challenge of revolutionizing the client’s current recruitment process by harnessing the power of advanced Natural Language Processing (NLP) techniques.
In this blog post, I’ll take you through the intricacies of the project, detailing the architecture, preprocessing pipelines, data storage, APIs, trained models, user interface, and the intricate AI logic that powers the system.
Understanding the client’s needs was the crucial first step in the journey. The client’s existing recruitment process was time-consuming and inefficient, relying heavily on its recruiters to manually review each resume and match it to relevant job openings. This resulted in slower placement times and potential missed opportunities for both job seekers and employers.
To address these challenges, the client sought a solution that could not only streamline their current processes by automating the matching of resumes to job specifications but also bring innovation to the forefront of their operations. They wanted a system that could accurately identify relevant resumes for each open position (and vice versa), eliminating the need for manual screening.
Before developing the solution, I spent time understanding the specific needs of the client. I met with the company’s recruiters to understand their key pain points and how they currently matched resumes to job openings. I also analyzed the company’s data to identify any patterns or trends that could be used to improve the matching process. Through this collaborative process, I gained crucial insights into the intricacies of their current recruitment workflow, enabling myself to tailor my solution to their specific needs. As a result, I developed a set of requirements for the proposed solution:
The success of the project relied heavily on creating a well-designed solution architecture. The system’s core comprises multiple components collaborating to provide a strong and intelligent recruitment solution. Each stage, from data ingestion to storing results in ElasticSearch, was crafted with emphasis on scalability, efficiency, and smooth integration into the client’s current infrastructure. Figure 1 visually depicts the interconnected components, showcasing a rough architecture of the solution.
To kickstart the process, the system ingests data from two primary sources: The client’s FTP server and Dropbox account. The FTP server contains approximately 35 000 unlabelled documents, while Dropbox contributes around 3 000 labeled resumes associated with job specifications. This data undergoes an initialization step, whereby raw text is extracted from the documents, cleaned (using the clean-text package) and stored as separate sentences in an Elasticsearch database. Additionally, five metadata extraction steps are executed at this stage which encompass updating information related to file locations, industry and job title labels, and constructing sets of named entities.
The system utilized various trained models to perform specific tasks associated with document preprocessing and unsupervised ranking. Two Bidirectional Encoder Representations from Transformers (BERT) models are finetuned in order to predict, given the raw resume text, a candidate’s industry and associated job title, giving rise to the IndustryBERT and JobTitleBERT models, respectively. The fine-tuned models ensure accurate categorization, forming the foundation for subsequent stages in the process.
A transformer-based spaCy Named Entity Recognition (NER) model is also trained at this stage to extract key information like key skills, location, degrees and college names which may be used for downstream filtering. Additionally, Uniform Manifold Approximation and Projection (UMAP) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) models are fitted to facilitate document vector representations which is an essential component in the subsequent unsupervised ranking step.
All sentences comprising the raw resume text are then fed through various preprocessing steps, the outputs of which are each stored in separate indices in Elasticsearch. During this phase, the finetuned industry and job title BERT models as well as the custom NER model are leveraged, together with two pre-trained BERT-based models. More specifically:
Elasticsearch served as the central repository for storing preprocessed document data. Furthermore, the ability to execute database queries according to the cosine similarity measure makes it an extremely efficient database for enabling efficient document retrieval. A dedicated AWS EC2 instance is used to host the Elasticsearch instance.
The AI logic is the heart of the system, orchestrating the matchmaking process. By considering predicted industry and job titles, user-specified keyword entities, and leveraging unsupervised ranking techniques, our system intelligently filters and ranks documents, providing personalized recommendations to users. This is accomplished through 5 sequential steps facilitated by a Streamlit user interface displayed in Figure 2.
The effectiveness of the resume matching system was extensively evaluated by the client using the Streamlit user interface.
After thoughrough evaluation of the system, the client was impressed with the overall quality of the recommendations and I proceeded to productionising the solution.
By leveraging advanced NLP techniques, I was able to create a system that significantly improved the client’s efficiency and effectiveness in finding the right candidates for their open positions, as confirmed by the positive client feedback in Figure 3. The client was so impressed with the solution that they decided to reposition their business offering around this new AI-based recruitment matching system, as shown in the accompanying video below.
As NLP continues to evolve, I can only expect even more innovative applications in the field of recruitment, further transforming the way companies find and hire top talent. It’s projects like this that futher cement my belief that Data Science and Machine Learning has the power to revolutionise almost every industry.