Resume Parsing Across Multiple Job Domains Using a BERT-Based NER Model
Published in Irish Conference on Artificial Intelligence and Cognitive Science (AICS), 2023
Recommended citation: Srivastava, M., Greaney, P. (2023) ‘Resume Parsing Across Multiple Job Domains Using a BERT-Based NER Model’, Proceedings of the 31st Irish Conference on Artificial Intelligence and Cognitive Science, Letterkenny, Ireland, 7-8 December, Washington D.C.: IEEE Computer Society Press. https://ieeexplore.ieee.org/abstract/document/10470917/
This study presents a resume information extraction system using named entity recognition (NER) techniques. By harnessing the power of BERT, a state-of-the-art transfer learning model, in conjunction with NER, we develop a model which accurately extracts relevant information from resumes.
Our approach involves fine-tuning the pre-trained BERT base model on a customised NER resume dataset, which comprises a limited volume of annotated resume data from across four diverse job domains: information technology, human resources, consultancy, and engineering. To achieve this, we utilised the NLP capabilities of spaCy pipelines. Our results show that even with a constrained training dataset and minimal fine-tuning, transfer learning can be successfully leveraged to extract named entities from resumes, achieving respectable accuracy tailored to our specific application.
Our findings underscore the pivotal role of data size and annotation quality in custom NER training. The model’s generalisation and contextual comprehension heavily depend on these factors, reinforcing the need for carefully selected training data. This paper sheds light on the relationship between transfer learning, NER, and data quality in developing a sophisticated resume information extraction system.