Resume screening: comparing AI models for fair and accurate CV parsing
How to benchmark AI models for resume extraction with a focus on accuracy, speed, and demographic fairness.
For every open position, recruiters receive hundreds of applications. Reading each thoroughly is impossible. But skimming leads to missed candidates and unconscious bias.
AI-powered resume screening can help—but the model you choose matters more than you might think. HR document extraction isn’t just about accuracy. It’s about fairness.
The resume extraction challenge
Resumes are uniquely challenging documents:
Example scenario
Sample input
A software engineer resume containing:
- Document type: PDF resume
- Key fields to extract:
- Contact information
- Work experience with dates
- Education and certifications
- Skills and technologies
- Languages spoken
Sample output
{
"contact": {
"name": "Priya Krishnamurthy",
"email": "priya.k@email.com",
"phone": "+1-555-0123",
"location": "San Francisco, CA",
"linkedin": "linkedin.com/in/priyak"
},
"experience": [
{
"title": "Senior Software Engineer",
"company": "TechCorp Inc.",
"location": "San Francisco, CA",
"start_date": "2021-03",
"end_date": null,
"current": true,
"highlights": [
"Led migration to microservices architecture",
"Reduced API latency by 40%",
"Mentored 3 junior engineers"
]
},
{
"title": "Software Engineer",
"company": "StartupXYZ",
"location": "Palo Alto, CA",
"start_date": "2018-06",
"end_date": "2021-02",
"current": false,
"highlights": [
"Built real-time data pipeline processing 1M events/day",
"Implemented CI/CD reducing deploy time by 60%"
]
}
],
"education": [
{
"degree": "M.S. Computer Science",
"institution": "Stanford University",
"graduation_year": 2018
},
{
"degree": "B.Tech Computer Science",
"institution": "IIT Bombay",
"graduation_year": 2016
}
],
"skills": {
"languages": ["Python", "Go", "TypeScript", "SQL"],
"frameworks": ["React", "FastAPI", "Kubernetes"],
"tools": ["AWS", "Docker", "Terraform"]
}
}
Model comparison
The fairness dimension
Standard accuracy metrics can hide demographic disparities. Analyzing extraction accuracy across name origin categories reveals significant differences:
Lower variance = more equitable performance
GPT-4o shows only 3.8% variance across name origins. Budget models exceed 9%—creating a systematic disadvantage for candidates with non-Western names.
Why fairness matters for extraction
Poor extraction on a candidate’s name or education institution doesn’t just affect accuracy metrics—it affects their chances:
- Misspelled names make candidates harder to find and contact
- Missing education means qualifications aren’t matched correctly
- Incorrect dates can disqualify candidates on experience requirements
A model that performs 5% worse on non-Western names systematically disadvantages those candidates.
Transformation metrics
What does fair, accurate resume screening deliver?
By ensuring consistent extraction quality across demographics, you’re not inadvertently filtering out qualified candidates.
Key insights for HR document processing
1. Measure demographic variance, not just accuracy
Overall accuracy can hide systematic biases. Test extraction performance across name origins and education backgrounds.
2. Budget models have higher variance
Cost savings on per-document processing may come at the cost of fairness. Calculate the true cost including missed candidates.
3. Resume format diversity matters
Test on your actual resume formats—creative designs, ATS-formatted, international CVs. Performance varies significantly.
4. Speed enables better candidate experience
Faster screening means faster responses to candidates. Top talent won’t wait.
Try it yourself
LLMCompare helps HR teams evaluate models for resume extraction with a focus on both accuracy and fairness. Upload your resumes, define your extraction schema, and measure performance across demographic categories.
Because fair hiring starts with fair data extraction.