real estate

Property documents: automating due diligence with AI extraction

How to benchmark AI models for extracting data from energy certificates, appraisals, and property documents across multiple languages.

9 min read

Every real estate transaction generates a mountain of documents. Energy Performance Certificates (EPCs), property appraisals, cadastral extracts, building permits—each containing critical data points that inform investment decisions.

For portfolio analysis, this paperwork bottleneck can kill deals. A portfolio of 500+ properties with 8-12 documents each means 4,000-6,000 documents to process. Manual extraction takes weeks and is error-prone.

This guide explores how to benchmark AI models for property documents, with a focus on multilingual performance across European markets.

The document diversity challenge

Real estate documents are notoriously varied:

Document diversity

VariationExample

Country Belgian EPC vs. Dutch Energielabel vs. German Energieausweis

Region Flemish and Walloon EPCs have different formats

Age Certificate formats change every few years

Quality Original PDFs, scanned copies, photos of documents

Example scenario

Sample input

A Flemish Energy Performance Certificate (EPC) containing:

Document type: PDF certificate
Language: Dutch
Key fields to extract:
- Certificate number and validity
- Energy rating (A-G scale)
- Primary energy consumption
- Building characteristics
- Recommended improvements

Sample output

{
  "certificate": {
    "number": "20240315-0001234567",
    "issue_date": "2024-03-15",
    "valid_until": "2034-03-15"
  },
  "property": {
    "address": "Meir 1, 2000 Antwerpen",
    "type": "Apartment",
    "construction_year": 1985,
    "total_floor_area": 95.5
  },
  "energy": {
    "rating": "C",
    "primary_energy": 245,
    "unit": "kWh/m²/year"
  },
  "building_envelope": {
    "walls_u_value": 1.2,
    "roof_u_value": 0.8,
    "windows_u_value": 2.8,
    "floor_u_value": 1.1
  },
  "recommendations": [
    {
      "measure": "Roof insulation",
      "estimated_savings": "15-20%",
      "priority": "High"
    },
    {
      "measure": "Window replacement",
      "estimated_savings": "10-15%",
      "priority": "Medium"
    }
  ]
}

Model comparison by language

Multilingual

# ModelDutchFrenchGermanOverall

1 GPT-4o 93.8% 91.4% 89.2% 91.8%

2 Gemini 2.0 Flash 91.2% 88.6% 86.4% 89.2%

3 GPT-4o-mini 88.4% 85.2% 82.6% 85.8%

4 Claude 3.5 Haiku 86.2% 82.8% 79.4% 83.4%

Best overall GPT-4o

Best Dutch 93.8%

Note: German documents showed the highest variance across models.

Cost analysis at scale

For high-volume property document processing, cost matters:

Cost analysis at scale

2,500 docs/month

# ModelCost/docMonthly (2,500 docs)Accuracy

1 Gemini 2.0 Flash $0.003 $8 89.2%

2 GPT-4o-mini $0.004 $10 85.8%

3 Claude 3.5 Haiku $0.012 $30 83.4%

4 GPT-4o $0.032 $80 91.8%

Best value Gemini 2.0 Flash

Lowest cost $8/mo

At 88%+ accuracy threshold, Gemini 2.0 Flash offers the best value: strong accuracy at extremely low cost with consistent multilingual performance.

Field-level accuracy reveals workflow design

Understanding which fields models handle well shapes your extraction workflow:

Field-level accuracy

7 fields

FieldAccuracyConfidence

Certificate number 94.8% High

Energy rating (A-G) 93.2% High

Total floor area 89.6% Medium

Construction year 87.4% Medium

U-values 84.2% Medium

Renovation estimates 72.6% Low

Recommendations 68.4% Low

This data reveals that while core metrics (ratings, areas, years) extract reliably, recommendation sections need human review. Design your workflow accordingly.

Transformation metrics

What does automated extraction deliver?

Transformation metrics

Impact

MetricBeforeAfterChange

Time per portfolio 72 hours 4 hours -94%

Documents/month 850 4,200 +394%

Data entry errors 3.2% 0.4% -87%

Analyst productivity 12 docs/day 85 docs/day +608%

Key insights for real estate documents

1. Multilingual consistency matters more than peak performance

If your documents span multiple languages, test specifically for cross-language consistency. A model that’s 97% accurate in Dutch but 85% in German isn’t useful for European portfolios.

2. Document quality impacts model choice

Scanned documents and photos show higher variance across models. Some models handle lower-quality inputs better than others.

3. Not all fields need AI extraction

The benchmark identifies which fields AI handles well and which need human oversight. This hybrid approach delivers better results than pursuing 100% automation.

4. Processing speed is a competitive advantage

In real estate, timing matters. Faster due diligence means more deals and better negotiating position.

Try it yourself

Whether you’re processing EPCs, appraisals, or building permits, LLMCompare helps you find the right model for your specific document mix.

Upload your documents. Define your schema. Compare 50+ models. Get results in minutes, not weeks.