CI/CD Integration

Automate LLM benchmarks in your development workflow

Prevent regressions, validate prompt changes, and monitor model performance directly from GitHub Actions and GitLab CI/CD.

Zero-setup integration

Add benchmarks in minutes with a single API call

Automatic regression detection

Block PRs that cause accuracy degradation

Cost-effective

Use cheaper models for PR checks, premium for releases

Persistent tracking

All benchmark runs are visible in the dashboard

Flexible triggers

PR checks, scheduled runs, manual dispatch, or release tags

## Supported platforms

Choose your CI/CD platform for seamless integration

GitHub Actions

Integrate benchmarks directly into your GitHub workflows. Automatic checks on pull requests.

  • Pull request validation
  • Matrix builds for multiple document sets
  • Reusable workflows
View example

GitLab CI/CD

Native integration with GitLab pipelines. Scheduled benchmarks and parallel job execution.

  • Merge request validation
  • Parallel matrix jobs
  • Scheduled pipelines (cron)
View example

## Code examples

Copy these examples directly to your repository

.github/workflows/llm-benchmark.yml
name: LLM Benchmark

on:
  pull_request:
    paths:
      - "prompts/**"
      - "extraction-schema.json"
  workflow_dispatch:

env:
  API_BASE_URL: https://api.llmcompare.com

jobs:
  benchmark:
    runs-on: ubuntu-latest
    timeout-minutes: 30

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Read prompts
        id: prompts
        run: |
          SYSTEM_PROMPT=$(cat prompts/system.txt | jq -Rs .)
          USER_PROMPT=$(cat prompts/user.txt | jq -Rs .)
          echo "system=$SYSTEM_PROMPT" >> $GITHUB_OUTPUT
          echo "user=$USER_PROMPT" >> $GITHUB_OUTPUT

      - name: Start Benchmark
        id: start
        run: |
          RESPONSE=$(curl -s -X POST "${{ env.API_BASE_URL }}/api/v1/benchmark" \
            -H "Authorization: Bearer ${{ secrets.LLMCOMPARE_API_KEY }}" \
            -H "Content-Type: application/json" \
            -d '{
              "project": {
                "name": "CI-${{ github.repository }}",
                "extractionSchema": $(cat extraction-schema.json)
              },
              "benchmark": {
                "models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet"]
              },
              "idempotencyKey": "${{ github.run_id }}-${{ github.run_attempt }}"
            }')

          JOB_ID=$(echo $RESPONSE | jq -r '.jobId')
          echo "job_id=$JOB_ID" >> $GITHUB_OUTPUT

## Quick start (5 minutes)

Add benchmarks to your CI/CD pipeline in a few steps

1

Create API key

Go to dashboard → Settings → API Keys → Create New Key. Copy the API key (llmc_...).

2

Configure secret

GitHub: Settings → Secrets → Actions → New repository secret
GitLab: Settings → CI/CD → Variables → Add variable

3

Add workflow

Copy the example above to your repository. Adjust the document URLs and schema to match your use case.

4

Test

Push a PR with prompt changes. The benchmark runs automatically! Check the workflow run for results.

Ready to automate your benchmarks?

Add LLM benchmarks to your CI/CD pipeline and prevent regressions before they reach production.