CI/CD Integration

Automate LLM benchmarks in your development workflow

Prevent regressions, validate prompt changes, and monitor model performance directly from GitHub Actions and GitLab CI/CD.

Get started View examples

Zero-setup integration

Add benchmarks in minutes with a single API call

Automatic regression detection

Block PRs that cause accuracy degradation

Cost-effective

Use cheaper models for PR checks, premium for releases

Persistent tracking

All benchmark runs are visible in the dashboard

Flexible triggers

PR checks, scheduled runs, manual dispatch, or release tags

## Supported platforms

Choose your CI/CD platform for seamless integration

GitHub Actions

Integrate benchmarks directly into your GitHub workflows. Automatic checks on pull requests.

Pull request validation
Matrix builds for multiple document sets
Reusable workflows

View example

GitLab CI/CD

Native integration with GitLab pipelines. Scheduled benchmarks and parallel job execution.

Merge request validation
Parallel matrix jobs
Scheduled pipelines (cron)

View example

## Code examples

Copy these examples directly to your repository

.github/workflows/llm-benchmark.yml

name: LLM Benchmark

on:
  pull_request:
    paths:
      - "prompts/**"
      - "extraction-schema.json"
  workflow_dispatch:

env:
  API_BASE_URL: https://api.llmcompare.com

jobs:
  benchmark:
    runs-on: ubuntu-latest
    timeout-minutes: 30

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Read prompts
        id: prompts
        run: |
          SYSTEM_PROMPT=$(cat prompts/system.txt | jq -Rs .)
          USER_PROMPT=$(cat prompts/user.txt | jq -Rs .)
          echo "system=$SYSTEM_PROMPT" >> $GITHUB_OUTPUT
          echo "user=$USER_PROMPT" >> $GITHUB_OUTPUT

      - name: Start Benchmark
        id: start
        run: |
          RESPONSE=$(curl -s -X POST "${{ env.API_BASE_URL }}/api/v1/benchmark" \
            -H "Authorization: Bearer ${{ secrets.LLMCOMPARE_API_KEY }}" \
            -H "Content-Type: application/json" \
            -d '{
              "project": {
                "name": "CI-${{ github.repository }}",
                "extractionSchema": $(cat extraction-schema.json)
              },
              "benchmark": {
                "models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet"]
              },
              "idempotencyKey": "${{ github.run_id }}-${{ github.run_attempt }}"
            }')

          JOB_ID=$(echo $RESPONSE | jq -r '.jobId')
          echo "job_id=$JOB_ID" >> $GITHUB_OUTPUT

.gitlab-ci.yml

stages:
  - benchmark

variables:
  API_BASE_URL: https://api.llmcompare.com

llm-benchmark:
  image: curlimages/curl:latest
  stage: benchmark
  before_script:
    - apk add --no-cache jq bc
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      changes:
        - prompts/**/*
        - extraction-schema.json
  script:
    - SYSTEM_PROMPT=$(cat prompts/system.txt | jq -Rs .)
    - |
      RESPONSE=$(curl -s -X POST "$API_BASE_URL/api/v1/benchmark" \
        -H "Authorization: Bearer $LLMCOMPARE_API_KEY" \
        -H "Content-Type: application/json" \
        -d "{
          \"project\": { \"name\": \"CI-${CI_PROJECT_PATH}\" },
          \"benchmark\": {
            \"models\": [\"openai/gpt-4o\", \"anthropic/claude-3-5-sonnet\"]
          }
        }")
      JOB_ID=$(echo $RESPONSE | jq -r '.jobId')

## Quick start (5 minutes)

Add benchmarks to your CI/CD pipeline in a few steps

Create API key

Go to dashboard → Settings → API Keys → Create New Key. Copy the API key (llmc_...).

Configure secret

GitHub: Settings → Secrets → Actions → New repository secret
GitLab: Settings → CI/CD → Variables → Add variable

Add workflow

Copy the example above to your repository. Adjust the document URLs and schema to match your use case.

Test

Push a PR with prompt changes. The benchmark runs automatically! Check the workflow run for results.

Ready to automate your benchmarks?

Add LLM benchmarks to your CI/CD pipeline and prevent regressions before they reach production.

Get started Learn more