Talk to your benchmarks
Connect LLMCompare to Claude, ChatGPT, Cursor, and other AI assistants. Run benchmarks, query results, and automate workflows using natural language.
## What is MCP?
The open standard for AI tool integrations
Model Context Protocol (MCP) is an open standard that lets AI assistants connect to external tools and data sources. Our MCP server exposes LLMCompare's full capabilities—so you can benchmark models, analyze results, and manage projects entirely through conversation.
Natural language interface
Ask "Run a benchmark comparing GPT-4o vs Claude on my invoices" and watch it happen. No clicking through menus or writing code.
Works with your AI tools
Claude Desktop, Claude Code, Cursor, Windsurf, and any MCP-compatible client. Use whichever AI assistant you prefer.
Workflow automation
Let AI assistants orchestrate entire benchmark workflows. Upload documents, run tests, analyze results—all in one conversation.
Secure API access
Uses your existing LLMCompare API key. All operations are authenticated and scoped to your workspace.
## Available tools
Actions the AI can perform on your behalf
run_benchmark Trigger a document extraction benchmark with specified models and parameters.
get_benchmark_status Check progress of a running benchmark job in real-time.
cancel_benchmark Stop a running benchmark job if you need to abort.
create_project Create a new benchmark project with custom extraction schema.
## Available resources
Data the AI can read and query
llmcompare://models List of 50+ vision-capable LLM models available for benchmarking llmcompare://projects Your benchmark projects with documents and prompts llmcompare://projects/:id Detailed project info with extraction schema and documents llmcompare://runs Recent benchmark runs with summary metrics llmcompare://runs/:id Full benchmark results with per-model accuracy, cost, and extractions ## Example conversations
See what's possible with natural language
You can benchmark 47 vision-capable models including Claude 3.5 Sonnet, GPT-4o, Gemini 1.5 Pro, and many more.
Would you like me to filter by provider or capability?
Started benchmark job job_abc123
Processing 5 documents across 2 models
Claude performed better on address extraction (98% vs 89%) while GPT-4o was stronger on numeric fields.
## Supported platforms
Works with your favorite AI tools
## Quick setup
Get started in under a minute
Get your API key
Log into LLMCompare and create an API key from your account
settings. Keys start with llmc_.
Configure your client
Add the MCP server to your Claude Desktop or IDE configuration:
{
"mcpServers": {
"llmcompare": {
"command": "npx",
"args": ["@llmcompare/mcp"],
"env": {
"LLMCOMPARE_API_KEY": "llmc_your_key_here"
}
}
}
} Start talking to your benchmarks
Open a conversation and ask Claude about your models, projects, or results. Try "What benchmark projects do I have?"
Ready to talk to your benchmarks?
Install the MCP server and start using natural language to manage your LLM benchmarks.