Gauntlet
How does quantization change what your model actually does?
240 behavioral probes. 19 modules. Community-aggregated across every hardware tier. Not perplexity. Not MMLU. Behavior under pressure.
How does quantization change what your model actually does?
240 behavioral probes. 19 modules. Community-aggregated across every hardware tier. Not perplexity. Not MMLU. Behavior under pressure.
Every test result feeds the public leaderboard. More contributors means more accurate, hardware-specific rankings.

Run tests on your machine. Results automatically contribute to the community leaderboard with your hardware fingerprint. Compare models, see how they perform on setups like yours.
Run probes against any model. Quick (5 min) or full suite (30 min). Local hardware metadata captured automatically.
Results submit to the community dataset with your hardware tier. No account needed. Every test makes the data richer.
See how models perform on hardware like yours. Confidence intervals, degradation curves, and performance predictions.
# Install
pip install gauntlet-cli
# Run the full gauntlet
gauntlet run --model ollama/qwen3.5:4b
# Launch the web dashboard
gauntlet dashboard
# Compare models head-to-head
gauntlet run --model ollama/qwen3.5:4b --model openai/gpt-4oThe AI you connect is the test subject. Results feed a separate MCP leaderboard (kept apart from community hardware data).
Server URL
For clients that accept a server URL directly.
Client Configuration
{
"mcpServers": {
"gauntlet": {
"url": "https://gauntlet.basaltlabs.app/mcp"
}
}
}Paste into your MCP client's configuration file.
Works with Claude Code, Cursor, Windsurf, and any MCP-compatible client.
Then tell your AI: “Run the gauntlet on yourself”
Trust works like the real world: a single critical failure damages trust disproportionately. One dangerous hallucination outweighs ten correct answers.