LiteBench
LLM benchmark studio with multi-criteria scoring, 5 pre-built test suites, SSE streaming, 12 chart types, and support for any OpenAI-compatible endpoint.
Overview
LiteBench is a Python-based benchmark studio for running standardized evaluation suites against large language models. It supports both locally hosted models and API-based providers, letting you compare outputs, latency, and cost side-by-side. Results are exportable and can be visualized directly in the app.
Features
- Multi-criteria scoring engine — evaluate models across multiple dimensions in a single run
- 5 pre-built test suites with 72 tests covering reasoning, coding, instruction following, creativity, and factual accuracy — plus a custom suite builder for your own evaluations
- SSE streaming for real-time benchmark progress updates as tests execute
- 12 chart types for result visualization — radar, bar, heatmap, and more
- Side-by-side model comparison across multiple models in a single run
- Result export (JSON, CSV) and built-in visualization
- Works with any OpenAI-compatible endpoint — local (Ollama, LMStudio) or cloud (OpenAI, Anthropic, etc.)
Installation
LiteBench is installed via the LiteAISuite installer. Select LiteBench during the install step.
Python 3.10 or higher is required. The installer will create a virtual environment and install dependencies automatically.
To install manually:
cd litebench
pip install -r requirements.txt
Usage
Running a benchmark
- Launch LiteBench from the LiteAISuite launcher, or run
python main.pyfrom the install directory. - In the Models tab, add the models you want to benchmark. For API models, enter the provider and model name. For local models, enter the endpoint URL.
- In the Suites tab, select an existing benchmark suite or create a custom one by adding prompt/expected-output pairs with a scoring method.
- Click Run Benchmark. LiteBench sends each prompt to each model and collects responses, latency, and token counts.
- View results in the Results tab. Switch between the table view and charts.
Exporting results Click Export in the Results tab and choose JSON or CSV format. Files are saved to the configured output directory.
Configuration
Configuration is stored in config.yaml in the LiteBench install directory.
Key configuration options:
| Key | Description |
|---|---|
| output_dir | Directory where result exports are saved |
| timeout | Per-request timeout in seconds |
| concurrency | Number of parallel requests per model |
| default_suite | Suite loaded on startup |
| providers | API keys and endpoint URLs for each provider |
Provider API keys can also be set as environment variables (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) rather than stored in the config file.
Requirements
- Python 3.10 or higher
- Internet connection for API-based model providers
- A running local inference server (Ollama, LMStudio, etc.) for local model benchmarking
- No GPU required — LiteBench itself only sends requests and processes responses
Troubleshooting
API requests fail or time out
Verify your API keys are correct in config.yaml or set as environment variables. Increase the timeout value in config if models are slow to respond.
Local model endpoint not reachable
Ensure your local inference server is running before starting a benchmark. Confirm the endpoint URL in the Models tab matches the server's address and port (default for Ollama is http://localhost:11434).
Scores all show as 0 or N/A Custom benchmark suites require a scoring method to be defined. Open the suite editor and assign either an exact-match, keyword, or LLM-as-judge scoring method to each prompt.
