One Gateway. Every LLM.
A single door in front of every LLM you use.
Tollgate is a self-hosted API gateway that sits in front of your LLM calls. It authenticates users, routes to the right model, caches repeated queries, and tracks analytics — so you spend less and ship faster.
Everything you need in one gateway.
Batteries included. Configure once, ship forever.
Smart Routing
Say fast, cheap, or smart. A local 0.5B model classifies your request and picks the optimal provider automatically.
Semantic Cache
Vector embeddings + exact-match cache powered by Redis and Qdrant. Identical and similar queries return instantly.
Rate Limiting
Token bucket rate limiting per user. Configure requests-per-second and burst capacity via the setup wizard.
Multi-Provider
OpenAI, Anthropic, and Gemini behind a single unified API endpoint. Add providers without changing your app.
Auth & Tokens
Bearer token authentication out of the box. Generate scoped tokens for each client or team.
Analytics
Every request tracked — latency, token usage, cost per model. EMA-based model scoring keeps routing accurate over time.
Never pay for the same answer twice.
Smart routing, on your hardware.
The routing model runs entirely on your hardware.
No extra API calls. No added latency. No data leaves your server.
Get running in minutes.
- 1Installstep 1
- 2Configurestep 2
- 3Startstep 3
Step 1 — Install
Step 2 — Configure
Step 3 — Start
$ tollgate start
✓ Loaded config from ~/.tollgate/config.yaml
✓ Connected: redis · qdrant · classifier (0.5B)
✓ Providers ready: openai, anthropic, gemini
Tollgate running on http://localhost:8000
Then call it like any OpenAI-compatible API:
from openai import OpenAI client = OpenAI( base_url="http://localhost:8000/v1", api_key="tg_live_...",) resp = client.chat.completions.create( model="smart", # or "fast" / "cheap" messages=[{"role": "user", "content": "Hi"}],)print(resp.choices[0].message.content)Your keys. Your data. Your infrastructure.
Tollgate runs anywhere you can run Python or Docker. Nothing phones home.