Smart routing powered by local AI

One Gateway. Every LLM.

View on GitHub
Your AppTOLLGATEgatewayOpenAIAnthropicGemini
OpenAI·Anthropic·Gemini·More coming soon
// what is tollgate

A single door in front of every LLM you use.

Tollgate is a self-hosted API gateway that sits in front of your LLM calls. It authenticates users, routes to the right model, caches repeated queries, and tracks analytics — so you spend less and ship faster.

authroutingcacherate-limitsanalytics
tollgate ~ live requests● live
// capabilities

Everything you need in one gateway.

Batteries included. Configure once, ship forever.

Smart Routing

Say fast, cheap, or smart. A local 0.5B model classifies your request and picks the optimal provider automatically.

Semantic Cache

Vector embeddings + exact-match cache powered by Redis and Qdrant. Identical and similar queries return instantly.

Rate Limiting

Token bucket rate limiting per user. Configure requests-per-second and burst capacity via the setup wizard.

Multi-Provider

OpenAI, Anthropic, and Gemini behind a single unified API endpoint. Add providers without changing your app.

Auth & Tokens

Bearer token authentication out of the box. Generate scoped tokens for each client or team.

Analytics

Every request tracked — latency, token usage, cost per model. EMA-based model scoring keeps routing accurate over time.

// semantic cache

Never pay for the same answer twice.

incoming
"Summarize this novel in three sentences…"
MISS · routed to gpt-4o432ms
Response stored for next time.
Cost saved this session
$0.000
// how it works

Smart routing, on your hardware.

Your Request
OpenAI-compatible
Tollgate Receives
auth + parse
Local AI Classifies
SIMPLE · CODE · REASONING · CREATIVE
Routing Engine
picks optimal model
Response
returned instantly

The routing model runs entirely on your hardware. No extra API calls. No added latency. No data leaves your server.

0
providers
0
unified API
<0ms
routing overhead
0%
self-hosted
// quickstart

Get running in minutes.

  1. 1
    Install
    step 1
  2. 2
    Configure
    step 2
  3. 3
    Start
    step 3

Step 1 — Install

Step 2 — Configure

$ tollgate init
┌──────────────────────────────────────────┐
│ T O L L G A T E · setup wizard │
└──────────────────────────────────────────┘
› Select providers: [x] OpenAI [x] Anthropic [x] Gemini
› Rate limit (rps): 10 burst: 30
› Cache backend: redis://localhost:6379
› Vector store: qdrant://localhost:6333
✓ Config written to ~/.tollgate/config.yaml

Step 3 — Start

$ tollgate start

  ✓ Loaded config from ~/.tollgate/config.yaml
  ✓ Connected: redis · qdrant · classifier (0.5B)
  ✓ Providers ready: openai, anthropic, gemini

  Tollgate running on http://localhost:8000

Then call it like any OpenAI-compatible API:

python
from openai import OpenAI
 
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="tg_live_...",
)
 
resp = client.chat.completions.create(
model="smart", # or "fast" / "cheap"
messages=[{"role": "user", "content": "Hi"}],
)
print(resp.choices[0].message.content)

Your keys. Your data. Your infrastructure.

Tollgate runs anywhere you can run Python or Docker. Nothing phones home.

No vendor lock-in
Runs on any server
Encrypted config