Smart routing powered by local AI

One Gateway. Every LLM.

OpenAI·Anthropic·Gemini·More coming soon

// what is tollgate

A single door in front of every LLM you use.

Tollgate is a self-hosted API gateway that sits in front of your LLM calls. It authenticates users, routes to the right model, caches repeated queries, and tracks analytics — so you spend less and ship faster.

authroutingcacherate-limitsanalytics

tollgate ~ live requests● live

// capabilities

Everything you need in one gateway.

Batteries included. Configure once, ship forever.

Smart Routing

Say fast, cheap, or smart. A local 0.5B model classifies your request and picks the optimal provider automatically.

Semantic Cache

Vector embeddings + exact-match cache powered by Redis and Qdrant. Identical and similar queries return instantly.

Rate Limiting

Token bucket rate limiting per user. Configure requests-per-second and burst capacity via the setup wizard.

Multi-Provider

OpenAI, Anthropic, and Gemini behind a single unified API endpoint. Add providers without changing your app.

Auth & Tokens

Bearer token authentication out of the box. Generate scoped tokens for each client or team.

Analytics

Every request tracked — latency, token usage, cost per model. EMA-based model scoring keeps routing accurate over time.

// semantic cache

Never pay for the same answer twice.

incoming

"Summarize this novel in three sentences…"

MISS · routed to gpt-4o432ms

Response stored for next time.

Cost saved this session

$0.000

// how it works

Smart routing, on your hardware.

Your Request

OpenAI-compatible

Tollgate Receives

auth + parse

Local AI Classifies

SIMPLE · CODE · REASONING · CREATIVE

Routing Engine

picks optimal model

Response

returned instantly

Your Request

OpenAI-compatible

Tollgate Receives

auth + parse

Local AI Classifies

SIMPLE · CODE · REASONING · CREATIVE

Routing Engine

picks optimal model

Response

returned instantly

The routing model runs entirely on your hardware.
No extra API calls. No added latency. No data leaves your server.

providers

unified API

<0ms

routing overhead

self-hosted

// quickstart

Get running in minutes.

1
Install
step 1
2
Configure
step 2
3
Start
step 3

Step 1 — Install

Step 2 — Configure

$ tollgate init

┌──────────────────────────────────────────┐

│ T O L L G A T E · setup wizard │

└──────────────────────────────────────────┘

› Select providers: [x] OpenAI [x] Anthropic [x] Gemini

› Rate limit (rps): 10 burst: 30

› Cache backend: redis://localhost:6379

› Vector store: qdrant://localhost:6333

✓ Config written to ~/.tollgate/config.yaml

Step 3 — Start

$ tollgate start

  ✓ Loaded config from ~/.tollgate/config.yaml
  ✓ Connected: redis · qdrant · classifier (0.5B)
  ✓ Providers ready: openai, anthropic, gemini

  Tollgate running on http://localhost:8000

Then call it like any OpenAI-compatible API:

python

from openai import OpenAI
 
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="tg_live_...",
)
 
resp = client.chat.completions.create(
    model="smart",           # or "fast" / "cheap"
    messages=[{"role": "user", "content": "Hi"}],
)
print(resp.choices[0].message.content)

Your keys. Your data. Your infrastructure.

Tollgate runs anywhere you can run Python or Docker. Nothing phones home.

No vendor lock-in

Runs on any server

Encrypted config