Self-hosted · v2 · works with the SDK you already use

Your prompts carry more tokens than they need to. We trim them on the way out.

TokenOptimizer is a proxy that sits in front of your LLM. It compresses each prompt before forwarding it to the real provider, then logs exactly what you saved. You change one line: the base URL.

No card required Runs on your infrastructure Five-minute setup
client.py
# point the client at the proxy — nothing else changes from openai import OpenAI client = OpenAI( - base_url="https://api.openai.com/v1", + base_url="https://proxy.internal/v1/proxy/to-prod-9f2c", api_key="sk-...", # your real provider key, untouched )
request: 1,840 → 540 tokens this call: $0.018 → $0.005 logged to ledger ✓
Routes and optimizes for
OpenAI Anthropic Google Gemini DeepSeek Groq Mistral
What's in the box

One optimization layer, a handful of moving parts

Compression, caching, media handling and cost tracking, all behind a single proxy endpoint.

Prompt compression

Thirty-odd rules strip filler, compact system prompts and trim stale context without changing what the model is actually asked.

Drop-in proxy

Any OpenAI-compatible client works. Swap the base URL for your proxy endpoint and keep your SDK, keys and code as they are.

Media handling

Images get resized and stripped of EXIF; videos are reduced to the keyframes that matter. Far smaller multimodal payloads.

Similarity cache

A Qdrant-backed vector cache returns a stored answer when a near-identical prompt comes through, so you don't pay twice.

Cost ledger

Every request records real USD cost, tokens saved and dollars saved, using current per-model pricing across providers.

Payload inspector

See the before-and-after of each request, token counts included, so the compression is auditable rather than a black box.

Getting started

Three steps, then you watch the numbers

  1. Create a workspace

    Sign up and generate a TokenOptimizer key for your team. Nothing to install on your side.

  2. Repoint the base URL

    Send requests through /v1/proxy/<key> instead of the provider directly. Keep using your real provider key.

  3. Read the ledger

    Each call is compressed, forwarded and logged. Tokens and dollars saved show up per request in the dashboard.

Why bother

The bill goes down, the product doesn't change

Less waste per call

Compression and caching cut tokens on every request automatically, with no prompt rewriting on your end.

Nothing to migrate

It's a URL change, not a refactor. No new SDK, no client library, no lock-in if you decide to remove it.

Spend you can see

Per-request cost, an audit trail and team key management mean you can actually attribute LLM spend.

Stays in your perimeter

Self-host it and your prompts never leave your network. Useful when the data is the sensitive part.

From teams running it

What people noticed first

“The base-URL swap really was the whole integration. We had it in staging before lunch and saw the bill drop the same week.”

Priya KrishnanStaff Engineer, Acme

“The inspector is the part that stuck. Once everyone could see token cost per request, people started writing tighter prompts on their own.”

Marcus ReedCTO, Loop Labs

“We needed it self-hosted for compliance reasons. Getting gateway-style savings without data leaving our VPC was the deciding factor.”

Sara LindqvistPlatform Lead, Northwind
Pricing

Start free, pay when it's earning its keep

Flat tiers, no per-seat counting. Self-host the whole thing if you'd rather not pay at all.

Starter

Side projects and trying it out.
$0 /mo
  • Up to 1M tokens optimized / mo
  • Rule-based compression
  • One workspace and key
  • Basic cost ledger
Get Started

Team

Most picked
Production workloads with more than one engineer.
$49 /mo
  • Up to 50M tokens optimized / mo
  • Media compression + similarity cache
  • Team keys and audit log
  • Full analytics and CSV export
Start free trial

Enterprise

Scale, SSO and a contract.
Let's talk
  • Unlimited tokens
  • Air-gapped self-hosting
  • SSO, roles and SLAs
  • A human to call
Contact us
Questions first?

Tell us about your workload

Volume pricing, self-hosting, a tricky provider setup — send a note and we'll get back to you, usually within a day.

Thanks — we'll be in touch.

See your first saving on the next request

Spin up a workspace, repoint one URL, and check the ledger.

Get Started