TokenOptimizer is a proxy that sits in front of your LLM. It compresses each prompt before forwarding it to the real provider, then logs exactly what you saved. You change one line: the base URL.
Compression, caching, media handling and cost tracking, all behind a single proxy endpoint.
Thirty-odd rules strip filler, compact system prompts and trim stale context without changing what the model is actually asked.
Any OpenAI-compatible client works. Swap the base URL for your proxy endpoint and keep your SDK, keys and code as they are.
Images get resized and stripped of EXIF; videos are reduced to the keyframes that matter. Far smaller multimodal payloads.
A Qdrant-backed vector cache returns a stored answer when a near-identical prompt comes through, so you don't pay twice.
Every request records real USD cost, tokens saved and dollars saved, using current per-model pricing across providers.
See the before-and-after of each request, token counts included, so the compression is auditable rather than a black box.
Sign up and generate a TokenOptimizer key for your team. Nothing to install on your side.
Send requests through /v1/proxy/<key> instead of the provider directly. Keep using your real provider key.
Each call is compressed, forwarded and logged. Tokens and dollars saved show up per request in the dashboard.
Compression and caching cut tokens on every request automatically, with no prompt rewriting on your end.
It's a URL change, not a refactor. No new SDK, no client library, no lock-in if you decide to remove it.
Per-request cost, an audit trail and team key management mean you can actually attribute LLM spend.
Self-host it and your prompts never leave your network. Useful when the data is the sensitive part.
“The base-URL swap really was the whole integration. We had it in staging before lunch and saw the bill drop the same week.”
“The inspector is the part that stuck. Once everyone could see token cost per request, people started writing tighter prompts on their own.”
“We needed it self-hosted for compliance reasons. Getting gateway-style savings without data leaving our VPC was the deciding factor.”
Flat tiers, no per-seat counting. Self-host the whole thing if you'd rather not pay at all.
Volume pricing, self-hosting, a tricky provider setup — send a note and we'll get back to you, usually within a day.
Spin up a workspace, repoint one URL, and check the ledger.
Get Started