Self-hosted · v2 · works with the SDK you already use

Your prompts carry more tokens than they need to. We trim them on the way out.

TokenOptimizer is a proxy that sits in front of your LLM. It compresses each prompt before forwarding it to the real provider, then logs exactly what you saved. You change one line: the base URL.

Get Started See how it works

No card required Runs on your infrastructure Five-minute setup

client.py

# point the client at the proxy — nothing else changes from openai import OpenAI client = OpenAI( - base_url="https://api.openai.com/v1", + base_url="https://proxy.internal/v1/proxy/to-prod-9f2c", api_key="sk-...", # your real provider key, untouched )

request: 1,840 → 540 tokens this call: $0.018 → $0.005 logged to ledger ✓

What's in the box

One optimization layer, a handful of moving parts

Compression, caching, media handling and cost tracking, all behind a single proxy endpoint.

Prompt compression

Thirty-odd rules strip filler, compact system prompts and trim stale context without changing what the model is actually asked.

Drop-in proxy

Any OpenAI-compatible client works. Swap the base URL for your proxy endpoint and keep your SDK, keys and code as they are.

Media handling

Images get resized and stripped of EXIF; videos are reduced to the keyframes that matter. Far smaller multimodal payloads.

Similarity cache

A Qdrant-backed vector cache returns a stored answer when a near-identical prompt comes through, so you don't pay twice.

Cost ledger

Every request records real USD cost, tokens saved and dollars saved, using current per-model pricing across providers.

Payload inspector

See the before-and-after of each request, token counts included, so the compression is auditable rather than a black box.

Getting started

Three steps, then you watch the numbers

Create a workspace
Sign up and generate a TokenOptimizer key for your team. Nothing to install on your side.
Repoint the base URL
Send requests through /v1/proxy/<key> instead of the provider directly. Keep using your real provider key.
Read the ledger
Each call is compressed, forwarded and logged. Tokens and dollars saved show up per request in the dashboard.

Why bother

The bill goes down, the product doesn't change

Less waste per call

Compression and caching cut tokens on every request automatically, with no prompt rewriting on your end.

Nothing to migrate

It's a URL change, not a refactor. No new SDK, no client library, no lock-in if you decide to remove it.

Spend you can see

Per-request cost, an audit trail and team key management mean you can actually attribute LLM spend.

Stays in your perimeter

Self-host it and your prompts never leave your network. Useful when the data is the sensitive part.

From teams running it

What people noticed first

“The base-URL swap really was the whole integration. We had it in staging before lunch and saw the bill drop the same week.”

Priya KrishnanStaff Engineer, Acme

“The inspector is the part that stuck. Once everyone could see token cost per request, people started writing tighter prompts on their own.”

Marcus ReedCTO, Loop Labs

“We needed it self-hosted for compliance reasons. Getting gateway-style savings without data leaving our VPC was the deciding factor.”

Sara LindqvistPlatform Lead, Northwind

Pricing

Start free, pay when it's earning its keep

Flat tiers, no per-seat counting. Self-host the whole thing if you'd rather not pay at all.

Starter

Side projects and trying it out.

$0 /mo

Up to 1M tokens optimized / mo
Rule-based compression
One workspace and key
Basic cost ledger

Get Started

Team

Most picked

Production workloads with more than one engineer.

$49 /mo

Up to 50M tokens optimized / mo
Media compression + similarity cache
Team keys and audit log
Full analytics and CSV export

Start free trial

Enterprise

Scale, SSO and a contract.

Let's talk

Unlimited tokens
Air-gapped self-hosting
SSO, roles and SLAs
A human to call

Questions first?

Tell us about your workload

Volume pricing, self-hosting, a tricky provider setup — send a note and we'll get back to you, usually within a day.

hello@tokenoptimizer.ioSales and general questions

github.com/tokenoptimizerDocs, issues and the self-host guide

Community chatTalk to the team and other users