Why We're Building Bandito
If you're building AI-powered features, you've probably been here:
- You pick a model. It works great.
- A new model drops. Benchmarks look amazing.
- You switch. Users complain.
- You switch back. Your bill doubled somehow.
This is the loop. And it's exhausting.
The spreadsheet era
Most teams "optimize" their LLM stack with a spreadsheet. They run a batch of test prompts, eyeball the results, pick a winner, and ship it. Maybe they revisit in a month. Maybe they don't.
This worked when there were three models. It doesn't work when there are thirty, with new ones every week, each with different cost/latency/quality tradeoffs across different types of requests.
What if your app could figure it out?
That's what Bandito does. Instead of you deciding which model or prompt is best, Bandito continuously evaluates every variant against real traffic and routes each request to the option delivering the best outcomes.
- Your chatbot handles customer support and code generation? Different variants can win for different task types.
- A model gets quietly worse after a provider update? Bandito notices and shifts traffic away — before your users do.
- A cheaper model handles 80% of requests just as well? Bandito finds that and saves you money automatically.
What's next
We're in early development. The SDK, the cloud control plane, and the local dashboard are all being built in the open. If this resonates, join the waitlist — we'd love to build this with you.