The Secret Sauce for Good AI Software

If you've built AI software, the start is magical. It sent an email for me, organized pages of a document, etc. Then...the cracks start to show. Hallucination, poor quality, runaway costs.

How do some products continue "the magic"? The builders look at their data.

Human Review

You are the magician, not the AI. Your power is judgement -> "Was the response good or bad?"

Let's compare two implementations of the same task; have an AI agent write sql (text-2-sql)

The anatomy of a bad process

Pick a model, dump your database metadata into a prompt, and start asking questions. You run the outputted SQL, it returned results. Heck, you even ran it through with a few different questions & checked the SQL results.

Awesome! Let's give it to our internal teams. A week goes by, people seem happy.

Then...

the head of marketing references performance that is completely wrong
customer success team mentioned their NPS score jumped 120 points

You ask where they got that information..."you're new AI query tool"...$h!t

Back at your computer - you start asking the tool questions, checking the SQL...wrong, wrong, wrong. Disaster

And a good one!

A sales pitch for human evaluation via tweets (x posts?)