Agentic Engineering

Build apps that hit aha.

Loss functions exist for models. I built them for products. Autonomous agents iterate against three metrics. The app improves the way a model trains.

Platform access when it ships. No spam.

live DocBench 4.80 Delegate your document work
live Annotate 4.70 AI annotation platform
soon Launch Autopilot AI launch operator for founders
soon ClawTrade 4.65 Trading automation with receipts
soon VidCraft 4.50 AI video editor

Loss Functions for Products

Model training has a loss function that drives improvement automatically. Product development didn't. Until now. These three metrics are the product's loss function. Autonomous agents iterate against them, round after round.

Each round, a swarm of synthetic users tests the app. Three scores come out. Agents iterate until they converge on the target, or the app gets killed.

Activation ROI 4.80 DocBench, round 54

First value, how fast?

Seconds to first useful result. Every dead end chips away at the score. DocBench started at 2.1, agents fixed it over 10 rounds.

f = activated · e-time · e-friction
Payoff 0.93 completion × quality

Did they finish the job?

The aha is not enough. The persona keeps going: real documents, harder questions, exports. Payoff scores the whole session.

f = completion · quality
Retention Signal 0.67 return rate

Would they come back?

Same persona, next day, with memory. Did the app earn a second visit? Most apps fail here silently.

f = Σ returns / revisits
Build forward pass QA validate Personas compute loss Score gradient Ship done update weights 4.5 rounds score convergence

The loop

Build. Score. Trace friction to features. Fix. Repeat.

Same principle as gradient descent, applied to the product itself.

View the full conveyor diagram →
Sample conveyor: DocBench
Open status page →
Round 54 Score 4.80 / 4.50 Deploy docbench.roibench.com
Complete
Discovery
Value contract, personas, and fixture-backed user jobs are defined.
Round 54
Build
Latest round fixed question-before-upload race and SSE display fallback for long agent runs.
Above Target
Evaluate
Current scored state is 4.80, with Diana/Rachel validated in the latest scored round.
Live
Ship
The app is deployed and reachable on its production URL.
Started
Promote
Homepage shows the next-channel plan and links to the full status detail.

Autonomous agents.
Measured outcomes.

"When a simulated user says they got value, does a real person agree?"

"Can you tell early that an app has hit its ceiling?"

"What drives retention? How does session one shape whether someone returns?"

05 Apps through the pipeline
52 Autonomous rounds on single app
01 Killed early (low ROI)

Want to apply this to your product or idea?

Your idea, defined as a value contract, built by agents, scored by synthetic users.