If you don’t have a query-able history of your conversations with agents, then you lack a critical tool in improving their effectiveness.
I used to think that I wanted my agent to learn and remember facts about me. It felt cute and personalized. It learned (some of) my preferences over time. But it was a coat of paint over the same problem: it never got better at working the way I wanted it to work.
So, I had the agent build a scraper that streamed every session, prompt, invocation, etc. into Loki and Prometheus. I made scheduled and on-demand query scripts that emit dated markdown audit reports.
I now chat with those reports and continually improve my configuration.
Case in point: Last week I refactored my AGENT.md based on such a report; the audit told me which AGENT.md rules were dead weight, so I deleted them.
This week I noticed the agent was punting work to follow-ups instead of finishing tasks. Worse: features could never be “done,” and it kept pushing scope downstream, including bugs in its own unmerged code.
Instead of guessing, I went back to the data:
- Assistant “follow-up” mentions: 2.37× higher post-refactor (per 1k messages)
- “Leave for later” / “defer”: 2.52×
- User frustration words: 3.18× (If you don’t curse at your agent, I question how much you’re really trying to use it)
Every issue was concentrated in a single model, not a global shift in behavior.
Root cause: the trimmed rules read as “minimum code, nothing speculative,” and so the model interpreted that as license to defer … everything.
Fix: a surgical 3-line edit clarifying that “no new features” doesn’t mean “stop finishing the task.” Then I wrote a script to re-measure on a 7-day cadence so I’d actually know whether it worked.
It did. And I have the data to prove it, including a significant drop in user frustration words.
Treating agent configuration as a measured system, not vibes, turned a frustrating week into a diagnosable, fixable, verifiable loop.
Measure. Manage. Done …
… And I didn’t have to wait for a new model version.
Originally shared on LinkedIn, May 16, 2026.
