This article is written by an engineer, first and foremost.
Many of the APIs or LLM extensions provided by AI companies are written by ML engineers that do not have Phil's decades of experience in distributed systems, databases and networking. That is evident after reading this article; the first time I've seen a coherent discussion of the tools and tradeoffs when building agentic systems.
I've struggled to actually build something useful with the "agentic" systems and tools out there (and I've tried a lot). Deep down I've felt intimidated by the dozens of new terms the docs use, and after reflection, those tech marketing pieces give the vibe that they are written primarily by AI and told to be colorful and not clear and precise. These solutions from billion dollar valued companies must to present "brand new" ideas to justify their valuations. We should know better: everything builds on the shoulders of decades of research and discovery. If you see something flying high in the clouds (and not standing on the shoulders of giants), it is sure to fall back to earth soon.
A great read. I'm very excited about Outropy.
Yeah, it's really clear it's an engineer, what the software does is never even mentioned, the purposes and tasks that it performs: what are they? And how is hallucination managed? This reads to me like a complexity soup, where they just started without a clear idea of purpose or goal. Perhaps if the article mentioned what the software does, the purpose, it might be more clear. It sounds like a replacement for the entire management layer of a company...
Agreed. This could be a very intelligent implementation, or it could be an over-engineered mess. It certainly seems like overkill for my experiences with agents, but problem applications can vary wildly. It is impossible to tell how to evaluate these design choices without more concrete details.
Manager need to replace engineers with AI faster than engineers use AI to replace their managers. In the end, nobody wins except OpenAI.
> durable workflows
This is what long-running-transactions of the past became.. and slowly cover all their ground (initially Cadence by Uber, then Temporal). Zillions of little flows that can go through their FSMs at any speed, (milli)seconds-or-days-or-months-or-whenever.
i wonder though, how much some further developments like Cloudflare's durable objects, or similar recently announced Rivet actions [1] would simplify (or, complicate) matters, esp. in this "agentic" case ?
> Agents are not Microservices
> Agents naturally align with OOP principles: they maintain encapsulated state (their memory), expose methods (their tools and decision-making capabilities via inference pipelines), and communicate through message passing
it does sound like a service (memory=db,methods+messages=api). it is just the level of isolation/deployment you need
UPD: also, how come your services share database layer (?), maybe problems in scaling are not due to Agents at all? do you have scaling issues even without agents? would not be surprised! classic rule form Amazon 2002 API mandate by Bezos "no shared db between services. all communication happens over exposed interfaces and over network".
What I read is that the cut for microservices and agents do not align.
This does not mean that agents can not run in microservices, just that is is not 1:1 between agent and a microservice.
Great breakdown of the "architectural decision log" for the evolution of this system.
> This model broke down when we added backpressure and resilience patterns to our agents. We faced new challenges: what happens when the third of five LLM calls fails during an agent’s decision process? Should we retry everything? Save partial results and retry just the failed call? When do we give up and error out?”
> We first looked at ETL tools like Apache Airflow. While great for data engineering, Airflow’s focus on stateless, scheduled tasks wasn’t a good fit for our agents’ stateful, event-driven operations.
> I’d heard great things about Temporal from my previous teams at DigitalOcean. It’s built for long-running, stateful workflows, offering the durability and resilience we needed out of the box.
I would also have reached for workflow engines here. But I wonder if Actor frameworks might actually be the sweet spot; something like Erlang's distributed actor model could be a good fit. I'm not familiar with a good distributed Actor framework for Python but there's of course Elixir, Actix, Akka in other stacks.
Coming from the other direction, I'm not surprised that Airflow isn't fit for this purpose, but I wonder if one of the newer generation of ETL engines like Dagster would work? Maybe the workflow here just involves too many pipelines (one per customer per Agent, I suppose), and too many Sensor events (each Slack message would get materialized, not sure if that's excessive). Could be a fairly substantial overhaul to the architecture vs. Temporal, but I'd be interested to know if anyone has experimented with this option for AI workflows.
I don't see AI system too special in terms of back-end engineering, except maybe for agentic system, things are inherently stateful.
But considering how limited RPM/TPM with regards mainstream LLMs, states saving/loading is hardly the bottleneck I feel.
Great article, really enjoyed how they described what they initially tried, where it struggled, and why their current solution works better.
Color me impressed. These guys get it right because they treat LLMs like what they are -- tools with a specific use, not anthropomorphized pets. (although I did groan a bit at the "AI Chief of Staff" moniker).
It's extremely refreshing to hear an actual engineering conversation around LLMs that doesn't sound like it came out of the pages of an undergraduate alchemy notebook.
Well written. It's a rare pleasure to hear a discussion about LLMs grounded in real engineering, free from the fanciful notions often found in all the other spam out there.
I came to the same conclusion about Temporal for these types of things. Interactive stuff that touches 1 DB? Do it in the API. Needs to coordinate >1 thing? Temporal.
Orchestrating a bunch of LLM calls is a perfect fit for Temporal.
Thanks for the excellent article. It's hard to find these step by step architecture evolution retrospectives. A great reference for other startups going though a similar journey!
[dead]
crazy idea: could quantum entangled communication help soften CAP ? (e.g. by allowing limited communication between partitions)