We focused so much on how that we didn’t stop to ask why.
We already have an interface for agents, we call that an API. Why do we need to have AI click buttons, that eventually call an API. Just skip the middle layer and go straight to an API that does everything that system is capable of doing.
I think everyone agrees with you. The issue is who is building API first interfaces? The value an agent provides here is that it moves control from the publisher to the consumer (in the technical sense).
“Twitter doesn’t want to let me access via an API? That’s fine, my browser/agent will automatically automate what I want it to do for me in the UI.” Is a major shift in how we interface with other services.
Come especially handy to work around the increasingly user hostile captchas, ironically. I expect this arm race to spiral down quickly into some form of mandatory biometric check, the dream come true of our security obsessed insecure rulers.
Altman already spiraled down there: https://en.m.wikipedia.org/wiki/World_(blockchain)
> World Network, originally Worldcoin, is a for-profit cryptocurrency project that uses iris biometrics developed by San Francisco- and Berlin-based Tools for Humanity. Founded in 2019 by OpenAI chief executive Sam Altman, Max Novendstern, and Alex Blania, it has accumulated $250 million in funding from venture capital firms Andreessen Horowitz and Khosla Ventures, as well as from Reid Hoffman.
And of course every user interface designed with anccessibility in mind will automatically become an API endpoint, or at the very least an interface that is much easier for machines to use.
This goes for aria- labelled forms, but also for button operated doors, tactile surfaces, and information printed in plain and clear English.
Robots must love public buildings! Many doors are labelled with a bright green sign using the human word for door (“exit”), most signs and switches are labelled in human binary (braille) and the designers of the building go out of their way to replace lots of pesky steps with nice ramps instead.
Because money.
End-user interfaces can be served ads, are intrinsically rate limited, and are often the main money source.
APIs make no ad revenue, can easily rack up expensive server costs, and are often a single third-party service that offer services to the rest of the customer base (bonus points for making you look bad if they implement a feature better).
It's trivial to insert ads into an API response. The fact that people don't is simply because we were never really API-first.
But it's also why ad-blockers generally work.
People know this. But most actually useful software built for humans doesn't have a good programmatic API. So computer use is the only way to connect AI agents to that software. And that's where a lot of the true (medium-term) financial gain will be - automating tasks on tools that are poorly built or just old but are still valuable.
Unfortunately LOB apps often don’t have a useful API, nor do they have a vendor/support that is capable of supporting you in a timely manner even if they do.
RPA bridged the gap a bit for us.
I would think that a lot of apis are designed for human users and we may need new apis designed for a usage patterns more natural to bots.
The Accenture report this is based on looks like science fiction to me.
https://www.accenture.com/us-en/insights/technology/technolo... (links to a PDF which is horrible to read on mobile, I guess Accenture don't think any of their target audience spend work time away from a laptop)
It talks about "trust" a LOT, which indicates they expect a robust solution to LLM security and reliability (see prompt injection) soon.
I do not share their optimism.
Remember when blockchain was going to be used in everything?
Now we have AI + the ever favorite avatars.
Except AI is kinda already in everything from iphones to windows to google? Don't think that point holds well
Except has been pushed down into places where it does not make any sense just for the sake of riding the ai race. It’s like Clippy but integrated everywhere, eg in the chrome developer tools, nobody wanted it. On iPhone? I won’t be surprised if most users don’t use any generative ai feature at all. And other useful features were already inside the system but since they’re not “generative” they’re getting marketed as such or wrapped by a useless natural language prompt
I agree about the general silliness of most of the "generative" features - I was only contesting that to me, there's a clear difference in the rate of institutional belief/backing and adoption
Sure, but are LLMs? This is like saying "hashes are already used in everything from databases to network routers, so surely blockchain will succeed too!"
Not that I disagree that LLMs _will_ be used in a lot of places, many more than blockchain, but I want to point out that there is a qualitative difference between the two.
Don't forget The Metaverse.
They have a lot of faith in billable hours.
If Accenture says it then it is safe to assume you should do the opposite.
The programmable web is a thing for a long time and I have spent a large portion of my career building infrastructure in that space. Agents are another iteration of digital transformation of services and I am all in for it. That doesn't mean it will replace humans as users. Agents will just operate on another entry point. Maybe compare it to non-lit tunnels for autonomous driving without passengers.
It feels like that already, most of the accounts that try to interact with me are bots. Dead internet, etc.
Plus a huge chunk of internet traffic is already bots
So far it is confined to the Internet. You just wait til they train robots. It will be gradual and then all at once. Especially coordination and swarming behavior between drones etc. We are cooked as they say.
I’m more optimistic about robots/drones. At least we could actively sabotage them with low tech tools, the Internet seems unrecoverable already, and how you will apply to jobs
3 Fronts are mentioned:
1. Agentic systems
2. Digital core
3. Generative UI
Generative UI seems interesting to note as it's presented here as a way to create personalized UIs. Which means... that Generative AI creates a UI for each user? What are the implications of such a world? How does support work with this? How does a knowledge base get created to educate customers? How do you have a standardized way of discussing the UI?
Yes!
The simplest and most direct and useful implementation Generative UI is to have a settings / form react to the context — e.g. an email form, or a bunch of sliders. I have a zod-based form schema, and set of forms and schemas. If there's a new schema (e.g. if I want to add name and comments) I can just have the AI generate and save a new schema, and the UI will know how to handle the changes. This makes updating interfaces really easy. (this is personalized to the dev)
For UIs generated per user, ideally you'd have a lot of guardrails around what it's able to generate and what the underlying data and use case is. I haven't built any of this in production yet, but I think accessibility may get a boost. Some video games already have a "verbose mode" for new players, and a "bare bones" mode for players who already get the mechanisms. One might consider building an "explanation" layer where some of the concepts can be explained to the user dynamically, and some more complex options can be either revealed under an "advanced mode" or when the UI determines the user "understands" the underlying concepts, as not to overwhelm them. I'm currently experimenting and designing/thinking through some bioinformatics (gene annotation) tools with this approach, but it's early days.
This sounds exceedingly unintuitive. Users hate having settings change / appear and disappear. It sounds like it would make interfaces broadly unteachable and difficult to document.
sounds like a bike shedder's dream.
While i think it's possible that contextualized UX has value, it's seem way too easy for builders to go ham doing what they love: building more things.
i'm a cursor LLM editor convert. Pretty amazing. Thing is, being capable of building N versions of a thing doesn't mean it's a good idea. Technical debt has gone up dramatically on a team of 2.
And its not realistic that higher iteration velocity will yield the greatest version. easy to say but you need data and users to test through all of them. that's not happening in real life.
> "bare bones" mode for players who already get the mechanisms.
I've wanted this in monster hunter for years. Spare me the cutscenes and dialog and just gimme a menu full of quests so I can decide which monster I want to fight without having to listen to a damn audiobook.
you are not taking the concept far enough
gen UI could look like a completely custom interface integrating hundreds of apps to one. that is if an Agentic system is completely multi-modal it can collect all digital information the user owns (in a safe way ofc)
but that means when an agent generates a ui it is JUST to make decisions on the data is has access too. E.g. new email received asking for a file and a small change, agent finds the file and makes the change, then a UI is generated confirming the correct file and changes.
this is what generative ui looks like to me, far off, but definitely in the future.
that's not surprising, my server is already getting slammed by AI agent traffic
(and to be fair, I'm building intelligent automation tools myself, so I'm part of the problem)
At what point do we not just write apis then?
Accenture article is more like like PR and doesn't sound very practical.
It claim the current beat agent is from Anthropic Sonnet.
"Company that hopes to make a lot of consulting revenue from selling AI agents says AI agents are going to be the next big thing."
In my experience I’d rather say a user look for a familiar UI which he knows how to navigate, even if boring or similar to other websites if familiar and standardized it can only be a benefit. That’s why projects such as Bootstrap had a huge success imo, and still are despite the increasing usage trend of utility belts like tailwindcss. A custom ui for each user seems so preposterous
I'm surprised I'm not seeing more 'Are You Human' captures to view content.
Cyberpunk 2077 becomes reality in 2024
It wasn't until 2044 that the Black Wall was implemented to separate the rogue AIs from the corporate nets. As is often postulated though, how would a network of rogue AIs power themselves? LLMs being quite energy intensive, it's impossible to know what technology would actually create AGI and if that will as well be energy intensive, but basically all intelligence to date including humans have proven to be quite demanding in resources.
> As is often postulated though, how would a network of rogue AIs power themselves?
Smaller and smaller models are reaching higher and higher benchmarks; the 35B I can run on consumer hardware today is better than a 605B from years ago. I bet there are also advancements reducing the power usage at each model size.
I think the answer boils down to the same answer as any other rogue software "roaming" the internet: hacked consumer devices.
My guess is 2026 to 2030...
- ChatGPT releases a phone where the Lock Screen UI is like a FaceTime call with ur AI agent (can skin ur agent /assistant to however u want). H.E.R. The movie in ur pocket who does everything for u
- All business will have their own agents and ur personal assistant / agent interfaces with them to get things done for you. I guess your agent could interface with your friends agent when they're not available.
The thing I don’t really get is: who the heck is gonna create services for all these agents? Other agents? That won’t work. And what should they get done for you? The boring work of what? So you concentrate on yours hobbies? AIs are already playing guitar better than a hobbyist. Agents or not, high diffusion of AI or not if you want to make something significant and deep in your life you still need to go deeper yourself, agents wouldn’t work out for you, you’d just get lazier otherwise. And in your example what’s the purpose of agents interacting with other friend’s agents? Makes no sense to me, sounds like totally wasted compute
If your friends are away or busy and they want their agent to schedule meet up times or provide info like what's ur bday ..but that was a tail end thought. Each business or doctors office or other organizations buying their own AI agent from GPT or whomever jumps on this market I see valuable.
This thought popped in my head recently. I use ChatGPT while driving ..talk to it (have full conversations with it) to get things done and as a knowledge base . I recently asked give me steps for junking my old car in Pennsylvania which it did and gave me local businesses. I wish it just interfaced with the businesses and set up the day / time without me having to call them.
I think we'll have HUDs like Meta's Orion, with always monitoring AI agents, before long.
HUDs?
Love my meta ray bans but sorry Zuckerberg they won't replace phones only compliment them. You can not take selfies with glasses while your GPT phone/ AI Agent can take the right selfies for you (get u to best lighting).
AI Agents will be the death of the AI hype. Nobody knows what is a agent. Is running a scraping script an agent? Is something that constructs code on itself based on plain english requirement an agent?
What comes aftet agent? Nobody knows. It's like the hype of chatbot LLMs circa '21-23.
An AI agent is an LLM attached to some external tool.
That's it.
What people are confused about is what groups of agents are, how they are managed and how they interact with each other.
They are confused because no one is building real multi-agent systems in the open which work.
Trying to build multi-step logical LLMs with the paltry memory we have today is about as effective as trying to play 8k video on a Intel 8008.
All the attention is on the LLMs themselves, which is the same thing as being excited about a CPU in 1970. Sure they are amazing engineering, but you will need another layer on top to make them actually useful.
Historically it was apple that first got the software right. We are now waiting for the first company to get a multi-agent system right - but again thanks to the tiny memories we have because of NVidia it will have to run in the data center, or it will be a decade away.
A few weeks ago, Anthropic published what seemed to me to be a level-headed discussion of agents:
https://www.anthropic.com/research/building-effective-agents
Simon Willison had some valuable comments about that article:
https://simonwillison.net/2024/Dec/20/building-effective-age...
What is the difference between an Agent and Function Calls?