Secrets of AI Agents

Insider’s View

TCC Group‘s Chief Technology Officer, Kit Ruparel, has been a long-standing advocate of selecting appropriate AI for the job in hand: Predictive AI for uncovering and interpreting content, and Generative AI for creating new content.

As AI Agent technologies ride the crest of the hype wave, Kit fears that the ease of entry into Agentic AI is pushing companies into not looking closely enough at the AI and other methods being used underneath those agentic covers. This article seeks to demystify this ‘new’ technology and to help clarify how it is most appropriately developed and used.

The Big Secret

Shhh… AI Agents are not new.

Software Agents, being ‘autonomous’ units of software dedicated to a purpose, but that react and respond to the requests of users or other programs, were first conceptualised in the 1950s, and became mainstream in the 80’s with software patterns such as distributed systems and microservice architectures. Whilst such program modules were designed to perform tasks autonomously, they were typically based on predefined rules (“code”) – so were Software Agents rather than AI Agents… unless engineers put AI inside them.

AI, the fashionable moniker that mostly covers machine learning technologies that can also trace roots back to the 1950s, has increasingly been used inside those agentic software modules. This suggests that there is a very good chance you’ve been using applications for many years that make use of AI Agents.

Way back in 1996, when you sat down in front of Microsoft Word to type up your grandmother’s spaghetti carbonara recipe and Clippy appeared, offering to help you with preparing your ‘formal business letter’, you began your love or loathe relationship with an AI Agent.

Then, in 2015, when Amazon launched the Alexa Skills Kit, so you could ask the Echo in your kitchen how to make Boeuf Bourguignon or to play Capital FM, they had effectively released an Agentic AI framework. It allowed third-party developers to build Skills (Agents) that would register their capabilities with your Echo, which would parse your request and route it to the Recipe or Radio skill that seemed most appropriate.

AI has obviously evolved over the years, as has the implied meaning of the term. Whilst today’s hype is because the Generative AI (or GenAI) community has co-opted the Agentic AI terms to mean ‘an end-to-end application ecosystem comprising Generative LLMs (Large Learning Models) everywhere’ – and the confusion this has caused is not necessarily a good thing.

That’s not to say that GenAI hasn’t brought a lot of new toys to the Agentic playground, but let’s come back to that later.

The Secret Agent Quiz Night

The induction event at your new secret agent job is a quiz night. You’ve been selected as a team captain, whilst Mo, Sally and Tabby have joined your team.

Of course, the recruitment process was all very secretive, and so you don’t know any of your team from Adam.

There’s not much time for introductions, but as the quizmaster gives his microphone the one-two one-twos, Mo quickly tells you his specialist subject is music, Sally that she’s a real soccer fan, and Tabby that she’s a tech freak.

The first question is about the pre-game eating habits of some Arsenal player, and whilst you’re not sporty enough to have a clue what the answer might be, you at least know that Arsenal is a UK football team, defer to Soccer Sally, and she nails it. First point in the bag, and everyone happily takes a deep chug of their beer.

In the agentic technology world, you’ve just played the roles of a ‘Parser’, interpreting the question, as a ‘Reasoner’, deciding that you couldn’t answer the question directly based on own your limited knowledge but that another might, and as an ‘Orchestrator’ deferring the question to one of your agents – Sally, because she ‘registered her skill’ with you concerning UK football knowledge. You asked Sally the question using a common ‘communications contract’ between you (the English language), taking her response via the same contracted language. Then, finally, your ‘Reasoning’ decided to trust her response based on your ‘confidence’ in it, rather than simply answering “we don’t know” or making up a different answer yourself.

Question two arrives and asks about the naming origins of the UK electronic band ‘808 State’. You’re far too young to have a scooby, but hearing the word ‘band’, you figure this must be one for Music Mo.

Mo thinks about it, then comes back with: “808 is the US police penal code for ‘disturbing the peace’, and social disruption was all the music rage back in the 80s. It must be the answer.”

This makes sense to you, but you also heard the word ‘electronic’ in the question, so you also decide to give Tech Tabitha a shot. She comes back with “I think there was a popular electronic drum machine in the 80s called the Roland TR-808 – I think maybe that was it?”

Now you’ve got two options on the table, but given the question was about a band called “808 State”, not “TR-808 State”, and given Mohammed registered himself as the music expert and this question seems to be about music, and he seemed more confident in his answer than Timid Tabby, you again use your Reasoning capabilities and unfortunately elect to go with “disturbing the peace”. The next sip of beer is much more reserved.

Agentic Framework Roles

There are various technological elements to an Agentic framework and, depending on what you read, these can vary and are often described in quite complex ways.

To keep things simple here, the key components that matter most are described, hopefully, in terms the non-technorati can understand.

The Agentic Client, Parser, Reasoner and Orchestrator

Some form of request or requirement needs to be fulfilled, so as with most software applications, there is a ‘Client’ – something that makes a request, usually on behalf of a human user. Today’s most well-known Agentic Clients are ChatGPT for consumers, and Microsoft CoPilot for enterprise users. However, as you will have experienced, each CRM, communications tool, website and other piece of software out there is getting into the Agentic Client game – something that is about as welcome as Microsoft Clippy for many users.

Then something needs to interpret that request (the ‘Parser’) and once the request is ‘understood’, it needs to be responded to directly or broken down into sub-tasks (by a ‘Reasoner’), before each sub-task is routed (by an ‘Orchestrator’) to one or more Agents.

Technically, these can be four separate technological components, however, in reality, today the Parser, Reasoner and Orchestrator are an integrated part of the Client – the fourth role that you were performing as the team captain at the quiz night.

The Agentic Server

The Client-Server paradigm is the most prevalent software pattern in the industry, and as such, Agentic AI frameworks refer to the AI Agents as Servers – as they are responding to the requests of the Clients.

As you were the Agentic Client on quiz night, Mo, Sally and Tabby were the Agentic Servers.

Agentic Servers need to tell the Client what they [believe they] are capable of, described in the quiz night example as ‘registering a skill’, whilst the Client and Server need to be able to exchange requests and responses using a ‘communications contract’.

Before LLMs introduced themselves to the Agentic AI world, communications contracts between software components were very formal and very fragmented across the industry, with every vendor creating and publishing their own ‘API’ (Application Programming Interface).

Standards such as SOAP (the Simple Object Access Protocol), CORBA (the Common Object Request Broker Architecture) and OAS (The OpenAPI Specification) emerged to try and standardise a little on how APIs operated across different software components and network architectures. But frankly, in most industries, we never really got our act together to create a single communications contract that would allow any piece of software to talk to any other.

The biggest thing that’s changed with the advent of LLMs is that software can now talk to other software using English (or many other human languages) as their communications protocol.

However, deciding what to say to each other was a bit of a free-for-all in the first year of the modern Agentic AI era. This was until Anthropic proposed and open–sourced the now widely-adopted MCP (the Model Control Protocol) that provides a bit of groundwork in terms of how to both register an agent skill and then communicate the follow-up request and response using English – essentially lending a bit of structure in the same way SOAP did for API services back in 1998.

Then, whilst MCP is focused on Servers registering and communicating with Clients, it’s looking quite likely that it will be supplemented with the similar-but-different A2A (Agent-to-Agent) protocol from Google, intended to semi-standardise how the various Agentic Servers can talk to other Servers.

English being at the core of new Client-Server-Server communications has truly opened up the opportunity for any software to talk to any other software, whether this Agentic system otherwise uses AI or not. However, it also means that rather than the robust communications contracts we used to have in APIs, we are also inheriting the biggest problems with LLMs: their propensity to be unknowingly, but confidently, wrong.

Amongst Agentic hype, the Generative AI community would also have you believe that all Server Agents should also be built entirely from LLMs – especially given they need to Parse, Reason, and potentially Orchestrate other Agents to provide a response. Yet again, this falsehood can increase the likelihood of them, and therefore the Agentic system as a whole, being confidently wrong.

Being Confidently Wrong

By now, everyone reading this blog post will have experienced both correct and incorrect AI summaries at the top of their Google search results, leaving you, the end-user, to use your own judgement as to whether to believe or ignore them.

Kit’s previous blog articles have drilled into why LLMs are so often wrong, whilst the HuggingFace Leaderboards demonstrate there is still a long way to go before LLMs provide trustworthy answers or reasoning.

Those who have heard Kit Ruparel speak on AI also know that as much as providing a correct result is important. What is even more important is that an AI system provides a probability score as to how confident that AI system believes it is with regards to its answer’s correctness – and that this is a fundamental weakness of Generative AI LLMs. They don’t provide an answer along with a confidence score instead only providing a sequence of statistically likely text.

Imagine how useful it would be if that AI answer at the top of your Google search results was accompanied with a single additional statement: “This AI generated answer is likely 94% correct.” You could then decide if that 94% result is within your acceptable risk tolerance, based upon your research use-case, alongside your historic knowledge as to how reliable Google was at formulating that percentage probability of correctness.

Today, only Predictive AI (or ‘traditional’ machine learning if you prefer) provides both an answer and a probability score that you can learn the trustworthiness of and bake into your application’s risk tolerance threshold.

Playground Whispers

Returning to the agent quiz night example above, let’s look at everywhere an LLM might have been used in today’s Agentic AI world: for each of the agents to register their skills; to parse and reason over a quiz question; to decide which agent(s) to route the question to; for one or more agents to parse and reason over the question and provide an answer; for the client to reason over the answers provided and to formulate an ultimate response.

That’s five different uses of LLMs, and if each use of an LLM only has a 90% chance of being accurate, then mathematically, there is a 41% chance that the final answer will be incorrect.

If a more complex agentic mesh were in play with 10 agents involved, each with a 90% accuracy level, then, statistically, two out of three final answers will be wrong. Confidently wrong.

Using LLMs for every element of the Agentic AI ecosystem is akin to the old ‘Whispers’ game that you may have enjoyed in the playground as children, perhaps known by a less politically correct moniker at the time, or, if you were in the USA, known as the ‘Telephone Game’. According to the rules of that game, each child in the process had to pass on what they thought they heard. A more sensible game outcome might have been achieved if each child had the option of saying “I’m 94% confident that I heard ‘Grandma wears blue shoes’”, and then each next child in the chain had the option of taking this information and potentially reasoning towards an “I don’t know” answer. But that would have been far less fun.

Successful Agentic AI in the Enterprise

Thanks to Generative LLMs, Agentic AI is coming for your business, so it is your organisation’s responsibility to ensure that the implementation is both safe and that it makes your employees more productive.

Productivity is obviously the purpose of these technologies. However, recent company surveys by respected institutions such as MIT and McKinsey demonstrate that, today, only around 10% of Generative AI projects make it out of pilot purgatory, with only 5% or so actually providing a return on investment.

Users instead find their time lost either to the ‘AI Verification Tax’ – i.e. spending more time checking and correcting those confidently wrong results than the initial time saved in having answers or content created for them; or in adding the meat to the ‘AI Workslop’: defined by Harvard as “AI generated content that masquerades as good work, but lacks the substance to meaningfully advance a given task.”

This will change over time as AI scientists and developers build increasingly more performative tools, so early-adopters really need to start setting out their stalls for safe and responsible Agentic AI.

Such a process must start with careful selection of agents and other AI components. The Agentic AI ecosystem will only be as good as the sum of each of its parts, each potentially having been built by a different development team, each of whom must be trusted to have ‘done the right thing’.

Where componentry is developed in house, try and avoid LLMs unless text generation is truly required; most especially where an organisation owns both sides of the Client-Server or Server-Server equation. If you can define the communication contract between your services, then a restrictive and purposeful API is going to guarantee reliable consistency in a way that the ambiguities of natural language never will.

Then, within agentic services themselves, build or prefer to use components that minimise technologies related to Generative AI when either Predictive AI, or even more straightforward procedural coding will suffice.

Importantly, ensure each component in the framework is designed to opt-out – to provide that “I don’t know” answer whenever there is uncertainty. For AI services using traditional machine learning, make sure you are in control of what the confidence tolerance thresholds are for the AI either returning an answer or taking an action, or for providing the opt-out.

Next, consider security, guardrails and traceability. This is where most organisations must surely turn to Microsoft, as CoPilot and the Office suite will most commonly be the entry-point Client for business users, and Office documents will often be the desired output format.

More pertinently, organisations using MS Azure will turn to the security of the Azure Agent ID technologies, their Agentic monitoring and AI Content Safety guardrails, and the Purview AI data governance capabilities that Microsoft is rapidly evolving. Agentic AI components that are developed outside of the Azure ecosystem will be sensibly turned down by most information security teams.

Finally, ensure your Agentic AI system has an appropriate amount of human-in-the-loop, for both consumption with users well trained on responsible use and best practice, as well as for monitoring, quality control and continuous improvement throughout the AI componentry mesh.

Recordsure AI and Agents

Recordsure develops and consumes AI responsibly, with our products proven to improve productivity, yet with an appropriate amount of human-in-the-loop application to reduce rather than increase operational and conduct risk.

Recordsure has used software agents (microservices) since our first product developments over a decade ago, with many of these agents often providing or consuming AI capabilities – initially predictive machine learning technologies, and more recently, carefully designed Generative AI LLM usage where text actually needs to be generated.

Traditional, robust APIs are used wherever we own both halves of a communications contract between services, or where trusted suppliers are in the mix. Every AI element is designed with configurable and well-tested AI confidence thresholds, guardrails for LLM human input and responses, and monitoring and improvement methods throughout.

As our clients look to integrate our products within their own Agentic ecosystems, we are working with the various emerging Microsoft AI and other Azure services to ensure that InfoSec teams can be confident of a natural fit into the security and supervision frameworks, which will increasingly become a part of their daily Agentic AI operations.

Secrets of AI Agents

Insider’s View

The Big Secret

The Secret Agent Quiz Night

Agentic Framework Roles

The Agentic Client, Parser, Reasoner and Orchestrator

The Agentic Server

Being Confidently Wrong

Playground Whispers

Successful Agentic AI in the Enterprise

Recordsure AI and Agents

All images are 100% AI generated. All words are 100% human. Most mistakes are the proof-readers’.

Latest

Compliance week: U.K. ‘buy now pay later’ regulation signals end of ‘Wild West’ fintech loans system

Why traditional resourcing models are quietly undermining financial services firms

Navigating financial crime compliance in a changing regulatory landscape

Subscribe for updates

Receive regular insights including industry leader interviews, blogs and key trends