From the wheel to the windmills, from basic machinery to computers. Humanity has always sought for ways to make life easier through helpers. Whether those helpers are human or machines, the goal has always been the same, namely to get someone – or something – carry the burden of the work. If you have enough money, you hire a butler. If you are not wealthy but rich in imagination, you might fantasize about a robot helper who could clean, fold your laundry and do your daily chores while you lean back, watch and enjoy life.
Today, AI agents are the digital copy of your personal Jeeves. They will not make you tea, but they can sort your mails, plan trips, do web searches, bring you the news(paper) and way more. No wonder agents are THE hot topic in the AI world. In this post, we’ll rip away the curtain of the marketing and look into how an exemplary agent is built.
To cut through the hype, let us briefly discuss what an agent is… also to understand what it is not. An AI agent is a system that completes tasks on your behalf, ideally with little to no guidance. Think of a human travel agent. You tell the agent the place and time you want to go, and it comes back with flight, hotel and transfer fully booked. You do not tell them which airline to use and the check-in time for the hotel. You delegate.
In the AI world, models are restricted to their training. They have no information beyond what was in their training, have no connection to the outside world and only remember the most recent communication. With tools, you expand the LLM beyond its original limitations.
Leaving aside all the marketing, as promised: AI Agent = LLM + tools.
That is - A model that takes the user instructions, decides on its own which of the tools at its disposal to use, executes the tool, processes the result, often enough stores input and output in a persistent memory and informs the user of the overall outcome.
Tools are what make an AI agent. They can be anything from web search to weather service call, access to flight databases to code interpreters. Tools can be as simple as an API call or complex like a mechanism orchestrating other agents with its own internal logic.
For demonstration reasons, we focus on the overall picture and stick to the simple tools. In this post, we’ll implement two demo tools:
Why those? You might have seen experiments where LLMs horribly fail to get even the easiest calculations right. It makes sense: LLMs are specialized on language, math is not their strength. You don’t ask your favorite barista for tax advice just because it’s the place to get the best coffee in town, right?
However, give the LLM a calculator and you’ll get a math expert. Whether that will work with your barista?
A very simple calculator tool will be enough to overcome the inherent math issues of the LLM. The weather tool will be even less than simple, it will be a dummy. We will only use it for the agent to have a choice between two tools and to demonstrate how it will pick the calculator for a math question and not something else.
Let us start with a pure-vanilla approach and build the agent in Python using OpenAI’s API as the only external module.