From the wheel to the windmills, from basic machinery to computers. Humanity has always sought for ways to make life easier through helpers. Whether those helpers are human or machines, the goal has always been the same, namely to get someone – or something – carry the burden of the work. If you have enough money, you hire a butler. If you are not wealthy but rich in imagination, you might fantasize about a robot helper who could clean, fold your laundry and do your daily chores while you lean back, watch and enjoy life.
Today, AI agents are the digital copy of your personal Jeeves. They will not make you tea, but they can sort your mails, plan trips, do web searches, bring you the news(paper) and way more. No wonder agents are THE hot topic in the AI world. In this post, we’ll rip away the curtain of the marketing and look into how an exemplary agent is built.
What is an AI agent?
To cut through the hype, let us briefly discuss what an agent is… also to understand what it is not. An AI agent is a system that completes tasks on your behalf, ideally with little to no guidance. Think of a human travel agent. You tell the agent the place and time you want to go, and it comes back with flight, hotel and transfer fully booked. You do not tell them which airline to use and the check-in time for the hotel. You delegate.
In the AI world, models are restricted to their training. They have no information beyond what was in their training, have no connection to the outside world and only remember the most recent communication. With tools, you expand the LLM beyond its original limitations.
Leaving aside all the marketing, as promised: AI Agent = LLM + tools.
That is - A model that takes the user instructions, decides on its own which of the tools at its disposal to use, executes the tool, processes the result, often enough stores input and output in a persistent memory and informs the user of the overall outcome.
Tools
Tools are what make an AI agent. They can be anything from web search to weather service call, access to flight databases to code interpreters. Tools can be as simple as an API call or complex like a mechanism orchestrating other agents with its own internal logic.
Hands-on
For demonstration reasons, we focus on the overall picture and stick to the simple tools. In this post, we’ll implement two demo tools:
- Calculator tool to handle math,
- Weather tool to return a weather forecast.
Why those? You might have seen experiments where LLMs horribly fail to get even the easiest calculations right. It makes sense: LLMs are specialized on language, math is not their strength. You don’t ask your favorite barista for tax advice just because it’s the place to get the best coffee in town, right?
However, give the LLM a calculator and you’ll get a math expert. Whether that will work with your barista?
A very simple calculator tool will be enough to overcome the inherent math issues of the LLM. The weather tool will be even less than simple, it will be a dummy. We will only use it for the agent to have a choice between two tools and to demonstrate how it will pick the calculator for a math question and not something else.
Implementation: Pure Python
Let us start with a pure-vanilla approach and build the agent in Python using OpenAI’s API as the only external module.
from openai import OpenAI
client = OpenAI()
def calculator_tool(expression):
return eval(expression)
def weather_tool(location):
match location:
case "Berlin":
output="It is cloudy and rainy in Berlin"
case "Miami":
output="Sunny and warm in Miami"
case _:
output = "No weather data available"
return output
def call_openai(messages):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
return response.choices[0].message.content
One import statement, two tools and a method to call the LLM. The calculator will only evaluate its argument to return the result. The weather tool will do even less and return predefined answers if it ever gets called.
prompt = f"""You are an intelligent AI agent with access to the following tools:
1. calculator_tool(expression):
- Use this tool to perform arithmetic operations like addition, subtraction, multiplication, or division.
- Example: calculator(3 * (5 + 2))
2. weather_tool(location):
- Use this tool to get the current weather in a specific city.
- Example: weather("Miami")
Instructions:
- If you want to use a tool, respond **only** with the tool call in the exact format shown above.
- Do not write any explanation, just the tool call.
- If no tool is needed, respond normally with your answer.
"""
system_message = {
"role": "assistant",
"content": prompt
}
messages = [system_message]
It’s here where the magic is happening. Let’s have a closer look at the prompt that makes our agent:
We are informing the model of the tools it will have, by listing and describing the tools and showing an example of how they can be called. Next, we are telling the LLM how to use the tools, namely by saying so in a structured way. This point is crucial, and we will come back to it during this post. As you might know, LLMs tend to not only be very polite but also overly wordy.
To catch the tool call, we need a direct statement and not having to wade through lines of fillers that will make it hard to find the call e.g. like “Certainly, please let me use the tool calculator. If there is anything else I can help you with, let me know”.
Anything besides a crisp instruction will make it hard to automatically process the answer and hence the tool call.
user_message = "What is 38237 x 42?"
messages.append({"role": "user", "content": user_message})
response = call_openai(messages)
messages.append({"role": "assistant", "content": response})
After appending system prompt and user request, we are calling the LLM. The answer will also be appended to our message history. That way we keep a full log of the entire conversation which we can send to the LLM at each call.
A closer look at the response will reveal calculator_tool(38237 * 42). The model answered in the expected way, i.e. with method call and required arguments.
All there is left to do is to execute the call and return its output to the model:
if response.lower().startswith("use tool: calculator("):
expr = response[len("use tool: calculator("):-1]
result = calculator_tool(expr)
user_message = f"The tool returned {result}. Please answer in a lengthy, detailed and wordy way"
messages.append({"role": "user", "content": user_message})
response = call_openai(messages)
print(response)
And there we have it, the final answer:
Certainly! The result of multiplying 38,237 by 42 is **1,605,954**. To arrive
at this figure, you take the number 38,237 and add it together with itself a total of 42 times… […more…] …
To elaborate further, in the traditional method of long multiplication, you would… […even more…] …
Here, using either mental math, written calculation… […still more…]
Okay, we overdid it asking for the “lengthy, detailed and wordy” reply.
Implementation: LangChain
Instead of building it all manually, frameworks can take off a lot of the work and make building agents much easier. Next, we will build the same agent with LangChain [1].
Again, let’s start with the tools:
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
@tool
def calculator_tool(expression):
""" Use this tool to perform arithmetic operations like addition, subtraction, multiplication, or division. """
return eval(expression)
@tool
def weather_tool(location):
""" Use this tool to get the current weather in a specific city. """
match location:
case "Berlin":
output="It is cloudy and rainy in Berlin"
case "Miami":
output="Sunny and warm in Miami"
case _:
output = "No weather data available"
return output
tool_map = {
"calculator_tool": calculator_tool,
"weather_tool": weather_tool
}
llm = ChatOpenAI(
model="gpt-4o"
)
We are declaring the same tools as before. Note the main difference: The tool descriptions are in the docstrings (like it should be anyways, right?) but we omit the examples telling the model how to call the method. We do not need the latter.
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
chat_template = ChatPromptTemplate([
SystemMessagePromptTemplate.from_template("You are a friendly AI assistant that speaks English."),
HumanMessagePromptTemplate.from_template("What is {math_question}?")
])
Here we define the chat template defining the LLM’s role and preparing the user’s question. As you can see, we do not need a lengthy prompt like before.
prompt = chat_template.invoke({"math_question": "38237 x 42"})
llm_with_tools = llm.bind_tools([calculator_tool, weather_tool])
reply = llm_with_tools.invoke(prompt)
for tool_call in reply.tool_calls:
result = tool_map[tool_call["name"]].invoke(tool_call["args"])
print(result)
Putting everything together and calling the model. Did you notice how little is needed to bind the tools to the model? Instead of a prompt, we only call bind_tools() on the model, chaining LLM and tools. The model call is then done on the chain. Hence LangChain.
We can see that everything works as expected by checking with print(result). It is 1605954, the result of the tool call. The result can be sent back to the LLM to generate another lengthy answer if we want to.
Wrapping it up
We briefly explored the fundamentals of AI agents and two approaches to implementing them: one using plain Python for a more transparent, low-level view, and another using LangChain, which abstracts away many complexities.
So why choose one over the other?
When working with just a couple of tools, the Python-only method is still manageable. Crafting and maintaining the prompt is relatively straightforward. But as the number of tools increases—say, to a few dozen—the prompt can quickly become unwieldy, even longer than the responses from the LLM. Maintaining such a prompt becomes tedious and error-prone.
With LangChain, adding a tool is as simple as appending it to a list via bind_tools()
. It handles prompt construction and maintenance for you. Additionally, switching LLMs in LangChain often requires just a few lines of code, whereas the Python-only approach may require significant prompt reengineering to align with the new model’s expectations.
A key limitation of the plain Python approach is reliability. There’s no guarantee the LLM will always respond with a clean, standardized tool call. Even minor deviations—like extra words—can break the code. To mitigate this, you’d need to build your own parsers and error handlers to sanitize responses.
LangChain handles these issues out of the box. It includes built-in parsers, validators, and control loops to ensure consistent behavior. The trade-off? You give up some transparency and control over the agent’s internal workings.
-----
[1] https://python.langchain.com/docs/introduction/