Building AI Agents from Scratch with Python: A Practical Guide (2026)

Building AI Agents from Scratch with Python: A Practical Guide (2026)

TL;DR

AI agents go beyond chatbots by taking autonomous actions — calling APIs, executing code, and making decisions in a loop. This guide builds agents from scratch in Python, starting with function calling fundamentals and progressing to a complete working agent with memory, multi-tool orchestration, and the ReAct pattern. All code is production-ready, using the OpenAI Python SDK with any OpenAI-compatible endpoint. No frameworks required.

What Is an AI Agent?

The term “agent” gets applied to everything from simple chatbots to complex autonomous systems, so let us define it precisely.

An AI agent is a program that uses a language model to decide what actions to take in pursuit of a goal. The key word is “decide.” A chatbot generates text. An agent generates text and takes actions — calling functions, querying databases, hitting APIs, reading files — and then uses the results of those actions to decide what to do next.

Chatbot vs AI Agent — a chatbot produces a single text response, while an agent loops through tool calls, observes results, and acts autonomously

The simplest mental model:

Chatbot:  User message → LLM → Text response
Agent:    User message → LLM → Action → Result → LLM → Action → Result → ... → Final answer

What makes this powerful is the loop. The agent does not need a pre-programmed sequence of steps. It observes results, reasons about them, and picks the next action dynamically. This means it can handle tasks it has never seen before, recover from errors, and adapt to unexpected situations.

What Agents Are Not

Agents are not magic. They are constrained by:

  • The tools you give them: An agent can only take actions you define. No tool for sending email means no email.
  • The model’s reasoning ability: Weaker models make worse decisions about which tools to use and how.
  • The context window: Long-running agents accumulate history that can exceed the model’s context limit.
  • Cost: Every tool call adds tokens. Complex tasks can burn through significant API credits.

Understanding these constraints is essential for building agents that work reliably in production rather than just in demos.

The Agent Loop: Observe, Think, Act

Every AI agent, regardless of implementation, follows the same core loop:

┌─────────────────────────────────┐
│                                 │
│   ┌─────────┐                  │
│   │ OBSERVE │ ← tool results   │
│   └────┬────┘                  │
│        │                       │
│   ┌────▼────┐                  │
│   │  THINK  │ ← LLM reasoning │
│   └────┬────┘                  │
│        │                       │
│   ┌────▼────┐                  │
│   │   ACT   │ → call tool OR   │
│   └────┬────┘   return answer  │
│        │                       │
│        └───────────────────────┘
  1. Observe: The agent receives input — either the user’s initial message or the result of a previous action.
  2. Think: The language model processes all context (system prompt, conversation history, tool definitions, previous results) and decides what to do.
  3. Act: The model either calls a tool (returning structured arguments) or produces a final text response.

If the model calls a tool, the result is fed back into step 1, and the loop continues. If the model produces a text response, the loop ends.

This is the entire architecture. Everything else — memory, guardrails, multi-tool orchestration — is built on top of this loop.

Foundation: Function Calling

Function calling (also called tool use) is the mechanism that lets language models trigger actions in your code. Without function calling, an agent would have to output something like “I want to call the weather API for London” as plain text, and you’d need to parse that text to figure out what it wants. Fragile, unreliable, and error-prone.

Modern function calling is structured. You define tools as JSON schemas, and the model returns structured tool_call objects with exact function names and typed arguments.

Defining Tools

Here is how you define a tool for the OpenAI-compatible API:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'London' or 'San Francisco'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature units"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

The schema tells the model: “You have a tool called get_weather. It takes a required city string and an optional units string.” The description fields are critical — they are the model’s only documentation for understanding when and how to use the tool.

Making a Tool Call

When you send a message with tools defined, the model may respond with a tool call:

import openai
import json

client = openai.OpenAI(
    base_url="https://api.openai.com/v1",  # or your custom endpoint
    api_key="your-api-key"
)

messages = [
    {"role": "user", "content": "What's the weather in Tokyo?"}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

message = response.choices[0].message

if message.tool_calls:
    tool_call = message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")
    # Output:
    # Function: get_weather
    # Arguments: {"city": "Tokyo", "units": "celsius"}

Returning Tool Results

After executing the function, you return the result to the model:

# Execute the actual function
def get_weather(city: str, units: str = "celsius") -> dict:
    # In production, call a real weather API
    return {"temperature": 22, "condition": "sunny", "city": city, "units": units}

# Parse the arguments and call the function
args = json.loads(tool_call.function.arguments)
result = get_weather(**args)

# Add the assistant's tool call and the result to the conversation
messages.append(message.model_dump())
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps(result)
})

# Get the final response
final_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

print(final_response.choices[0].message.content)
# Output: "The current weather in Tokyo is 22°C and sunny."

This two-step process — model returns tool call, you execute it, model processes the result — is the building block for everything that follows.

Building a Minimal Agent

Let us build a complete, working agent. This agent can check weather and perform calculations — simple enough to understand fully, complex enough to demonstrate all the core patterns.

Complete Working Code

"""
Minimal AI Agent with Weather and Calculator Tools
Works with any OpenAI-compatible API endpoint.
"""

import json
import math
import openai

# --- Configuration ---
client = openai.OpenAI(
    base_url="https://api.openai.com/v1",  # Replace with your endpoint
    api_key="your-api-key"                  # Replace with your key
)
MODEL = "gpt-4o"  # Replace with your preferred model
MAX_ITERATIONS = 10

# --- Tool Definitions ---
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city. Returns temperature, condition, and humidity.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name (e.g., 'London', 'New York')"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Evaluate a mathematical expression. Supports +, -, *, /, ** (power), and common math functions like sqrt, sin, cos, log.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "Math expression to evaluate (e.g., '2 + 3 * 4', '144 ** 0.5', '2**10')"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

# --- Tool Implementations ---
WEATHER_DATA = {
    "london": {"temperature": 14, "condition": "cloudy", "humidity": 78},
    "new york": {"temperature": 25, "condition": "sunny", "humidity": 45},
    "tokyo": {"temperature": 22, "condition": "partly cloudy", "humidity": 60},
    "sydney": {"temperature": 19, "condition": "rainy", "humidity": 85},
    "paris": {"temperature": 17, "condition": "overcast", "humidity": 70},
}


def get_weather(city: str) -> dict:
    """Look up weather data for a city."""
    city_lower = city.lower().strip()
    if city_lower in WEATHER_DATA:
        return {
            "city": city,
            "temperature_celsius": WEATHER_DATA[city_lower]["temperature"],
            "condition": WEATHER_DATA[city_lower]["condition"],
            "humidity_percent": WEATHER_DATA[city_lower]["humidity"],
        }
    return {"error": f"Weather data not available for '{city}'"}


def calculate(expression: str) -> dict:
    """
    Safely evaluate a mathematical expression using AST parsing.
    Only allows numeric literals and basic math operations.
    """
    import ast
    import operator

    # Supported binary operators
    ops = {
        ast.Add: operator.add,
        ast.Sub: operator.sub,
        ast.Mult: operator.mul,
        ast.Div: operator.truediv,
        ast.Pow: operator.pow,
        ast.Mod: operator.mod,
        ast.FloorDiv: operator.floordiv,
    }

    # Supported unary operators
    unary_ops = {
        ast.UAdd: operator.pos,
        ast.USub: operator.neg,
    }

    # Supported function names mapped to math module
    safe_functions = {
        "sqrt": math.sqrt,
        "abs": abs,
        "round": round,
        "pow": pow,
        "log": math.log,
        "log10": math.log10,
        "sin": math.sin,
        "cos": math.cos,
        "tan": math.tan,
        "pi": math.pi,
        "e": math.e,
    }

    def safe_eval_node(node):
        """Recursively evaluate an AST node safely."""
        if isinstance(node, ast.Expression):
            return safe_eval_node(node.body)
        elif isinstance(node, ast.Constant) and isinstance(node.value, (int, float)):
            return node.value
        elif isinstance(node, ast.BinOp) and type(node.op) in ops:
            left = safe_eval_node(node.left)
            right = safe_eval_node(node.right)
            return ops[type(node.op)](left, right)
        elif isinstance(node, ast.UnaryOp) and type(node.op) in unary_ops:
            operand = safe_eval_node(node.operand)
            return unary_ops[type(node.op)](operand)
        elif isinstance(node, ast.Call):
            if isinstance(node.func, ast.Name) and node.func.id in safe_functions:
                func = safe_functions[node.func.id]
                if callable(func):
                    args = [safe_eval_node(arg) for arg in node.args]
                    return func(*args)
            raise ValueError(f"Unsupported function: {ast.dump(node.func)}")
        elif isinstance(node, ast.Name) and node.id in safe_functions:
            val = safe_functions[node.id]
            if not callable(val):
                return val  # constants like pi, e
            raise ValueError(f"'{node.id}' is a function and must be called with arguments")
        else:
            raise ValueError(f"Unsupported expression: {ast.dump(node)}")

    try:
        tree = ast.parse(expression, mode="eval")
        result = safe_eval_node(tree)
        return {"expression": expression, "result": result}
    except Exception as e:
        return {"expression": expression, "error": str(e)}


# --- Tool Dispatcher ---
TOOL_FUNCTIONS = {
    "get_weather": get_weather,
    "calculate": calculate,
}


def execute_tool(name: str, arguments: str) -> str:
    """Execute a tool and return the result as a JSON string."""
    if name not in TOOL_FUNCTIONS:
        return json.dumps({"error": f"Unknown tool: {name}"})

    try:
        args = json.loads(arguments)
        result = TOOL_FUNCTIONS[name](**args)
        return json.dumps(result)
    except Exception as e:
        return json.dumps({"error": f"Tool execution failed: {str(e)}"})


# --- Agent Loop ---
def run_agent(user_message: str) -> str:
    """Run the agent loop until it produces a final text response."""
    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful assistant with access to weather data and a calculator. "
                "Use the available tools when needed to answer questions accurately. "
                "Always show your work when doing calculations."
            ),
        },
        {"role": "user", "content": user_message},
    ]

    for iteration in range(MAX_ITERATIONS):
        response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            tools=tools,
        )

        message = response.choices[0].message

        # If no tool calls, the agent is done
        if not message.tool_calls:
            return message.content

        # Add the assistant message (with tool calls) to history
        messages.append(message.model_dump())

        # Execute each tool call
        for tool_call in message.tool_calls:
            print(f"  [Tool] {tool_call.function.name}({tool_call.function.arguments})")
            result = execute_tool(tool_call.function.name, tool_call.function.arguments)
            print(f"  [Result] {result}")

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result,
            })

    return "Agent reached maximum iterations without producing a final answer."


# --- Main ---
if __name__ == "__main__":
    test_queries = [
        "What's the weather like in London and Tokyo?",
        "If it's 14 degrees in London and 22 degrees in Tokyo, what's the average temperature?",
        "What's the square root of the humidity in Sydney?",
    ]

    for query in test_queries:
        print(f"\nUser: {query}")
        print(f"Agent: {run_agent(query)}")
        print("-" * 60)

How It Works

Walk through the code from bottom to top:

  1. run_agent() is the core loop. It sends messages to the model, checks if the response contains tool calls, executes them, and loops until the model returns a plain text response.
  2. execute_tool() is the dispatcher. It maps tool names to Python functions, parses JSON arguments, and handles errors.
  3. Tool functions (get_weather, calculate) are ordinary Python functions. In production, get_weather would call a real API. The calculator uses safe AST-based parsing that only allows numeric literals and whitelisted math operations.
  4. Tool definitions tell the model what tools exist. The descriptions and parameter schemas are what the model reads to decide when and how to use each tool.

The agent handles multi-step tasks naturally. If asked “What’s the average temperature in London and Tokyo?”, it will:

  1. Call get_weather("London") and get_weather("Tokyo") (possibly in parallel)
  2. Receive both results
  3. Call calculate("(14 + 22) / 2")
  4. Receive the result
  5. Formulate a natural language answer

No explicit orchestration code. The model decides the sequence.

Adding Memory and Context

The agent above is stateless — each call to run_agent() starts fresh. Real agents need memory.

Short-Term Memory: Conversation History

The simplest form of memory is keeping the message list between calls:

class Agent:
    def __init__(self, system_prompt: str):
        self.messages = [{"role": "system", "content": system_prompt}]
        self.client = openai.OpenAI(
            base_url="https://api.openai.com/v1",
            api_key="your-api-key"
        )

    def chat(self, user_message: str) -> str:
        self.messages.append({"role": "user", "content": user_message})

        for _ in range(MAX_ITERATIONS):
            response = self.client.chat.completions.create(
                model=MODEL,
                messages=self.messages,
                tools=tools,
            )

            message = response.choices[0].message

            if not message.tool_calls:
                self.messages.append({"role": "assistant", "content": message.content})
                return message.content

            self.messages.append(message.model_dump())

            for tool_call in message.tool_calls:
                result = execute_tool(tool_call.function.name, tool_call.function.arguments)
                self.messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result,
                })

        return "Max iterations reached."

Now the agent remembers previous exchanges:

agent = Agent("You are a helpful weather assistant.")
print(agent.chat("What's the weather in London?"))
# -> "It's 14 degrees C and cloudy in London."

print(agent.chat("How about Tokyo?"))
# -> "Tokyo is 22 degrees C and partly cloudy." (remembers context)

print(agent.chat("Which city is warmer?"))
# -> "Tokyo is warmer at 22 degrees C compared to London's 14 degrees C."

Long-Term Memory: Persistent Storage

For memory that survives across sessions, store facts externally:

import json
from pathlib import Path


class PersistentMemory:
    def __init__(self, filepath: str = "agent_memory.json"):
        self.filepath = Path(filepath)
        self.data = self._load()

    def _load(self) -> dict:
        if self.filepath.exists():
            return json.loads(self.filepath.read_text())
        return {"facts": [], "preferences": {}}

    def save(self):
        self.filepath.write_text(json.dumps(self.data, indent=2))

    def add_fact(self, fact: str):
        if fact not in self.data["facts"]:
            self.data["facts"].append(fact)
            self.save()

    def set_preference(self, key: str, value: str):
        self.data["preferences"][key] = value
        self.save()

    def get_context(self) -> str:
        lines = []
        if self.data["facts"]:
            lines.append("Known facts: " + "; ".join(self.data["facts"]))
        if self.data["preferences"]:
            prefs = [f"{k}: {v}" for k, v in self.data["preferences"].items()]
            lines.append("User preferences: " + "; ".join(prefs))
        return "\n".join(lines)

Inject the memory context into the system prompt:

memory = PersistentMemory()
memory.set_preference("temperature_units", "fahrenheit")
memory.add_fact("User lives in Berlin")

system_prompt = f"""You are a helpful assistant.

{memory.get_context()}

Use this information to personalize your responses."""

Context Window Management

Long conversations will exceed the model’s context window. Implement a sliding window or summarization strategy:

def trim_messages(messages: list, max_tokens: int = 100000) -> list:
    """Keep the system prompt and most recent messages within token budget."""
    # Always keep the system message
    system_msg = messages[0] if messages[0]["role"] == "system" else None
    other_msgs = messages[1:] if system_msg else messages

    # Rough token estimation (4 chars ~ 1 token)
    total_chars = sum(len(json.dumps(m)) for m in other_msgs)
    max_chars = max_tokens * 4

    while total_chars > max_chars and len(other_msgs) > 2:
        removed = other_msgs.pop(0)
        total_chars -= len(json.dumps(removed))

    if system_msg:
        return [system_msg] + other_msgs
    return other_msgs

Multi-Tool Orchestration

Real agents need more than two tools. Here is a pattern for cleanly managing many tools:

Multi-tool orchestration architecture — an agent controller manages rate limiting and token budgets while the agent loop dispatches calls to a registry of specialized tools in parallel

from typing import Callable
from dataclasses import dataclass


@dataclass
class Tool:
    name: str
    description: str
    parameters: dict
    function: Callable


class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, Tool] = {}

    def register(self, name: str, description: str, parameters: dict):
        """Decorator to register a function as a tool."""
        def decorator(func: Callable):
            self._tools[name] = Tool(
                name=name,
                description=description,
                parameters=parameters,
                function=func,
            )
            return func
        return decorator

    def get_openai_tools(self) -> list[dict]:
        """Return tool definitions in OpenAI format."""
        return [
            {
                "type": "function",
                "function": {
                    "name": tool.name,
                    "description": tool.description,
                    "parameters": tool.parameters,
                },
            }
            for tool in self._tools.values()
        ]

    def execute(self, name: str, arguments: str) -> str:
        """Execute a tool by name with JSON arguments."""
        if name not in self._tools:
            return json.dumps({"error": f"Unknown tool: {name}"})
        try:
            args = json.loads(arguments)
            result = self._tools[name].function(**args)
            return json.dumps(result)
        except Exception as e:
            return json.dumps({"error": str(e)})


# Usage
registry = ToolRegistry()


@registry.register(
    name="search_web",
    description="Search the web for current information",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Search query"}
        },
        "required": ["query"],
    },
)
def search_web(query: str) -> dict:
    # Implement with a real search API
    return {"results": [f"Result for: {query}"]}


@registry.register(
    name="read_file",
    description="Read the contents of a file",
    parameters={
        "type": "object",
        "properties": {
            "path": {"type": "string", "description": "File path to read"}
        },
        "required": ["path"],
    },
)
def read_file(path: str) -> dict:
    try:
        content = Path(path).read_text()
        return {"path": path, "content": content}
    except Exception as e:
        return {"error": str(e)}

This registry pattern scales to dozens of tools. The agent code itself does not change — it just passes registry.get_openai_tools() to the API and calls registry.execute() for each tool call.

The ReAct Pattern

ReAct (Reasoning + Acting) structures the agent’s behavior so its reasoning is explicit. Instead of the model silently deciding to call a tool, it first outputs its thought process.

Implementation

The key is in the system prompt:

REACT_SYSTEM_PROMPT = """You are an AI assistant that solves problems step by step.

For each step, follow this exact format:

Thought: [Your reasoning about what to do next]
Action: [The tool to call, or "finish" if you have the answer]

When you have enough information to answer the user's question, respond with:

Thought: [Your final reasoning]
Answer: [Your final response to the user]

Always think before acting. Never call a tool without explaining why first."""

With this prompt, the model’s responses become transparent:

User: Is it warmer in London or Tokyo right now?

Thought: I need to check the weather in both cities to compare temperatures.
I'll start by getting London's weather.
Action: get_weather({"city": "London"})

[Result: {"temperature_celsius": 14, ...}]

Thought: London is 14 degrees C. Now I need Tokyo's temperature.
Action: get_weather({"city": "Tokyo"})

[Result: {"temperature_celsius": 22, ...}]

Thought: London is 14 degrees C and Tokyo is 22 degrees C. Tokyo is clearly warmer.
Answer: Tokyo is warmer right now at 22 degrees C, compared to London's 14 degrees C
 — a difference of 8 degrees.

Why ReAct Matters

  1. Debuggability: You can see exactly why the agent made each decision.
  2. Reliability: Forcing the model to think before acting reduces impulsive or incorrect tool calls.
  3. Auditability: For production systems, the thought chain provides a log of the agent’s reasoning.
  4. Correction: If the model’s reasoning is wrong, you can detect it before the action executes.

The tradeoff is token usage — the explicit reasoning adds output tokens. For cost-sensitive applications, you may want to use ReAct during development and switch to a more concise format in production.

Error Handling and Guardrails

Production agents need robust error handling. An unguarded agent will happily loop forever, exhaust your API budget, or return nonsensical results.

Iteration Limits and Token Budgets

MAX_ITERATIONS = 15


def run_agent_safe(user_message: str) -> str:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_message},
    ]

    total_tokens = 0

    for iteration in range(MAX_ITERATIONS):
        response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            tools=tools,
        )

        # Track token usage
        if response.usage:
            total_tokens += response.usage.total_tokens

        # Budget guard
        if total_tokens > 50000:
            return (
                "I've used a large number of tokens on this task. "
                "Here's what I've found so far: "
                + summarize_findings(messages)
            )

        message = response.choices[0].message

        if not message.tool_calls:
            return message.content

        messages.append(message.model_dump())

        for tool_call in message.tool_calls:
            result = execute_tool_with_timeout(
                tool_call.function.name,
                tool_call.function.arguments,
                timeout_seconds=30,
            )
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result,
            })

    return "I wasn't able to complete this task within the allowed steps."

Tool Execution Timeouts

import signal


class ToolTimeout(Exception):
    pass


def execute_tool_with_timeout(name: str, arguments: str, timeout_seconds: int = 30) -> str:
    """Execute a tool with a timeout guard."""
    def handler(signum, frame):
        raise ToolTimeout(f"Tool {name} timed out after {timeout_seconds}s")

    old_handler = signal.signal(signal.SIGALRM, handler)
    signal.alarm(timeout_seconds)

    try:
        result = execute_tool(name, arguments)
        return result
    except ToolTimeout:
        return json.dumps({"error": f"Tool execution timed out after {timeout_seconds} seconds"})
    finally:
        signal.alarm(0)
        signal.signal(signal.SIGALRM, old_handler)

Input Validation

Always validate tool arguments before execution:

def execute_tool_validated(name: str, arguments: str) -> str:
    if name not in TOOL_FUNCTIONS:
        return json.dumps({"error": f"Unknown tool: {name}"})

    try:
        args = json.loads(arguments)
    except json.JSONDecodeError:
        return json.dumps({"error": "Invalid JSON arguments"})

    # Type checking and sanitization per tool
    if name == "read_file" and "path" in args:
        path = args["path"]
        # Block path traversal attacks
        if ".." in path or path.startswith("/etc") or path.startswith("/root"):
            return json.dumps({"error": "Path not allowed"})

    try:
        result = TOOL_FUNCTIONS[name](**args)
        return json.dumps(result)
    except TypeError as e:
        return json.dumps({"error": f"Invalid arguments: {str(e)}"})
    except Exception as e:
        return json.dumps({"error": f"Execution error: {str(e)}"})

Retry Logic

When a tool call fails, give the model a chance to self-correct:

# The tool returns an error
result = execute_tool(tool_call.function.name, tool_call.function.arguments)

# Add the error result to messages
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": result,  # Contains {"error": "..."}
})

# The model sees the error and can try a different approach
# No special retry code needed — the agent loop handles it naturally

This is one of the elegant properties of the agent loop: errors are just tool results. The model reads the error, adjusts its approach, and tries again. The iteration limit prevents infinite retries.

Production Deployment Considerations

Moving from a script to a production agent involves several additional concerns.

Async Execution

Production agents should use async I/O to handle concurrent requests:

import asyncio
import openai


async def run_agent_async(user_message: str) -> str:
    client = openai.AsyncOpenAI(
        base_url="https://api.openai.com/v1",
        api_key="your-api-key"
    )

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_message},
    ]

    for _ in range(MAX_ITERATIONS):
        response = await client.chat.completions.create(
            model=MODEL,
            messages=messages,
            tools=tools,
        )

        message = response.choices[0].message

        if not message.tool_calls:
            return message.content

        messages.append(message.model_dump())

        # Execute tool calls concurrently
        async def execute_and_format(tc):
            result = await asyncio.to_thread(
                execute_tool, tc.function.name, tc.function.arguments
            )
            return {
                "role": "tool",
                "tool_call_id": tc.id,
                "content": result,
            }

        tool_results = await asyncio.gather(
            *[execute_and_format(tc) for tc in message.tool_calls]
        )
        messages.extend(tool_results)

    return "Max iterations reached."

Structured Logging

Log every step of the agent loop for debugging and monitoring:

import logging
import time

logger = logging.getLogger("agent")


def run_agent_logged(user_message: str, request_id: str) -> str:
    start_time = time.time()
    total_tokens = 0

    logger.info("Agent started", extra={
        "request_id": request_id,
        "user_message": user_message[:200],
    })

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_message},
    ]

    for iteration in range(MAX_ITERATIONS):
        response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            tools=tools,
        )

        if response.usage:
            total_tokens += response.usage.total_tokens

        message = response.choices[0].message

        if not message.tool_calls:
            elapsed = time.time() - start_time
            logger.info("Agent completed", extra={
                "request_id": request_id,
                "iterations": iteration + 1,
                "total_tokens": total_tokens,
                "elapsed_seconds": round(elapsed, 2),
            })
            return message.content

        for tool_call in message.tool_calls:
            logger.info("Tool call", extra={
                "request_id": request_id,
                "iteration": iteration + 1,
                "tool": tool_call.function.name,
                "arguments": tool_call.function.arguments,
            })

        messages.append(message.model_dump())
        for tool_call in message.tool_calls:
            result = execute_tool(tool_call.function.name, tool_call.function.arguments)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result,
            })

    return "Max iterations reached."

API Endpoint Considerations

For production agent deployments, your choice of API endpoint matters more than in simple chat applications. Agents make multiple sequential API calls, so latency compounds. They generate more tokens, so pricing differences are amplified. And they depend on reliable tool calling, which varies by provider.

If you are running agents that need access to multiple model families — say, Claude for complex reasoning steps and GPT-4o-mini for simple classification sub-tasks — an aggregation platform simplifies the architecture. Rather than managing multiple API clients with different authentication, you use a single endpoint. Platforms like Ofox.ai provide this unified access through an OpenAI-compatible interface, which means your agent code works without modification regardless of which underlying model handles each request.

Rate Limiting and Queuing

Agents can generate bursts of API calls. Implement client-side rate limiting:

import asyncio
from asyncio import Semaphore


class RateLimiter:
    def __init__(self, max_concurrent: int = 5, requests_per_minute: int = 60):
        self.semaphore = Semaphore(max_concurrent)
        self.rpm = requests_per_minute
        self.interval = 60.0 / requests_per_minute
        self._last_request = 0.0

    async def acquire(self):
        await self.semaphore.acquire()
        now = asyncio.get_event_loop().time()
        wait_time = self._last_request + self.interval - now
        if wait_time > 0:
            await asyncio.sleep(wait_time)
        self._last_request = asyncio.get_event_loop().time()

    def release(self):
        self.semaphore.release()

Framework Comparison

You do not need a framework to build agents — this entire guide uses only the OpenAI Python SDK. But frameworks can accelerate development for complex use cases. Here is an honest comparison.

Raw API (What This Guide Uses)

Pros: Full control, no dependencies beyond the SDK, easy to debug, no abstraction overhead, works with any OpenAI-compatible endpoint.

Cons: You build everything yourself — memory, tool management, error handling, observability.

Best for: Teams that want to understand exactly what their agent does, simple to moderate complexity agents, situations where framework abstractions get in the way.

LangChain / LangGraph

Pros: Massive ecosystem of integrations (vector stores, document loaders, output parsers), LangGraph handles complex stateful workflows, large community.

Cons: Heavy abstraction layers make debugging difficult, frequent breaking changes between versions, “framework lock-in” where your code becomes deeply coupled to LangChain patterns, excessive boilerplate for simple tasks.

Best for: Complex multi-step workflows with branching logic, teams that need lots of pre-built integrations, RAG applications.

CrewAI

Pros: Multi-agent orchestration is first-class, role-based agent design is intuitive, good for simulating team workflows.

Cons: Opinionated architecture that does not fit all use cases, less mature than LangChain, debugging multi-agent interactions is inherently complex.

Best for: Multi-agent systems where agents have distinct roles (researcher, writer, reviewer), complex tasks that benefit from decomposition into agent teams.

Claude Agent SDK (Anthropic)

Pros: Native integration with Claude models, designed for Anthropic’s tool use patterns, clean API design.

Cons: Anthropic-only (no OpenAI or Gemini models), newer and less battle-tested, smaller ecosystem.

Best for: Teams committed to Claude, projects that need deep integration with Anthropic’s specific features like extended thinking.

OpenAI Agents SDK

Pros: First-party support from OpenAI, well-integrated with GPT models, supports handoffs between agents.

Cons: OpenAI-centric, ties you to their model ecosystem, newer framework still evolving.

Best for: Teams using GPT models, applications that need multi-agent handoffs with guaranteed format compatibility.

Comparison Table

FeatureRaw APILangChainCrewAIClaude SDKOpenAI SDK
Learning curveLowHighMediumLowLow
DebuggingEasyHardMediumEasyEasy
Multi-model supportAnyAnyAnyClaude onlyGPT only
Multi-agentManualLangGraphNativeManualNative
DependenciesMinimalHeavyModerateLightLight
Production readinessYou build itMatureGrowingGrowingGrowing

The Pragmatic Recommendation

Start with the raw API approach from this guide. Build your first agent without any framework. You will understand every line of code, every API call, and every decision point. When (and if) you hit a wall — say, you need complex multi-agent orchestration or dozens of pre-built tool integrations — evaluate frameworks based on the specific capability you need, not on marketing promises.

Many production agents at serious companies run on fewer than 200 lines of code, using nothing more than the OpenAI SDK and standard Python libraries. Complexity is not a feature.

Putting It All Together

Here is the architecture of a production-ready agent, incorporating everything from this guide:

User Request
    |
    v
Agent Controller
  - Rate limiting
  - Token budgeting
  - Iteration limits
  - Structured logging
    |
    v
Agent Loop (Observe -> Think -> Act)
    |
    v
Tool Registry
  [Weather] [Search] [Files]
  [Math]    [DB]     [API]
    |
    v
Memory Layer
  - Conversation history (short-term)
  - Persistent storage (long-term)
  - Context window management

The agent controller handles cross-cutting concerns (limits, logging, rate limiting). The agent loop is the core observe-think-act cycle. The tool registry manages available tools. The memory layer handles state across turns and sessions.

This architecture scales from a hobby project to a production service. The only thing that changes between them is the robustness of each layer — more comprehensive logging, stricter guardrails, better monitoring, and more reliable tool implementations.

Next Steps

Building your first agent is the beginning, not the end. Here are directions to explore:

  1. Add real tools: Replace mock data with actual API integrations — weather APIs, search engines, databases, file systems.
  2. Implement streaming: Stream agent responses to users as they are generated, rather than waiting for the full loop to complete.
  3. Build a web interface: Wrap your agent in a FastAPI or Flask server and connect it to a frontend.
  4. Add evaluation: Build a test suite that runs your agent against known tasks and measures success rate, token usage, and latency.
  5. Explore multi-agent systems: Create specialized agents that hand off to each other — a router agent that delegates to a researcher agent, a writer agent, and a reviewer agent.

The code in this guide is complete, working, and production-oriented. Copy it, modify it, break it, and rebuild it. That is the fastest path to understanding AI agents deeply enough to build them with confidence.