How to Build an AI Agent: Framework Guide for Developers

How to Build AI Agent: Building AI Agents
Core Agent Architecture
Framework Comparison
Building with LangChain
CrewAI for Multi-Agent Systems
AutoGen: Conversation Patterns
Claude Agent SDK
Tool Integration and Function Calling
Memory and State Management
Cost Optimization for Agent Inference
FAQ
Related Resources
Sources

How to Build AI Agent: Building AI Agents

AI agents have three layers: an LLM that reasons, tool execution environment, and orchestration logic. Most agents: observe state → reason → pick tools → execute → loop.

Minimum requirements:

LLM with function calling (Claude, GPT-4, Gemini)
Tool definitions
Memory for context
State machine for loops
Cost tracking

Core Agent Architecture

The Agent Loop

All agents share a common execution pattern. The loop repeats until termination conditions are met. For detailed framework comparisons, see agentic AI frameworks.

Observe → Reason → Plan → Execute → Reflect → Loop

Observe means reading the current state. This includes task description, previous outputs, tool results, and available actions.

Reason is the LLM generating thoughts and decisions based on observations. Quality of reasoning depends heavily on context window size and model capabilities.

Plan involves determining which tools to call next, with what parameters. Function calling syntax varies by framework.

Execute runs the selected tool in a sandboxed environment. Results become inputs for the next observation phase.

Reflect allows the agent to evaluate whether progress is happening. Some frameworks include explicit reflection steps.

The loop continues until the agent outputs a final answer or hits iteration limits.

Context Window Implications

Context window size directly impacts agent effectiveness. Longer conversations accumulate tool results, previous reasoning traces, and intermediate outputs.

Claude Sonnet 4.6 provides 200k context (as of March 2026), supporting extended reasoning chains. Smaller models like GPT-4.1 still offer 128k context. For long-running agents processing many tool results, larger context windows reduce information loss between iterations.

Cost scales with context usage. An agent processing 100 tool calls and accumulating 150k tokens of conversation costs more than the same task structured to minimize context. Strategic context management becomes critical for production deployments.

Memory Architecture

Agents need multiple memory layers. Working memory holds the current task and recent interactions. This refreshes each iteration. Long-term memory stores facts learned from previous tool results, applicable across runs.

Simple implementations use system prompts to inject context. Sophisticated systems maintain vector databases of past interactions, retrieving relevant memories when needed.

Framework Comparison

Four major frameworks dominate the agent space.

Framework	Best For	Learning Curve	Tool Integration	Memory Support
LangChain	Prototyping, chains	Low-Medium	Excellent	Basic
CrewAI	Multi-agent collab	Medium	Good	Built-in
AutoGen	Conversation flows	Medium-High	Very Good	External
Claude SDK	Advanced agents	High	Native	Custom

LangChain dominates for tutorials and quick prototypes. It abstracts common patterns into composable objects. Tool calling works well. The framework feels natural for developers coming from traditional software engineering.

CrewAI emphasizes role-based agents. Each agent has a persona, goal, and backstory. This structure works well for systems where multiple agents cooperate toward shared objectives. The framework handles inter-agent communication.

AutoGen treats agents as conversation participants. Agents exchange messages until consensus is reached or a termination condition triggers. This works well for reasoning tasks where dialogue between perspectives improves outcomes.

Claude Agent SDK provides low-level control. It exposes the raw patterns without heavy abstraction. This appeals to teams building novel architectures or needing precise behavior control.

Building with LangChain

Setup and Basic Structure

from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory

llm = OpenAI(model="gpt-4.1", temperature=0)
memory = ConversationBufferMemory()

tools = [
    Tool(
        name="Search",
        func=web_search,
        description="Search the web for current information"
    ),
    Tool(
        name="Calculator",
        func=calculate,
        description="Perform mathematical operations"
    )
]

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent="zero-shot-react-description",
    memory=memory,
    verbose=True
)

The agent type determines reasoning strategy. "zero-shot-react-description" uses the ReAct pattern: Reasoning, Action, Observation. It works without training examples.

Tool Definition

Tools map natural language to functions. Quality descriptions matter significantly. Vague descriptions lead to incorrect tool selection.

def search_financial_data(query: str) -> str:
    """
    Search financial databases for stock prices, earnings, and market data.

    Args:
        query: Specific question about financial data, e.g., 'AAPL earnings 2024'

    Returns:
        Relevant financial information with dates and sources
    """
    # Implementation
    pass

tool = Tool(
    name="FinancialSearch",
    func=search_financial_data,
    description="Search financial databases for stock prices, earnings, and market data. Use this for investment research questions."
)

Good descriptions specify use cases and expected input formats. This reduces hallucination and incorrect selections.

Agent Execution

response = agent.run("What is the market cap of Apple?")

During execution, the framework logs each decision. Intermediate reasoning steps appear in verbose output. Tool results flow back into context for the next decision cycle.

Cost tracking requires monitoring API calls. Explore available tools for cost tracking. LangChain provides callback hooks for this.

from langchain.callbacks import get_openai_token_count

class CostTracker(BaseCallbackHandler):
    def __init__(self):
        self.total_tokens = 0

    def on_llm_end(self, response, **kwargs):
        tokens = response.llm_output.get("token_usage", {})
        self.total_tokens += tokens.get("total_tokens", 0)

CrewAI for Multi-Agent Systems

Agent Definition

CrewAI structures agents with explicit roles.

from crewai import Agent, Task, Crew

researcher = Agent(
    role="Research Analyst",
    goal="Provide accurate, well-researched information",
    backstory="Expert analyst with 10 years experience",
    tools=[web_search_tool, database_tool]
)

writer = Agent(
    role="Technical Writer",
    goal="Explain complex concepts clearly",
    backstory="Former software engineer who writes documentation",
    tools=[formatting_tool, template_tool]
)

Roles shape reasoning behavior. The agent considers its persona when generating responses.

Task Creation and Orchestration

research_task = Task(
    description="Research best practices for API design",
    agent=researcher,
    expected_output="Comprehensive report with citations"
)

writing_task = Task(
    description="Write a blog post about API design best practices",
    agent=writer,
    expected_output="2000-word blog post in markdown"
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential
)

result = crew.kickoff()

Sequential processing ensures the researcher completes before the writer begins. Hierarchical processing assigns a manager agent to coordinate other agents.

Multi-Agent Reasoning

When multiple agents collaborate, each brings different expertise. The framework handles context passing between agents. Earlier agents' outputs become inputs for downstream agents.

This pattern works well for complex tasks decomposing into specialized subtasks. A research agent gathers information. An analyst agent evaluates findings. A writer agent produces final output.

AutoGen: Conversation Patterns

Agent Registration

from autogen import AssistantAgent, UserProxyAgent

config_list = [{"model": "gpt-4.1", "api_key": "your-key"}]

assistant = AssistantAgent(
    name="Assistant",
    llm_config={"config_list": config_list}
)

user_proxy = UserProxyAgent(
    name="User",
    human_input_mode="NEVER"
)

AutoGen distinguishes between assistant agents (LLM-powered) and user proxy agents (represent users). Proxy agents can approve actions or provide human feedback.

Conversation Loop

user_proxy.initiate_chat(
    assistant,
    message="Analyze this dataset and provide insights"
)

The framework manages message exchange until a termination condition is met. Termination conditions include word limits, iteration counts, or explicit "exit" messages.

Tool Use in AutoGen

def analyze_data(file_path: str) -> str:
    # Analysis implementation
    return "Dataset contains X records..."

assistant.register_function(
    func=analyze_data,
    description="Analyze a CSV file and return statistical summaries"
)

Tools are functions registered on specific agents. When an agent decides to use a tool, it calls the function and receives results.

Claude Agent SDK

The Claude Agent SDK from Anthropic provides native support for agentic patterns. Unlike wrapper frameworks, it exposes the core agent logic directly. Learn more about AI agent frameworks and MCP server integration.

Basic Implementation

from anthropic import Anthropic

client = Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Get weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and state"
                }
            }
        }
    }
]

messages = []

def run_agent(user_message: str):
    messages.append({"role": "user", "content": user_message})

    while True:
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )

        if response.stop_reason == "tool_use":
            # Process tool calls
            for content_block in response.content:
                if content_block.type == "tool_use":
                    tool_name = content_block.name
                    tool_input = content_block.input
                    tool_result = execute_tool(tool_name, tool_input)

                    messages.append({"role": "assistant", "content": response.content})
                    messages.append({
                        "role": "user",
                        "content": [{
                            "type": "tool_result",
                            "tool_use_id": content_block.id,
                            "content": tool_result
                        }]
                    })
        else:
            # Agent reached conclusion
            return response.content[-1].text

The SDK returns structured tool use objects. Developers explicitly handle tool results and continue the conversation loop.

Advantages of Low-Level Control

Direct control enables custom termination logic, specialized caching strategies, and precise cost monitoring. Developers see exactly what happens at each iteration.

The trade-off is verbosity. Simple agents require more code than with LangChain.

Tool Integration and Function Calling

Designing Tool Functions

Effective tools have clear boundaries. A weather tool should return weather. A calculator tool should do math. Mixing responsibilities confuses the agent.

def calculate_compound_interest(principal: float, rate: float, years: int) -> str:
    """Calculate compound interest using the standard formula."""
    result = principal * (1 + rate) ** years
    return f"${result:.2f}"

def financial_analysis(principal: float, rate: float, years: int, action: str) -> str:
    """Do financial stuff."""
    if action == "compound":
        # Calculate compound interest
    elif action == "simple":
        # Calculate simple interest
    # etc

Function Calling Patterns

Different models have different function calling syntax. Claude uses tool_use blocks. GPT models use function_call format. Gemini uses function_declaration syntax.

Most frameworks abstract these differences. Developers define tools once, and the framework handles model-specific formatting.

Tool Result Integration

After a tool executes, results must flow back into the agent context. Different frameworks handle this differently.

LangChain's agent automatically incorporates results into the next observation. AutoGen requires explicit result messages. The Claude SDK requires manual message construction.

This difference affects code structure. With LangChain, the framework handles orchestration. With the Claude SDK, developers orchestrate explicitly.

Streaming Tool Results

For long-running tools (database queries, API calls), streaming provides feedback before completion.

def search_large_dataset(query: str):
    for result in fetch_results(query):
        yield f"Found: {result}"
        time.sleep(0.1)

Some frameworks buffer entire results. Others support streaming. Check the framework documentation for streaming support.

Memory and State Management

Conversation History

The simplest memory is conversation history. Every message since task start gets included in context. This works for brief interactions but becomes expensive as conversations grow.

Claude Sonnet 4.6 supports 200k token context. GPT-4.1 offers 128k. This is substantial, but 100 tool calls with detailed results can consume half the context window.

Hierarchical Memory

Sophisticated agents use multiple memory tiers:

Working Memory: Current task, recent interactions, immediate context (last 5-10 turns)
Session Memory: Full conversation history for the current session
Long-term Memory: Facts and learnings persisted across sessions

Working memory appears in the prompt. Session memory is available for retrieval if needed. Long-term memory lives in external storage.

Vector Database Integration

For long-term memory, vector databases (Pinecone, Weaviate, Milvus) store semantic representations of past interactions.

When starting a new task, the agent retrieves relevant memories using semantic search:

previous_insights = vector_db.search(
    query=current_task,
    top_k=5
)

This pattern works well for multi-session agents that learn over time.

State Serialization

Production agents need to pause and resume. Serializing state requires capturing:

Agent reasoning trace
Tool results
Memory snapshots
Current task

JSON provides a simple serialization format:

{
  "task_id": "analyze_sales_data_2026-03",
  "state": "awaiting_tool_result",
  "pending_tool": "database_query",
  "context_length": 45000,
  "messages": [...],
  "session_memory": {...}
}

On resume, deserialize state and continue the agent loop from where it paused.

Cost Optimization for Agent Inference

Token Accounting

Each agent iteration consumes tokens for:

Prompt tokens (initial instruction + conversation history)
Completion tokens (agent reasoning)
Tool results (added to context)

As conversations grow, prompt tokens dominate. A 100-turn conversation with 150k context is expensive regardless of completion length.

Example cost for agents using Claude Sonnet 4.6 (as of March 2026):

Input: $3 per 1M tokens
Output: $15 per 1M tokens

A 150k token conversation with 5k output costs: (150 * $3 + 5 * $15) / 1000 = $0.52 per iteration.

Cost Reduction Strategies

Strategy 1: Summarization. After every 10 turns, summarize conversation into a single paragraph. Replace the full history with the summary. This keeps context bounded.

Strategy 2: Selective History. Include only the last N messages in context. Older messages get archived.

Strategy 3: Compression. Use a smaller, cheaper model (GPT-5 Mini at $0.25/$2) for routine tasks. Escalate to Claude Sonnet 4.6 only when reasoning complexity increases.

Strategy 4: Batch Processing. If multiple similar tasks exist, batch them. A single agent iteration processing 10 similar queries amortizes fixed costs.

Strategy 5: Tool Result Pruning. Tool results often contain noise. Extract only relevant fields before adding to context.

full_result = {
    "status": 200,
    "data": [...],
    "metadata": {...},
    "debug_info": {...}
}

relevant_result = json.dumps(full_result["data"])

Comparing Framework Costs

All frameworks eventually call the same LLM API. Cost differences come from:

Token counting accuracy (some frameworks overestimate)
Internal retries (failed requests)
Middleware overhead (negligible for most)

For cost-sensitive applications, prefer frameworks with transparent token accounting. The Claude SDK exposes token usage directly. LangChain requires callbacks.

FAQ

Q: What model should I use for agentic applications?

Claude Sonnet 4.6 and GPT-4.1 are the current standards. Both support 128k+ context and reliable tool use. For simple agents, GPT-5 Mini ($0.25/$2) handles basic tasks. Larger models improve reasoning quality but increase cost.

Q: How do I prevent infinite loops?

Set hard iteration limits. Most frameworks support max_iterations parameter. Also, track iteration count and exit if stuck.

if iteration_count > max_iterations:
    return "Task exceeded maximum iterations"

Q: Should I use an agent or a chain for this task?

Chains execute predetermined sequences. Agents reason about next steps. If the task has a known, fixed sequence, use chains. If the task requires dynamic decision-making, use agents. Many real-world tasks use both: an agent coordinates multiple chains.

Q: How does multi-agent reasoning compare to single-agent?

Multiple agents with different perspectives often reach better conclusions through debate. The overhead is higher (more API calls, longer execution). Use multi-agent systems when reasoning quality justifies the cost.

Q: Can agents access private data?

Yes. Tools can call internal APIs, databases, or file systems. Ensure proper authentication and access controls. Never expose credentials in tool definitions.

Q: What about latency and real-time requirements?

Agents introduce latency. Each reasoning step requires an LLM call. For low-latency applications, consider pre-computed responses or caching tool results. StreamingSDK support in frameworks helps with perception of latency.

Sources

Anthropic Claude Agent SDK Documentation
LangChain Documentation
CrewAI Documentation
Microsoft AutoGen Documentation
OpenAI Function Calling Guide

Contents