Custom Tools
The built-in tools cover filesystem operations, but real agents need domain-specific capabilities. In this module, you’ll learn how to create custom tools that extend your agent’s abilities with specialized functions tailored to your application’s needs.
Exercise 1: Simple Tool
The @tool decorator from langchain_core makes it easy to turn any Python function into a tool the agent can use. Let’s start with a basic example.
-
Create a simple word-counting tool:
-
Run
-
Code Preview
cat > simple_tool.py << 'EOF' import os from langchain_core.tools import tool from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") @tool def word_count(text: str) -> int: """Count the number of words in the given text.""" return len(text.split()) agent = create_deep_agent( model=MODEL, tools=[word_count], ) result = agent.invoke({"messages": [("user", "How many words are in this sentence: " "'The quick brown fox jumps over the lazy dog'" )]}) print(agent_response(result)) EOFimport os from langchain_core.tools import tool from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") @tool def word_count(text: str) -> int: """Count the number of words in the given text.""" return len(text.split()) agent = create_deep_agent( model=MODEL, tools=[word_count], ) result = agent.invoke({"messages": [("user", "How many words are in this sentence: " "'The quick brown fox jumps over the lazy dog'" )]}) print(agent_response(result)) -
-
Run it to see the agent use your custom tool:
uv run simple_tool.pySample output (your results may vary)There are 9 words in that sentence.
Notice how custom tools are passed via the tools=[…] parameter and are added alongside (not replacing) the built-in tools.
Your docstring IS your schema. The @tool decorator uses Pydantic under the hood to convert your function’s type hints and docstring into a JSON schema that the LLM reads to understand how to call your tool. A clear, specific docstring means better tool routing. A vague or missing docstring means the LLM will guess — and guess wrong. Write docstrings as if you’re explaining the tool to a colleague who’s never seen it before.
|
Exercise 2: Tool with Structured Input
For more complex tools, you can use Pydantic V2 models to define structured input schemas with rich field descriptions.
-
Create a code analysis tool with multiple parameters:
-
Run
-
Code Preview
cat > structured_tool.py << 'EOF' import os from pydantic import BaseModel, Field from langchain_core.tools import tool from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") class CodeAnalysisInput(BaseModel): code: str = Field(description="The Python code to analyze") check_types: bool = Field(default=True, description="Whether to check for type hints") check_docstrings: bool = Field(default=True, description="Whether to check for docstrings") @tool(args_schema=CodeAnalysisInput) def analyze_code(code: str, check_types: bool = True, check_docstrings: bool = True) -> str: """Analyze Python code for quality issues like missing type hints and docstrings.""" issues = [] if check_types and "->" not in code and ":" not in code: issues.append("No type hints found") if check_docstrings and '"""' not in code and "'''" not in code: issues.append("No docstrings found") if not issues: return "Code looks good!" return "Issues found: " + ", ".join(issues) agent = create_deep_agent( model=MODEL, tools=[analyze_code], ) result = agent.invoke({"messages": [("user", "Analyze this code: def add(a, b): return a + b" )]}) print(agent_response(result)) EOFimport os from pydantic import BaseModel, Field from langchain_core.tools import tool from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") class CodeAnalysisInput(BaseModel): code: str = Field(description="The Python code to analyze") check_types: bool = Field(default=True, description="Whether to check for type hints") check_docstrings: bool = Field(default=True, description="Whether to check for docstrings") @tool(args_schema=CodeAnalysisInput) def analyze_code(code: str, check_types: bool = True, check_docstrings: bool = True) -> str: """Analyze Python code for quality issues like missing type hints and docstrings.""" issues = [] if check_types and "->" not in code and ":" not in code: issues.append("No type hints found") if check_docstrings and '"""' not in code and "'''" not in code: issues.append("No docstrings found") if not issues: return "Code looks good!" return "Issues found: " + ", ".join(issues) agent = create_deep_agent( model=MODEL, tools=[analyze_code], ) result = agent.invoke({"messages": [("user", "Analyze this code: def add(a, b): return a + b" )]}) print(agent_response(result)) -
-
Run it to see structured input in action:
uv run structured_tool.pySample output (your results may vary)The code has a couple of quality issues: 1. No type hints found - the function parameters and return value lack type annotations 2. No docstrings found - there's no documentation explaining what the function does Consider adding type hints like `def add(a: int, b: int) -> int:` and a docstring describing the function's purpose.
Pydantic V2 models give the LLM a rich schema with field descriptions, making it easier for the model to understand how to call your tool correctly with all the right parameters.
Exercise 3: Structured Output with Pydantic
You’ve seen structured input (telling the LLM what to pass to a tool). Now let’s get structured output — forcing the agent to return data in a specific Pydantic schema. This is essential when agent responses feed into downstream systems that need predictable formats.
-
Create a script that gets a structured code review from the agent:
-
Run
-
Code Preview
cat > structured_output.py << 'EOF' import os from pydantic import BaseModel, Field from deepagents import create_deep_agent MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") class CodeIssue(BaseModel): severity: str = Field(description="CRITICAL, WARNING, or INFO") line: str = Field(description="Approximate location in the code") issue: str = Field(description="One-sentence description of the problem") fix: str = Field(description="One-sentence suggested fix") class CodeReview(BaseModel): summary: str = Field(description="One-sentence overall assessment") issues: list[CodeIssue] = Field(description="List of issues found") score: int = Field(description="Quality score from 1-10") agent = create_deep_agent(model=MODEL) # Use with_structured_output to force the response into our schema structured_agent = agent | (lambda result: result) # passthrough for now # The LangChain way: bind structured output to the model from langchain.chat_models import init_chat_model model = init_chat_model(MODEL) structured_model = model.with_structured_output(CodeReview) code_to_review = """ def get_user(id): data = eval(open(f"users/{id}.json").read()) return data """ review = structured_model.invoke( f"Review this Python code and identify all issues:\n\n{code_to_review}" ) # review is now a CodeReview object, not a string print(f"Summary: {review.summary}") print(f"Score: {review.score}/10") print(f"\nIssues ({len(review.issues)}):") for issue in review.issues: print(f" [{issue.severity}] {issue.line}: {issue.issue}") print(f" Fix: {issue.fix}") EOFimport os from pydantic import BaseModel, Field from deepagents import create_deep_agent MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") class CodeIssue(BaseModel): severity: str = Field(description="CRITICAL, WARNING, or INFO") line: str = Field(description="Approximate location in the code") issue: str = Field(description="One-sentence description of the problem") fix: str = Field(description="One-sentence suggested fix") class CodeReview(BaseModel): summary: str = Field(description="One-sentence overall assessment") issues: list[CodeIssue] = Field(description="List of issues found") score: int = Field(description="Quality score from 1-10") agent = create_deep_agent(model=MODEL) # Use with_structured_output to force the response into our schema structured_agent = agent | (lambda result: result) # passthrough for now # The LangChain way: bind structured output to the model from langchain.chat_models import init_chat_model model = init_chat_model(MODEL) structured_model = model.with_structured_output(CodeReview) code_to_review = """ def get_user(id): data = eval(open(f"users/{id}.json").read()) return data """ review = structured_model.invoke( f"Review this Python code and identify all issues:\n\n{code_to_review}" ) # review is now a CodeReview object, not a string print(f"Summary: {review.summary}") print(f"Score: {review.score}/10") print(f"\nIssues ({len(review.issues)}):") for issue in review.issues: print(f" [{issue.severity}] {issue.line}: {issue.issue}") print(f" Fix: {issue.fix}") -
-
Run it:
uv run structured_output.pySample output (your results may vary)Summary: Critical security vulnerability with multiple code quality issues Score: 2/10 Issues (3): [CRITICAL] eval() call: Using eval() on file contents allows arbitrary code execution Fix: Replace eval() with json.load() for safe JSON parsing [WARNING] open() without context manager: File handle may not be closed properly Fix: Use 'with open(...) as f:' context manager [WARNING] No input validation: The id parameter is used directly in a file path Fix: Validate and sanitize the id parameter before constructing the path
The response is a CodeReview Pydantic object, not a string. You can access review.score, review.issues[0].severity, etc. programmatically. This is how you build agent pipelines where one agent’s output feeds into another’s input with guaranteed structure.
with_structured_output() is a LangChain model feature, not a Deep Agents feature. It works on the model directly (init_chat_model(MODEL).with_structured_output(Schema)), bypassing the agent’s tool loop. Use it when you need guaranteed structured responses — for example, a subagent that returns findings in a specific format for the orchestrator to parse.
|
Exercise 4: Tool Design Patterns
Before writing more tools, consider these design patterns that make tools more effective:
Tool descriptions are the primary routing signal
The LLM decides which tool to use based primarily on the description in your docstring. Make it count.
Keep descriptions specific and action-oriented
Good: "Calculate the sum of two numbers and return the result"
Bad: "A tool for math"
When to use a tool vs. system prompt instruction
Use tools for:
-
Actions with side effects (API calls, file writes)
-
Deterministic computations
-
External data access
Use system prompts for:
-
Formatting preferences
-
Tone and style
-
General behavioral guidelines
One tool per concern
Don’t create a single "do_everything" tool. Instead, create focused tools that each do one thing well. This makes it easier for the LLM to choose the right tool and for you to maintain the code.
Exercise 5: Multiple Tools
Let’s add several tools and observe how the agent routes between them based on user questions.
-
Create an agent with multiple specialized tools:
-
Run
-
Code Preview
cat > multiple_tools.py << 'EOF' import os from langchain_core.tools import tool from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") @tool def word_count(text: str) -> int: """Count the number of words in the given text.""" return len(text.split()) @tool def get_timestamp() -> str: """Get the current date and time in ISO format.""" from datetime import datetime return datetime.now().isoformat() @tool def calculate(expression: str) -> str: """Evaluate a mathematical expression safely. Only supports basic arithmetic.""" allowed = set("0123456789+-*/.() ") if not all(c in allowed for c in expression): return "Error: only basic arithmetic is supported" return str(eval(expression)) agent = create_deep_agent( model=MODEL, tools=[word_count, get_timestamp, calculate], ) # Try questions that route to different tools questions = [ "What time is it?", "How many words: 'Hello world from Python'", "What is 15 * 23 + 7?", ] for question in questions: print(f"\nQuestion: {question}") result = agent.invoke({"messages": [("user", question)]}) print(f"Answer: {agent_response(result)}") EOFimport os from langchain_core.tools import tool from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") @tool def word_count(text: str) -> int: """Count the number of words in the given text.""" return len(text.split()) @tool def get_timestamp() -> str: """Get the current date and time in ISO format.""" from datetime import datetime return datetime.now().isoformat() @tool def calculate(expression: str) -> str: """Evaluate a mathematical expression safely. Only supports basic arithmetic.""" allowed = set("0123456789+-*/.() ") if not all(c in allowed for c in expression): return "Error: only basic arithmetic is supported" return str(eval(expression)) agent = create_deep_agent( model=MODEL, tools=[word_count, get_timestamp, calculate], ) # Try questions that route to different tools questions = [ "What time is it?", "How many words: 'Hello world from Python'", "What is 15 * 23 + 7?", ] for question in questions: print(f"\nQuestion: {question}") result = agent.invoke({"messages": [("user", question)]}) print(f"Answer: {agent_response(result)}") -
-
Run it and watch the agent select the appropriate tool for each question:
uv run multiple_tools.pySample output (your results may vary)Question: What time is it? Answer: The current date and time is 2026-03-30T14:32:15.891234. Question: How many words: 'Hello world from Python' Answer: There are 4 words in that text. Question: What is 15 * 23 + 7? Answer: The result is 352.
The agent automatically chooses the right tool based on the question and the tool descriptions you provided.
Exercise 6: Interrupt on Tool Call
For tools with side effects or high costs, you can add human-in-the-loop approval using interrupts.
-
Create an agent that pauses before executing certain tools:
-
Run
-
Code Preview
cat > interrupt_tool.py << 'EOF' import os from langchain_core.tools import tool from langgraph.checkpoint.memory import MemorySaver from langgraph.types import Command from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") @tool def word_count(text: str) -> int: """Count the number of words in the given text.""" return len(text.split()) @tool def calculate(expression: str) -> str: """Evaluate a mathematical expression safely. Only supports basic arithmetic.""" allowed = set("0123456789+-*/.() ") if not all(c in allowed for c in expression): return "Error: only basic arithmetic is supported" return str(eval(expression)) # Interrupts require a checkpointer to save state between pause and resume checkpointer = MemorySaver() agent = create_deep_agent( model=MODEL, tools=[word_count, calculate], interrupt_on={"calculate": True}, checkpointer=checkpointer, ) # The agent will pause before executing the calculate tool config = {"configurable": {"thread_id": "interrupt-demo"}} print("Sending: 'What is 100 * 50?'") result = agent.invoke( {"messages": [("user", "What is 100 * 50?")]}, config=config, ) # Check if we hit an interrupt state = agent.get_state(config) if state.next: print(f"\n*** INTERRUPT: Agent paused at {state.next} ***") # Show what the agent wants to do last_msg = state.values["messages"][-1] if hasattr(last_msg, 'tool_calls'): for tc in last_msg.tool_calls: print(f" Tool: {tc['name']}") print(f" Args: {tc['args']}") # Auto-approve for demo purposes print("\n[Auto-approving for demo...]") # Resume by providing approval decisions decisions = {"decisions": [{"type": "approve"}]} result = agent.invoke(Command(resume=decisions), config=config) print(f"\nResult: {agent_response(result)}") else: print(f"Result: {agent_response(result)}") EOFimport os from langchain_core.tools import tool from langgraph.checkpoint.memory import MemorySaver from langgraph.types import Command from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") @tool def word_count(text: str) -> int: """Count the number of words in the given text.""" return len(text.split()) @tool def calculate(expression: str) -> str: """Evaluate a mathematical expression safely. Only supports basic arithmetic.""" allowed = set("0123456789+-*/.() ") if not all(c in allowed for c in expression): return "Error: only basic arithmetic is supported" return str(eval(expression)) # Interrupts require a checkpointer to save state between pause and resume checkpointer = MemorySaver() agent = create_deep_agent( model=MODEL, tools=[word_count, calculate], interrupt_on={"calculate": True}, checkpointer=checkpointer, ) # The agent will pause before executing the calculate tool config = {"configurable": {"thread_id": "interrupt-demo"}} print("Sending: 'What is 100 * 50?'") result = agent.invoke( {"messages": [("user", "What is 100 * 50?")]}, config=config, ) # Check if we hit an interrupt state = agent.get_state(config) if state.next: print(f"\n*** INTERRUPT: Agent paused at {state.next} ***") # Show what the agent wants to do last_msg = state.values["messages"][-1] if hasattr(last_msg, 'tool_calls'): for tc in last_msg.tool_calls: print(f" Tool: {tc['name']}") print(f" Args: {tc['args']}") # Auto-approve for demo purposes print("\n[Auto-approving for demo...]") # Resume by providing approval decisions decisions = {"decisions": [{"type": "approve"}]} result = agent.invoke(Command(resume=decisions), config=config) print(f"\nResult: {agent_response(result)}") else: print(f"Result: {agent_response(result)}") -
-
Run it to see the interrupt mechanism:
uv run interrupt_tool.pySample output (your results may vary)Sending: 'What is 100 * 50?' *** INTERRUPT: Agent paused at ('HumanInTheLoopMiddleware.after_model',) *** Tool: calculate Args: {'expression': '100 * 50'} [Auto-approving for demo...] Result: The result of 100 * 50 is 5000.
The agent pauses before executing calculate, shows you exactly what it wants to do, and waits for approval. Notice three requirements:
-
interrupt_on={"calculate": True}— tells the agent which tools need approval -
checkpointer=MemorySaver()— saves the agent’s state when it pauses so it can resume. Without a checkpointer, there’s nowhere to store the paused state. -
Command(resume=decisions)— resumes execution by passing the approval decision back to the agent
This pattern is critical for tools with side effects like sending emails, making purchases, or deleting files.
Beyond custom tools: MCP integration
Deep Agents can also consume MCP (Model Context Protocol) servers as tools. MCP is an open standard that lets you connect to community-built integrations — databases, APIs, file systems, and more — without writing custom tool code. If an MCP server exists for a service you want to integrate, you can plug it into your agent directly.
See Bonus: MCP — Connecting Agents to External Services for a hands-on walkthrough where you build a status checker MCP server and wire it into a Deep Agent with subagents.
Module Summary
You’ve learned how to extend agents with custom tools and structured data:
-
Basic tools using the
@tooldecorator — docstrings power the schema -
Structured input with Pydantic V2 — rich field descriptions for complex tool parameters
-
Structured output with
with_structured_output()— typed Pydantic responses for agent pipelines -
Tool design patterns — action-oriented descriptions, tool vs. prompt decisions
-
Multiple tools — the agent routes between tools based on descriptions
-
Human-in-the-loop —
interrupt_onwithMemorySaverandCommand(resume=)for approval gates -
MCP integration — connect to external tool servers instead of writing everything in Python
Custom tools are where agents become truly useful in real applications. The combination of structured input (telling the LLM how to call tools), structured output (getting typed data back), and interrupt gates (human approval for side effects) gives you full control over how agents interact with your systems.