Memory & AGENTS.md
Skills give agents on-demand capabilities, but memory gives them persistent context that shapes every conversation. This module shows you how to use the AGENTS.md convention to provide agents with project knowledge, coding standards, and other context that should always be present.
Exercise 1: The AGENTS.md spec
The AGENTS.md spec defines a convention for persistent project context. It’s like a README, but written specifically for agents rather than humans. The file lives in your project root and contains:
-
Project overview and technology stack
-
Coding standards and conventions
-
Architecture and directory structure
-
Known issues, TODOs, and important context
-
Anything else the agent should always know
Unlike skills (which are loaded on demand), memory from AGENTS.md is always present in the agent’s context.
-
Create an AGENTS.md file for a Python REST API project:
cat > AGENTS.md << 'EOF' # Project Context This is a Python REST API project using FastAPI and SQLAlchemy. ## Coding Standards - Use type hints on all functions - Write docstrings for public functions - Use async/await for database operations - Follow PEP 8 naming conventions ## Architecture - `src/` contains application code - `tests/` contains pytest test files - `alembic/` contains database migrations ## Known Issues - The user authentication module needs rate limiting (TODO) - Database connection pooling is not yet configured EOF
Exercise 2: MemoryMiddleware
The agent references the coding standards from AGENTS.md without you explicitly providing them in the conversation. The MemoryMiddleware injects the contents of AGENTS.md into the system prompt, wrapped in <agent_memory> tags.
The key difference from skills:
-
Skills: Loaded on demand when relevant to the task
-
Memory: Always present in the system prompt
Use memory for context that should influence every interaction: coding standards, architecture principles, project constraints, team preferences.
-
Create a script to configure an agent with memory and observe how it uses the context:
-
Run
-
Code Preview
cat > test_memory.py << 'EOF' import os from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") agent = create_deep_agent( model=MODEL, memory=["./AGENTS.md"], ) result = agent.invoke({"messages": [("user", "What coding standards should I follow in this project?" )]}) print(agent_response(result)) EOFimport os from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") agent = create_deep_agent( model=MODEL, memory=["./AGENTS.md"], ) result = agent.invoke({"messages": [("user", "What coding standards should I follow in this project?" )]}) print(agent_response(result)) -
-
Run the script:
uv run test_memory.pySample output (your results may vary)Based on the project's coding standards, you should follow these practices: 1. Use type hints on all functions 2. Write docstrings for public functions 3. Use async/await for database operations 4. Follow PEP 8 naming conventions ...
Exercise 3: Self-updating memory
Agents can modify their own AGENTS.md to learn and persist knowledge across conversations. The agent uses its edit_file tool to update its own memory. The next time you start a conversation with this agent, the Pydantic V2 standard will be included in the context. This is how agents learn and persist knowledge across conversations.
-
Create a script to ask the agent to update its own AGENTS.md:
-
Run
-
Code Preview
cat > update_memory.py << 'EOF' import os from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") agent = create_deep_agent( model=MODEL, memory=["./AGENTS.md"], ) result = agent.invoke({"messages": [("user", "I just decided we should also use Pydantic V2 for all data validation. " "Update the project's AGENTS.md to include this coding standard." )]}) print(agent_response(result)) EOFimport os from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") agent = create_deep_agent( model=MODEL, memory=["./AGENTS.md"], ) result = agent.invoke({"messages": [("user", "I just decided we should also use Pydantic V2 for all data validation. " "Update the project's AGENTS.md to include this coding standard." )]}) print(agent_response(result)) -
-
Run the script:
uv run update_memory.pySample output (your results may vary)I've updated the AGENTS.md file to include the Pydantic V2 coding standard. The change has been added to the Coding Standards section.
-
Verify the change:
cat AGENTS.mdSample output (your results may vary)# Project Context This is a Python REST API project using FastAPI and SQLAlchemy. ## Coding Standards - Use type hints on all functions - Write docstrings for public functions - Use async/await for database operations - Follow PEP 8 naming conventions - Use Pydantic V2 for all data validation ## Architecture ...
Benefits of self-updating memory:
-
Agents remember decisions and context from previous sessions
-
Knowledge compounds over time
-
No need to repeat information across conversations
-
The project’s AGENTS.md becomes a living document
Exercise 4: Memory layering
Both files are loaded and injected into the system prompt. The agent now has:
-
Project context: from
./AGENTS.md(coding standards, architecture) -
User preferences: from
~/.deepagents/AGENTS.md(response style, tool choices)
This layering pattern is useful for:
-
Separating project-specific context from personal preferences
-
Sharing user preferences across all projects
-
Overriding global defaults with project-specific context
-
Managing team-wide conventions in a shared file
-
Create a user-level AGENTS.md with personal preferences:
mkdir -p ~/.deepagents cat > ~/.deepagents/AGENTS.md << 'EOF' # User Preferences - I prefer concise responses - Show code examples rather than lengthy explanations - Always use uv, never pip EOF -
Create a script to load memory from multiple sources — project-level and user-level:
-
Run
-
Code Preview
cat > layered_memory.py << 'EOF' import os from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") agent = create_deep_agent( model=MODEL, memory=["./AGENTS.md", "~/.deepagents/AGENTS.md"], ) result = agent.invoke({"messages": [("user", "How should I install dependencies for this project?" )]}) print(agent_response(result)) EOFimport os from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") agent = create_deep_agent( model=MODEL, memory=["./AGENTS.md", "~/.deepagents/AGENTS.md"], ) result = agent.invoke({"messages": [("user", "How should I install dependencies for this project?" )]}) print(agent_response(result)) -
-
Run the script:
uv run layered_memory.pySample output (your results may vary)Use uv to install dependencies: ```bash uv pip install -r requirements.txt ``` ...
-
Exercise 5: Memory + skills + subagents
What happens:
-
Memory: Agent sees project context (FastAPI/SQLAlchemy, coding standards)
-
Subagent: Agent delegates research to the researcher subagent
-
Skills: Agent loads the code-review skill to structure the review
-
Memory again: Agent applies project-specific coding standards to the review
Full middleware stack order:
-
TodoList: Manages task tracking and progress
-
Skills: Discovers and loads skill definitions on demand
-
Filesystem: Provides file read/write tools
-
SubAgent: Enables task delegation to specialized agents
-
Summarization: Condenses long conversations to manage context
-
PatchToolCalls: Enhances tool calling reliability
-
AnthropicPromptCaching: Reduces costs by caching system prompts
-
Memory: Injects AGENTS.md content into system prompt
-
HumanInTheLoop: Allows human approval for sensitive operations
Memory sits near the bottom of the stack because it shapes the system prompt that all other middleware builds upon.
-
Create a script to combine all three patterns — memory, skills, and subagents — to build a fully-equipped agent:
-
Run
-
Code Preview
cat > full_stack.py << 'EOF' import os from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") FAST_MODEL = os.environ.get("DEEPAGENTS_FAST_MODEL", "anthropic:claude-haiku-4-5-20251001") subagents = [ { "name": "researcher", "description": "Research topics and gather information.", "system_prompt": "You are a research assistant.", "model": FAST_MODEL, } ] agent = create_deep_agent( model=MODEL, memory=["./AGENTS.md"], skills=["./skills/"], subagents=subagents, ) result = agent.invoke({"messages": [("user", "Research the latest FastAPI best practices for async database operations, " "then review our authentication.py file against those best practices. " "Make sure your review follows our project's coding standards." )]}) print(agent_response(result)) EOFimport os from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") FAST_MODEL = os.environ.get("DEEPAGENTS_FAST_MODEL", "anthropic:claude-haiku-4-5-20251001") subagents = [ { "name": "researcher", "description": "Research topics and gather information.", "system_prompt": "You are a research assistant.", "model": FAST_MODEL, } ] agent = create_deep_agent( model=MODEL, memory=["./AGENTS.md"], skills=["./skills/"], subagents=subagents, ) result = agent.invoke({"messages": [("user", "Research the latest FastAPI best practices for async database operations, " "then review our authentication.py file against those best practices. " "Make sure your review follows our project's coding standards." )]}) print(agent_response(result)) -
-
Run the script:
uv run full_stack.pySample output (your results may vary)I'll research FastAPI best practices and review authentication.py. [Delegating to researcher subagent...] [Loading code-review skill...] ... Review complete. The authentication.py file follows most best practices but needs: 1. Type hints on async functions 2. Docstrings for public methods 3. Connection pooling configuration ...
Exercise 6: Introducing a second model provider (OPTIONAL)
The "provider:model" format works with any LiteLLM-supported provider:
-
anthropic:claude-sonnet-4-6 -
openai:gpt-4o -
openai:gpt-4o-mini -
google:gemini-pro -
cohere:command-r-plus
This flexibility allows you to:
-
Use specialized models for specific tasks (e.g., cheaper models for simple research)
-
Experiment with different providers without changing your code structure
-
Optimize cost and performance by matching models to workloads
-
Take advantage of unique capabilities from different providers
-
Set up your OpenAI API key:
export OPENAI_API_KEY=your-key-here -
Create a script to use different model providers for different subagents:
-
Run
-
Code Preview
cat > multi_provider.py << 'EOF' import os from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") subagents = [ { "name": "researcher", "description": "Research topics using web search.", "system_prompt": "You are a research assistant.", "model": "openai:gpt-4o", }, { "name": "analyst", "description": "Deep analysis requiring careful reasoning.", "system_prompt": "You are an analytical expert.", "model": MODEL, }, ] agent = create_deep_agent( model=MODEL, memory=["./AGENTS.md"], subagents=subagents, ) result = agent.invoke({"messages": [("user", "Research the top 3 Python async frameworks and have the analyst compare them." )]}) print(agent_response(result)) EOFimport os from deepagents import create_deep_agent from utils import agent_response MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6") subagents = [ { "name": "researcher", "description": "Research topics using web search.", "system_prompt": "You are a research assistant.", "model": "openai:gpt-4o", }, { "name": "analyst", "description": "Deep analysis requiring careful reasoning.", "system_prompt": "You are an analytical expert.", "model": MODEL, }, ] agent = create_deep_agent( model=MODEL, memory=["./AGENTS.md"], subagents=subagents, ) result = agent.invoke({"messages": [("user", "Research the top 3 Python async frameworks and have the analyst compare them." )]}) print(agent_response(result)) -
-
Run the script:
uv run multi_provider.pySample output (your results may vary)[Delegating research to researcher subagent using GPT-4o...] Top 3 frameworks: FastAPI, Tornado, aiohttp [Delegating analysis to analyst subagent using Claude Sonnet...] Comparative analysis: 1. FastAPI: Best for REST APIs, modern features, automatic validation 2. Tornado: Best for WebSockets, long-lived connections 3. aiohttp: Best for HTTP clients and servers, flexible ...
-
|
The main agent and each subagent can use different models and providers. The framework handles the provider-specific details (authentication, API format, tool calling conventions) automatically. |
Module summary
You’ve learned how memory gives agents persistent context:
-
AGENTS.md spec: Convention for project knowledge and coding standards
-
MemoryMiddleware: Injects memory into system prompt with
<agent_memory>tags -
Self-updating memory: Agents can edit their own AGENTS.md to learn over time
-
Memory layering: Load multiple sources (project + user preferences)
-
Full stack: Memory works alongside skills and subagents
-
Multi-provider support: Use any LiteLLM-supported model provider
Memory is what transforms an agent from a stateless tool into a persistent collaborator that remembers your project’s context, standards, and evolution over time.