Memory & AGENTS.md

Skills give agents on-demand capabilities, but memory gives them persistent context that shapes every conversation. This module shows you how to use the AGENTS.md convention to provide agents with project knowledge, coding standards, and other context that should always be present.

Exercise 1: The AGENTS.md spec

The AGENTS.md spec defines a convention for persistent project context. It’s like a README, but written specifically for agents rather than humans. The file lives in your project root and contains:

Project overview and technology stack
Coding standards and conventions
Architecture and directory structure
Known issues, TODOs, and important context
Anything else the agent should always know

Unlike skills (which are loaded on demand), memory from AGENTS.md is always present in the agent’s context.

Create an AGENTS.md file for a Python REST API project:

cat > AGENTS.md << 'EOF'
# Project Context

This is a Python REST API project using FastAPI and SQLAlchemy.

## Coding Standards
- Use type hints on all functions
- Write docstrings for public functions
- Use async/await for database operations
- Follow PEP 8 naming conventions

## Architecture
- `src/` contains application code
- `tests/` contains pytest test files
- `alembic/` contains database migrations

## Known Issues
- The user authentication module needs rate limiting (TODO)
- Database connection pooling is not yet configured
EOF

Exercise 2: MemoryMiddleware

The agent references the coding standards from AGENTS.md without you explicitly providing them in the conversation. The MemoryMiddleware injects the contents of AGENTS.md into the system prompt, wrapped in <agent_memory> tags.

The key difference from skills:

Skills: Loaded on demand when relevant to the task
Memory: Always present in the system prompt

Use memory for context that should influence every interaction: coding standards, architecture principles, project constraints, team preferences.

Create a script to configure an agent with memory and observe how it uses the context:

Run
Code Preview

cat > test_memory.py << 'EOF'
import os
from deepagents import create_deep_agent
from utils import agent_response

MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6")

agent = create_deep_agent(
    model=MODEL,
    memory=["./AGENTS.md"],
)

result = agent.invoke({"messages": [("user",
    "What coding standards should I follow in this project?"
)]})

print(agent_response(result))
EOF

import os
from deepagents import create_deep_agent
from utils import agent_response

MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6")

agent = create_deep_agent(
    model=MODEL,
    memory=["./AGENTS.md"],
)

result = agent.invoke({"messages": [("user",
    "What coding standards should I follow in this project?"
)]})

print(agent_response(result))

Run the script:

uv run test_memory.py

Sample output (your results may vary)

Based on the project's coding standards, you should follow these practices:

1. Use type hints on all functions
2. Write docstrings for public functions
3. Use async/await for database operations
4. Follow PEP 8 naming conventions
...

Exercise 3: Self-updating memory

Agents can modify their own AGENTS.md to learn and persist knowledge across conversations. The agent uses its edit_file tool to update its own memory. The next time you start a conversation with this agent, the Pydantic V2 standard will be included in the context. This is how agents learn and persist knowledge across conversations.

Create a script to ask the agent to update its own AGENTS.md:

Run
Code Preview

cat > update_memory.py << 'EOF'
import os
from deepagents import create_deep_agent
from utils import agent_response

MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6")

agent = create_deep_agent(
    model=MODEL,
    memory=["./AGENTS.md"],
)

result = agent.invoke({"messages": [("user",
    "I just decided we should also use Pydantic V2 for all data validation. "
    "Update the project's AGENTS.md to include this coding standard."
)]})

print(agent_response(result))
EOF

import os
from deepagents import create_deep_agent
from utils import agent_response

MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6")

agent = create_deep_agent(
    model=MODEL,
    memory=["./AGENTS.md"],
)

result = agent.invoke({"messages": [("user",
    "I just decided we should also use Pydantic V2 for all data validation. "
    "Update the project's AGENTS.md to include this coding standard."
)]})

print(agent_response(result))

Run the script:

uv run update_memory.py

Sample output (your results may vary)

I've updated the AGENTS.md file to include the Pydantic V2 coding standard. The change has been added to the Coding Standards section.

Verify the change:

cat AGENTS.md

Sample output (your results may vary)

# Project Context

This is a Python REST API project using FastAPI and SQLAlchemy.

## Coding Standards
- Use type hints on all functions
- Write docstrings for public functions
- Use async/await for database operations
- Follow PEP 8 naming conventions
- Use Pydantic V2 for all data validation

## Architecture
...

Benefits of self-updating memory:

Agents remember decisions and context from previous sessions
Knowledge compounds over time
No need to repeat information across conversations
The project’s AGENTS.md becomes a living document

Exercise 4: Memory layering

Both files are loaded and injected into the system prompt. The agent now has:

Project context: from ./AGENTS.md (coding standards, architecture)
User preferences: from ~/.deepagents/AGENTS.md (response style, tool choices)

This layering pattern is useful for:

Separating project-specific context from personal preferences
Sharing user preferences across all projects
Overriding global defaults with project-specific context

Managing team-wide conventions in a shared file

Create a user-level AGENTS.md with personal preferences:

mkdir -p ~/.deepagents
cat > ~/.deepagents/AGENTS.md << 'EOF'
# User Preferences
- I prefer concise responses
- Show code examples rather than lengthy explanations
- Always use uv, never pip
EOF

Create a script to load memory from multiple sources — project-level and user-level:

Run
Code Preview

cat > layered_memory.py << 'EOF'
import os
from deepagents import create_deep_agent
from utils import agent_response

MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6")

agent = create_deep_agent(
    model=MODEL,
    memory=["./AGENTS.md", "~/.deepagents/AGENTS.md"],
)

result = agent.invoke({"messages": [("user",
    "How should I install dependencies for this project?"
)]})

print(agent_response(result))
EOF

import os
from deepagents import create_deep_agent
from utils import agent_response

MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6")

agent = create_deep_agent(
    model=MODEL,
    memory=["./AGENTS.md", "~/.deepagents/AGENTS.md"],
)

result = agent.invoke({"messages": [("user",
    "How should I install dependencies for this project?"
)]})

print(agent_response(result))

Run the script:

uv run layered_memory.py

Sample output (your results may vary)

Use uv to install dependencies:

```bash
uv pip install -r requirements.txt
```
...

Exercise 5: Memory + skills + subagents

What happens:

Memory: Agent sees project context (FastAPI/SQLAlchemy, coding standards)
Subagent: Agent delegates research to the researcher subagent
Skills: Agent loads the code-review skill to structure the review
Memory again: Agent applies project-specific coding standards to the review

Full middleware stack order:

TodoList: Manages task tracking and progress
Skills: Discovers and loads skill definitions on demand
Filesystem: Provides file read/write tools
SubAgent: Enables task delegation to specialized agents
Summarization: Condenses long conversations to manage context
PatchToolCalls: Enhances tool calling reliability
AnthropicPromptCaching: Reduces costs by caching system prompts
Memory: Injects AGENTS.md content into system prompt
HumanInTheLoop: Allows human approval for sensitive operations

Memory sits near the bottom of the stack because it shapes the system prompt that all other middleware builds upon.

Create a script to combine all three patterns — memory, skills, and subagents — to build a fully-equipped agent:

Run
Code Preview

cat > full_stack.py << 'EOF'
import os
from deepagents import create_deep_agent
from utils import agent_response

MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6")
FAST_MODEL = os.environ.get("DEEPAGENTS_FAST_MODEL", "anthropic:claude-haiku-4-5-20251001")

subagents = [
    {
        "name": "researcher",
        "description": "Research topics and gather information.",
        "system_prompt": "You are a research assistant.",
        "model": FAST_MODEL,
    }
]

agent = create_deep_agent(
    model=MODEL,
    memory=["./AGENTS.md"],
    skills=["./skills/"],
    subagents=subagents,
)

result = agent.invoke({"messages": [("user",
    "Research the latest FastAPI best practices for async database operations, "
    "then review our authentication.py file against those best practices. "
    "Make sure your review follows our project's coding standards."
)]})

print(agent_response(result))
EOF

import os
from deepagents import create_deep_agent
from utils import agent_response

MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6")
FAST_MODEL = os.environ.get("DEEPAGENTS_FAST_MODEL", "anthropic:claude-haiku-4-5-20251001")

subagents = [
    {
        "name": "researcher",
        "description": "Research topics and gather information.",
        "system_prompt": "You are a research assistant.",
        "model": FAST_MODEL,
    }
]

agent = create_deep_agent(
    model=MODEL,
    memory=["./AGENTS.md"],
    skills=["./skills/"],
    subagents=subagents,
)

result = agent.invoke({"messages": [("user",
    "Research the latest FastAPI best practices for async database operations, "
    "then review our authentication.py file against those best practices. "
    "Make sure your review follows our project's coding standards."
)]})

print(agent_response(result))

Run the script:

uv run full_stack.py

Sample output (your results may vary)

I'll research FastAPI best practices and review authentication.py.

[Delegating to researcher subagent...]
[Loading code-review skill...]
...
Review complete. The authentication.py file follows most best practices but needs:
1. Type hints on async functions
2. Docstrings for public methods
3. Connection pooling configuration
...

Exercise 6: Introducing a second model provider (OPTIONAL)

The "provider:model" format works with any LiteLLM-supported provider:

anthropic:claude-sonnet-4-6
openai:gpt-4o
openai:gpt-4o-mini
google:gemini-pro
cohere:command-r-plus

This flexibility allows you to:

Use specialized models for specific tasks (e.g., cheaper models for simple research)
Experiment with different providers without changing your code structure
Optimize cost and performance by matching models to workloads

Take advantage of unique capabilities from different providers

Set up your OpenAI API key:
```
export OPENAI_API_KEY=your-key-here
```

Create a script to use different model providers for different subagents:

Run
Code Preview

cat > multi_provider.py << 'EOF'
import os
from deepagents import create_deep_agent
from utils import agent_response

MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6")

subagents = [
    {
        "name": "researcher",
        "description": "Research topics using web search.",
        "system_prompt": "You are a research assistant.",
        "model": "openai:gpt-4o",
    },
    {
        "name": "analyst",
        "description": "Deep analysis requiring careful reasoning.",
        "system_prompt": "You are an analytical expert.",
        "model": MODEL,
    },
]

agent = create_deep_agent(
    model=MODEL,
    memory=["./AGENTS.md"],
    subagents=subagents,
)

result = agent.invoke({"messages": [("user",
    "Research the top 3 Python async frameworks and have the analyst compare them."
)]})

print(agent_response(result))
EOF

import os
from deepagents import create_deep_agent
from utils import agent_response

MODEL = os.environ.get("DEEPAGENTS_MODEL", "anthropic:claude-sonnet-4-6")

subagents = [
    {
        "name": "researcher",
        "description": "Research topics using web search.",
        "system_prompt": "You are a research assistant.",
        "model": "openai:gpt-4o",
    },
    {
        "name": "analyst",
        "description": "Deep analysis requiring careful reasoning.",
        "system_prompt": "You are an analytical expert.",
        "model": MODEL,
    },
]

agent = create_deep_agent(
    model=MODEL,
    memory=["./AGENTS.md"],
    subagents=subagents,
)

result = agent.invoke({"messages": [("user",
    "Research the top 3 Python async frameworks and have the analyst compare them."
)]})

print(agent_response(result))

Run the script:

uv run multi_provider.py

Sample output (your results may vary)

[Delegating research to researcher subagent using GPT-4o...]
Top 3 frameworks: FastAPI, Tornado, aiohttp

[Delegating analysis to analyst subagent using Claude Sonnet...]
Comparative analysis:
1. FastAPI: Best for REST APIs, modern features, automatic validation
2. Tornado: Best for WebSockets, long-lived connections
3. aiohttp: Best for HTTP clients and servers, flexible
...

The main agent and each subagent can use different models and providers. The framework handles the provider-specific details (authentication, API format, tool calling conventions) automatically.

Module summary

You’ve learned how memory gives agents persistent context:

AGENTS.md spec: Convention for project knowledge and coding standards
MemoryMiddleware: Injects memory into system prompt with <agent_memory> tags
Self-updating memory: Agents can edit their own AGENTS.md to learn over time
Memory layering: Load multiple sources (project + user preferences)
Full stack: Memory works alongside skills and subagents
Multi-provider support: Use any LiteLLM-supported model provider

Memory is what transforms an agent from a stateless tool into a persistent collaborator that remembers your project’s context, standards, and evolution over time.