Docs/Context Inspector

Context Inspector

Real-time visibility into what your AI agent is thinking: tokens, context window, caching, and session uptime.

What the Context Inspector Shows

The Context Inspector gives you a live view of your agent's internal state:

📊
Token Usage

How many tokens are in the current conversation, broken down by input, output, and cached tokens.

🧠
Context Window

Total capacity vs. used capacity. See how close you are to the model's context limit (e.g., 200K tokens for Claude).

Cache Status

Which parts of the conversation are cached. Cached tokens are faster and cheaper to process.

⏱️
Session Uptime

How long the current session has been active. Long sessions may benefit from a restart to clear memory.

Token Breakdown Chart

The Context Inspector shows a visual breakdown of where tokens are being used:

Token Distribution
System Prompt & Instructions12,450 tokens
Conversation History8,300 tokens
File Context (MEMORY.md, daily files)18,600 tokens
Tool Output & Results6,200 tokens
Cached (System + Memory)30,900 tokens ⚡
Total Context Used45,550 / 200,000 tokens (23%)

Hover over any bar to see more details about what's consuming tokens in that category.

Understanding Token Costs

Tokens are the units LLM APIs use for billing. The Context Inspector helps you understand costs:

💬
Input Tokens

Everything your agent reads: your messages, files, memory, system prompt. These are cheaper than output tokens.

🤖
Output Tokens

Everything your agent writes: responses, code, tool calls. These cost more than input tokens (typically 3-5x).

Cached Tokens

Tokens that don't need reprocessing (like your system prompt and memory). These are ~10x cheaper than regular input tokens and much faster.

💡

Pro tip: If token costs are high, check the Context Inspector to see what's using the most tokens. Long daily files or verbose tool outputs can add up quickly.

Auto-Refresh Behavior

The Context Inspector updates automatically as your conversation progresses:

  • Token counts refresh after every agent response
  • Cache status updates when prompt caching occurs (typically on long system messages)
  • Session uptime ticks up continuously while the agent is active
  • Visual breakdown chart animates to reflect changes

You don't need to manually refresh — the inspector stays in sync with your conversation automatically.

Model Information

The Context Inspector also shows which LLM model your agent is using:

🤖
Active Model
Model:claude-sonnet-4.5-20250929
Context Window:200,000 tokens
Provider:Anthropic
Caching Enabled:Yes ⚡

If you switch models mid-conversation (e.g., from Claude to GPT-4), the inspector updates to show the new model's specs.

When to Check the Context Inspector

Use the Context Inspector to troubleshoot or optimize:

🐢 Agent seems slow

Check if you're near the context limit. If so, start a new session or archive old daily files.

💸 High API costs

Look at token breakdown to see what's consuming the most. Trim verbose memory or tool outputs.

❓ Agent forgets things

If context is full, old conversation history gets pruned. Check if important info was lost.

🔍 Debugging weird behavior

See exactly what context the agent has access to — helps identify missing or incorrect information.

Opening the Context Inspector

Access the Context Inspector from anywhere in Pinchr:

Keyboard Shortcut

Press to toggle the inspector panel

⌘⇧I

You can also open it from the top toolbar by clicking the "Context" button.

Questions about Context Inspector?

Join our community or reach out — we're here to help.