Model Context Protocol – Teaching Diagram

Layer 1 · Architecture — The Three Components

Protocol roles

🧠

MCP Host

Claude Desktop, Claude Code, your app

🔌

MCP Client

Built into host, manages connections

📡

MCP Protocol

JSON-RPC 2.0 over stdio or HTTP+SSE

⚙️

MCP Server

Exposes tools, resources, prompts

🗃️

External Systems

APIs, databases, filesystems

MCP Host The AI application the user interacts with — Claude Desktop, Claude Code, or any custom app built with an MCP SDK. The host controls which MCP servers to connect to and which capabilities the LLM is permitted to use.

MCP Client Lives inside the host. Maintains a 1-to-1 connection with each MCP server, speaks the protocol, and mediates between the LLM's tool-call requests and the server's responses. One host can manage many clients simultaneously.

MCP Server A lightweight process — any language, any machine — that advertises a set of capabilities and executes them on demand. Servers can be local child processes (stdio) or remote services (HTTP+SSE). They are fully isolated from the host.

Layer 2 · Tool Call Flow — One Round Trip

Per tool invocation

💬

User Prompt

Natural language request

🧠

LLM Decides

Selects tool + builds arguments

📤

Tool Request

JSON-RPC call over protocol

⚡

Server Executes

Calls API, DB, filesystem

📥

Tool Result

Returned to LLM context

✅

Final Answer

Grounded in live data

Capability discovery On connect, the client calls tools/list, resources/list, and prompts/list. The server's capability schemas are injected into the LLM's system prompt. The LLM never calls tools directly — it emits a structured tool_use block which the host intercepts.

Tool execution The host routes the LLM's tool-call output to the right MCP client, which sends a tools/call JSON-RPC request to the server. The server runs the logic in isolation and returns a content result. Human-in-the-loop approval can be inserted here before execution.

Result injection The tool result is appended to the conversation as a tool_result message. The LLM resumes generation with real, live data now in its context window. Multiple tools can be called sequentially or in parallel within a single turn.

The Three Primitives

🔧

Tools

Executable functions the LLM can invoke to take action or fetch dynamic data. Each tool has a name, description, and a JSON schema for its parameters. The LLM reads the schema and decides when and how to call it.

search_web run_query read_file call_api

📂

Resources

Read-only data the host can load into context — files, records, configs. Unlike tools, resources are pulled by the host at its discretion, not called by the LLM. Think of them as structured context injection.

file://app.log db://users/42 config://env

💡

Prompts

Reusable prompt templates with optional arguments, exposed as slash commands or workflow starters. Server authors pre-package instructions to guide the LLM toward specific tasks without the user typing them each time.

/summarize /code-review /debug

❌ Without MCP

Every AI application builds its own bespoke integrations. A coding assistant has custom Git code; a support bot has custom Zendesk code — none of it transfers. Context is stale: the LLM can only know what was baked into training or manually pasted into the prompt. Adding a new data source means rewriting the app.

✅ With MCP

One MCP server plugs into any MCP-compatible host without modification. A filesystem server works in Claude Desktop, Claude Code, and your custom app alike. The LLM gains live access to real-world data on demand, turning static completions into grounded, up-to-date, actionable responses.

How It Works — Step by Step

Startup — Client connects and discovers capabilities

When the host launches, it connects to each configured MCP server and sends an initialize handshake. It then calls tools/list, resources/list, and prompts/list to ask "what can you do?" The server responds with JSON schemas for everything it exposes. These schemas are automatically injected into the LLM's system prompt so it knows every available tool and how to call it.

Routing — The LLM outputs a structured tool call

The LLM doesn't execute anything directly. When it decides a tool is needed, it emits a tool_use block in its response — containing the tool name, arguments as JSON, and a unique call ID. The host intercepts this before showing any output to the user. This interception point is where human-in-the-loop approval can be inserted: the host can pause and ask the user to confirm before the tool runs.

Execution — Server runs the tool in isolation

The MCP client sends a tools/call JSON-RPC message to the server with the tool name and arguments. The server executes — querying a database, calling an external API, reading a file, running a subprocess — and returns a content block containing text, structured data, or even images. This runs inside the server's isolated process, completely separate from the LLM and host. The server has no access to the conversation history.

Grounding — The result enters the LLM's context

The tool result is appended to the conversation thread as a tool_result message. The LLM resumes generation with fresh, real-world data now in its context window. It can chain additional tool calls — searching, then reading, then summarizing — or synthesize a final answer that directly cites the retrieved data. The output is grounded in live information, not training-time guesses.