How MCP gives AI models a standardized way to connect to external tools, data sources, and services — turning a chatbot into a capable, grounded agent.
tools/list, resources/list, and prompts/list. The server's capability schemas are injected into the LLM's system prompt. The LLM never calls tools directly — it emits a structured tool_use block which the host intercepts.
tools/call JSON-RPC request to the server. The server runs the logic in isolation and returns a content result. Human-in-the-loop approval can be inserted here before execution.
tool_result message. The LLM resumes generation with real, live data now in its context window. Multiple tools can be called sequentially or in parallel within a single turn.
Executable functions the LLM can invoke to take action or fetch dynamic data. Each tool has a name, description, and a JSON schema for its parameters. The LLM reads the schema and decides when and how to call it.
Read-only data the host can load into context — files, records, configs. Unlike tools, resources are pulled by the host at its discretion, not called by the LLM. Think of them as structured context injection.
Reusable prompt templates with optional arguments, exposed as slash commands or workflow starters. Server authors pre-package instructions to guide the LLM toward specific tasks without the user typing them each time.
Every AI application builds its own bespoke integrations. A coding assistant has custom Git code; a support bot has custom Zendesk code — none of it transfers. Context is stale: the LLM can only know what was baked into training or manually pasted into the prompt. Adding a new data source means rewriting the app.
One MCP server plugs into any MCP-compatible host without modification. A filesystem server works in Claude Desktop, Claude Code, and your custom app alike. The LLM gains live access to real-world data on demand, turning static completions into grounded, up-to-date, actionable responses.
When the host launches, it connects to each configured MCP server and sends an initialize handshake. It then calls tools/list, resources/list, and prompts/list to ask "what can you do?" The server responds with JSON schemas for everything it exposes. These schemas are automatically injected into the LLM's system prompt so it knows every available tool and how to call it.
The LLM doesn't execute anything directly. When it decides a tool is needed, it emits a tool_use block in its response — containing the tool name, arguments as JSON, and a unique call ID. The host intercepts this before showing any output to the user. This interception point is where human-in-the-loop approval can be inserted: the host can pause and ask the user to confirm before the tool runs.
The MCP client sends a tools/call JSON-RPC message to the server with the tool name and arguments. The server executes — querying a database, calling an external API, reading a file, running a subprocess — and returns a content block containing text, structured data, or even images. This runs inside the server's isolated process, completely separate from the LLM and host. The server has no access to the conversation history.
The tool result is appended to the conversation thread as a tool_result message. The LLM resumes generation with fresh, real-world data now in its context window. It can chain additional tool calls — searching, then reading, then summarizing — or synthesize a final answer that directly cites the retrieved data. The output is grounded in live information, not training-time guesses.