In the previous episodes, I talked about my RAG server, why it exists and how it helps me provide context to AIs.
But as my experiments progressed, I realized an important point:

a RAG server, no matter how well built it is, is not sufficient for a smooth and robust integration with modern AIs.
There is a missing building block to connect coding agents to my RAG server.

In this article, I will explain:

  • what a RAG server is,
  • what a MCP server is,
  • why RAG alone is not enough,
  • and how I plan to evolve my RAG server so it can be used via MCP.

🧠 RAG: what is it?

A RAG server (Retrieval-Augmented Generation) has a simple and powerful goal:

augment an AI model with relevant external context.

Concretely, it combines:

  • a knowledge base (documents, notes, .md files),
  • a semantic search engine (vectors, embeddings),
  • potentially AI services that rebuild enriched prompts.

When an AI receives a question, the RAG will:

  1. search the knowledge base for the most relevant text fragments,
  2. return those fragments to be included in the prompt,
  3. improve the final quality of the answer.

A RAG server is therefore essentially an augmented search backend, able to answer requests such as:

POST /api/search

with a semantic query, tags, and a list of results.

👉 The goal is to provide textual context, not to manage the AI itself.


🔌 MCP: it is not an engine, it is an interface

A MCP server (Model Context Protocol) is not another way of doing document search.
It is a standard interface that allows an AI, or a software agent, to discover and call external capabilities in a uniform way.

Concretely, with MCP a server exposes:

  • tools
  • resources
  • prompts

and each element is described with a clear schema (input/output), so that an AI can:

  • know what is available,
  • how to call it,
  • which inputs to expect.

👉 MCP is a semantic abstraction layer on top of classic APIs. It is a standard way of telling an AI:

Here is what I can do, and how you can use it.

Without MCP, an AI does not know how to call your RAG API, which routes exist, or how to format the expected JSON payloads.


❗ Why a RAG server alone is not enough

During my experiments, especially when testing Augment or other AI agents, I noticed that:

  • AIs hallucinate or reinvent the wheel if they are not given explicit context,
  • each AI or agent has its own way of consuming APIs,
  • without a standard protocol, you end up writing specific code bridges for each client,
  • you lose portability and maintainability.

In other words:

The RAG provides context.

The MCP tells the AI how to find and use that context.

Without MCP, even the best RAG becomes almost useless for intelligent agents that want to automate their usage.


🛠️ What I am going to do

Rather than creating a dedicated MCP server, I chose a more pragmatic approach:

add MCP endpoints directly to my existing RAG server.

The idea is simple:

  1. my RAG server continues to exist as before (embeddings, semantic search, ingestion),
  2. I add a standard MCP interface on top of it,
  3. the AI (Claude, Cursor, Augment or others) can discover the tools and call them automatically.

Concretely, this means adding at least:

  • a GET /mcp route that describes the server,
  • a GET /mcp/tools route that lists the available tools,
  • POST /mcp/tools/{toolName} routes to execute semantic searches from any MCP client,
  • MCP resources (documents, tags, ingestion status),
  • potentially prompts to guide the AI (for example an “expert assistant” role).

This evolution does not break anything that already exists, but opens the server to generalized automated usage.


🧩 Summary

ConceptRole
RAGProvides relevant textual context to an AI
MCPProvides a standard interface so AIs can easily use that context

👉 A RAG without MCP is useful but limited.
👉 An MCP without RAG is an empty interface.
👉 The combination of both is what makes a server truly relevant for modern AI systems.


In the coming days, I will document the MCP implementation of my server, the added endpoints, and how to test everything with different AI agents.

Stay tuned! 🚀