You know that feeling? You're an SRE, it's 3 AM, a service is down, and you need to find THE rollback procedure in a 500-page doc... You scroll, you search, you curse the colleague who wrote "see previous section" without saying which one. 😩
Plot twist: What if I told you we can transform this nightmare into a fluid conversation with an AI documentation chatbot that knows all your documentation by heart? At our company, with 50 SREs juggling between incidents and maintenance, implementing a RAG system reduced critical information search time by 3x.
In this RAG tutorial, you'll learn how to implement a complete Retrieval-Augmented Generation system on MkDocs Material documentation using LangChain, ChromaDB, and FastAPI. This step-by-step guide shows you how to build an intelligent documentation assistant that delivers accurate answers with source citations.
Picture this: 50 SREs, 7 teams, MkDocs Material documentation with:
Our teams' daily routine:
# Classic scenario at 3 AM
1. Incident detected → Service X down
2. Search procedure → 15 minutes of navigation
3. "Oh no, this isn't the right version"
4. Re-search → 10 more minutes
5. Procedure found → FINALLY!
The painful stats:
Key insight: The problem wasn't our doc quality, but its cognitive accessibility!
While MkDocs Material is excellent, traditional keyword search has significant limitations that a RAG implementation can solve:
❌ Keyword-only search (no semantic understanding)
❌ No context understanding across documents
❌ Results sometimes too numerous or off-topic
❌ Can't ask questions in natural language
❌ No cross-document info aggregation
❌ Search limited to titles and first paragraphs
❌ No notion of priority or urgencyThe community has long requested semantic search improvements, as evidenced by this GitHub issue that remains open for several years.
Comparison with other solutions:
| Solution | Semantic Search | AI Conversational | Existing Integration |
|---|---|---|---|
| Native MkDocs Material | ❌ | ❌ | ✅ |
| Algolia DocSearch | ⚠️ Limited | ❌ | ⚠️ Complex setup |
| RAG + LLM | ✅ | ✅ | ✅ |
| GitBook | ✅ | ⚠️ Basic | ❌ Migration required |
Concrete example:
Here's how we built our AI-powered documentation assistant using a complete RAG stack:
# Complete RAG tech stack for documentation search
TECH_STACK = {
"backend": "FastAPI", # Fast REST API for RAG endpoints
"embeddings": "OpenAI text-embedding-3-small", # Vector embeddings (512 dimensions, $0.02/1M tokens)
"vector_db": "ChromaDB", # Vector database for semantic search (alternative: Pinecone, Weaviate)
"llm": "GPT-4o-mini", # LLM for response generation ($0.15/1M input tokens)
"framework": "LangChain", # RAG orchestration framework
"docs_source": "MkDocs Material",
"deployment": "Docker + K8s",
"monitoring": "Prometheus + Grafana", # RAG metrics tracking
"cache": "Redis", # Semantic cache for performance
}RAG Workflow:


The genius of our approach: no need to modify MkDocs! This RAG tutorial shows you how to scrape existing content and build a vector database index in parallel, enabling semantic search without changing your current documentation setup.
# Base configuration for MkDocs indexing
MKDOCS_CONFIG = {
"docs_path": "/app/docs",
"base_url": "https://docs.company.com",
"chunk_size": 1000, # Optimal for runbooks
"chunk_overlap": 200, # Maintains coherence
"file_types": [".md"],
"exclude_patterns": ["temp/", "drafts/"]
}Here's the complete FastAPI implementation for our RAG system with OpenAI embeddings and streaming:
@app.post("/ask")
def ask_question_stream(request: QuestionRequest):
question = request.question
model = rag.llm
# Base URL configuration
BASE_DOCS_URL = "https://docs.company.com"
# Optimized retriever for technical docs
retriever = rag.vector_store.as_retriever(
search_type="mmr", # Maximum Marginal Relevance
search_kwargs={
"k": 8, # 8 chunks for rich context
"fetch_k": 20, # Larger initial pool
"lambda_mult": 0.7 # Balance relevance/diversity
}
)
retrieved_docs = retriever.
The key improvements we added:
🎯 Enhanced retrieval:
💡 Pro tip: These parameters were adjusted after 2 weeks of testing with our SRE teams!
import os
import yaml
from pathlib import Path
from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# LangChain-based document indexer for RAG implementation
class MkDocsIndexer:
def __init__(self, docs_path: str, base_url: str):
self.docs_path = Path(docs_path)
self.base_url = base_url
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["
# Automatic classification by content type
DOC_TYPES_CONFIG = {
"runbook": {
"weight": 1.5, # High priority for incidents
"keywords": ["incident", "rollback", "emergency", "critical"]
},
"api_doc": {
"weight": 1.2,
"keywords": ["endpoint", "authentication", "request", "response"]
},
"troubleshooting": {
"weight": 1.4, # High priority for debugging
"keywords": ["error", "debug", "logs", "diagnostic"]
},
"general": {
"weight"
# docker-compose.yml for local dev
version: '3.8'
services:
rag-api:
build: .
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- DOCS_PATH=/app/docs
- BASE_DOCS_URL=https://docs.company.com
volumes:
- ./docs:/app/docs:ro
- ./vector_db:/app/vector_db
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
# Simple frontend for testing
rag-frontend:
build:
// Simple but effective React component
function RAGChat() {
const [question, setQuestion] = useState('');
const [response, setResponse] = useState('');
const [loading, setLoading] = useState(false);
const askQuestion = async () => {
setLoading(true);
setResponse('');
try {
const response = await fetch('/api/ask', {
method: 'POST',
🎯 Chunking and indexing
# ✅ DO: Respect logical doc structure
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Optimal for technical docs
chunk_overlap=200, # Maintains context
separators=[
"\n## ", # Main sections first
"\n### ", # Then subsections
"\n\n", # Paragraphs
"\n", " ", "" # Finally words/characters
]
)
# ✅ DO: Enrich metadata
metadata = {
"doc_type": "runbook", # Classification
"urgency": "critical",
🔍 Intelligent retrieval configuration
# ✅ DO: Adjust according to question type
def get_retriever_config(question_type):
if "emergency" in question.lower() or "incident" in question.lower():
return {"k": 12, "doc_types": ["runbook", "troubleshooting"]}
elif "api" in question.lower():
return {"k": 6, "doc_types": ["api_doc"]}
else:
return {"k": 8, "doc_types": "all"}🧠 Adaptive prompt engineering
# ✅ DO: Adapt prompt according to SRE context
def build_system_prompt(urgency_level, doc_types):
base_prompt = "You are a specialized SRE assistant."
if urgency_level == "critical":
return base_prompt + """
🚨 CRITICAL INCIDENT MODE:
- Prioritize immediate actionable steps
- Include rollback procedures when relevant
- Mention escalation contacts if available
- Be concise but complete
"""
elif "api" in doc_types:
return base_prompt + """
📡 API DOCUMENTATION MODE:
- Provide exact endpoint syntax
- Include authentication details
- Show request/response examples
- Mention rate limits and error codes
"""
return base_prompt + "Standard documentation assistance mode."📈 Monitoring and metrics
# ✅ DO: Track important metrics
METRICS_TO_TRACK = {
"usage": ["questions_per_day", "unique_users", "peak_hours"],
"quality": ["avg_response_time", "user_satisfaction", "sources_clicked"],
"content": ["most_asked_topics", "unused_docs", "missing_answers"],
"performance": ["search_latency", "llm_response_time", "error_rate"]
}
# ✅ DO: Structured logs for analytics
logger.info("rag_query", extra={
"question": hash(question), # Privacy-safe
"doc_count": len(retrieved_docs),
"response_time": response_time,
"user_id": user_id,
🔒 Security and privacy
# ✅ DO: Implement guardrails
def validate_question(question: str) -> bool:
"""Verify the question is appropriate"""
# No sensitive data in logs
if any(pattern in question.lower() for pattern in
["password", "secret", "token", "key"]):
return False
# Size limit to prevent abuse
if len(question) > 500:
return False
return True
# ✅ DO: Anonymize logs
def sanitize_for_logs(text: str) -> str:
"""Remove sensitive info from logs"""
patterns =
❌ DON'T: Neglect data freshness
# ❌ DON'T: Static index without updates
# Problem: Outdated docs = bad advice during incidents!
# ✅ DO: Automatic update system
def schedule_index_updates():
"""Update index when docs change"""
# Webhook from Git for real-time triggers
@app.post("/webhook/docs-updated")
def handle_docs_update():
asyncio.create_task(reindex_documents())
# Backup: periodic modification scan
scheduler.add_job(
func=check_for_updates,
trigger="interval",
minutes=30,
id='docs_freshness_check'
)❌ DON'T: Ignore user context
# ❌ DON'T: Identical response for everyone
# Problem: Junior vs Senior SRE = different needs
# ✅ DO: Adapt according to user
def personalize_response(user_profile, question, base_answer):
if user_profile.experience_level == "junior":
return add_explanatory_context(base_answer)
elif user_profile.team == "security":
return emphasize_security_aspects(base_answer)
elif user_profile.on_call_status:
return prioritize_quick_actions(base_answer)
return base_answer❌ DON'T: Blindly trust the LLM
# ❌ DON'T: No validation of critical responses
# Problem: Hallucination = aggravated incident!
# ✅ DO: Validation for critical procedures
def validate_critical_response(question, response, doc_sources):
"""Validate responses for sensitive procedures"""
critical_keywords = ["delete", "drop", "destroy", "remove", "rollback"]
if any(keyword in question.lower() for keyword in critical_keywords):
# Require explicit and recent source
if not doc_sources or not has_recent_source(doc_sources):
return add_validation_warning(response)
# Double-check with pattern matching
if not validate_procedure_steps(response):
❌ DON'T: Forget production performance
# ❌ DON'T: No intelligent caching
# Problem: Repetitive questions = exploded OpenAI costs
# ✅ DO: Semantic cache with adaptive TTL
from functools import lru_cache
import hashlib
class SemanticCache:
def __init__(self):
self.cache = {}
self.similarity_threshold = 0.92
def get_cache_key(self, question: str) -> str:
"""Key based on question embedding"""
embedding = get_question_embedding(question)
return hashlib.md5(str(embedding).encode()).hexdigest()
def should_cache_response(self, question
❌ DON'T: Neglect user experience
# ❌ DON'T: Too technical responses for everyone
# Problem: Manager asking question = unreadable response
# ✅ DO: Automatic level adaptation
def adjust_technical_level(response: str, user_role: str) -> str:
"""Adapt technical level according to user"""
if user_role in ["manager", "product", "business"]:
return simplify_technical_terms(response)
elif user_role in ["intern", "junior"]:
return add_educational_context(response)
elif user_role in ["senior", "staff", "principal"]:
return add_advanced_details(response)
return response
def
# Before/after RAG metrics
RESULTS = {
"average_search_time": {
"before": "18 minutes/day/SRE",
"after": "6 minutes/day/SRE",
"improvement": "-67%",
},
"incident_resolution": {
"before": "MTTR = 23 minutes",
"after": "MTTR = 16 minutes",
"improvement": "-30%",
},
"team_satisfaction": {
"before": "6.2/10",
"after": "8.7/10",
"improvement": "+40%",
}
}Top 5 most asked questions to RAG:
Deploying a RAG system on MkDocs Material documentation is like hiring a senior SRE who knows all procedures by heart, never sleeps, and responds instantly during emergencies. This semantic search solution transforms how teams access knowledge.
Concrete benefits of our RAG implementation:
The best part? The RAG system improves automatically with LangChain's intelligent retrieval. The more questions your team asks, the better the vector database becomes at surfacing relevant content.
Ready to implement RAG for your documentation? This tutorial gives you everything needed to build an AI documentation assistant with FastAPI, ChromaDB, and OpenAI. Your "3 AM future self" will thank you! 😄
🔥 Bonus challenge: Measure the time your teams spend searching for info this week. Then re-measure in a month after implementing your RAG. The results will surprise you!
Thank you for following me on this adventure! 🚀
This article was written with ❤️ for the DevOps community.