Construyendo un pipeline de RAG desde cero

📅3 hours ago⏱2 min read

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗥𝗔𝗚 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗙𝗿𝗼𝗺 𝗦𝗰𝗿𝗮𝘁𝗰𝗵

I wanted to add an AI assistant to SmartQueue.

SmartQueue is a task queue I built in Go for IT support tickets. I did not want a generic AI. A generic model does not know your specific password reset rules or your outage runbooks.

I needed Retrieval-Augmented Generation (RAG). This pulls facts from your documents first. Then it gives those facts to the model as context.

Here is what I learned while building this pipeline.

𝗧𝗵𝗲 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗙𝗮𝗶𝗹𝘂𝗿𝗲

My first version used ChromaDB for vector search. It worked locally. It failed during deployment.

I ran everything in one container on Hugging Face Spaces. This included Redis, a Go API, workers, a FastAPI service, and ChromaDB. Five processes competed for limited memory and CPU. ChromaDB caused startup races and silent failures.

I made a choice. I ripped out the vector database and replaced it with a simple BM25 search.

𝗧𝗵𝗲 𝗦𝗶𝗺𝗽𝗹𝗲 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻

The new replacement is 50 lines of Python. It has no external process. It has no network calls. It uses the Okapi BM25 formula to match keywords in memory.

The trade-off is clear:

BM25 relies on exact word matches. It misses synonyms.
For 10 short IT runbooks, this does not matter. Users use specific terms like "VPN" or "password reset."
The service now starts reliably every time.

𝗧𝘂𝗻𝗶𝗻𝗴 𝘁𝗵𝗲 𝗦𝘆𝘀𝘁𝗲𝗺

I tuned several settings to keep the system stable: • Retrieved docs (k): 4. This provides enough context without hitting token limits. • Bot temperature: 0.2. Troubleshooting needs literal answers, not creativity. • Classifier temperature: 0.1. This ensures the JSON output is predictable. • Session history: Last 10 turns. This provides continuity without using too much memory. • Rate limits: 30 requests per minute. This protects my API quota.

𝗧𝗵𝗲 𝗕𝗲𝘀𝘁 𝗗𝗲𝘀𝗶𝗴𝗻 𝗶𝘀 𝗮 𝗗𝗲𝗴𝗿𝗮𝗱𝗶𝗻𝗴 𝗗𝗲𝘀𝗶𝗴𝗻

I built every endpoint with a non-AI fallback. If the AI service goes down, the system uses keyword matching or rule-based logic. The system degrades instead of failing.

This is not a complex RAG setup. It has no re-ranking or hybrid search. It is a small, smart tool built for a specific scale.

Lecciones aprendidas:

Optimiza según tus limitaciones, no para una configuración de manual.
La fiabilidad suele superar la precisión teórica.
Utiliza la búsqueda vectorial cuando tengas cientos de documentos. Utiliza la búsqueda por palabras clave cuando tengas diez.

Fuente: https://dev.to/ambarish_0221/building-a-rag-pipeline-from-scratch-what-smartqueue-taught-me-about-retrieval-4gdb

Comunidad de aprendizaje opcional: https://t.me/GyaanSetuAi

Construyendo un pipeline de RAG desde cero

Continue reading

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗔𝗻 𝗜𝗻𝘁𝗲𝗿𝗻𝗮𝗹 𝗔𝗜 𝗖𝗵𝗮𝘁𝗯𝗼𝘁: 𝗟𝗲𝘀𝘀𝗼𝗻𝘀 𝗟𝗲𝗮𝗿𝗻𝗲𝗱

Creación de un asistente de IA interno

Recuperación híbrida y observabilidad de agentes

𝗛𝗼𝘄 𝗜 𝗖𝘂𝘁 𝗠𝘆 𝗔𝗜 𝗖𝗼𝘀𝘁𝘀 𝟲𝟬% 𝗪𝗶𝘁𝗵 𝗧𝗵𝗶𝘀 𝗥𝗔𝗚 𝗦𝗲𝘁𝘂𝗽

Guía de desarrollo de chatbots de IA para sistemas RAG