Building a RAG (Retrieval Augmented Generation) system usually means:
- Setting up a vector database
- Configuring embeddings
- Chunking documents
- Managing API keys
- Writing update logic
What if all of that disappeared into three lines?
from piragi import Ragi
kb = Ragi(["./docs", "./code/**/*.py", "https://api.example.com/docs"])
answer = kb.ask("How do I deploy this?")
That’s piragi (pronounced “pee-rah-gee”) - Retrieval Augmented Generation Interface. Built-in vector store, embeddings, citations, and auto-updates. Free and local by default.
The Problem: RAG is Too Complex
Every RAG tutorial starts the same way:
“First, install ChromaDB/Pinecone/Weaviate. Then configure OpenAI embeddings. Now chunk your documents. Set up a background worker for updates…”
By the time you’re done, you’ve written 200+ lines of boilerplate just to ask questions about your docs.
Most developers don’t need distributed vector databases or fancy reranking algorithms. They just want to:
- Point at some documents
- Ask questions
- Get answers with sources
piragi: Zero to RAG in 3 Lines
from piragi import Ragi
kb = Ragi("./docs")
answer = kb("What is this?")
That’s it. No configuration. No API keys. No vector database setup.
What just happened:
- Documents auto-chunked with markdown-aware splitting
- Embeddings generated locally (all-mpnet-base-v2, ~420MB)
- Vector database created (LanceDB, stored in
.piragi/) - LLM queries routed to local Ollama (llama3.2)
- Citations included automatically
All free. All local. All automatic.
Universal Document Support
piragi handles everything:
kb = Ragi([
"./docs/**/*.md", # Markdown
"./specs/*.pdf", # PDFs
"./data/*.xlsx", # Excel
"./code/**/*.py", # Code
"https://api.docs.com", # URLs
"./audio/*.mp3", # Audio transcription
"./images/*.png" # Image OCR
])
answer = kb("Summarize all the technical specs")
Supported formats:
- Documents: PDF, Word, Excel, Markdown, Text
- Code: Python, JavaScript, Java, Go, Rust, etc.
- Web: URLs (auto-crawls and extracts text)
- Media: Images (OCR), Audio (transcription)
piragi figures out the format and processes it correctly. No manual loaders.
Auto-Updates: Never Stale
Traditional RAG systems get stale. Documents change, but your vector database doesn’t know.
piragi watches your sources in the background:
kb = Ragi(["./docs", "https://api.docs.com"])
# Auto-updates enabled by default
# Edit a file in ./docs
# piragi detects change via mtime + hash
# Reprocesses chunk in background
# Queries continue uninterrupted
How it works:
- Files: Tracks modification time + content hash
- URLs: Periodic HTTP HEAD checks
- Background workers: Updates happen async
- Zero query latency: Queries never block on updates
Disable if you don’t need it:
kb = Ragi("./docs", config={"auto_update": {"enabled": False}})
Smart Citations
Every answer includes ranked sources:
kb = Ragi("./docs")
result = kb.ask("How does authentication work?")
print(result["answer"])
# "The system uses OAuth2 with JWT tokens. Users authenticate via..."
print(result["sources"])
# [
# {"file": "auth.md", "chunk": "OAuth2 implementation...", "score": 0.89},
# {"file": "security.pdf", "chunk": "JWT token format...", "score": 0.82},
# {"file": "api.md", "chunk": "Authentication endpoints...", "score": 0.76}
# ]
Citations show:
- Source file/URL
- Relevant chunk (the actual text used)
- Similarity score (0-1, higher = more relevant)
You know exactly where the answer came from.
Query Expansion & Reranking
Version 0.1.4 added advanced retrieval:
# Your query: "deployment steps"
# Expanded to: ["deployment steps", "how to deploy", "deployment process", "release workflow"]
# Results retrieved for all variations
# Reranked by combining vector similarity + keyword matching
# Top results used for answer generation
Why this matters:
- Better recall: Catches relevant docs even with different phrasing
- Better precision: Reranking filters noise
- Better citations: Sources are actually relevant
Configure if needed:
kb = Ragi("./docs", config={
"llm": {
"enable_reranking": True,
"enable_query_expansion": True,
"temperature": 0.1 # Lower = more focused
}
})
Metadata Filtering
Filter results by document properties:
kb = Ragi("./docs")
# Only search PDFs
kb.filter(file_type="pdf").ask("What's in the PDFs?")
# Only recent documents
kb.filter(modified_after="2025-01-01").ask("What changed recently?")
# Only code files
kb.filter(file_extension=".py").ask("How is this implemented?")
Filtering happens at the vector search level, so results are fast and relevant.
OpenAI Compatibility
Use GPT-4 or any OpenAI-compatible API:
# OpenAI (LLM only, local embeddings)
kb = Ragi("./docs", config={
"llm": {
"model": "gpt-4o-mini",
"api_key": "sk-...",
"base_url": "https://api.openai.com/v1"
}
})
# OpenAI for both LLM and embeddings
kb = Ragi("./docs", config={
"llm": {
"model": "gpt-4o-mini",
"api_key": "sk-..."
},
"embedding": {
"model": "text-embedding-3-small",
"base_url": "https://api.openai.com/v1",
"api_key": "sk-..."
}
})
Works with:
- OpenAI
- Azure OpenAI
- Together AI
- Anyscale
- Any OpenAI-compatible endpoint
Embedding Model Options
Default embedding model is all-mpnet-base-v2 (~420MB, good quality).
Want smaller?
kb = Ragi("./docs", config={
"embedding": {"model": "all-MiniLM-L6-v2"} # ~90MB, decent quality
})
Want maximum quality?
kb = Ragi("./docs", config={
"embedding": {"model": "nvidia/llama-embed-nemotron-8b"} # ~8GB, best quality
})
Want remote embeddings?
kb = Ragi("./docs", config={
"embedding": {
"model": "text-embedding-3-small",
"base_url": "https://api.openai.com/v1",
"api_key": "sk-..."
}
})
piragi auto-detects whether embeddings should be local (sentence-transformers) or remote (API call).
Real-World Example: Code Q&A
from piragi import Ragi
# Index entire codebase
kb = Ragi([
"./src/**/*.py",
"./docs/*.md",
"./README.md",
"./CHANGELOG.md"
])
# Ask about implementation
result = kb.ask("How does the auto-update system work?")
print(result["answer"])
# "The auto-update system uses background workers to detect changes..."
# Get specific citations
for source in result["sources"][:3]:
print(f"\nSource: {source['file']}")
print(f"Chunk: {source['chunk'][:100]}...")
print(f"Score: {source['score']:.2f}")
Output:
Source: src/auto_update.py
Chunk: The UpdateWorker class runs in a separate thread, checking for file changes every 300 seconds...
Score: 0.91
Source: docs/architecture.md
Chunk: Auto-updates are implemented using a background worker pool that monitors source files...
Score: 0.84
Source: CHANGELOG.md
Chunk: Version 0.1.0 added auto-updates with background workers and change detection...
Score: 0.78
You know exactly where the answer came from and how confident the system is.
API Overview
# Initialize
kb = Ragi(sources, persist_dir=".piragi", config=None)
# Add more sources
kb.add("./more-docs")
# Ask questions
result = kb.ask(query, top_k=5)
# Shorthand
result = kb(query)
# Filter + ask
result = kb.filter(file_type="pdf").ask(query)
# Count documents
count = kb.count()
# Clear database
kb.clear()
Full API docs: API.md
Version History
v0.1.5 (Latest) - Fixed chunking bugs, added minimum chunk length filter
v0.1.4 - Added query expansion and reranking
v0.1.3 - Fixed schema mismatches for mixed sources
v0.1.2 - Changed default embedding to all-mpnet-base-v2 (prevents memory crashes)
v0.1.1 - Renamed module from ragi to piragi
v0.1.0 - Initial release with auto-updates, citations, universal format support
Installation
pip install piragi
# Optional: Install Ollama for local LLM
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2
Design Philosophy
piragi was built on three principles:
- Zero Configuration: Works out of the box with free local models
- Universal Support: Any document format, any LLM/embedding provider
- Always Fresh: Auto-updates keep your knowledge base current
You shouldn’t need a PhD in vector search to ask questions about your docs.
When to Use piragi
piragi is perfect for:
- Internal documentation search (company wikis, API docs)
- Code Q&A (ask questions about codebases)
- Research assistants (query papers, articles, reports)
- Customer support (search knowledge bases)
It’s not for:
- Web-scale search (millions of documents)
- Real-time streaming updates
- Complex multi-modal reasoning
For those use cases, you need LangChain, LlamaIndex, or custom infrastructure.
But if you just want to ask questions about your docs? piragi is the fastest way from zero to working RAG.
Conclusion
RAG doesn’t have to be complex. You don’t need distributed databases, custom chunking algorithms, or manual update scripts.
You just need:
kb = Ragi("./docs")
answer = kb("What is this?")
Three lines. Free. Local. With citations.
Try it: github.com/hemanth/ragi
The best RAG interface yet.
About Hemanth HM
Hemanth HM is a Sr. Staff Engineer at PayPal, Google Developer Expert, TC39 delegate, FOSS advocate, and community leader with a passion for programming, AI, and open-source contributions..