Hemanth's scribes

python

piragi: The Best RAG Interface Yet

Author Photo

Hemanth HM

Thumbnail

Building a RAG (Retrieval Augmented Generation) system usually means:

  1. Setting up a vector database
  2. Configuring embeddings
  3. Chunking documents
  4. Managing API keys
  5. Writing update logic

What if all of that disappeared into three lines?

from piragi import Ragi

kb = Ragi(["./docs", "./code/**/*.py", "https://api.example.com/docs"])
answer = kb.ask("How do I deploy this?")

That’s piragi (pronounced “pee-rah-gee”) - Retrieval Augmented Generation Interface. Built-in vector store, embeddings, citations, and auto-updates. Free and local by default.

The Problem: RAG is Too Complex

Every RAG tutorial starts the same way:

“First, install ChromaDB/Pinecone/Weaviate. Then configure OpenAI embeddings. Now chunk your documents. Set up a background worker for updates…”

By the time you’re done, you’ve written 200+ lines of boilerplate just to ask questions about your docs.

Most developers don’t need distributed vector databases or fancy reranking algorithms. They just want to:

  • Point at some documents
  • Ask questions
  • Get answers with sources

piragi: Zero to RAG in 3 Lines

from piragi import Ragi

kb = Ragi("./docs")
answer = kb("What is this?")

That’s it. No configuration. No API keys. No vector database setup.

What just happened:

  • Documents auto-chunked with markdown-aware splitting
  • Embeddings generated locally (all-mpnet-base-v2, ~420MB)
  • Vector database created (LanceDB, stored in .piragi/)
  • LLM queries routed to local Ollama (llama3.2)
  • Citations included automatically

All free. All local. All automatic.

Universal Document Support

piragi handles everything:

kb = Ragi([
    "./docs/**/*.md",           # Markdown
    "./specs/*.pdf",            # PDFs
    "./data/*.xlsx",            # Excel
    "./code/**/*.py",           # Code
    "https://api.docs.com",     # URLs
    "./audio/*.mp3",            # Audio transcription
    "./images/*.png"            # Image OCR
])

answer = kb("Summarize all the technical specs")

Supported formats:

  • Documents: PDF, Word, Excel, Markdown, Text
  • Code: Python, JavaScript, Java, Go, Rust, etc.
  • Web: URLs (auto-crawls and extracts text)
  • Media: Images (OCR), Audio (transcription)

piragi figures out the format and processes it correctly. No manual loaders.

Auto-Updates: Never Stale

Traditional RAG systems get stale. Documents change, but your vector database doesn’t know.

piragi watches your sources in the background:

kb = Ragi(["./docs", "https://api.docs.com"])
# Auto-updates enabled by default

# Edit a file in ./docs
# piragi detects change via mtime + hash
# Reprocesses chunk in background
# Queries continue uninterrupted

How it works:

  • Files: Tracks modification time + content hash
  • URLs: Periodic HTTP HEAD checks
  • Background workers: Updates happen async
  • Zero query latency: Queries never block on updates

Disable if you don’t need it:

kb = Ragi("./docs", config={"auto_update": {"enabled": False}})

Smart Citations

Every answer includes ranked sources:

kb = Ragi("./docs")
result = kb.ask("How does authentication work?")

print(result["answer"])
# "The system uses OAuth2 with JWT tokens. Users authenticate via..."

print(result["sources"])
# [
#   {"file": "auth.md", "chunk": "OAuth2 implementation...", "score": 0.89},
#   {"file": "security.pdf", "chunk": "JWT token format...", "score": 0.82},
#   {"file": "api.md", "chunk": "Authentication endpoints...", "score": 0.76}
# ]

Citations show:

  • Source file/URL
  • Relevant chunk (the actual text used)
  • Similarity score (0-1, higher = more relevant)

You know exactly where the answer came from.

Query Expansion & Reranking

Version 0.1.4 added advanced retrieval:

# Your query: "deployment steps"
# Expanded to: ["deployment steps", "how to deploy", "deployment process", "release workflow"]
# Results retrieved for all variations
# Reranked by combining vector similarity + keyword matching
# Top results used for answer generation

Why this matters:

  • Better recall: Catches relevant docs even with different phrasing
  • Better precision: Reranking filters noise
  • Better citations: Sources are actually relevant

Configure if needed:

kb = Ragi("./docs", config={
    "llm": {
        "enable_reranking": True,
        "enable_query_expansion": True,
        "temperature": 0.1  # Lower = more focused
    }
})

Metadata Filtering

Filter results by document properties:

kb = Ragi("./docs")

# Only search PDFs
kb.filter(file_type="pdf").ask("What's in the PDFs?")

# Only recent documents
kb.filter(modified_after="2025-01-01").ask("What changed recently?")

# Only code files
kb.filter(file_extension=".py").ask("How is this implemented?")

Filtering happens at the vector search level, so results are fast and relevant.

OpenAI Compatibility

Use GPT-4 or any OpenAI-compatible API:

# OpenAI (LLM only, local embeddings)
kb = Ragi("./docs", config={
    "llm": {
        "model": "gpt-4o-mini",
        "api_key": "sk-...",
        "base_url": "https://api.openai.com/v1"
    }
})

# OpenAI for both LLM and embeddings
kb = Ragi("./docs", config={
    "llm": {
        "model": "gpt-4o-mini",
        "api_key": "sk-..."
    },
    "embedding": {
        "model": "text-embedding-3-small",
        "base_url": "https://api.openai.com/v1",
        "api_key": "sk-..."
    }
})

Works with:

  • OpenAI
  • Azure OpenAI
  • Together AI
  • Anyscale
  • Any OpenAI-compatible endpoint

Embedding Model Options

Default embedding model is all-mpnet-base-v2 (~420MB, good quality).

Want smaller?

kb = Ragi("./docs", config={
    "embedding": {"model": "all-MiniLM-L6-v2"}  # ~90MB, decent quality
})

Want maximum quality?

kb = Ragi("./docs", config={
    "embedding": {"model": "nvidia/llama-embed-nemotron-8b"}  # ~8GB, best quality
})

Want remote embeddings?

kb = Ragi("./docs", config={
    "embedding": {
        "model": "text-embedding-3-small",
        "base_url": "https://api.openai.com/v1",
        "api_key": "sk-..."
    }
})

piragi auto-detects whether embeddings should be local (sentence-transformers) or remote (API call).

Real-World Example: Code Q&A

from piragi import Ragi

# Index entire codebase
kb = Ragi([
    "./src/**/*.py",
    "./docs/*.md",
    "./README.md",
    "./CHANGELOG.md"
])

# Ask about implementation
result = kb.ask("How does the auto-update system work?")
print(result["answer"])
# "The auto-update system uses background workers to detect changes..."

# Get specific citations
for source in result["sources"][:3]:
    print(f"\nSource: {source['file']}")
    print(f"Chunk: {source['chunk'][:100]}...")
    print(f"Score: {source['score']:.2f}")

Output:

Source: src/auto_update.py
Chunk: The UpdateWorker class runs in a separate thread, checking for file changes every 300 seconds...
Score: 0.91

Source: docs/architecture.md
Chunk: Auto-updates are implemented using a background worker pool that monitors source files...
Score: 0.84

Source: CHANGELOG.md
Chunk: Version 0.1.0 added auto-updates with background workers and change detection...
Score: 0.78

You know exactly where the answer came from and how confident the system is.

API Overview

# Initialize
kb = Ragi(sources, persist_dir=".piragi", config=None)

# Add more sources
kb.add("./more-docs")

# Ask questions
result = kb.ask(query, top_k=5)

# Shorthand
result = kb(query)

# Filter + ask
result = kb.filter(file_type="pdf").ask(query)

# Count documents
count = kb.count()

# Clear database
kb.clear()

Full API docs: API.md

Version History

v0.1.5 (Latest) - Fixed chunking bugs, added minimum chunk length filter v0.1.4 - Added query expansion and reranking v0.1.3 - Fixed schema mismatches for mixed sources v0.1.2 - Changed default embedding to all-mpnet-base-v2 (prevents memory crashes) v0.1.1 - Renamed module from ragi to piragi v0.1.0 - Initial release with auto-updates, citations, universal format support

Installation

pip install piragi

# Optional: Install Ollama for local LLM
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2

Design Philosophy

piragi was built on three principles:

  1. Zero Configuration: Works out of the box with free local models
  2. Universal Support: Any document format, any LLM/embedding provider
  3. Always Fresh: Auto-updates keep your knowledge base current

You shouldn’t need a PhD in vector search to ask questions about your docs.

When to Use piragi

piragi is perfect for:

  • Internal documentation search (company wikis, API docs)
  • Code Q&A (ask questions about codebases)
  • Research assistants (query papers, articles, reports)
  • Customer support (search knowledge bases)

It’s not for:

  • Web-scale search (millions of documents)
  • Real-time streaming updates
  • Complex multi-modal reasoning

For those use cases, you need LangChain, LlamaIndex, or custom infrastructure.

But if you just want to ask questions about your docs? piragi is the fastest way from zero to working RAG.

Conclusion

RAG doesn’t have to be complex. You don’t need distributed databases, custom chunking algorithms, or manual update scripts.

You just need:

kb = Ragi("./docs")
answer = kb("What is this?")

Three lines. Free. Local. With citations.

Try it: github.com/hemanth/ragi

The best RAG interface yet.

#python#rag#ai#llm#embeddings#vector-search#opensource
Author Photo

About Hemanth HM

Hemanth HM is a Sr. Staff Engineer at PayPal, Google Developer Expert, TC39 delegate, FOSS advocate, and community leader with a passion for programming, AI, and open-source contributions..