Go to file
Z User 7a6b6f1086 Add local tool selector: keyword parser picks relevant tools, no LLM
_select_tools() parses the user message with keyword matching:
- News keywords → news_aggregate, news_get_top_stories, news_get_reddit
- Finance/stock keywords → finance_get_stock_info/history (extracts ticker)
- Crypto keywords → finance_get_crypto_price (extracts coin name), finance_get_top_cryptos
- Weather keywords → weather_get_current/forecast/air_quality (extracts location)
- Medical keywords → pubmed, fda, disease data, health topics
- Science keywords → science_aggregate_search
- Wikipedia keywords → wikipedia_search
- Always: web_search + web_instant_answer as general fallback
- URL in message → web_get_page_content

Entity extractors:
- _extract_ticker: maps known company names, handles $TICKER format
- _extract_crypto: maps known crypto names to CoinGecko IDs
- _extract_location: preposition-based + known locations (prefers longest match)
- _extract_subject: strips question patterns, leading articles, trailing punctuation

Flow remains: request → select tools → run in parallel → results into system prompt → 1 LLM call
2026-03-29 18:44:14 +00:00
rag Fix tool call parsing, improve embeddings, and fix async issues 2026-03-29 17:49:32 +00:00
tools Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00
.gitignore Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00
main.py Add local tool selector: keyword parser picks relevant tools, no LLM 2026-03-29 18:44:14 +00:00
README.md Fix tool call parsing, improve embeddings, and fix async issues 2026-03-29 17:49:32 +00:00
requirements.txt Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00
tools.md Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00
website_downloader_tool.py Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00
website_downloader.py Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00

DocRAG - OpenAI-Compatible RAG Server

A custom RAG (Retrieval-Augmented Generation) system that appears as a standard OpenAI API server to clients like Open WebUI. Behind the scenes, it:

  1. Processes user queries through a RAG system
  2. Retrieves relevant context from a knowledge base
  3. Passes the enriched context to GLM-4.7-Flash for response generation
  4. Optionally uses tools like website_downloader for enhanced capabilities

Users interact with what appears to be a normal chat experience, while sophisticated RAG operations happen transparently in the background.

Features

  • OpenAI-Compatible API: Works with any OpenAI client (Open WebUI, custom apps, etc.)
  • RAG Integration: Automatic context retrieval for enhanced responses
  • Document Management: Upload and manage documents in the knowledge base
  • Tool Support: Built-in tools like website_downloader for extended capabilities
  • Streaming Support: Real-time streaming responses
  • Easy Configuration: Environment-based configuration

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment

cp .env.example .env
# Edit .env and add your ZAI_API_KEY

3. Run the Server

python main.py

The server will start on http://0.0.0.0:8000

4. Use with Open WebUI

  1. Open Open WebUI settings
  2. Add a new OpenAI-compatible connection
  3. Set the base URL to http://your-server:8000/v1
  4. Leave the API key empty or use any value (not validated)
  5. Select the "DocRAG-GLM-4.7" model

API Endpoints

OpenAI-Compatible Endpoints

Endpoint Method Description
/v1/chat/completions POST Chat completions (streaming supported)
/v1/models GET List available models
/v1/models/{model_id} GET Get model information

Document Management Endpoints

Endpoint Method Description
/v1/documents GET List documents in knowledge base
/v1/documents/upload POST Upload a document
/v1/documents/url POST Add document from URL
/v1/documents/{doc_id} DELETE Delete a document

Health & Status

Endpoint Method Description
/health GET Health check
/ GET API information

Usage Examples

Chat Completion

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "DocRAG-GLM-4.7",
    "messages": [
      {"role": "user", "content": "What is machine learning?"}
    ],
    "stream": false
  }'

Upload Document

curl -X POST http://localhost:8000/v1/documents/upload \
  -F "file=@document.pdf"

Add Document from URL

curl -X POST http://localhost:8000/v1/documents/url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article.html"}'

Python Client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"  # API key not validated
)

response = client.chat.completions.create(
    model="DocRAG-GLM-4.7",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Configuration

Configure via environment variables or .env file:

Variable Default Description
HOST 0.0.0.0 Server host
PORT 8000 Server port
DEBUG false Enable debug mode
MODEL_NAME DocRAG-GLM-4.7 Display model name
UPSTREAM_MODEL glm-4.7 Upstream model to use
ZAI_API_KEY / OPENROUTER_API_KEY (required) API key for upstream LLM (OpenRouter)
EMBEDDING_MODEL text-embedding-3-small Embedding model
VECTOR_STORE_PATH ./data/vectors Vector store location
DOCUMENTS_PATH ./data/documents Document storage
CHUNK_SIZE 1000 Document chunk size
CHUNK_OVERLAP 200 Chunk overlap
TOP_K_RESULTS 5 Number of context results
ENABLE_TOOLS true Enable tool support

Project Structure

docrag/
├── main.py                    # FastAPI application entry point
├── rag/
│   ├── __init__.py           # RAG system main class
│   ├── document_processor.py # Document parsing and chunking
│   ├── vector_store.py       # Vector storage and search
│   └── retriever.py          # Context retrieval logic
├── tools/
│   └── __init__.py           # Tool management (website_downloader, etc.)
├── website-downloader.py     # CLI website downloader
├── website_downloader_tool.py # Tool wrapper for GLM-4.7-Flash
├── requirements.txt          # Python dependencies
├── .env.example              # Configuration template
└── README.md                 # This file

How It Works

Request Flow

  1. User sends message → OpenAI-compatible endpoint receives request
  2. RAG Retrieval → Query is processed and relevant context is retrieved
  3. Context Enhancement → Retrieved context is added to the prompt
  4. Tool Execution → If needed, tools are invoked (e.g., website_downloader)
  5. LLM Generation → GLM-4.7-Flash generates response with context
  6. Response → User receives response (streaming supported)

RAG Pipeline

User Query
    │
    ▼
┌─────────────────┐
│ Query Processor │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Vector Search   │ ← Knowledge Base
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Context Builder │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  GLM-4.7-Flash  │
└────────┬────────┘
         │
         ▼
     Response

Supported Document Formats

  • Text: .txt, .md, .rst, .log
  • Documents: .pdf, .docx
  • Web: .html, .htm
  • Data: .json, .yaml, .yml, .xml, .toml, .csv, .tsv
  • Code: .py, .js, .ts, .java, .cpp, .c, .go, .rs, .rb, .php, etc.

Extending

Adding New Tools

# In tools/__init__.py

def my_custom_tool(param1: str, param2: int = 10) -> dict:
    """Your tool implementation."""
    return {"result": "success"}

# Register the tool
tool_manager.register_tool(
    name="my_custom_tool",
    function=my_custom_tool,
    schema={
        "type": "function",
        "function": {
            "name": "my_custom_tool",
            "description": "Description of your tool",
            "parameters": {
                "type": "object",
                "properties": {
                    "param1": {"type": "string", "description": "..."},
                    "param2": {"type": "integer", "description": "...", "default": 10}
                },
                "required": ["param1"]
            }
        }
    }
)

Using Different Vector Stores

The default implementation uses a simple file-based store. To use ChromaDB:

  1. Install: pip install chromadb
  2. Modify rag/vector_store.py to use ChromaDB client

Development

Running in Development Mode

DEBUG=true python main.py

Running Tests

pip install pytest pytest-asyncio
pytest tests/

License

Private repository - All rights reserved.