Go to file

Z User 7a6b6f1086 Add local tool selector: keyword parser picks relevant tools, no LLM _select_tools() parses the user message with keyword matching: - News keywords → news_aggregate, news_get_top_stories, news_get_reddit - Finance/stock keywords → finance_get_stock_info/history (extracts ticker) - Crypto keywords → finance_get_crypto_price (extracts coin name), finance_get_top_cryptos - Weather keywords → weather_get_current/forecast/air_quality (extracts location) - Medical keywords → pubmed, fda, disease data, health topics - Science keywords → science_aggregate_search - Wikipedia keywords → wikipedia_search - Always: web_search + web_instant_answer as general fallback - URL in message → web_get_page_content Entity extractors: - _extract_ticker: maps known company names, handles $TICKER format - _extract_crypto: maps known crypto names to CoinGecko IDs - _extract_location: preposition-based + known locations (prefers longest match) - _extract_subject: strips question patterns, leading articles, trailing punctuation Flow remains: request → select tools → run in parallel → results into system prompt → 1 LLM call		2026-03-29 18:44:14 +00:00
rag	Fix tool call parsing, improve embeddings, and fix async issues	2026-03-29 17:49:32 +00:00
tools	Implement tool calling loop for LLM	2026-03-29 16:07:56 +00:00
.gitignore	Implement tool calling loop for LLM	2026-03-29 16:07:56 +00:00
main.py	Add local tool selector: keyword parser picks relevant tools, no LLM	2026-03-29 18:44:14 +00:00
README.md	Fix tool call parsing, improve embeddings, and fix async issues	2026-03-29 17:49:32 +00:00
requirements.txt	Implement tool calling loop for LLM	2026-03-29 16:07:56 +00:00
tools.md	Implement tool calling loop for LLM	2026-03-29 16:07:56 +00:00
website_downloader_tool.py	Implement tool calling loop for LLM	2026-03-29 16:07:56 +00:00
website_downloader.py	Implement tool calling loop for LLM	2026-03-29 16:07:56 +00:00

README.md

DocRAG - OpenAI-Compatible RAG Server

A custom RAG (Retrieval-Augmented Generation) system that appears as a standard OpenAI API server to clients like Open WebUI. Behind the scenes, it:

Processes user queries through a RAG system
Retrieves relevant context from a knowledge base
Passes the enriched context to GLM-4.7-Flash for response generation
Optionally uses tools like website_downloader for enhanced capabilities

Users interact with what appears to be a normal chat experience, while sophisticated RAG operations happen transparently in the background.

Features

OpenAI-Compatible API: Works with any OpenAI client (Open WebUI, custom apps, etc.)
RAG Integration: Automatic context retrieval for enhanced responses
Document Management: Upload and manage documents in the knowledge base
Tool Support: Built-in tools like website_downloader for extended capabilities
Streaming Support: Real-time streaming responses
Easy Configuration: Environment-based configuration

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment

cp .env.example .env
# Edit .env and add your ZAI_API_KEY

3. Run the Server

python main.py

The server will start on http://0.0.0.0:8000

4. Use with Open WebUI

Open Open WebUI settings
Add a new OpenAI-compatible connection
Set the base URL to http://your-server:8000/v1
Leave the API key empty or use any value (not validated)
Select the "DocRAG-GLM-4.7" model

API Endpoints

OpenAI-Compatible Endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	Chat completions (streaming supported)
`/v1/models`	GET	List available models
`/v1/models/{model_id}`	GET	Get model information

Document Management Endpoints

Endpoint	Method	Description
`/v1/documents`	GET	List documents in knowledge base
`/v1/documents/upload`	POST	Upload a document
`/v1/documents/url`	POST	Add document from URL
`/v1/documents/{doc_id}`	DELETE	Delete a document

Health & Status

Endpoint	Method	Description
`/health`	GET	Health check
`/`	GET	API information

Usage Examples

Chat Completion

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "DocRAG-GLM-4.7",
    "messages": [
      {"role": "user", "content": "What is machine learning?"}
    ],
    "stream": false
  }'

Upload Document

curl -X POST http://localhost:8000/v1/documents/upload \
  -F "file=@document.pdf"

Add Document from URL

curl -X POST http://localhost:8000/v1/documents/url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article.html"}'

Python Client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"  # API key not validated
)

response = client.chat.completions.create(
    model="DocRAG-GLM-4.7",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Configuration

Configure via environment variables or .env file:

Variable	Default	Description
`HOST`	`0.0.0.0`	Server host
`PORT`	`8000`	Server port
`DEBUG`	`false`	Enable debug mode
`MODEL_NAME`	`DocRAG-GLM-4.7`	Display model name
`UPSTREAM_MODEL`	`glm-4.7`	Upstream model to use
`ZAI_API_KEY` / `OPENROUTER_API_KEY`	(required)	API key for upstream LLM (OpenRouter)
`EMBEDDING_MODEL`	`text-embedding-3-small`	Embedding model
`VECTOR_STORE_PATH`	`./data/vectors`	Vector store location
`DOCUMENTS_PATH`	`./data/documents`	Document storage
`CHUNK_SIZE`	`1000`	Document chunk size
`CHUNK_OVERLAP`	`200`	Chunk overlap
`TOP_K_RESULTS`	`5`	Number of context results
`ENABLE_TOOLS`	`true`	Enable tool support

Project Structure

docrag/
├── main.py                    # FastAPI application entry point
├── rag/
│   ├── __init__.py           # RAG system main class
│   ├── document_processor.py # Document parsing and chunking
│   ├── vector_store.py       # Vector storage and search
│   └── retriever.py          # Context retrieval logic
├── tools/
│   └── __init__.py           # Tool management (website_downloader, etc.)
├── website-downloader.py     # CLI website downloader
├── website_downloader_tool.py # Tool wrapper for GLM-4.7-Flash
├── requirements.txt          # Python dependencies
├── .env.example              # Configuration template
└── README.md                 # This file

How It Works

Request Flow

User sends message → OpenAI-compatible endpoint receives request
RAG Retrieval → Query is processed and relevant context is retrieved
Context Enhancement → Retrieved context is added to the prompt
Tool Execution → If needed, tools are invoked (e.g., website_downloader)
LLM Generation → GLM-4.7-Flash generates response with context
Response → User receives response (streaming supported)

RAG Pipeline

User Query
    │
    ▼
┌─────────────────┐
│ Query Processor │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Vector Search   │ ← Knowledge Base
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Context Builder │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  GLM-4.7-Flash  │
└────────┬────────┘
         │
         ▼
     Response

Supported Document Formats

Text: .txt, .md, .rst, .log
Documents: .pdf, .docx
Web: .html, .htm
Data: .json, .yaml, .yml, .xml, .toml, .csv, .tsv
Code: .py, .js, .ts, .java, .cpp, .c, .go, .rs, .rb, .php, etc.

Extending

Adding New Tools

# In tools/__init__.py

def my_custom_tool(param1: str, param2: int = 10) -> dict:
    """Your tool implementation."""
    return {"result": "success"}

# Register the tool
tool_manager.register_tool(
    name="my_custom_tool",
    function=my_custom_tool,
    schema={
        "type": "function",
        "function": {
            "name": "my_custom_tool",
            "description": "Description of your tool",
            "parameters": {
                "type": "object",
                "properties": {
                    "param1": {"type": "string", "description": "..."},
                    "param2": {"type": "integer", "description": "...", "default": 10}
                },
                "required": ["param1"]
            }
        }
    }
)

Using Different Vector Stores

The default implementation uses a simple file-based store. To use ChromaDB:

Install: pip install chromadb
Modify rag/vector_store.py to use ChromaDB client

Development

Running in Development Mode

DEBUG=true python main.py

Running Tests

pip install pytest pytest-asyncio
pytest tests/