Go to file
Z User 8a46a78a4e Fix: add robust parsing, logging, and safety net for empty responses
Three fixes for the 'I apologize, couldnt generate a response' bug:

1. Safety net: if _clean_tool_syntax strips ALL content (e.g. the LLM
   output only the JSON tool call block and nothing else), return the
   original content instead of the useless error message.

2. Detailed logging: now logs the first 300 chars of every LLM response
   so we can see exactly what the model outputs. Also logs which parse
   pattern matched and which tool names were found.

3. Desperate fallback parser (Pattern 4): if none of the regex/brace
   patterns match, tries to json.loads() the entire content and looks
   for known tool names. Catches LLMs that output the array directly
   or use slightly different formatting.
2026-03-29 18:11:43 +00:00
rag Fix tool call parsing, improve embeddings, and fix async issues 2026-03-29 17:49:32 +00:00
tools Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00
.gitignore Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00
main.py Fix: add robust parsing, logging, and safety net for empty responses 2026-03-29 18:11:43 +00:00
README.md Fix tool call parsing, improve embeddings, and fix async issues 2026-03-29 17:49:32 +00:00
requirements.txt Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00
tools.md Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00
website_downloader_tool.py Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00
website_downloader.py Implement tool calling loop for LLM 2026-03-29 16:07:56 +00:00

DocRAG - OpenAI-Compatible RAG Server

A custom RAG (Retrieval-Augmented Generation) system that appears as a standard OpenAI API server to clients like Open WebUI. Behind the scenes, it:

  1. Processes user queries through a RAG system
  2. Retrieves relevant context from a knowledge base
  3. Passes the enriched context to GLM-4.7-Flash for response generation
  4. Optionally uses tools like website_downloader for enhanced capabilities

Users interact with what appears to be a normal chat experience, while sophisticated RAG operations happen transparently in the background.

Features

  • OpenAI-Compatible API: Works with any OpenAI client (Open WebUI, custom apps, etc.)
  • RAG Integration: Automatic context retrieval for enhanced responses
  • Document Management: Upload and manage documents in the knowledge base
  • Tool Support: Built-in tools like website_downloader for extended capabilities
  • Streaming Support: Real-time streaming responses
  • Easy Configuration: Environment-based configuration

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment

cp .env.example .env
# Edit .env and add your ZAI_API_KEY

3. Run the Server

python main.py

The server will start on http://0.0.0.0:8000

4. Use with Open WebUI

  1. Open Open WebUI settings
  2. Add a new OpenAI-compatible connection
  3. Set the base URL to http://your-server:8000/v1
  4. Leave the API key empty or use any value (not validated)
  5. Select the "DocRAG-GLM-4.7" model

API Endpoints

OpenAI-Compatible Endpoints

Endpoint Method Description
/v1/chat/completions POST Chat completions (streaming supported)
/v1/models GET List available models
/v1/models/{model_id} GET Get model information

Document Management Endpoints

Endpoint Method Description
/v1/documents GET List documents in knowledge base
/v1/documents/upload POST Upload a document
/v1/documents/url POST Add document from URL
/v1/documents/{doc_id} DELETE Delete a document

Health & Status

Endpoint Method Description
/health GET Health check
/ GET API information

Usage Examples

Chat Completion

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "DocRAG-GLM-4.7",
    "messages": [
      {"role": "user", "content": "What is machine learning?"}
    ],
    "stream": false
  }'

Upload Document

curl -X POST http://localhost:8000/v1/documents/upload \
  -F "file=@document.pdf"

Add Document from URL

curl -X POST http://localhost:8000/v1/documents/url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article.html"}'

Python Client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"  # API key not validated
)

response = client.chat.completions.create(
    model="DocRAG-GLM-4.7",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Configuration

Configure via environment variables or .env file:

Variable Default Description
HOST 0.0.0.0 Server host
PORT 8000 Server port
DEBUG false Enable debug mode
MODEL_NAME DocRAG-GLM-4.7 Display model name
UPSTREAM_MODEL glm-4.7 Upstream model to use
ZAI_API_KEY / OPENROUTER_API_KEY (required) API key for upstream LLM (OpenRouter)
EMBEDDING_MODEL text-embedding-3-small Embedding model
VECTOR_STORE_PATH ./data/vectors Vector store location
DOCUMENTS_PATH ./data/documents Document storage
CHUNK_SIZE 1000 Document chunk size
CHUNK_OVERLAP 200 Chunk overlap
TOP_K_RESULTS 5 Number of context results
ENABLE_TOOLS true Enable tool support

Project Structure

docrag/
├── main.py                    # FastAPI application entry point
├── rag/
│   ├── __init__.py           # RAG system main class
│   ├── document_processor.py # Document parsing and chunking
│   ├── vector_store.py       # Vector storage and search
│   └── retriever.py          # Context retrieval logic
├── tools/
│   └── __init__.py           # Tool management (website_downloader, etc.)
├── website-downloader.py     # CLI website downloader
├── website_downloader_tool.py # Tool wrapper for GLM-4.7-Flash
├── requirements.txt          # Python dependencies
├── .env.example              # Configuration template
└── README.md                 # This file

How It Works

Request Flow

  1. User sends message → OpenAI-compatible endpoint receives request
  2. RAG Retrieval → Query is processed and relevant context is retrieved
  3. Context Enhancement → Retrieved context is added to the prompt
  4. Tool Execution → If needed, tools are invoked (e.g., website_downloader)
  5. LLM Generation → GLM-4.7-Flash generates response with context
  6. Response → User receives response (streaming supported)

RAG Pipeline

User Query
    │
    ▼
┌─────────────────┐
│ Query Processor │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Vector Search   │ ← Knowledge Base
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Context Builder │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  GLM-4.7-Flash  │
└────────┬────────┘
         │
         ▼
     Response

Supported Document Formats

  • Text: .txt, .md, .rst, .log
  • Documents: .pdf, .docx
  • Web: .html, .htm
  • Data: .json, .yaml, .yml, .xml, .toml, .csv, .tsv
  • Code: .py, .js, .ts, .java, .cpp, .c, .go, .rs, .rb, .php, etc.

Extending

Adding New Tools

# In tools/__init__.py

def my_custom_tool(param1: str, param2: int = 10) -> dict:
    """Your tool implementation."""
    return {"result": "success"}

# Register the tool
tool_manager.register_tool(
    name="my_custom_tool",
    function=my_custom_tool,
    schema={
        "type": "function",
        "function": {
            "name": "my_custom_tool",
            "description": "Description of your tool",
            "parameters": {
                "type": "object",
                "properties": {
                    "param1": {"type": "string", "description": "..."},
                    "param2": {"type": "integer", "description": "...", "default": 10}
                },
                "required": ["param1"]
            }
        }
    }
)

Using Different Vector Stores

The default implementation uses a simple file-based store. To use ChromaDB:

  1. Install: pip install chromadb
  2. Modify rag/vector_store.py to use ChromaDB client

Development

Running in Development Mode

DEBUG=true python main.py

Running Tests

pip install pytest pytest-asyncio
pytest tests/

License

Private repository - All rights reserved.