_select_tools() parses the user message with keyword matching: - News keywords → news_aggregate, news_get_top_stories, news_get_reddit - Finance/stock keywords → finance_get_stock_info/history (extracts ticker) - Crypto keywords → finance_get_crypto_price (extracts coin name), finance_get_top_cryptos - Weather keywords → weather_get_current/forecast/air_quality (extracts location) - Medical keywords → pubmed, fda, disease data, health topics - Science keywords → science_aggregate_search - Wikipedia keywords → wikipedia_search - Always: web_search + web_instant_answer as general fallback - URL in message → web_get_page_content Entity extractors: - _extract_ticker: maps known company names, handles $TICKER format - _extract_crypto: maps known crypto names to CoinGecko IDs - _extract_location: preposition-based + known locations (prefers longest match) - _extract_subject: strips question patterns, leading articles, trailing punctuation Flow remains: request → select tools → run in parallel → results into system prompt → 1 LLM call |
||
|---|---|---|
| rag | ||
| tools | ||
| .gitignore | ||
| main.py | ||
| README.md | ||
| requirements.txt | ||
| tools.md | ||
| website_downloader_tool.py | ||
| website_downloader.py | ||
DocRAG - OpenAI-Compatible RAG Server
A custom RAG (Retrieval-Augmented Generation) system that appears as a standard OpenAI API server to clients like Open WebUI. Behind the scenes, it:
- Processes user queries through a RAG system
- Retrieves relevant context from a knowledge base
- Passes the enriched context to GLM-4.7-Flash for response generation
- Optionally uses tools like website_downloader for enhanced capabilities
Users interact with what appears to be a normal chat experience, while sophisticated RAG operations happen transparently in the background.
Features
- OpenAI-Compatible API: Works with any OpenAI client (Open WebUI, custom apps, etc.)
- RAG Integration: Automatic context retrieval for enhanced responses
- Document Management: Upload and manage documents in the knowledge base
- Tool Support: Built-in tools like website_downloader for extended capabilities
- Streaming Support: Real-time streaming responses
- Easy Configuration: Environment-based configuration
Quick Start
1. Install Dependencies
pip install -r requirements.txt
2. Configure Environment
cp .env.example .env
# Edit .env and add your ZAI_API_KEY
3. Run the Server
python main.py
The server will start on http://0.0.0.0:8000
4. Use with Open WebUI
- Open Open WebUI settings
- Add a new OpenAI-compatible connection
- Set the base URL to
http://your-server:8000/v1 - Leave the API key empty or use any value (not validated)
- Select the "DocRAG-GLM-4.7" model
API Endpoints
OpenAI-Compatible Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST | Chat completions (streaming supported) |
/v1/models |
GET | List available models |
/v1/models/{model_id} |
GET | Get model information |
Document Management Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/documents |
GET | List documents in knowledge base |
/v1/documents/upload |
POST | Upload a document |
/v1/documents/url |
POST | Add document from URL |
/v1/documents/{doc_id} |
DELETE | Delete a document |
Health & Status
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/ |
GET | API information |
Usage Examples
Chat Completion
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "DocRAG-GLM-4.7",
"messages": [
{"role": "user", "content": "What is machine learning?"}
],
"stream": false
}'
Upload Document
curl -X POST http://localhost:8000/v1/documents/upload \
-F "file=@document.pdf"
Add Document from URL
curl -X POST http://localhost:8000/v1/documents/url \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/article.html"}'
Python Client
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed" # API key not validated
)
response = client.chat.completions.create(
model="DocRAG-GLM-4.7",
messages=[
{"role": "user", "content": "Explain quantum computing"}
],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Configuration
Configure via environment variables or .env file:
| Variable | Default | Description |
|---|---|---|
HOST |
0.0.0.0 |
Server host |
PORT |
8000 |
Server port |
DEBUG |
false |
Enable debug mode |
MODEL_NAME |
DocRAG-GLM-4.7 |
Display model name |
UPSTREAM_MODEL |
glm-4.7 |
Upstream model to use |
ZAI_API_KEY / OPENROUTER_API_KEY |
(required) | API key for upstream LLM (OpenRouter) |
EMBEDDING_MODEL |
text-embedding-3-small |
Embedding model |
VECTOR_STORE_PATH |
./data/vectors |
Vector store location |
DOCUMENTS_PATH |
./data/documents |
Document storage |
CHUNK_SIZE |
1000 |
Document chunk size |
CHUNK_OVERLAP |
200 |
Chunk overlap |
TOP_K_RESULTS |
5 |
Number of context results |
ENABLE_TOOLS |
true |
Enable tool support |
Project Structure
docrag/
├── main.py # FastAPI application entry point
├── rag/
│ ├── __init__.py # RAG system main class
│ ├── document_processor.py # Document parsing and chunking
│ ├── vector_store.py # Vector storage and search
│ └── retriever.py # Context retrieval logic
├── tools/
│ └── __init__.py # Tool management (website_downloader, etc.)
├── website-downloader.py # CLI website downloader
├── website_downloader_tool.py # Tool wrapper for GLM-4.7-Flash
├── requirements.txt # Python dependencies
├── .env.example # Configuration template
└── README.md # This file
How It Works
Request Flow
- User sends message → OpenAI-compatible endpoint receives request
- RAG Retrieval → Query is processed and relevant context is retrieved
- Context Enhancement → Retrieved context is added to the prompt
- Tool Execution → If needed, tools are invoked (e.g., website_downloader)
- LLM Generation → GLM-4.7-Flash generates response with context
- Response → User receives response (streaming supported)
RAG Pipeline
User Query
│
▼
┌─────────────────┐
│ Query Processor │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Vector Search │ ← Knowledge Base
└────────┬────────┘
│
▼
┌─────────────────┐
│ Context Builder │
└────────┬────────┘
│
▼
┌─────────────────┐
│ GLM-4.7-Flash │
└────────┬────────┘
│
▼
Response
Supported Document Formats
- Text:
.txt,.md,.rst,.log - Documents:
.pdf,.docx - Web:
.html,.htm - Data:
.json,.yaml,.yml,.xml,.toml,.csv,.tsv - Code:
.py,.js,.ts,.java,.cpp,.c,.go,.rs,.rb,.php, etc.
Extending
Adding New Tools
# In tools/__init__.py
def my_custom_tool(param1: str, param2: int = 10) -> dict:
"""Your tool implementation."""
return {"result": "success"}
# Register the tool
tool_manager.register_tool(
name="my_custom_tool",
function=my_custom_tool,
schema={
"type": "function",
"function": {
"name": "my_custom_tool",
"description": "Description of your tool",
"parameters": {
"type": "object",
"properties": {
"param1": {"type": "string", "description": "..."},
"param2": {"type": "integer", "description": "...", "default": 10}
},
"required": ["param1"]
}
}
}
)
Using Different Vector Stores
The default implementation uses a simple file-based store. To use ChromaDB:
- Install:
pip install chromadb - Modify
rag/vector_store.pyto use ChromaDB client
Development
Running in Development Mode
DEBUG=true python main.py
Running Tests
pip install pytest pytest-asyncio
pytest tests/
License
Private repository - All rights reserved.