docrag/README.md
Z User eabdadfb62 Implement full DocRAG server with OpenAI-compatible API
Features:
- FastAPI server with OpenAI-compatible endpoints (/v1/chat/completions, /v1/models)
- RAG system with document processing and vector storage
- Support for multiple document formats (PDF, DOCX, HTML, text, code)
- Streaming response support
- Tool integration with website_downloader
- Document management API endpoints
- GLM-4.7-Flash integration via z-ai-web-dev-sdk
- Works transparently with Open WebUI and other OpenAI clients

Components:
- main.py: FastAPI application with OpenAI-compatible API
- rag/: RAG system (document processor, vector store, retriever)
- tools/: Tool manager with website_downloader integration
- .env.example: Configuration template
2026-03-29 00:57:37 +00:00

277 lines
7.6 KiB
Markdown

# DocRAG - OpenAI-Compatible RAG Server
A custom RAG (Retrieval-Augmented Generation) system that **appears as a standard OpenAI API server** to clients like Open WebUI. Behind the scenes, it:
1. Processes user queries through a RAG system
2. Retrieves relevant context from a knowledge base
3. Passes the enriched context to GLM-4.7-Flash for response generation
4. Optionally uses tools like website_downloader for enhanced capabilities
Users interact with what appears to be a normal chat experience, while sophisticated RAG operations happen transparently in the background.
## Features
- **OpenAI-Compatible API**: Works with any OpenAI client (Open WebUI, custom apps, etc.)
- **RAG Integration**: Automatic context retrieval for enhanced responses
- **Document Management**: Upload and manage documents in the knowledge base
- **Tool Support**: Built-in tools like website_downloader for extended capabilities
- **Streaming Support**: Real-time streaming responses
- **Easy Configuration**: Environment-based configuration
## Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Configure Environment
```bash
cp .env.example .env
# Edit .env and add your ZAI_API_KEY
```
### 3. Run the Server
```bash
python main.py
```
The server will start on `http://0.0.0.0:8000`
### 4. Use with Open WebUI
1. Open Open WebUI settings
2. Add a new OpenAI-compatible connection
3. Set the base URL to `http://your-server:8000/v1`
4. Leave the API key empty or use any value (not validated)
5. Select the "DocRAG-GLM-4.7" model
## API Endpoints
### OpenAI-Compatible Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/v1/chat/completions` | POST | Chat completions (streaming supported) |
| `/v1/models` | GET | List available models |
| `/v1/models/{model_id}` | GET | Get model information |
### Document Management Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/v1/documents` | GET | List documents in knowledge base |
| `/v1/documents/upload` | POST | Upload a document |
| `/v1/documents/url` | POST | Add document from URL |
| `/v1/documents/{doc_id}` | DELETE | Delete a document |
### Health & Status
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/` | GET | API information |
## Usage Examples
### Chat Completion
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "DocRAG-GLM-4.7",
"messages": [
{"role": "user", "content": "What is machine learning?"}
],
"stream": false
}'
```
### Upload Document
```bash
curl -X POST http://localhost:8000/v1/documents/upload \
-F "file=@document.pdf"
```
### Add Document from URL
```bash
curl -X POST http://localhost:8000/v1/documents/url \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/article.html"}'
```
### Python Client
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed" # API key not validated
)
response = client.chat.completions.create(
model="DocRAG-GLM-4.7",
messages=[
{"role": "user", "content": "Explain quantum computing"}
],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
```
## Configuration
Configure via environment variables or `.env` file:
| Variable | Default | Description |
|----------|---------|-------------|
| `HOST` | `0.0.0.0` | Server host |
| `PORT` | `8000` | Server port |
| `DEBUG` | `false` | Enable debug mode |
| `MODEL_NAME` | `DocRAG-GLM-4.7` | Display model name |
| `UPSTREAM_MODEL` | `glm-4.7` | Upstream model to use |
| `ZAI_API_KEY` | (required) | API key for ZAI SDK |
| `EMBEDDING_MODEL` | `text-embedding-3-small` | Embedding model |
| `VECTOR_STORE_PATH` | `./data/vectors` | Vector store location |
| `DOCUMENTS_PATH` | `./data/documents` | Document storage |
| `CHUNK_SIZE` | `1000` | Document chunk size |
| `CHUNK_OVERLAP` | `200` | Chunk overlap |
| `TOP_K_RESULTS` | `5` | Number of context results |
| `ENABLE_TOOLS` | `true` | Enable tool support |
## Project Structure
```
docrag/
├── main.py # FastAPI application entry point
├── rag/
│ ├── __init__.py # RAG system main class
│ ├── document_processor.py # Document parsing and chunking
│ ├── vector_store.py # Vector storage and search
│ └── retriever.py # Context retrieval logic
├── tools/
│ └── __init__.py # Tool management (website_downloader, etc.)
├── website-downloader.py # CLI website downloader
├── website_downloader_tool.py # Tool wrapper for GLM-4.7-Flash
├── requirements.txt # Python dependencies
├── .env.example # Configuration template
└── README.md # This file
```
## How It Works
### Request Flow
1. **User sends message** → OpenAI-compatible endpoint receives request
2. **RAG Retrieval** → Query is processed and relevant context is retrieved
3. **Context Enhancement** → Retrieved context is added to the prompt
4. **Tool Execution** → If needed, tools are invoked (e.g., website_downloader)
5. **LLM Generation** → GLM-4.7-Flash generates response with context
6. **Response** → User receives response (streaming supported)
### RAG Pipeline
```
User Query
┌─────────────────┐
│ Query Processor │
└────────┬────────┘
┌─────────────────┐
│ Vector Search │ ← Knowledge Base
└────────┬────────┘
┌─────────────────┐
│ Context Builder │
└────────┬────────┘
┌─────────────────┐
│ GLM-4.7-Flash │
└────────┬────────┘
Response
```
## Supported Document Formats
- **Text**: `.txt`, `.md`, `.rst`, `.log`
- **Documents**: `.pdf`, `.docx`
- **Web**: `.html`, `.htm`
- **Data**: `.json`, `.yaml`, `.yml`, `.xml`, `.toml`, `.csv`, `.tsv`
- **Code**: `.py`, `.js`, `.ts`, `.java`, `.cpp`, `.c`, `.go`, `.rs`, `.rb`, `.php`, etc.
## Extending
### Adding New Tools
```python
# In tools/__init__.py
def my_custom_tool(param1: str, param2: int = 10) -> dict:
"""Your tool implementation."""
return {"result": "success"}
# Register the tool
tool_manager.register_tool(
name="my_custom_tool",
function=my_custom_tool,
schema={
"type": "function",
"function": {
"name": "my_custom_tool",
"description": "Description of your tool",
"parameters": {
"type": "object",
"properties": {
"param1": {"type": "string", "description": "..."},
"param2": {"type": "integer", "description": "...", "default": 10}
},
"required": ["param1"]
}
}
}
)
```
### Using Different Vector Stores
The default implementation uses a simple file-based store. To use ChromaDB:
1. Install: `pip install chromadb`
2. Modify `rag/vector_store.py` to use ChromaDB client
## Development
### Running in Development Mode
```bash
DEBUG=true python main.py
```
### Running Tests
```bash
pip install pytest pytest-asyncio
pytest tests/
```
## License
Private repository - All rights reserved.