# DocRAG - OpenAI-Compatible RAG Server A custom RAG (Retrieval-Augmented Generation) system that **appears as a standard OpenAI API server** to clients like Open WebUI. Behind the scenes, it: 1. Processes user queries through a RAG system 2. Retrieves relevant context from a knowledge base 3. Passes the enriched context to GLM-4.7-Flash for response generation 4. Optionally uses tools like website_downloader for enhanced capabilities Users interact with what appears to be a normal chat experience, while sophisticated RAG operations happen transparently in the background. ## Features - **OpenAI-Compatible API**: Works with any OpenAI client (Open WebUI, custom apps, etc.) - **RAG Integration**: Automatic context retrieval for enhanced responses - **Document Management**: Upload and manage documents in the knowledge base - **Tool Support**: Built-in tools like website_downloader for extended capabilities - **Streaming Support**: Real-time streaming responses - **Easy Configuration**: Environment-based configuration ## Quick Start ### 1. Install Dependencies ```bash pip install -r requirements.txt ``` ### 2. Configure Environment ```bash cp .env.example .env # Edit .env and add your ZAI_API_KEY ``` ### 3. Run the Server ```bash python main.py ``` The server will start on `http://0.0.0.0:8000` ### 4. Use with Open WebUI 1. Open Open WebUI settings 2. Add a new OpenAI-compatible connection 3. Set the base URL to `http://your-server:8000/v1` 4. Leave the API key empty or use any value (not validated) 5. Select the "DocRAG-GLM-4.7" model ## API Endpoints ### OpenAI-Compatible Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/v1/chat/completions` | POST | Chat completions (streaming supported) | | `/v1/models` | GET | List available models | | `/v1/models/{model_id}` | GET | Get model information | ### Document Management Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/v1/documents` | GET | List documents in knowledge base | | `/v1/documents/upload` | POST | Upload a document | | `/v1/documents/url` | POST | Add document from URL | | `/v1/documents/{doc_id}` | DELETE | Delete a document | ### Health & Status | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Health check | | `/` | GET | API information | ## Usage Examples ### Chat Completion ```bash curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "DocRAG-GLM-4.7", "messages": [ {"role": "user", "content": "What is machine learning?"} ], "stream": false }' ``` ### Upload Document ```bash curl -X POST http://localhost:8000/v1/documents/upload \ -F "file=@document.pdf" ``` ### Add Document from URL ```bash curl -X POST http://localhost:8000/v1/documents/url \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com/article.html"}' ``` ### Python Client ```python from openai import OpenAI client = OpenAI( base_url="http://localhost:8000/v1", api_key="not-needed" # API key not validated ) response = client.chat.completions.create( model="DocRAG-GLM-4.7", messages=[ {"role": "user", "content": "Explain quantum computing"} ], stream=True ) for chunk in response: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") ``` ## Configuration Configure via environment variables or `.env` file: | Variable | Default | Description | |----------|---------|-------------| | `HOST` | `0.0.0.0` | Server host | | `PORT` | `8000` | Server port | | `DEBUG` | `false` | Enable debug mode | | `MODEL_NAME` | `DocRAG-GLM-4.7` | Display model name | | `UPSTREAM_MODEL` | `glm-4.7` | Upstream model to use | | `ZAI_API_KEY` / `OPENROUTER_API_KEY` | (required) | API key for upstream LLM (OpenRouter) | | `EMBEDDING_MODEL` | `text-embedding-3-small` | Embedding model | | `VECTOR_STORE_PATH` | `./data/vectors` | Vector store location | | `DOCUMENTS_PATH` | `./data/documents` | Document storage | | `CHUNK_SIZE` | `1000` | Document chunk size | | `CHUNK_OVERLAP` | `200` | Chunk overlap | | `TOP_K_RESULTS` | `5` | Number of context results | | `ENABLE_TOOLS` | `true` | Enable tool support | ## Project Structure ``` docrag/ ├── main.py # FastAPI application entry point ├── rag/ │ ├── __init__.py # RAG system main class │ ├── document_processor.py # Document parsing and chunking │ ├── vector_store.py # Vector storage and search │ └── retriever.py # Context retrieval logic ├── tools/ │ └── __init__.py # Tool management (website_downloader, etc.) ├── website-downloader.py # CLI website downloader ├── website_downloader_tool.py # Tool wrapper for GLM-4.7-Flash ├── requirements.txt # Python dependencies ├── .env.example # Configuration template └── README.md # This file ``` ## How It Works ### Request Flow 1. **User sends message** → OpenAI-compatible endpoint receives request 2. **RAG Retrieval** → Query is processed and relevant context is retrieved 3. **Context Enhancement** → Retrieved context is added to the prompt 4. **Tool Execution** → If needed, tools are invoked (e.g., website_downloader) 5. **LLM Generation** → GLM-4.7-Flash generates response with context 6. **Response** → User receives response (streaming supported) ### RAG Pipeline ``` User Query │ ▼ ┌─────────────────┐ │ Query Processor │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Vector Search │ ← Knowledge Base └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Context Builder │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ GLM-4.7-Flash │ └────────┬────────┘ │ ▼ Response ``` ## Supported Document Formats - **Text**: `.txt`, `.md`, `.rst`, `.log` - **Documents**: `.pdf`, `.docx` - **Web**: `.html`, `.htm` - **Data**: `.json`, `.yaml`, `.yml`, `.xml`, `.toml`, `.csv`, `.tsv` - **Code**: `.py`, `.js`, `.ts`, `.java`, `.cpp`, `.c`, `.go`, `.rs`, `.rb`, `.php`, etc. ## Extending ### Adding New Tools ```python # In tools/__init__.py def my_custom_tool(param1: str, param2: int = 10) -> dict: """Your tool implementation.""" return {"result": "success"} # Register the tool tool_manager.register_tool( name="my_custom_tool", function=my_custom_tool, schema={ "type": "function", "function": { "name": "my_custom_tool", "description": "Description of your tool", "parameters": { "type": "object", "properties": { "param1": {"type": "string", "description": "..."}, "param2": {"type": "integer", "description": "...", "default": 10} }, "required": ["param1"] } } } ) ``` ### Using Different Vector Stores The default implementation uses a simple file-based store. To use ChromaDB: 1. Install: `pip install chromadb` 2. Modify `rag/vector_store.py` to use ChromaDB client ## Development ### Running in Development Mode ```bash DEBUG=true python main.py ``` ### Running Tests ```bash pip install pytest pytest-asyncio pytest tests/ ``` ## License Private repository - All rights reserved.