- Pass all registered tools to LLM during chat completion - Handle tool_calls from LLM response - Execute tools and feed results back to LLM - Loop until LLM returns final response - Updated system prompt to encourage tool use - Updated streaming to handle tool calls - Increased MAX_TOOL_ITERATIONS to 5
277 lines
7.6 KiB
Markdown
Executable File
277 lines
7.6 KiB
Markdown
Executable File
# DocRAG - OpenAI-Compatible RAG Server
|
|
|
|
A custom RAG (Retrieval-Augmented Generation) system that **appears as a standard OpenAI API server** to clients like Open WebUI. Behind the scenes, it:
|
|
|
|
1. Processes user queries through a RAG system
|
|
2. Retrieves relevant context from a knowledge base
|
|
3. Passes the enriched context to GLM-4.7-Flash for response generation
|
|
4. Optionally uses tools like website_downloader for enhanced capabilities
|
|
|
|
Users interact with what appears to be a normal chat experience, while sophisticated RAG operations happen transparently in the background.
|
|
|
|
## Features
|
|
|
|
- **OpenAI-Compatible API**: Works with any OpenAI client (Open WebUI, custom apps, etc.)
|
|
- **RAG Integration**: Automatic context retrieval for enhanced responses
|
|
- **Document Management**: Upload and manage documents in the knowledge base
|
|
- **Tool Support**: Built-in tools like website_downloader for extended capabilities
|
|
- **Streaming Support**: Real-time streaming responses
|
|
- **Easy Configuration**: Environment-based configuration
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install Dependencies
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### 2. Configure Environment
|
|
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env and add your ZAI_API_KEY
|
|
```
|
|
|
|
### 3. Run the Server
|
|
|
|
```bash
|
|
python main.py
|
|
```
|
|
|
|
The server will start on `http://0.0.0.0:8000`
|
|
|
|
### 4. Use with Open WebUI
|
|
|
|
1. Open Open WebUI settings
|
|
2. Add a new OpenAI-compatible connection
|
|
3. Set the base URL to `http://your-server:8000/v1`
|
|
4. Leave the API key empty or use any value (not validated)
|
|
5. Select the "DocRAG-GLM-4.7" model
|
|
|
|
## API Endpoints
|
|
|
|
### OpenAI-Compatible Endpoints
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/v1/chat/completions` | POST | Chat completions (streaming supported) |
|
|
| `/v1/models` | GET | List available models |
|
|
| `/v1/models/{model_id}` | GET | Get model information |
|
|
|
|
### Document Management Endpoints
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/v1/documents` | GET | List documents in knowledge base |
|
|
| `/v1/documents/upload` | POST | Upload a document |
|
|
| `/v1/documents/url` | POST | Add document from URL |
|
|
| `/v1/documents/{doc_id}` | DELETE | Delete a document |
|
|
|
|
### Health & Status
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/health` | GET | Health check |
|
|
| `/` | GET | API information |
|
|
|
|
## Usage Examples
|
|
|
|
### Chat Completion
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "DocRAG-GLM-4.7",
|
|
"messages": [
|
|
{"role": "user", "content": "What is machine learning?"}
|
|
],
|
|
"stream": false
|
|
}'
|
|
```
|
|
|
|
### Upload Document
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8000/v1/documents/upload \
|
|
-F "file=@document.pdf"
|
|
```
|
|
|
|
### Add Document from URL
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8000/v1/documents/url \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"url": "https://example.com/article.html"}'
|
|
```
|
|
|
|
### Python Client
|
|
|
|
```python
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(
|
|
base_url="http://localhost:8000/v1",
|
|
api_key="not-needed" # API key not validated
|
|
)
|
|
|
|
response = client.chat.completions.create(
|
|
model="DocRAG-GLM-4.7",
|
|
messages=[
|
|
{"role": "user", "content": "Explain quantum computing"}
|
|
],
|
|
stream=True
|
|
)
|
|
|
|
for chunk in response:
|
|
if chunk.choices[0].delta.content:
|
|
print(chunk.choices[0].delta.content, end="")
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Configure via environment variables or `.env` file:
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `HOST` | `0.0.0.0` | Server host |
|
|
| `PORT` | `8000` | Server port |
|
|
| `DEBUG` | `false` | Enable debug mode |
|
|
| `MODEL_NAME` | `DocRAG-GLM-4.7` | Display model name |
|
|
| `UPSTREAM_MODEL` | `glm-4.7` | Upstream model to use |
|
|
| `ZAI_API_KEY` | (required) | API key for ZAI SDK |
|
|
| `EMBEDDING_MODEL` | `text-embedding-3-small` | Embedding model |
|
|
| `VECTOR_STORE_PATH` | `./data/vectors` | Vector store location |
|
|
| `DOCUMENTS_PATH` | `./data/documents` | Document storage |
|
|
| `CHUNK_SIZE` | `1000` | Document chunk size |
|
|
| `CHUNK_OVERLAP` | `200` | Chunk overlap |
|
|
| `TOP_K_RESULTS` | `5` | Number of context results |
|
|
| `ENABLE_TOOLS` | `true` | Enable tool support |
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
docrag/
|
|
├── main.py # FastAPI application entry point
|
|
├── rag/
|
|
│ ├── __init__.py # RAG system main class
|
|
│ ├── document_processor.py # Document parsing and chunking
|
|
│ ├── vector_store.py # Vector storage and search
|
|
│ └── retriever.py # Context retrieval logic
|
|
├── tools/
|
|
│ └── __init__.py # Tool management (website_downloader, etc.)
|
|
├── website-downloader.py # CLI website downloader
|
|
├── website_downloader_tool.py # Tool wrapper for GLM-4.7-Flash
|
|
├── requirements.txt # Python dependencies
|
|
├── .env.example # Configuration template
|
|
└── README.md # This file
|
|
```
|
|
|
|
## How It Works
|
|
|
|
### Request Flow
|
|
|
|
1. **User sends message** → OpenAI-compatible endpoint receives request
|
|
2. **RAG Retrieval** → Query is processed and relevant context is retrieved
|
|
3. **Context Enhancement** → Retrieved context is added to the prompt
|
|
4. **Tool Execution** → If needed, tools are invoked (e.g., website_downloader)
|
|
5. **LLM Generation** → GLM-4.7-Flash generates response with context
|
|
6. **Response** → User receives response (streaming supported)
|
|
|
|
### RAG Pipeline
|
|
|
|
```
|
|
User Query
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Query Processor │
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Vector Search │ ← Knowledge Base
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Context Builder │
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ GLM-4.7-Flash │
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
Response
|
|
```
|
|
|
|
## Supported Document Formats
|
|
|
|
- **Text**: `.txt`, `.md`, `.rst`, `.log`
|
|
- **Documents**: `.pdf`, `.docx`
|
|
- **Web**: `.html`, `.htm`
|
|
- **Data**: `.json`, `.yaml`, `.yml`, `.xml`, `.toml`, `.csv`, `.tsv`
|
|
- **Code**: `.py`, `.js`, `.ts`, `.java`, `.cpp`, `.c`, `.go`, `.rs`, `.rb`, `.php`, etc.
|
|
|
|
## Extending
|
|
|
|
### Adding New Tools
|
|
|
|
```python
|
|
# In tools/__init__.py
|
|
|
|
def my_custom_tool(param1: str, param2: int = 10) -> dict:
|
|
"""Your tool implementation."""
|
|
return {"result": "success"}
|
|
|
|
# Register the tool
|
|
tool_manager.register_tool(
|
|
name="my_custom_tool",
|
|
function=my_custom_tool,
|
|
schema={
|
|
"type": "function",
|
|
"function": {
|
|
"name": "my_custom_tool",
|
|
"description": "Description of your tool",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"param1": {"type": "string", "description": "..."},
|
|
"param2": {"type": "integer", "description": "...", "default": 10}
|
|
},
|
|
"required": ["param1"]
|
|
}
|
|
}
|
|
}
|
|
)
|
|
```
|
|
|
|
### Using Different Vector Stores
|
|
|
|
The default implementation uses a simple file-based store. To use ChromaDB:
|
|
|
|
1. Install: `pip install chromadb`
|
|
2. Modify `rag/vector_store.py` to use ChromaDB client
|
|
|
|
## Development
|
|
|
|
### Running in Development Mode
|
|
|
|
```bash
|
|
DEBUG=true python main.py
|
|
```
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
pip install pytest pytest-asyncio
|
|
pytest tests/
|
|
```
|
|
|
|
## License
|
|
|
|
Private repository - All rights reserved.
|