docrag/README.md

# DocRAG - OpenAI-Compatible RAG Server

A custom RAG (Retrieval-Augmented Generation) system that **appears as a standard OpenAI API server** to clients like Open WebUI. Behind the scenes, it:

1. Processes user queries through a RAG system
2. Retrieves relevant context from a knowledge base
3. Passes the enriched context to GLM-4.7-Flash for response generation
4. Optionally uses tools like website_downloader for enhanced capabilities

Users interact with what appears to be a normal chat experience, while sophisticated RAG operations happen transparently in the background.

## Features

- **OpenAI-Compatible API**: Works with any OpenAI client (Open WebUI, custom apps, etc.)
- **RAG Integration**: Automatic context retrieval for enhanced responses
- **Document Management**: Upload and manage documents in the knowledge base
- **Tool Support**: Built-in tools like website_downloader for extended capabilities
- **Streaming Support**: Real-time streaming responses
- **Easy Configuration**: Environment-based configuration

## Quick Start

### 1. Install Dependencies

```bash
pip install -r requirements.txt
```

### 2. Configure Environment

```bash
cp .env.example .env
# Edit .env and add your ZAI_API_KEY
```

### 3. Run the Server

```bash
python main.py
```

The server will start on `http://0.0.0.0:8000`

### 4. Use with Open WebUI

1. Open Open WebUI settings
2. Add a new OpenAI-compatible connection
3. Set the base URL to `http://your-server:8000/v1`
4. Leave the API key empty or use any value (not validated)
5. Select the "DocRAG-GLM-4.7" model

## API Endpoints

### OpenAI-Compatible Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/v1/chat/completions` | POST | Chat completions (streaming supported) |
| `/v1/models` | GET | List available models |
| `/v1/models/{model_id}` | GET | Get model information |

### Document Management Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/v1/documents` | GET | List documents in knowledge base |
| `/v1/documents/upload` | POST | Upload a document |
| `/v1/documents/url` | POST | Add document from URL |
| `/v1/documents/{doc_id}` | DELETE | Delete a document |

### Health & Status

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/` | GET | API information |

## Usage Examples

### Chat Completion

```bash
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "DocRAG-GLM-4.7",
    "messages": [
      {"role": "user", "content": "What is machine learning?"}
    ],
    "stream": false
  }'
```

### Upload Document

```bash
curl -X POST http://localhost:8000/v1/documents/upload \
  -F "file=@document.pdf"
```

### Add Document from URL

```bash
curl -X POST http://localhost:8000/v1/documents/url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article.html"}'
```

### Python Client

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"  # API key not validated
)

response = client.chat.completions.create(
    model="DocRAG-GLM-4.7",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
```

## Configuration

Configure via environment variables or `.env` file:

| Variable | Default | Description |
|----------|---------|-------------|
| `HOST` | `0.0.0.0` | Server host |
| `PORT` | `8000` | Server port |
| `DEBUG` | `false` | Enable debug mode |
| `MODEL_NAME` | `DocRAG-GLM-4.7` | Display model name |
| `UPSTREAM_MODEL` | `glm-4.7` | Upstream model to use |
| `ZAI_API_KEY` | (required) | API key for ZAI SDK |
| `EMBEDDING_MODEL` | `text-embedding-3-small` | Embedding model |
| `VECTOR_STORE_PATH` | `./data/vectors` | Vector store location |
| `DOCUMENTS_PATH` | `./data/documents` | Document storage |
| `CHUNK_SIZE` | `1000` | Document chunk size |
| `CHUNK_OVERLAP` | `200` | Chunk overlap |
| `TOP_K_RESULTS` | `5` | Number of context results |
| `ENABLE_TOOLS` | `true` | Enable tool support |

## Project Structure

```
docrag/
├── main.py                    # FastAPI application entry point
├── rag/
│   ├── __init__.py           # RAG system main class
│   ├── document_processor.py # Document parsing and chunking
│   ├── vector_store.py       # Vector storage and search
│   └── retriever.py          # Context retrieval logic
├── tools/
│   └── __init__.py           # Tool management (website_downloader, etc.)
├── website-downloader.py     # CLI website downloader
├── website_downloader_tool.py # Tool wrapper for GLM-4.7-Flash
├── requirements.txt          # Python dependencies
├── .env.example              # Configuration template
└── README.md                 # This file
```

## How It Works

### Request Flow

1. **User sends message** → OpenAI-compatible endpoint receives request
2. **RAG Retrieval** → Query is processed and relevant context is retrieved
3. **Context Enhancement** → Retrieved context is added to the prompt
4. **Tool Execution** → If needed, tools are invoked (e.g., website_downloader)
5. **LLM Generation** → GLM-4.7-Flash generates response with context
6. **Response** → User receives response (streaming supported)

### RAG Pipeline

```
User Query
    │
    ▼
┌─────────────────┐
│ Query Processor │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Vector Search   │ ← Knowledge Base
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Context Builder │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  GLM-4.7-Flash  │
└────────┬────────┘
         │
         ▼
     Response
```

## Supported Document Formats

- **Text**: `.txt`, `.md`, `.rst`, `.log`
- **Documents**: `.pdf`, `.docx`
- **Web**: `.html`, `.htm`
- **Data**: `.json`, `.yaml`, `.yml`, `.xml`, `.toml`, `.csv`, `.tsv`
- **Code**: `.py`, `.js`, `.ts`, `.java`, `.cpp`, `.c`, `.go`, `.rs`, `.rb`, `.php`, etc.

## Extending

### Adding New Tools

```python
# In tools/__init__.py

def my_custom_tool(param1: str, param2: int = 10) -> dict:
    """Your tool implementation."""
    return {"result": "success"}

# Register the tool
tool_manager.register_tool(
    name="my_custom_tool",
    function=my_custom_tool,
    schema={
        "type": "function",
        "function": {
            "name": "my_custom_tool",
            "description": "Description of your tool",
            "parameters": {
                "type": "object",
                "properties": {
                    "param1": {"type": "string", "description": "..."},
                    "param2": {"type": "integer", "description": "...", "default": 10}
                },
                "required": ["param1"]
            }
        }
    }
)
```

### Using Different Vector Stores

The default implementation uses a simple file-based store. To use ChromaDB:

1. Install: `pip install chromadb`
2. Modify `rag/vector_store.py` to use ChromaDB client

## Development

### Running in Development Mode

```bash
DEBUG=true python main.py
```

### Running Tests

```bash
pip install pytest pytest-asyncio
pytest tests/
```

## License

Private repository - All rights reserved.