docrag

Author	SHA1	Message	Date
Z User	a2285d3a48	Switch to mega-tool-call approach for unlimited tool calls The upstream LLM only supports 2 native tool calls per response, but the user needs to fire many tools at once. Solution: content-based 'mega tool call' where the LLM bundles ALL tool calls into a single JSON array in its response text. Key changes: - System prompt: tells LLM to output {tool_calls: [...]} array with ALL needed tools in one block (no native tools param) - _parse_tool_calls: parses the tool_calls array format (with legacy tool_call single-object fallback) - generate_response: NO tools/tool_choice params to API, pure content-based parsing - generate_response: executes ALL tools concurrently via asyncio.gather - generate_response: feeds ALL results back in one consolidated message - _clean_tool_syntax: strips both tool_calls and tool_call blocks	2026-03-29 18:06:39 +00:00
Z User	57228625fc	Fix tool calling: switch to native OpenAI tools parameter Problems fixed: - 'Mega tool call': LLM outputting multiple tool calls that got bundled into one. Now uses native OpenAI tools parameter which handles multiple tool calls properly via message.tool_calls array. - 'Returning nothing': _clean_tool_syntax was too aggressive, stripping the entire response. Now only strips code-fence-wrapped blocks. - Tool results were appended to system message growing it unboundedly; now uses proper 'tool' role messages in conversation history. Key changes: - generate_response: passes tools/tool_choice to OpenAI API (native tool calling), with retry without tool_choice for unsupported models - generate_response: handles multiple tool_calls per response natively - generate_response: uses proper 'tool' role for results instead of appending to system message - _parse_tool_calls (was _parse_tool_call): now returns a list, supports multiple tool calls, used as fallback for models without native tools - _clean_tool_syntax: much less aggressive, only strips code-fence blocks, no longer removes bare JSON (was eating valid responses) - System prompt: removed JSON format instructions (native tools handles format), simplified rules	2026-03-29 17:57:26 +00:00
Z User	c03bde8023	Fix tool call parsing, improve embeddings, and fix async issues - main.py: Rewrote _parse_tool_call with brace-counting for robust JSON extraction - main.py: Improved _clean_tool_syntax with brace-aware removal of tool_call JSON - main.py: Fixed dict key mismatches (chunks_ingested, pages_downloaded) - main.py: Run tool execution in asyncio.to_thread to avoid blocking event loop - main.py: Always clean tool syntax from responses (handles edge cases) - rag/__init__.py: Wrap blocking website_downloader in run_in_executor - rag/__init__.py: Replace deprecated datetime.utcnow() with datetime.now(timezone.utc) - rag/__init__.py: Add add_document_from_url method - rag/vector_store.py: Replace hash-based embeddings with TF-IDF inspired embeddings - rag/vector_store.py: Add embedding dimension mismatch handling in search - README.md: Update API key config documentation	2026-03-29 17:49:32 +00:00
Z User	6eb18ce7f3	Switch to context-based tool calling (no API tool limit) Instead of passing tools to the OpenRouter API (limited to 10 tools): - Tool descriptions are now embedded in the system prompt - LLM outputs tool calls as JSON: {"tool_call": {"name": "...", "arguments": {...}}} - We parse the response, execute tools, and feed results back - Supports all 33 tools without hitting the API limit Changes: - Added _build_tool_descriptions() for tool docs in prompt - Added _parse_tool_call() to extract tool requests from LLM output - Added _clean_tool_syntax() to remove tool JSON from responses - Rewrote generate_response() for context-based approach - Updated system prompt with tool usage instructions	2026-03-29 17:02:02 +00:00
Z User	ac0eff1cdd	Fix: Prevent website re-downloads and skip automated tasks - Skip website download for Open WebUI automated tasks (title, tags, follow-ups) - Check if site already downloaded before re-downloading - Return cached site info if previously downloaded - Reduces unnecessary network calls and processing time	2026-03-29 16:54:38 +00:00
Z User	d966f8ea5d	Add detailed logging for debugging tool calling issues - Log full LLM response object - Log message content and tool calls - Log request start/end with request_id - Add traceback logging for errors	2026-03-29 16:25:44 +00:00
Z User	b811162f78	Implement tool calling loop for LLM - Pass all registered tools to LLM during chat completion - Handle tool_calls from LLM response - Execute tools and feed results back to LLM - Loop until LLM returns final response - Updated system prompt to encourage tool use - Updated streaming to handle tool calls - Increased MAX_TOOL_ITERATIONS to 5	2026-03-29 16:07:56 +00:00
Z User	973bf5ab88	Fix AsyncOpenAI proxy compatibility issue - Create custom httpx.AsyncClient to avoid proxy argument error - This fixes 'AsyncClient.__init__() got an unexpected keyword argument proxies'	2026-03-29 04:51:23 +00:00
Z User	5ec2ef5911	Fix .env loading and add debug logging for API key - Load .env from script directory explicitly - Add logging to show .env file location and existence - Show API key preview on startup for debugging	2026-03-29 04:47:54 +00:00
Z User	b23964b35a	Switch from ZAI SDK to OpenRouter with openrouter/free model - Replace z-ai-web-dev-sdk with openai SDK - Add OPENROUTER_API_KEY and OPENROUTER_BASE_URL config - Update AsyncOpenAI client for OpenRouter - Update generate_response and stream_chat_completion - Update .env.example with OpenRouter settings	2026-03-29 04:35:54 +00:00
Z User	10e61dd2f1	Fix: Auto-download websites BEFORE RAG retrieval Key changes: - Add URL extraction and detection functions - Download websites BEFORE RAG retrieval (not after) - Expand trigger keywords to include common phrases like 'go to', 'headlines', etc. - Update system prompt to tell LLM it CAN access websites - Improve streaming response handling Now when user asks 'go to orovillemr.com and give me the headlines': 1. System detects URL and access intent 2. Downloads and ingests website content 3. RAG retrieves relevant content 4. LLM generates response with actual website content	2026-03-29 03:58:39 +00:00
Z User	6aecc4b231	Integrate website_downloader_tool into RAG system Features: - RAG system now uses website_downloader_tool as primary content ingestion method - download_and_ingest_website() method for complete website processing - Stores page pointers (source_url, page_url, local_path) in vector store - Site registry tracks all downloaded websites with metadata - New API endpoints for website management: - POST /v1/documents/website - Download and ingest a website - GET /v1/documents/sites - List all downloaded sites - GET /v1/documents/sites/{url} - Get site info - DELETE /v1/documents/sites/{url} - Delete a site and its content Changes: - rag/__init__.py: Added download_and_ingest_website(), site registry - rag/document_processor.py: Added extract_text_from_html() public method - rag/vector_store.py: Added delete_by_source_url(), get_stats() - main.py: New website endpoints, integrated tool with RAG system	2026-03-29 02:36:59 +00:00
Z User	eabdadfb62	Implement full DocRAG server with OpenAI-compatible API Features: - FastAPI server with OpenAI-compatible endpoints (/v1/chat/completions, /v1/models) - RAG system with document processing and vector storage - Support for multiple document formats (PDF, DOCX, HTML, text, code) - Streaming response support - Tool integration with website_downloader - Document management API endpoints - GLM-4.7-Flash integration via z-ai-web-dev-sdk - Works transparently with Open WebUI and other OpenAI clients Components: - main.py: FastAPI application with OpenAI-compatible API - rag/: RAG system (document processor, vector store, retriever) - tools/: Tool manager with website_downloader integration - .env.example: Configuration template	2026-03-29 00:57:37 +00:00
turtle89431	e3681949e2	add main	2026-03-28 17:46:13 -07:00

14 Commits