The upstream LLM only supports 2 native tool calls per response, but
the user needs to fire many tools at once. Solution: content-based
'mega tool call' where the LLM bundles ALL tool calls into a single
JSON array in its response text.
Key changes:
- System prompt: tells LLM to output {tool_calls: [...]} array
with ALL needed tools in one block (no native tools param)
- _parse_tool_calls: parses the tool_calls array format (with legacy
tool_call single-object fallback)
- generate_response: NO tools/tool_choice params to API, pure
content-based parsing
- generate_response: executes ALL tools concurrently via asyncio.gather
- generate_response: feeds ALL results back in one consolidated message
- _clean_tool_syntax: strips both tool_calls and tool_call blocks
Problems fixed:
- 'Mega tool call': LLM outputting multiple tool calls that got bundled
into one. Now uses native OpenAI tools parameter which handles multiple
tool calls properly via message.tool_calls array.
- 'Returning nothing': _clean_tool_syntax was too aggressive, stripping
the entire response. Now only strips code-fence-wrapped blocks.
- Tool results were appended to system message growing it unboundedly;
now uses proper 'tool' role messages in conversation history.
Key changes:
- generate_response: passes tools/tool_choice to OpenAI API (native
tool calling), with retry without tool_choice for unsupported models
- generate_response: handles multiple tool_calls per response natively
- generate_response: uses proper 'tool' role for results instead of
appending to system message
- _parse_tool_calls (was _parse_tool_call): now returns a list, supports
multiple tool calls, used as fallback for models without native tools
- _clean_tool_syntax: much less aggressive, only strips code-fence
blocks, no longer removes bare JSON (was eating valid responses)
- System prompt: removed JSON format instructions (native tools handles
format), simplified rules
Instead of passing tools to the OpenRouter API (limited to 10 tools):
- Tool descriptions are now embedded in the system prompt
- LLM outputs tool calls as JSON: {"tool_call": {"name": "...", "arguments": {...}}}
- We parse the response, execute tools, and feed results back
- Supports all 33 tools without hitting the API limit
Changes:
- Added _build_tool_descriptions() for tool docs in prompt
- Added _parse_tool_call() to extract tool requests from LLM output
- Added _clean_tool_syntax() to remove tool JSON from responses
- Rewrote generate_response() for context-based approach
- Updated system prompt with tool usage instructions
- Skip website download for Open WebUI automated tasks (title, tags, follow-ups)
- Check if site already downloaded before re-downloading
- Return cached site info if previously downloaded
- Reduces unnecessary network calls and processing time
- Pass all registered tools to LLM during chat completion
- Handle tool_calls from LLM response
- Execute tools and feed results back to LLM
- Loop until LLM returns final response
- Updated system prompt to encourage tool use
- Updated streaming to handle tool calls
- Increased MAX_TOOL_ITERATIONS to 5
Key changes:
- Add URL extraction and detection functions
- Download websites BEFORE RAG retrieval (not after)
- Expand trigger keywords to include common phrases like 'go to', 'headlines', etc.
- Update system prompt to tell LLM it CAN access websites
- Improve streaming response handling
Now when user asks 'go to orovillemr.com and give me the headlines':
1. System detects URL and access intent
2. Downloads and ingests website content
3. RAG retrieves relevant content
4. LLM generates response with actual website content
Features:
- RAG system now uses website_downloader_tool as primary content ingestion method
- download_and_ingest_website() method for complete website processing
- Stores page pointers (source_url, page_url, local_path) in vector store
- Site registry tracks all downloaded websites with metadata
- New API endpoints for website management:
- POST /v1/documents/website - Download and ingest a website
- GET /v1/documents/sites - List all downloaded sites
- GET /v1/documents/sites/{url} - Get site info
- DELETE /v1/documents/sites/{url} - Delete a site and its content
Changes:
- rag/__init__.py: Added download_and_ingest_website(), site registry
- rag/document_processor.py: Added extract_text_from_html() public method
- rag/vector_store.py: Added delete_by_source_url(), get_stats()
- main.py: New website endpoints, integrated tool with RAG system
Features:
- FastAPI server with OpenAI-compatible endpoints (/v1/chat/completions, /v1/models)
- RAG system with document processing and vector storage
- Support for multiple document formats (PDF, DOCX, HTML, text, code)
- Streaming response support
- Tool integration with website_downloader
- Document management API endpoints
- GLM-4.7-Flash integration via z-ai-web-dev-sdk
- Works transparently with Open WebUI and other OpenAI clients
Components:
- main.py: FastAPI application with OpenAI-compatible API
- rag/: RAG system (document processor, vector store, retriever)
- tools/: Tool manager with website_downloader integration
- .env.example: Configuration template