Commit Graph

17 Commits

Author SHA1 Message Date
Z User
7a6b6f1086 Add local tool selector: keyword parser picks relevant tools, no LLM
_select_tools() parses the user message with keyword matching:
- News keywords → news_aggregate, news_get_top_stories, news_get_reddit
- Finance/stock keywords → finance_get_stock_info/history (extracts ticker)
- Crypto keywords → finance_get_crypto_price (extracts coin name), finance_get_top_cryptos
- Weather keywords → weather_get_current/forecast/air_quality (extracts location)
- Medical keywords → pubmed, fda, disease data, health topics
- Science keywords → science_aggregate_search
- Wikipedia keywords → wikipedia_search
- Always: web_search + web_instant_answer as general fallback
- URL in message → web_get_page_content

Entity extractors:
- _extract_ticker: maps known company names, handles $TICKER format
- _extract_crypto: maps known crypto names to CoinGecko IDs
- _extract_location: preposition-based + known locations (prefers longest match)
- _extract_subject: strips question patterns, leading articles, trailing punctuation

Flow remains: request → select tools → run in parallel → results into system prompt → 1 LLM call
2026-03-29 18:44:14 +00:00
Z User
70109d6889 Rewrite: firehose all tools in parallel, then single LLM call
No LLM needed for tool selection. Flow is now:
  Request → run ALL tools in parallel → results into system prompt → 1 LLM call

- _run_all_tools: fires every tool concurrently (30s timeout each)
  - No required args: run with schema defaults
  - Query-like required args (query, topic, title, etc): use user message
  - Specific args (symbol, url, pmid): skip (can't guess)
- _build_tool_results_text: formats all results into system prompt
- build_enhanced_messages: system prompt now has real-time data section
- call_llm: dead simple, just prompt → response (replaces generate_response)
- Removed: generate_response, _parse_tool_calls, _clean_tool_syntax,
  _build_tool_descriptions (all dead code now)
- Streaming path: same flow, runs tools then streams the LLM response
- Both streaming and non-streaming use identical tool pipeline
2026-03-29 18:36:37 +00:00
Z User
8a46a78a4e Fix: add robust parsing, logging, and safety net for empty responses
Three fixes for the 'I apologize, couldnt generate a response' bug:

1. Safety net: if _clean_tool_syntax strips ALL content (e.g. the LLM
   output only the JSON tool call block and nothing else), return the
   original content instead of the useless error message.

2. Detailed logging: now logs the first 300 chars of every LLM response
   so we can see exactly what the model outputs. Also logs which parse
   pattern matched and which tool names were found.

3. Desperate fallback parser (Pattern 4): if none of the regex/brace
   patterns match, tries to json.loads() the entire content and looks
   for known tool names. Catches LLMs that output the array directly
   or use slightly different formatting.
2026-03-29 18:11:43 +00:00
Z User
a2285d3a48 Switch to mega-tool-call approach for unlimited tool calls
The upstream LLM only supports 2 native tool calls per response, but
the user needs to fire many tools at once. Solution: content-based
'mega tool call' where the LLM bundles ALL tool calls into a single
JSON array in its response text.

Key changes:
- System prompt: tells LLM to output {tool_calls: [...]} array
  with ALL needed tools in one block (no native tools param)
- _parse_tool_calls: parses the tool_calls array format (with legacy
  tool_call single-object fallback)
- generate_response: NO tools/tool_choice params to API, pure
  content-based parsing
- generate_response: executes ALL tools concurrently via asyncio.gather
- generate_response: feeds ALL results back in one consolidated message
- _clean_tool_syntax: strips both tool_calls and tool_call blocks
2026-03-29 18:06:39 +00:00
Z User
57228625fc Fix tool calling: switch to native OpenAI tools parameter
Problems fixed:
- 'Mega tool call': LLM outputting multiple tool calls that got bundled
  into one. Now uses native OpenAI tools parameter which handles multiple
  tool calls properly via message.tool_calls array.
- 'Returning nothing': _clean_tool_syntax was too aggressive, stripping
  the entire response. Now only strips code-fence-wrapped blocks.
- Tool results were appended to system message growing it unboundedly;
  now uses proper 'tool' role messages in conversation history.

Key changes:
- generate_response: passes tools/tool_choice to OpenAI API (native
  tool calling), with retry without tool_choice for unsupported models
- generate_response: handles multiple tool_calls per response natively
- generate_response: uses proper 'tool' role for results instead of
  appending to system message
- _parse_tool_calls (was _parse_tool_call): now returns a list, supports
  multiple tool calls, used as fallback for models without native tools
- _clean_tool_syntax: much less aggressive, only strips code-fence
  blocks, no longer removes bare JSON (was eating valid responses)
- System prompt: removed JSON format instructions (native tools handles
  format), simplified rules
2026-03-29 17:57:26 +00:00
Z User
c03bde8023 Fix tool call parsing, improve embeddings, and fix async issues
- main.py: Rewrote _parse_tool_call with brace-counting for robust JSON extraction
- main.py: Improved _clean_tool_syntax with brace-aware removal of tool_call JSON
- main.py: Fixed dict key mismatches (chunks_ingested, pages_downloaded)
- main.py: Run tool execution in asyncio.to_thread to avoid blocking event loop
- main.py: Always clean tool syntax from responses (handles edge cases)
- rag/__init__.py: Wrap blocking website_downloader in run_in_executor
- rag/__init__.py: Replace deprecated datetime.utcnow() with datetime.now(timezone.utc)
- rag/__init__.py: Add add_document_from_url method
- rag/vector_store.py: Replace hash-based embeddings with TF-IDF inspired embeddings
- rag/vector_store.py: Add embedding dimension mismatch handling in search
- README.md: Update API key config documentation
2026-03-29 17:49:32 +00:00
Z User
6eb18ce7f3 Switch to context-based tool calling (no API tool limit)
Instead of passing tools to the OpenRouter API (limited to 10 tools):
- Tool descriptions are now embedded in the system prompt
- LLM outputs tool calls as JSON: {"tool_call": {"name": "...", "arguments": {...}}}
- We parse the response, execute tools, and feed results back
- Supports all 33 tools without hitting the API limit

Changes:
- Added _build_tool_descriptions() for tool docs in prompt
- Added _parse_tool_call() to extract tool requests from LLM output
- Added _clean_tool_syntax() to remove tool JSON from responses
- Rewrote generate_response() for context-based approach
- Updated system prompt with tool usage instructions
2026-03-29 17:02:02 +00:00
Z User
ac0eff1cdd Fix: Prevent website re-downloads and skip automated tasks
- Skip website download for Open WebUI automated tasks (title, tags, follow-ups)
- Check if site already downloaded before re-downloading
- Return cached site info if previously downloaded
- Reduces unnecessary network calls and processing time
2026-03-29 16:54:38 +00:00
Z User
d966f8ea5d Add detailed logging for debugging tool calling issues
- Log full LLM response object
- Log message content and tool calls
- Log request start/end with request_id
- Add traceback logging for errors
2026-03-29 16:25:44 +00:00
Z User
b811162f78 Implement tool calling loop for LLM
- Pass all registered tools to LLM during chat completion
- Handle tool_calls from LLM response
- Execute tools and feed results back to LLM
- Loop until LLM returns final response
- Updated system prompt to encourage tool use
- Updated streaming to handle tool calls
- Increased MAX_TOOL_ITERATIONS to 5
2026-03-29 16:07:56 +00:00
Z User
973bf5ab88 Fix AsyncOpenAI proxy compatibility issue
- Create custom httpx.AsyncClient to avoid proxy argument error
- This fixes 'AsyncClient.__init__() got an unexpected keyword argument proxies'
2026-03-29 04:51:23 +00:00
Z User
5ec2ef5911 Fix .env loading and add debug logging for API key
- Load .env from script directory explicitly
- Add logging to show .env file location and existence
- Show API key preview on startup for debugging
2026-03-29 04:47:54 +00:00
Z User
b23964b35a Switch from ZAI SDK to OpenRouter with openrouter/free model
- Replace z-ai-web-dev-sdk with openai SDK
- Add OPENROUTER_API_KEY and OPENROUTER_BASE_URL config
- Update AsyncOpenAI client for OpenRouter
- Update generate_response and stream_chat_completion
- Update .env.example with OpenRouter settings
2026-03-29 04:35:54 +00:00
Z User
10e61dd2f1 Fix: Auto-download websites BEFORE RAG retrieval
Key changes:
- Add URL extraction and detection functions
- Download websites BEFORE RAG retrieval (not after)
- Expand trigger keywords to include common phrases like 'go to', 'headlines', etc.
- Update system prompt to tell LLM it CAN access websites
- Improve streaming response handling

Now when user asks 'go to orovillemr.com and give me the headlines':
1. System detects URL and access intent
2. Downloads and ingests website content
3. RAG retrieves relevant content
4. LLM generates response with actual website content
2026-03-29 03:58:39 +00:00
Z User
6aecc4b231 Integrate website_downloader_tool into RAG system
Features:
- RAG system now uses website_downloader_tool as primary content ingestion method
- download_and_ingest_website() method for complete website processing
- Stores page pointers (source_url, page_url, local_path) in vector store
- Site registry tracks all downloaded websites with metadata
- New API endpoints for website management:
  - POST /v1/documents/website - Download and ingest a website
  - GET /v1/documents/sites - List all downloaded sites
  - GET /v1/documents/sites/{url} - Get site info
  - DELETE /v1/documents/sites/{url} - Delete a site and its content

Changes:
- rag/__init__.py: Added download_and_ingest_website(), site registry
- rag/document_processor.py: Added extract_text_from_html() public method
- rag/vector_store.py: Added delete_by_source_url(), get_stats()
- main.py: New website endpoints, integrated tool with RAG system
2026-03-29 02:36:59 +00:00
Z User
eabdadfb62 Implement full DocRAG server with OpenAI-compatible API
Features:
- FastAPI server with OpenAI-compatible endpoints (/v1/chat/completions, /v1/models)
- RAG system with document processing and vector storage
- Support for multiple document formats (PDF, DOCX, HTML, text, code)
- Streaming response support
- Tool integration with website_downloader
- Document management API endpoints
- GLM-4.7-Flash integration via z-ai-web-dev-sdk
- Works transparently with Open WebUI and other OpenAI clients

Components:
- main.py: FastAPI application with OpenAI-compatible API
- rag/: RAG system (document processor, vector store, retriever)
- tools/: Tool manager with website_downloader integration
- .env.example: Configuration template
2026-03-29 00:57:37 +00:00
e3681949e2 add main 2026-03-28 17:46:13 -07:00