docrag/rag
Z User 6aecc4b231 Integrate website_downloader_tool into RAG system
Features:
- RAG system now uses website_downloader_tool as primary content ingestion method
- download_and_ingest_website() method for complete website processing
- Stores page pointers (source_url, page_url, local_path) in vector store
- Site registry tracks all downloaded websites with metadata
- New API endpoints for website management:
  - POST /v1/documents/website - Download and ingest a website
  - GET /v1/documents/sites - List all downloaded sites
  - GET /v1/documents/sites/{url} - Get site info
  - DELETE /v1/documents/sites/{url} - Delete a site and its content

Changes:
- rag/__init__.py: Added download_and_ingest_website(), site registry
- rag/document_processor.py: Added extract_text_from_html() public method
- rag/vector_store.py: Added delete_by_source_url(), get_stats()
- main.py: New website endpoints, integrated tool with RAG system
2026-03-29 02:36:59 +00:00
..
__init__.py Integrate website_downloader_tool into RAG system 2026-03-29 02:36:59 +00:00
document_processor.py Integrate website_downloader_tool into RAG system 2026-03-29 02:36:59 +00:00
retriever.py Implement full DocRAG server with OpenAI-compatible API 2026-03-29 00:57:37 +00:00
vector_store.py Integrate website_downloader_tool into RAG system 2026-03-29 02:36:59 +00:00