From 1468097c1b6be2c87e50c5a27a3aa2d991e1b345 Mon Sep 17 00:00:00 2001 From: Butterfly Dev Date: Tue, 7 Apr 2026 04:39:10 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20update=20progress.md=20=E2=80=94=20Phas?= =?UTF-8?q?e=203=20VM=20Agent=20complete=20with=20full=20remote=20control?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- progress.md | 64 +++++++++++++++++++++++++++++++++-------------------- 1 file changed, 40 insertions(+), 24 deletions(-) diff --git a/progress.md b/progress.md index 14b2e6f..018bc86 100644 --- a/progress.md +++ b/progress.md @@ -1,14 +1,17 @@ # Butterfly Desktop Environment — Progress ## Overview -A remote desktop environment with a Rust (Actix) backend and Angular 21 frontend. The system mimics a traditional Windows-like desktop in the browser, receiving display/audio from VM agents with minimal lag. +A remote desktop environment with a Rust (Actix) backend, Angular 21 frontend, and Rust VM agent. The system mimics a traditional Windows-like desktop in the browser, receiving display/audio from VM agents with minimal lag. **Full remote control** — viewers can move the mouse, click, type, and scroll on the remote machine in real time. ## Architecture ``` ┌─────────────┐ WebSocket ┌──────────────────┐ WebSocket ┌─────────────┐ │ Angular 21 │◄──────────────────►│ Rust Actix Server│◄──────────────────►│ VM Agent exe │ -│ (Browser) │ display/audio │ (REST + WS) │ display/audio │ (Rust) │ -│ │ HUD commands │ │ HUD commands │ │ +│ (Browser) │ display frames │ (REST + WS) │ display frames │ (Rust) │ +│ Viewer │ HUD commands ▸ │ relay hub │ HUD commands ▸ │ captures │ +│ controls │ │ │ │ screen + │ +│ remote │ │ │ │ injects │ +│ desktop │ │ │ │ input │ └─────────────┘ └──────────────────┘ └─────────────┘ ``` @@ -61,34 +64,47 @@ A remote desktop environment with a Rust (Actix) backend and Angular 21 frontend - [x] Removed unused imports (RouterModule, ViewChild, etc.) - [x] HealthInfo interface updated with connected_viewers -### Phase 3: VM Agent Executable 🔲 (next) -- [ ] Rust desktop agent that captures display and audio -- [ ] Streams display frames (JPEG/PNG) and audio (Opus/PCM) via WebSocket -- [ ] Receives HUD commands (mouse, keyboard, resize) -- [ ] Low-latency design for LAN usage +### Phase 3: VM Agent Executable ✅ +- [x] `agent/Cargo.toml` — Dependencies: scrap, enigo, tokio-tungstenite, image, base64, clap, reqwest, serde +- [x] `agent/src/protocol.rs` — AgentWsMessage enum matching server WsMessage, builder helpers +- [x] `agent/src/config.rs` — CLI args: --server, --session, --fps, --quality, --display, --audio, --heartbeat, --reconnect +- [x] `agent/src/capture.rs` — Screen capture via `scrap` (DXGI/X11/CoreGraphics), BGRA→RGB, JPEG encoding, base64, frame stats +- [x] `agent/src/input.rs` — **Full remote control**: mouse move/click/dblclick/scroll, keyboard with 60+ key mappings (browser code→enigo Key), modifier handling, key_type for strings +- [x] `agent/src/main.rs` — Entry point: auto session creation via REST, WebSocket connect, tokio::select! loop (capture + receive + heartbeat), auto-reconnect, graceful shutdown -### Phase 4: Integration & Polish 🔲 +#### Remote Control Commands Supported +| Command | Description | Params | +|---------|-------------|--------| +| `mouse_move` | Move cursor | `x`, `y` | +| `mouse_down` | Press button | `button` (0=left, 1=mid, 2=right) | +| `mouse_up` | Release button | `button` | +| `mouse_click` | Click button | `button` | +| `mouse_dblclick` | Double-click | `button` | +| `scroll` | Scroll wheel | `deltaX`, `deltaY` | +| `key_down` | Press key | `key`, `code`, `ctrl`, `shift`, `alt`, `meta` | +| `key_up` | Release key | `key`, `code`, `ctrl`, `shift`, `alt`, `meta` | +| `key_click` | Type key | `key`, `code` | +| `key_type` | Type string | `text` | + +### Phase 4: Integration & Polish 🔲 (next) - [ ] End-to-end testing (agent → server → browser) -- [ ] Session management UI -- [ ] Multi-session support (frontend done, backend ready) -- [ ] Authentication (JWT) -- [ ] Performance optimization -- [ ] Audio playback (frontend plumbing exists, needs Web Audio integration) +- [ ] Audio capture and playback (cpal + Web Audio API) +- [ ] Authentication (JWT / API keys) +- [ ] Performance optimization (binary WS frames, delta encoding) - [ ] Start menu search filtering - [ ] Window snap/edge-docking +- [ ] Touch support for mobile viewers +- [ ] Clipboard forwarding +- [ ] Multi-monitor support ## Recent Commits +- `0961634` agent: main.rs — entry point, WS client, capture loop, input dispatch, auto-reconnect +- `e1e6442` agent: input.rs — full remote control with 60+ key mappings +- `4c93b47` agent: capture.rs — screen capture, BGRA→RGB, JPEG encoding, base64 +- `5a26c7c` agent: config.rs — CLI args and configuration +- `56f6e88` agent: protocol.rs — WsMessage types matching server +- `50e5df0` agent: Cargo.toml — project dependencies - `dd70696` api: add connected_viewers to HealthInfo interface -- `fb2323f` fix: remove signal type annotation for safeUrl (TS2749) -- `6f07a1b` fix: move sanitizer call to constructor (was used before init) -- `cecca9f` fix: browser uses DomSanitizer for iframe, key-based reload -- `ef1b8df` fix: remote display now uses per-instance WebSocket -- `b7d6c95` fix: terminal now auto-scrolls on output -- `f3da41d` fix: taskbar clock now updates every second -- `d096d20` cleanup: remove unused currentTime computed, RouterModule -- `5ea4cdf` cargo: remove unused rand and tokio-stream dependencies -- `4bc5e09` api: wire up real agent channel for HUD commands - `2344060` ws/handler: implement real bidirectional relay - `29eda76` state: add viewer/agent channel registries - `dcfaceb` desktop: production build works (328KB, 85KB gzip) -- `bae17ae` server: fix pong bytes — backend compiles