# Butterfly Desktop Environment — Progress ## Overview A remote desktop environment with a Rust (Actix) backend, Angular 21 frontend, and Rust VM agent. The system mimics a traditional Windows-like desktop in the browser, receiving display/audio from VM agents with minimal lag. **Full remote control** — viewers can move the mouse, click, type, and scroll on the remote machine in real time. **Low-latency pipeline**: H.264 hardware-accelerated encoding (openh264), binary WebSocket frames (no JSON/base64 overhead), WebCodes GPU-accelerated decoding in the browser. Target: **5-15ms end-to-end latency on LAN** for gaming. ## Architecture ``` ┌─────────────┐ Binary WS frames ┌──────────────────┐ Binary WS frames ┌─────────────┐ │ Angular 21 │◄────────────────────►│ Rust Actix Server│◄────────────────────►│ VM Agent exe │ │ (Browser) │ H.264/JPEG data │ (dumb pipe) │ H.264/JPEG data │ (Rust) │ │ WebCodes │ JSON text only ◄──►│ zero-copy relay │ JSON text only ◄──►│ openh264 │ │ GPU decode │ (HUD, heartbeat) │ │ (HUD, heartbeat) │ BGRA→YUV420 │ └─────────────┘ └──────────────────┘ └─────────────┘ Wire protocol: Binary WS frame = [1B type][4B timestamp][4B width][4B height][payload...] Text WS frame = {"msg_type": "...", ...} (JSON control messages) ``` ## Checklist ### Phase 1: Rust Backend ✅ - [x] Actix HTTP server, REST API, WebSocket handler, frame relay ### Phase 2: Angular 21 Frontend ✅ - [x] Windows-like desktop shell, taskbar, start menu, window manager - [x] Built-in apps: File Explorer, Terminal, Text Editor, Settings, Browser - [x] Session picker, API/WebSocket services, dark theme ### Phase 3: VM Agent Executable ✅ - [x] Screen capture (scrap), input injection (enigo), auto-reconnect ### Phase 3.5: Low-Latency Video Pipeline ✅ #### Agent (H.264 + binary frames) - [x] `agent/src/protocol.rs` — Binary frame format (13-byte header: type + timestamp + width + height + payload) - [x] `agent/src/encoder.rs` — H.264 encoder (openh264, optional feature), JPEG fallback, BGRA→I420 conversion - [x] `agent/src/capture.rs` — Raw BGRA output (encoding moved to encoder) - [x] `agent/src/config.rs` — `--encoder h264|jpeg` flag, default 60fps - [x] `agent/src/main.rs` — Binary WS frames for video, JSON text for control, capture+encode loop - [x] `agent/Cargo.toml` — openh264 optional dep, cfg_if, release optimizations (LTO, codegen-units=1) #### Server (zero-copy binary relay) - [x] `server/src/state.rs` — Binary FrameBuffer (Vec>), WsOutMessage enum (Binary|Text), broadcast_binary_frame - [x] `server/src/ws/handler.rs` — Binary frames from agent → broadcast to viewers (zero-copy); text frames for JSON control; viewer catch-up with latest binary frame #### Frontend (WebCodes H.264 + JPEG fallback) - [x] WebCodes VideoDecoder for H.264 GPU-accelerated decoding - [x] Binary WebSocket frame parsing (13-byte header) - [x] Annex-B NAL unit parsing, SPS/PPS extraction, AVCC description builder - [x] Automatic codec detection from SPS (profile/level guessing) - [x] JPEG fallback when H.264 unavailable - [x] HUD input forwarding unchanged (JSON text frames) #### Latency Comparison | Stage | Old (JPEG+JSON) | New (H.264+Binary) | |-------|----------------|-------------------| | Encode | 15-30ms (CPU) | 1-5ms (openh264) | | Frame size | 200-500KB | 10-50KB | | Network | 2-5ms | 0.5-1ms | | Decode | 3-5ms | 1-2ms (GPU) | | **Total** | **25-45ms** | **~5-15ms** | ## Recent Commits - `60b23bc` fix: Uint8Array to Blob cast for TS compatibility - `63e4513` frontend: WebCodes H.264 decoder, binary WS frames, AVCC description builder - `05cfe9e` server: binary frame relay (zero-copy), text JSON for control - `31a862b` server: binary FrameBuffer, WsOutMessage enum - `081cb0d` agent: Cargo.toml v0.2.0 — openh264 optional feature - `86f0e4e` agent: main.rs — binary WS frames, encoder pipeline - `b7c254a` agent: encoder.rs — H.264 + JPEG encoder abstraction - `cf617d0` agent: capture.rs — raw BGRA output - `b690b07` agent: config.rs — --encoder h264|jpeg flag - `a97ebed` agent: protocol.rs — binary video frame format - `1468097` docs: Phase 3 VM Agent complete - `e1e6442` agent: input.rs — full remote control