4.6 KiB
4.6 KiB
Butterfly Desktop Environment — Progress
Overview
A remote desktop environment with a Rust (Actix) backend, Angular 21 frontend, and Rust VM agent. The system mimics a traditional Windows-like desktop in the browser, receiving display/audio from VM agents with minimal lag. Full remote control — viewers can move the mouse, click, type, and scroll on the remote machine in real time.
Low-latency pipeline: H.264 hardware-accelerated encoding (openh264), binary WebSocket frames (no JSON/base64 overhead), WebCodes GPU-accelerated decoding in the browser. Target: 5-15ms end-to-end latency on LAN for gaming.
Architecture
┌─────────────┐ Binary WS frames ┌──────────────────┐ Binary WS frames ┌─────────────┐
│ Angular 21 │◄────────────────────►│ Rust Actix Server│◄────────────────────►│ VM Agent exe │
│ (Browser) │ H.264/JPEG data │ (dumb pipe) │ H.264/JPEG data │ (Rust) │
│ WebCodes │ JSON text only ◄──►│ zero-copy relay │ JSON text only ◄──►│ openh264 │
│ GPU decode │ (HUD, heartbeat) │ │ (HUD, heartbeat) │ BGRA→YUV420 │
└─────────────┘ └──────────────────┘ └─────────────┘
Wire protocol:
Binary WS frame = [1B type][4B timestamp][4B width][4B height][payload...]
Text WS frame = {"msg_type": "...", ...} (JSON control messages)
Checklist
Phase 1: Rust Backend ✅
- Actix HTTP server, REST API, WebSocket handler, frame relay
Phase 2: Angular 21 Frontend ✅
- Windows-like desktop shell, taskbar, start menu, window manager
- Built-in apps: File Explorer, Terminal, Text Editor, Settings, Browser
- Session picker, API/WebSocket services, dark theme
Phase 3: VM Agent Executable ✅
- Screen capture (scrap), input injection (enigo), auto-reconnect
Phase 3.5: Low-Latency Video Pipeline ✅
Agent (H.264 + binary frames)
agent/src/protocol.rs— Binary frame format (13-byte header: type + timestamp + width + height + payload)agent/src/encoder.rs— H.264 encoder (openh264, optional feature), JPEG fallback, BGRA→I420 conversionagent/src/capture.rs— Raw BGRA output (encoding moved to encoder)agent/src/config.rs—--encoder h264|jpegflag, default 60fpsagent/src/main.rs— Binary WS frames for video, JSON text for control, capture+encode loopagent/Cargo.toml— openh264 optional dep, cfg_if, release optimizations (LTO, codegen-units=1)
Server (zero-copy binary relay)
server/src/state.rs— Binary FrameBuffer (Vec<Vec>), WsOutMessage enum (Binary|Text), broadcast_binary_frameserver/src/ws/handler.rs— Binary frames from agent → broadcast to viewers (zero-copy); text frames for JSON control; viewer catch-up with latest binary frame
Frontend (WebCodes H.264 + JPEG fallback)
- WebCodes VideoDecoder for H.264 GPU-accelerated decoding
- Binary WebSocket frame parsing (13-byte header)
- Annex-B NAL unit parsing, SPS/PPS extraction, AVCC description builder
- Automatic codec detection from SPS (profile/level guessing)
- JPEG fallback when H.264 unavailable
- HUD input forwarding unchanged (JSON text frames)
Latency Comparison
| Stage | Old (JPEG+JSON) | New (H.264+Binary) |
|---|---|---|
| Encode | 15-30ms (CPU) | 1-5ms (openh264) |
| Frame size | 200-500KB | 10-50KB |
| Network | 2-5ms | 0.5-1ms |
| Decode | 3-5ms | 1-2ms (GPU) |
| Total | 25-45ms | ~5-15ms |
Recent Commits
60b23bcfix: Uint8Array to Blob cast for TS compatibility63e4513frontend: WebCodes H.264 decoder, binary WS frames, AVCC description builder05cfe9eserver: binary frame relay (zero-copy), text JSON for control31a862bserver: binary FrameBuffer, WsOutMessage enum081cb0dagent: Cargo.toml v0.2.0 — openh264 optional feature86f0e4eagent: main.rs — binary WS frames, encoder pipelineb7c254aagent: encoder.rs — H.264 + JPEG encoder abstractioncf617d0agent: capture.rs — raw BGRA outputb690b07agent: config.rs — --encoder h264|jpeg flaga97ebedagent: protocol.rs — binary video frame format1468097docs: Phase 3 VM Agent completee1e6442agent: input.rs — full remote control