projects/progress.md

4.6 KiB

Butterfly Desktop Environment — Progress

Overview

A remote desktop environment with a Rust (Actix) backend, Angular 21 frontend, and Rust VM agent. The system mimics a traditional Windows-like desktop in the browser, receiving display/audio from VM agents with minimal lag. Full remote control — viewers can move the mouse, click, type, and scroll on the remote machine in real time.

Low-latency pipeline: H.264 hardware-accelerated encoding (openh264), binary WebSocket frames (no JSON/base64 overhead), WebCodes GPU-accelerated decoding in the browser. Target: 5-15ms end-to-end latency on LAN for gaming.

Architecture

┌─────────────┐   Binary WS frames   ┌──────────────────┐   Binary WS frames   ┌─────────────┐
│  Angular 21 │◄────────────────────►│  Rust Actix Server│◄────────────────────►│ VM Agent exe │
│  (Browser)  │   H.264/JPEG data    │  (dumb pipe)      │   H.264/JPEG data    │  (Rust)      │
│  WebCodes   │   JSON text only ◄──►│  zero-copy relay  │   JSON text only ◄──►│  openh264    │
│  GPU decode │   (HUD, heartbeat)   │                    │   (HUD, heartbeat)   │  BGRA→YUV420 │
└─────────────┘                      └──────────────────┘                      └─────────────┘

Wire protocol:
  Binary WS frame = [1B type][4B timestamp][4B width][4B height][payload...]
  Text WS frame   = {"msg_type": "...", ...}  (JSON control messages)

Checklist

Phase 1: Rust Backend

  • Actix HTTP server, REST API, WebSocket handler, frame relay

Phase 2: Angular 21 Frontend

  • Windows-like desktop shell, taskbar, start menu, window manager
  • Built-in apps: File Explorer, Terminal, Text Editor, Settings, Browser
  • Session picker, API/WebSocket services, dark theme

Phase 3: VM Agent Executable

  • Screen capture (scrap), input injection (enigo), auto-reconnect

Phase 3.5: Low-Latency Video Pipeline

Agent (H.264 + binary frames)

  • agent/src/protocol.rs — Binary frame format (13-byte header: type + timestamp + width + height + payload)
  • agent/src/encoder.rs — H.264 encoder (openh264, optional feature), JPEG fallback, BGRA→I420 conversion
  • agent/src/capture.rs — Raw BGRA output (encoding moved to encoder)
  • agent/src/config.rs--encoder h264|jpeg flag, default 60fps
  • agent/src/main.rs — Binary WS frames for video, JSON text for control, capture+encode loop
  • agent/Cargo.toml — openh264 optional dep, cfg_if, release optimizations (LTO, codegen-units=1)

Server (zero-copy binary relay)

  • server/src/state.rs — Binary FrameBuffer (Vec<Vec>), WsOutMessage enum (Binary|Text), broadcast_binary_frame
  • server/src/ws/handler.rs — Binary frames from agent → broadcast to viewers (zero-copy); text frames for JSON control; viewer catch-up with latest binary frame

Frontend (WebCodes H.264 + JPEG fallback)

  • WebCodes VideoDecoder for H.264 GPU-accelerated decoding
  • Binary WebSocket frame parsing (13-byte header)
  • Annex-B NAL unit parsing, SPS/PPS extraction, AVCC description builder
  • Automatic codec detection from SPS (profile/level guessing)
  • JPEG fallback when H.264 unavailable
  • HUD input forwarding unchanged (JSON text frames)

Latency Comparison

Stage Old (JPEG+JSON) New (H.264+Binary)
Encode 15-30ms (CPU) 1-5ms (openh264)
Frame size 200-500KB 10-50KB
Network 2-5ms 0.5-1ms
Decode 3-5ms 1-2ms (GPU)
Total 25-45ms ~5-15ms

Recent Commits

  • 60b23bc fix: Uint8Array to Blob cast for TS compatibility
  • 63e4513 frontend: WebCodes H.264 decoder, binary WS frames, AVCC description builder
  • 05cfe9e server: binary frame relay (zero-copy), text JSON for control
  • 31a862b server: binary FrameBuffer, WsOutMessage enum
  • 081cb0d agent: Cargo.toml v0.2.0 — openh264 optional feature
  • 86f0e4e agent: main.rs — binary WS frames, encoder pipeline
  • b7c254a agent: encoder.rs — H.264 + JPEG encoder abstraction
  • cf617d0 agent: capture.rs — raw BGRA output
  • b690b07 agent: config.rs — --encoder h264|jpeg flag
  • a97ebed agent: protocol.rs — binary video frame format
  • 1468097 docs: Phase 3 VM Agent complete
  • e1e6442 agent: input.rs — full remote control