Video Use

πŸš€ Description

browser-use/video-use β€” edit videos with Claude Code (or Codex, Hermes, Openclaw). Drop raw takes in a folder, describe the edit in chat, get final.mp4 next door. 100% open source. Works for any content β€” talking heads, montages, tutorials, travel.

Same β€œstructured surface, not pixel dump” philosophy as Browser Use. The LLM never watches the video β€” it reads it through a packed transcript + on-demand visual filmstrips.

🧩 Features

  • Cuts fillers β€” umm, uh, false starts, dead space between takes
  • Auto color grades β€” warm cinematic, neutral punch, or custom ffmpeg chain
  • 30 ms audio fades at every cut
  • Burns subtitles β€” 2-word UPPERCASE chunks by default, fully customizable
  • Generates overlays via HyperFrames, Remotion, Manim, or PIL β€” parallel sub-agents
  • Self-evaluates the rendered output at every cut boundary before showing you
  • Persists session memory in project.md so next week’s session resumes

🎨 How it works (the key insight)

Naive: 30,000 frames Γ— 1,500 tokens = 45M tokens of noise. Video Use: 12KB text + a handful of PNGs.

  • Layer 1 β€” audio transcript (always loaded). ElevenLabs Scribe gives word-level timestamps, speaker diarization, audio events. All takes pack into takes_packed.md (~12KB).
  • Layer 2 β€” visual composite (on demand). timeline_view renders a filmstrip + waveform + word-label PNG only at decision points.
  • Pipeline: Transcribe β†’ Pack β†’ LLM reasons β†’ EDL β†’ Render β†’ Self-eval (max 3 retries).
  • 12 hard rules for production correctness, artistic freedom elsewhere.

Reasoning for

Talking-head edits, montages, travel cuts. Anything where speech boundaries drive cuts. Beats hand-editing in DaVinci/Premiere when the cuts are dictated by speech rather than visuals. Needs ffmpeg + ElevenLabs API key.

Alternatives considered

  • HyperFrames β€” for synthesized video from HTML; video-use composes them in
  • Remotion β€” React-based, source-available license; HyperFrames offers Apache 2.0
  • Manual NLE β€” better taste control, no automation
  • Browser Use β€” sibling project, same β€œstructured surface” philosophy
  • Browser Harness β€” sibling CDP harness, self-healing pattern
  • HyperFrames β€” overlay/animation engine integrated via parallel sub-agents
  • Claude Code β€” the primary host agent

Template: tool