About — One-Click Policy Deployment

How it works

The short version and the long version.

The short version

You paste a HuggingFace URL. You get a video of a robot moving. That's it.

No setup. No code. No environment configuration. You don't need to know what framework it uses, what dependencies it needs, or how to load the weights. The system figures all of that out.

The long version

When you click Deploy, six things happen in sequence:

Fetch

The backend pulls the repo metadata from HuggingFace — the README, file list, tags, framework info. This tells us what we're dealing with.

Analyze

Claude reads everything and determines: what RL framework was used (Stable Baselines 3, CleanRL, custom PyTorch), what MuJoCo environment it targets (Humanoid, Hopper, Ant), and what pip packages are needed. Then it writes a self-contained Python script that downloads the weights, creates the environment, runs inference, and captures video frames.

Install

The script installs its own dependencies at runtime. Every execution is self-contained. If the policy needs dm_control, it installs dm_control. If it needs a package we've never seen before, it installs that too. The Docker image provides Python, MuJoCo, and ffmpeg. Everything else is self-provisioned.

Run

The GPU worker executes the script. The policy loads, the MuJoCo environment spins up, and the trained neural network controls the robot for 10 seconds. Every frame is captured as an RGB image.

Render

The frames are piped through ffmpeg into an H.264 MP4. Ultrafast encoding, typically under 2 seconds for 300 frames.

Serve

The video URL is returned. You watch a robot walk, hop, run, swim, or fall over — depending on how well the policy was trained.

When it fails

Claude's generated script won't always work on the first try. Wrong environment version, missing dependency, incompatible checkpoint format.

When the script fails, the system captures stderr and sends it back to Claude: “Here's what you wrote. Here's the error. Fix it.” Claude reads the traceback, diagnoses the issue, and rewrites the script. Up to two retries.

In our tests, this three-layer approach — initial generation, auto-retry, self-provisioning deps — handles about 80% of arbitrary HuggingFace RL policies without any human intervention.

What it doesn't do

—It doesn't train policies. It deploys pre-trained ones.
—It doesn't work with language models, image models, or datasets. Only RL policies with MuJoCo-compatible environments.
—It doesn't guarantee every policy will work. Custom checkpoint formats, non-standard frameworks, and private repos may fail.
—It doesn't deploy to physical robots. Simulation only, for now.

Architecture

Two Docker containers on one A100 GPU machine. The backend handles Claude API calls and job orchestration. The GPU worker handles policy execution and video rendering. A small frontend VM runs Next.js behind Cloudflare.

User  →  Cloudflare  →  Frontend VM (nginx + Next.js)
                              ↓
                         A100 GPU VM
                         ├── Backend  (:8200)  Claude API + job queue
                         └── GPU Worker (:8300)  SB3 + MuJoCo + FFmpeg

Test suite

We maintain a test set of 100 HuggingFace RL policies across four frameworks: Stable Baselines 3, CleanRL, Sample Factory, and custom PyTorch. The automated test runner deploys each one end-to-end — same flow as clicking Deploy in the browser — and reports success rates by framework.

Each test produces a real video. Not a mock, not a dry run. A robot moving in a physics simulation, rendered to MP4.

Key insight

Every script installs its own dependencies at runtime. New framework = Claude adds it to the package list = just works. The Docker image only needs Python, MuJoCo system libs, and ffmpeg. Everything else is self-provisioned. We never rebuild the infrastructure to support a new type of policy.

Try it now

Paste any HuggingFace RL policy URL. Or pick from the catalog.