Fast, lightweight browser automation for LLM agents.

Launch a headless browser bound to a local port, then drive it over a plain HTTP API — with an interactive Swagger UI console at the root.

Go + CDP Single static binary Text-first No Playwright No MCP
Get started View on GitHub

What it is

webrudder <url> starts a local daemon: it launches a headless Chromium at that URL and serves it on a localhost port. Agents and scripts drive it through an HTTP API; visiting the root serves Swagger UI — an interactive list of every endpoint with a try-it-out console. Navigation happens by interacting (clicking links and buttons); the daemon is a state machine that tracks the current URL, DOM, and element map. Close the terminal and the browser dies with it.

No bundled browser bloat. No per-step screenshots. No MCP layer. One static binary talking to Chromium over CDP.

Why it's fast

Driving a browser through an LLM is slow when every step round-trips a screenshot: the model waits on inference, parses ~1.5k image tokens, then acts — and repeats. The engine speed was never the bottleneck; the protocol is. webrudder cuts the loop two ways.

Text-first state

GET /scan returns a compact list of actionable elements (e1 button "Login"). The model acts by ref-id with no vision and ~50 tokens instead of ~1500.

Batchable

Many actions in one request (POST /batch) collapse N round-trips into one — a whole form fills and submits in a single call.

How it works

Terminal:  ./webrudder https://example.com
                 ↓
   Daemon launches headless Chromium (over CDP)
   and serves http://localhost:10000
                 ↓
   Humans → open localhost:10000 → Swagger UI (endpoint list + try-it-out)
   Agents → GET /scan · POST /click · GET /read ...
                 ↓
   Daemon = state machine: current URL + DOM + element map.
   Clicking navigates; re-scan for the new page's elements.
                 ↓
   Close terminal → daemon + Chromium die cleanly

The URL passed at launch is just the entry point — after that you move around by interacting; you never re-feed URLs to navigate.

Two surfaces, one browser

SurfaceURLForWhat it does
Swagger UI localhost:10000/ humans interactive endpoint list + try-it-out console
HTTP API localhost:10000/scan, /click, … agents / scripts programmatic control, JSON in and out

Both hit the same state machine — a try-it-out call and an agent's /click act on one live browser. Swagger UI is generated from the API's OpenAPI spec, the single source of truth for every endpoint.

Quickstart

Start it in one terminal:

$ ./webrudder https://example.com
webrudder · http://localhost:10000 · chromium pid 4821 · ctrl-c to quit

Drive it via the API — curl, an agent, or a script:

$ curl localhost:10000/scan
{"elements":[{"ref":"e1","role":"link","name":"More information","href":"..."}]}

$ curl -X POST localhost:10000/click -d '{"ref":"e1"}'
{"ok":true,"navigated":true,"url":"https://www.iana.org/help/example-domains"}

Or open http://localhost:10000/ for Swagger UI — browse every endpoint and fire requests live.

API

Base URL: http://localhost:<port>

Method & PathBodyReturns
GET /scanactionable elements [{ref, role, name, kind?}]
GET /read{url, title, text}
GET /snapfull-page PNG; ?full=false for viewport only
GET /status{url, title, port}
POST /click{ref}{ok, navigated?, downloaded?, needs_file?}
POST /fill{ref, text}{ok}
POST /goto{url}{ok, url}
POST /upload{ref, file}clicks, intercepts the file chooser, injects the file
POST /download{ref, dir?}clicks, waits, returns the saved path
POST /batch{actions:[…]}many actions, one request
POST /shutdownstops the daemon and browser

Design principles

Browses, doesn't judge

It navigates, interacts, and extracts. It does not assert — the caller reads back text / url / elements and decides pass/fail.

Text-first

scan + read return cheap text; screenshots only on demand. No vision tokens per step.

Lightweight

No bundled browser, no driver layer — a thin CDP client and a single static binary. Chromium is fetched once, not embedded.

Stateful

A long-running daemon holds the live browser, so requests operate on evolving page state. Close the terminal and everything dies cleanly.