Why this exists

The safest pi sessions run with the bash tool turned off — no shell, no curl, no wget, nothing to rm -rf. But the moment you do that, your agent loses the ability to read the web. No docs lookups. No changelog peeks. No “grab that URL the user pasted.”

pi-websearch-crawl4ai gives the web back, but through a narrow door: a handful of purpose-built tools that talk HTTP to a Crawl4AI server you run. No outbound shell. No arbitrary subprocesses. No third-party SaaS to leak your prompts to. Just fetch() to http://localhost:11235.

web tools exposed to the agent

runtime npm dependencies — uses global fetch

localhost

Crawl4AI runs on your box; nothing leaves your network

1 file

single-file extension, hot-loadable by pi

The six tools

Each tool is a thin, typed wrapper over a Crawl4AI REST endpoint, registered with pi so the model can call it directly. Every tool ships with promptSnippet and promptGuidelines so the agent knows to prefer web_fetch over bash + curl.

Tool	Endpoint	Purpose
`web_fetch`	`POST /md`	URL → clean Markdown. Filters: `fit` (readability), `raw`, `bm25`, `llm`. Supports `query`, `max_chars`.
`web_fetch_html`	`POST /html`	Sanitized, preprocessed HTML when you need DOM structure instead of prose.
`web_crawl`	`POST /crawl`	Batch-crawl many URLs with typed `BrowserConfig` / `CrawlerRunConfig` passed straight through.
`web_execute_js`	`POST /execute_js`	Run JS snippets on a page and read the return values. For stuff behind dynamic interactions.
`web_screenshot`	`POST /screenshot`	Full-page PNG screenshot, returned inline as an image so vision-capable models can look at it.
`web_ask`	`GET /ask`	Query Crawl4AI's own library docs — useful only when the model needs to configure `web_crawl` itself.

Plus three slash commands: /crawl4ai-status (pings /health), /crawl4ai-url <url>, and /crawl4ai-token <tok> — for switching servers or toggling JWT without a restart.

What you get

The small set of things that matter for an LLM's web IO.

LLM-grade Markdown

Crawl4AI's readability-style fit filter strips chrome, ads, and nav; what reaches the model is what a human would read — tokens well spent.

Query-aware fetching

filter="bm25" + a query lets the agent pull only the most relevant chunks of a long page, keeping context windows honest.

Batch crawling

web_crawl takes up to 100 URLs at once with a shared typed config, returning per-URL markdown, links, metadata, and status.

JS execution

web_execute_js runs arbitrary snippets against the live DOM (return document.title, hrefs, JSON state) for pages plain fetching can't crack.

IMG

Inline screenshots

web_screenshot returns PNG bytes as an ImageContent block — vision-capable models see the page exactly as a browser would render it.

Self-hosted by default

Crawl4AI runs in Docker on localhost:11235. Your URLs, prompts, and cookies never leave your network, unlike managed scraping APIs.

Bounded output

Every tool honors a max_chars / max_chars_per_result cap. No agent can nuke its context by fetching a 4 MB blog.

Hot config

Change the Crawl4AI endpoint or JWT on the fly with /crawl4ai-url / /crawl4ai-token — no reload, no restart.

Pure pi extension

Single-file TypeScript, zero runtime deps. Installs via pi install. No daemons, no extra services — just tools the agent can call.

Install

You need two things: a running Crawl4AI server, and the extension loaded into pi. Neither requires npm on your PATH; Docker runs the server, pi runs the extension.

1 · start crawl4ai

# one-liner Docker install; exposes http://localhost:11235
docker run -d \
  -p 11235:11235 \
  --name crawl4ai \
  --shm-size=1g \
  unclecode/crawl4ai:latest

# sanity check
curl http://localhost:11235/health

2 · install the extension

# from npm (recommended)
pi install npm:@codingcoffee/pi-websearch-crawl4ai

# or from git
pi install git:github.com/codingcoffee/pi-websearch-crawl4ai

# try without installing
pi -e npm:@codingcoffee/pi-websearch-crawl4ai

Then lock pi down to just the tools you trust, and add web_fetch / web_crawl to the allowlist — your agent can now read the web without ever seeing a shell:

example: bash-free pi session

pi --tools read,write,edit,web_fetch,web_crawl,web_screenshot

# inside pi
▶ /crawl4ai-status    # confirm the server is up
▶ "Read https://example.com and summarize it."

Configuration

Three sources, in priority order: CLI flag > environment variable > default. All three are per-session; nothing is written to disk.

Setting	Env var	CLI flag	Default
Base URL	`CRAWL4AI_BASE_URL`	`--crawl4ai-url <url>`	`http://localhost:11235`
Bearer token	`CRAWL4AI_TOKEN`	`--crawl4ai-token <tok>`	(none)

At runtime you can retarget the extension without restarting pi:

in pi

/crawl4ai-status                         # show current config + /health
/crawl4ai-url http://10.0.0.9:11235      # switch to a LAN server
/crawl4ai-token eyJhbGciOi…              # set JWT for authenticated servers
/crawl4ai-token                          # clear the token

Production note. Crawl4AI supports rate limiting, JWT auth, allow-lists, and a managed config.yml out of the box. Run it behind an internal reverse proxy, point the extension at it with CRAWL4AI_BASE_URL, and you get a single place to audit every URL your fleet of agents fetches. See the Crawl4AI Docker guide for the knobs.

Let your LLM crawl & see
the web.