///WebTaskBench
LIVE OBSERVATORY · UPDATED WEEKLY

How Well Does Your
Agent Read the Web?

An open benchmark measuring token efficiency across 44 real websites. Fewer tokens means faster agents, lower costs, and more context for reasoning.

v0.5.1 · 37 sites · Run May 18, 2026 · Δ avg +-0.1x vs 0.5.1
0x
Avg Compression
0x
Peak Compression
0
Sites Benchmarked

Top Compression Leaders

Teal = SOM tokens · Dark = raw HTML tokens · Lower SOM is better

Full Benchmark Results

All 44 sites · Click headers to sort · Search to filter

>
SITECATEGORYHTML TOKENSSOM TOKENSCOMPRESSION
cloud.google.comSaaS & Cloud862K6K133.7x
arstechnica.comNews & Media141K1K108.9x
kubernetes.io/docsDev Tools125K1K100.8x
techcrunch.comNews & Media139K1K97.5x
nytimes.comNews & Media438K5K97.3x
linear.appSaaS & Cloud918K11K84.3x
stripe.com/docsSaaS & Cloud356K7K53.3x
docker.comSaaS & Cloud127K3K48.7x
tailwindcss.comSaaS & Cloud396K9K46.2x
httpbin.orgGeneral3K7937.6x
nodejs.orgGeneral185K5K36.9x
wired.comNews & Media460K15K30.5x
vercel.comSaaS & Cloud348K12K30.0x
typescriptlang.orgDev Tools103K4K23.4x
nextjs.orgDev Tools123K6K21.3x
aws.amazon.comSaaS & Cloud108K5K20.1x
theguardian.comNews & Media443K27K16.3x
azure.microsoft.comNews & Media158K15K10.5x
github.com/plasmate-labs/plasmateDev Tools180K19K9.3x
angular.devDev Tools32K4K7.4x
en.wikipedia.org/wiki/Rust_(programming_language)General189K28K6.9x
vuejs.orgDev Tools34K9K3.9x
getbootstrap.comDev Tools29K10K3.0x
developer.mozilla.org/en-US/docs/Web/JavaScriptDev Tools53K22K2.4x
svelte.devDev Tools38K18K2.1x
lobste.rsGeneral18K9K1.9x
medium.comNews & Media3K1K1.8x
docs.rsDev Tools5K4K1.2x
rust-lang.orgDev Tools5K5K1.0x
pypi.orgDev Tools6K7K0.9x
news.ycombinator.comGeneral12K15K0.8x
jsonplaceholder.typicode.comGeneral2K3K0.8x
python.orgGeneral9K15K0.7x
postgresql.orgDev Tools6K9K0.7x
example.comGeneral1523310.5x
crates.ioGeneral703480.2x
producthunt.comGeneral3K26K0.1x

Category Breakdown

Average compression ratio by site category

SaaS & Cloud
~47x
cloud.google.com, linear.app, figma.com, vercel.com, stripe.com, tailwindcss.com
News & Media
~41x
nytimes.com, wired.com, bbc.com, guardian.com
Dev Tools & Docs
~15x
nodejs.org, typescriptlang.org, react.dev, nextjs.org
Static & Minimal
~0.7x
example.com, crates.io, pypi.org, Hacker News

Browse by Category

Deep-dive into vertical-specific benchmark data

Cost at Scale

Estimate daily savings switching from raw HTML to SOM

100100K
HTML Daily Cost
$99.54
33,181,000 tokens
SOM Daily Cost
$5.69
1,895,000 tokens
You save
$93.86/day
31,286,000 tokens saved @ $3/MTok

Methodology

Plasmate version0.5.1
HTML baselinecurl -sL (raw HTTP, no rendering)
Token countertiktoken cl100k_base (GPT-4 tokenizer)
DateMay 18, 2026
PlatformLinux x86_64
Sites37 attempted, 37 successful, 0 failed (anti-bot)
Sourcegithub.com/plasmate-labs/plasmate-benchmarks

SOM is defined by the open SOMspec specification.

Contribute

Add your own sites to the benchmark:

git clone https://github.com/plasmate-labs/plasmate-benchmarks
# Add your URL to urls.txt
# Run: ./run-benchmark.sh
# Submit a PR with your results

Source: github.com/plasmate-labs/plasmate-benchmarks

Observatory Vision

Re-run weekly against the latest Plasmate release. Watch the GitHub repo for update notifications. Track how the web is changing for AI agents. Which sites are improving their agent-friendliness. Which are getting worse. Results follow the WebTaskBench Protocol v1.0 — a reproducible methodology open to third-party submissions.

Badges & Certifications

For SOM compliance scoring, badges, and certifications, see somordom.com — the community's SOM compliance tool.