WEBTASKBENCH / v1.0 / APR 2026
How Well Does Your
Agent Read the Web?
An open benchmark measuring token efficiency across 44 real websites. Fewer tokens means faster agents, lower costs, and more context for reasoning.
0
Sites Benchmarked
0x
Avg Compression
0M
Tokens Saved
0/44
SOM Wins
Top Compression Leaders
Teal = SOM tokens · Dark = raw HTML tokens · Lower SOM is better
Full Benchmark Results
All 44 sites · Click headers to sort · Search to filter
>
| SITE | HTML TOKENS | SOM TOKENS | COMPRESSION ▼ |
|---|---|---|---|
| cloud.google.com | 759K | 6K | 117.9x |
| nytimes.com | 532K | 5K | 110.3x |
| linear.app | 885K | 11K | 81.5x |
| stripe.com/docs | 346K | 6K | 54.3x |
| figma.com | 537K | 11K | 49.0x |
| tailwindcss.com | 418K | 9K | 47.3x |
| nodejs.org | 188K | 5K | 39.1x |
| vercel.com | 376K | 11K | 33.4x |
| wired.com | 455K | 18K | 25.6x |
| typescriptlang.org | 103K | 4K | 23.9x |
| mongodb.com | 300K | 13K | 23.8x |
| aws.amazon.com | 106K | 5K | 21.9x |
| nextjs.org | 123K | 6K | 21.3x |
| theguardian.com | 459K | 26K | 17.4x |
| bbc.com/news | 131K | 10K | 13.2x |
| azure.microsoft.com | 164K | 14K | 11.4x |
| react.dev | 107K | 10K | 11.1x |
| arstechnica.com | 149K | 15K | 9.7x |
| docker.com | 115K | 12K | 9.2x |
| ycombinator.com | 96K | 11K | 8.8x |
| github.com/plasmate-labs/plasmate | 157K | 18K | 8.5x |
| notion.so | 75K | 9K | 7.9x |
| angular.dev | 32K | 4K | 7.4x |
| en.wikipedia.org/wiki/Rust_(programming_language) | 189K | 27K | 7.0x |
| httpbin.org | 3K | 686 | 4.3x |
| docs.github.com | 29K | 7K | 4.1x |
| vuejs.org | 34K | 9K | 3.9x |
| getbootstrap.com | 29K | 10K | 3.0x |
| medium.com | 4K | 1K | 2.9x |
| kubernetes.io/docs | 123K | 48K | 2.5x |
| svelte.dev | 38K | 18K | 2.1x |
| lobste.rs | 19K | 9K | 2.1x |
| developer.mozilla.org/en-US/docs/Web/JavaScript | 52K | 27K | 1.9x |
| npmjs.com | 4K | 3K | 1.3x |
| docs.rs | 5K | 4K | 1.2x |
| rust-lang.org | 5K | 5K | 1.0x |
| pypi.org | 6K | 6K | 0.9x |
| news.ycombinator.com | 12K | 14K | 0.8x |
| jsonplaceholder.typicode.com | 2K | 3K | 0.7x |
| python.org | 9K | 14K | 0.6x |
| postgresql.org | 6K | 9K | 0.6x |
| example.com | 152 | 331 | 0.4x |
| crates.io | 71 | 372 | 0.1x |
| producthunt.com | 2K | 29K | 0.0x |
Category Breakdown
Average compression ratio by site category
SaaS & Cloud
~47x
cloud.google.com, linear.app, figma.com, vercel.com, stripe.com, tailwindcss.com
News & Media
~41x
nytimes.com, wired.com, bbc.com, guardian.com
Dev Tools & Docs
~15x
nodejs.org, typescriptlang.org, react.dev, nextjs.org
Static & Minimal
~0.7x
example.com, crates.io, pypi.org, Hacker News
Cost at Scale
Estimate daily savings switching from raw HTML to SOM
100100K
HTML Daily Cost
$99.54
33,181,000 tokens
SOM Daily Cost
$5.69
1,895,000 tokens
You save
$93.86/day
31,286,000 tokens saved @ $3/MTok
Methodology
| Plasmate version | 0.3.0 |
| HTML baseline | curl -sL (raw HTTP, no rendering) |
| Token counter | tiktoken cl100k_base (GPT-4 tokenizer) |
| Date | April 1, 2026 |
| Platform | Linux x86_64 |
| Sites | 51 attempted, 44 successful, 7 failed (anti-bot) |
| Source | github.com/plasmate-labs/plasmate-benchmarks |
Observatory Vision
This benchmark will be re-run weekly. Track how the web is changing for AI agents. Which sites are improving their agent-friendliness. Which are getting worse.
Contribute
Add your own sites to the benchmark:
git clone https://github.com/plasmate-labs/plasmate-benchmarks
# Add your URL to urls.txt
# Run: ./run-benchmark.sh
# Submit a PR with your resultsBadges & Certifications
For SOM compliance scoring, badges, and certifications, see somordom.com — the community's SOM compliance tool.