Engineering

How We Built a Website Benchmark Tool with Playwright and GPT-4o

April 1, 2026 · 7 min read

BenchmarkHQ analyzes thousands of websites using AI. Here's the technical stack, from browser automation to API deployment — all running on a $5/month infrastructure.

The Pipeline

  1. Research — Custom feature checklist per industry (45 for e-commerce, 50 for SaaS, 151 for news)
  2. Crawl — Playwright (headless Chromium) visits each site, takes 3-6 screenshots
  3. Analyze — GPT-4o Vision examines screenshots against the checklist, marks each feature Y/N/P
  4. Aggregate — Frequency analysis classifies features as CRITICAL/REQUIRED/RECOMMENDED/OPTIONAL

Tech Stack

ComponentTechnologyCost
Browser automationPlaywright (Python)Free
AI analysisGPT-4o Vision~$0.01/site
API serverFastAPIFree
HostingRailway$5/month
Landing pageGitHub PagesFree

Why GPT-4o Vision Instead of HTML Parsing

We tested three approaches: HTML parsing (too fragile — every site has different markup), Lighthouse-style audits (can't detect product features like wishlists), and GPT-4o Vision (looks at the page like a human). Vision-based analysis correctly identified 90%+ of features from screenshots alone, regardless of underlying technology.

Everything is open source: github.com/abdur-rehman10/benchmarkhq

Try the API yourself

42 industries. No signup required. Free and open source.

Explore API Docs →
← All postsStar on GitHub →