Engineering
How We Built a Website Benchmark Tool with Playwright and GPT-4o
BenchmarkHQ analyzes thousands of websites using AI. Here's the technical stack, from browser automation to API deployment — all running on a $5/month infrastructure.
The Pipeline
- Research — Custom feature checklist per industry (45 for e-commerce, 50 for SaaS, 151 for news)
- Crawl — Playwright (headless Chromium) visits each site, takes 3-6 screenshots
- Analyze — GPT-4o Vision examines screenshots against the checklist, marks each feature Y/N/P
- Aggregate — Frequency analysis classifies features as CRITICAL/REQUIRED/RECOMMENDED/OPTIONAL
Tech Stack
| Component | Technology | Cost |
|---|---|---|
| Browser automation | Playwright (Python) | Free |
| AI analysis | GPT-4o Vision | ~$0.01/site |
| API server | FastAPI | Free |
| Hosting | Railway | $5/month |
| Landing page | GitHub Pages | Free |
Why GPT-4o Vision Instead of HTML Parsing
We tested three approaches: HTML parsing (too fragile — every site has different markup), Lighthouse-style audits (can't detect product features like wishlists), and GPT-4o Vision (looks at the page like a human). Vision-based analysis correctly identified 90%+ of features from screenshots alone, regardless of underlying technology.
Everything is open source: github.com/abdur-rehman10/benchmarkhq