Cookie scanner
How the Playwright crawler works and how to classify what it finds.
Crawl strategy
The scanner launches headless Chromium with our user-agent, visits the homepage withwaitUntil: networkidle, and records every cookie, third-party script and storage key. Optionally it pulls the sitemap and visits up to N additional URLs.
Classification
Each cookie name is matched against a regex database covering GA, GA4, Google Ads, Meta Pixel, TikTok, LinkedIn Insight, Microsoft UET, Hotjar, Clarity, Shopify, WooCommerce, Stripe, PayPal, Klaviyo and Mailchimp. Unknowns get confidence 0 and surface on the My cookies page for manual review.
Scheduled scans
For self-hosted setups, run the CLI from a cron job:
pnpm --filter @ach/scanner scan <websiteId> --max=10 --sitemap