All docs

Cookie scanner

How the Playwright crawler works and how to classify what it finds.

Crawl strategy

The scanner launches headless Chromium with our user-agent, visits the homepage withwaitUntil: networkidle, and records every cookie, third-party script and storage key. Optionally it pulls the sitemap and visits up to N additional URLs.

Classification

Each cookie name is matched against a regex database covering GA, GA4, Google Ads, Meta Pixel, TikTok, LinkedIn Insight, Microsoft UET, Hotjar, Clarity, Shopify, WooCommerce, Stripe, PayPal, Klaviyo and Mailchimp. Unknowns get confidence 0 and surface on the My cookies page for manual review.

Scheduled scans

For self-hosted setups, run the CLI from a cron job:

pnpm --filter @ach/scanner scan <websiteId> --max=10 --sitemap