Server access logs are the most reliable, lowest-friction way to capture AI crawler activity if you have direct access to your server. Unlike GA4, which only records human sessions and misses all bot traffic, access logs record every request that reaches your web server — including every AI bot hit.
What you need
Apache and Nginx both write access logs in "combined" format by default. DarkTraffiK accepts either format directly. The fields we need are already present:
User-Agent— identifies the crawlerRequest path— identifies the page crawledTimestamp— when the crawl happenedReferrer— occasionally useful for tracing crawl origin
The AI crawler registry
Current User-Agent strings for the major AI crawlers:
GPTBot — Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot) OAI-SearchBot — OAI-SearchBot (search; +https://openai.com/searchbot) ClaudeBot — Mozilla/5.0 (compatible; ClaudeBot/1.0; +https://anthropic.com/en/claude-bot) PerplexityBot — Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://docs.perplexity.ai/docs/perplexitybot) Google-Extended — Mozilla/5.0 (compatible; Google-Extended) GrokBot — Mozilla/5.0 (compatible; Grok/1.0; +https://x.ai/grok) CCBot — CCBot/2.0 (https://commoncrawl.org/faq/) Bytespider — Bytespider (compatible; +https://zhanzhang.toutiao.com/)
Filtering access logs for AI crawlers
A simple shell or SQL filter against your log file does most of the work:
grep -iE "GPTBot|ClaudeBot|PerplexityBot|Google-Extended" /var/log/nginx/access.log
In DarkTraffiK you can drag and drop the log file directly — the parser handles deduplication and normalises the output.
What the data tells you
Once you have AI crawler data from server logs, three questions are immediately answerable:
- Which engines are crawling you? Some sites find that ClaudeBot is their most active AI crawler; others see more GPTBot. The mix affects which citation opportunities are most available.
- Which pages are they reading? High crawl volume on a specific page is a strong signal that the engine considers that content relevant to user queries in that topic area.
- How has crawl volume changed month over month? Increasing crawl volume on a topic often precedes increasing citations. Decreasing volume may mean the content has become stale relative to newer sources.
Server log data is one path to Signal 1 in the DarkTraffiK three-signal framework. Paired with citation checks (Signal 2) and GA4 referral data (Signal 3), it gives you the complete picture of your AI funnel — something that is not available from any single analytics tool alone.