Support FAQ

AI Crawler User Agents

Back to Bots

What are AI crawler user agents?

An AI crawler user agent is the identifier an AI-related bot sends in the HTTP User-Agent header when it requests a page. Website owners use these names to report, allow, block, or rate-limit crawler traffic.

User-agent strings are helpful, but they are not proof of identity. Any client can send a fake header. Treat the user agent as a label to investigate, then verify it with source infrastructure, fingerprints, route behavior, and request cadence.

Common AI crawler and agent names

User agent or bot name Operator Common purpose Peakhour guidance
GPTBot OpenAI Training crawler for model improvement Block or require explicit approval unless training use is acceptable
ChatGPT-User OpenAI Live retrieval for user requests Often allow on public pages with route and rate controls
OAI-SearchBot OpenAI AI search and index crawling Decide based on AI search visibility goals; monitor route depth
ChatGPT Operator OpenAI Agentic browsing or delegated actions Treat as agent traffic; verify intent and restrict sensitive workflows
anthropic-ai Anthropic AI crawler associated with Anthropic systems Block or control as training/retrieval traffic depending on route
ClaudeBot Anthropic Claude-related crawling Block, rate-limit, or allow only after policy review
Claude-Web Anthropic Claude-related web access Treat as AI retrieval or crawler traffic; verify behavior
PerplexityBot Perplexity AI AI search and answer retrieval Monitor closely; allow or rate-limit only when it creates value
Google-Extended Google Signal for Google AI training/use controls Manage separately from Googlebot search crawling
Google-CloudVertex Google Google AI or cloud-related crawling Review route access and source before allowlisting
Applebot-Extended Apple Apple AI data-use control signal Manage separately from Applebot search/discovery traffic
DuckAssistBot DuckDuckGo AI-assisted answer or search feature crawling Allow, block, or rate-limit based on visibility goals
CCBot Common Crawl Public web dataset collection used by many AI projects Often block or rate-limit unless dataset inclusion is desired
Bytespider ByteDance Search and AI-adjacent crawling Rate-limit or block if aggressive or low-value
Meta-ExternalAgent Meta Meta external agent or crawler traffic Review as AI-adjacent traffic; enforce route policy
Amazonbot Amazon Search, assistant, or commerce-related crawling Separate useful discovery from price/catalogue extraction
MistralAI-User Mistral AI AI assistant or user-driven retrieval Treat like live retrieval; allow only on approved public routes
LinerBot LINER AI or content-assistant crawling Monitor and control by route and cadence
QualifiedBot Qualified.com AI or sales-assistant crawler traffic Allow only if expected by the business
ICC Crawler NICT AI or research crawler traffic Review value before allowing broad access

This list should be reviewed regularly. AI providers may introduce new crawler names, split training and retrieval into separate bots, or change how their systems fetch pages.

Which AI crawlers should be blocked?

There is no universal answer. The right policy depends on the value exchange.

Block or require approval when the crawler:

  • Collects content for model training without a commercial agreement
  • Ignores robots.txt
  • Requests expensive search, listing, API, or catalogue routes
  • Rotates through proxies to preserve crawl volume
  • Spoofs another crawler or a normal browser
  • Creates origin load or analytics distortion

Allow or rate-limit when the crawler:

  • Is verified and transparent
  • Supports search visibility or user-driven retrieval
  • Requests a small number of public pages
  • Follows robots.txt and crawl-rate expectations
  • Can be tied to business value

How should user-agent lists be used?

Use this list for reporting and first-pass policy, not as the only enforcement control.

A practical workflow is:

  1. Detect known names in request logs.
  2. Group traffic by route, source network, status code, and request rate.
  3. Verify trusted crawler infrastructure where possible.
  4. Compare claimed identity with TLS, HTTP/2, and browser fingerprints.
  5. Apply route-aware allow, rate-limit, challenge, or block policies.
  6. Review outcomes against origin load, analytics, SEO, and AI visibility goals.

For the detection process, see how to detect AI crawlers. For enforcement options, see how to block AI crawlers.

Related Articles

AI For Cybersecurity

AI For Cybersecurity explains the concept in the context of AI security, with practical checks and mitigation considerations for site operators.

AI Image Generation

AI Image Generation explains the concept in the context of AI security, with practical checks and mitigation considerations for site operators.

AI Misuse

AI Misuse explains the concept in the context of AI security, with practical checks and mitigation considerations for site operators.

AI Vibe Coding

AI Vibe Coding explains the concept in the context of AI security, with practical checks and mitigation considerations for site operators.

© PEAKHOUR.IO PTY LTD 2025   ABN 76 619 930 826    All rights reserved.