AI Crawler User Agents

What are AI crawler user agents?

An AI crawler user agent is the identifier an AI-related bot sends in the HTTP User-Agent header when it requests a page. Website owners use these names to report, allow, block, or rate-limit crawler traffic.

User-agent strings are helpful, but they are not proof of identity. Any client can send a fake header. Treat the user agent as a label to investigate, then verify it with source infrastructure, fingerprints, route behavior, and request cadence.

Common AI crawler and agent names

User agent or bot name	Operator	Common purpose	Peakhour guidance
`GPTBot`	OpenAI	Training crawler for model improvement	Block or require explicit approval unless training use is acceptable
`ChatGPT-User`	OpenAI	Live retrieval for user requests	Often allow on public pages with route and rate controls
`OAI-SearchBot`	OpenAI	AI search and index crawling	Decide based on AI search visibility goals; monitor route depth
`ChatGPT Operator`	OpenAI	Agentic browsing or delegated actions	Treat as agent traffic; verify intent and restrict sensitive workflows
`anthropic-ai`	Anthropic	AI crawler associated with Anthropic systems	Block or control as training/retrieval traffic depending on route
`ClaudeBot`	Anthropic	Claude-related crawling	Block, rate-limit, or allow only after policy review
`Claude-Web`	Anthropic	Claude-related web access	Treat as AI retrieval or crawler traffic; verify behavior
`PerplexityBot`	Perplexity AI	AI search and answer retrieval	Monitor closely; allow or rate-limit only when it creates value
`Google-Extended`	Google	Signal for Google AI training/use controls	Manage separately from Googlebot search crawling
`Google-CloudVertex`	Google	Google AI or cloud-related crawling	Review route access and source before allowlisting
`Applebot-Extended`	Apple	Apple AI data-use control signal	Manage separately from Applebot search/discovery traffic
`DuckAssistBot`	DuckDuckGo	AI-assisted answer or search feature crawling	Allow, block, or rate-limit based on visibility goals
`CCBot`	Common Crawl	Public web dataset collection used by many AI projects	Often block or rate-limit unless dataset inclusion is desired
`Bytespider`	ByteDance	Search and AI-adjacent crawling	Rate-limit or block if aggressive or low-value
`Meta-ExternalAgent`	Meta	Meta external agent or crawler traffic	Review as AI-adjacent traffic; enforce route policy
`Amazonbot`	Amazon	Search, assistant, or commerce-related crawling	Separate useful discovery from price/catalogue extraction
`MistralAI-User`	Mistral AI	AI assistant or user-driven retrieval	Treat like live retrieval; allow only on approved public routes
`LinerBot`	LINER	AI or content-assistant crawling	Monitor and control by route and cadence
`QualifiedBot`	Qualified.com	AI or sales-assistant crawler traffic	Allow only if expected by the business
`ICC Crawler`	NICT	AI or research crawler traffic	Review value before allowing broad access

This list should be reviewed regularly. AI providers may introduce new crawler names, split training and retrieval into separate bots, or change how their systems fetch pages.

Which AI crawlers should be blocked?

There is no universal answer. The right policy depends on the value exchange.

Block or require approval when the crawler:

Collects content for model training without a commercial agreement
Ignores robots.txt
Requests expensive search, listing, API, or catalogue routes
Rotates through proxies to preserve crawl volume
Spoofs another crawler or a normal browser
Creates origin load or analytics distortion

Allow or rate-limit when the crawler:

Is verified and transparent
Supports search visibility or user-driven retrieval
Requests a small number of public pages
Follows robots.txt and crawl-rate expectations
Can be tied to business value

How should user-agent lists be used?

Use this list for reporting and first-pass policy, not as the only enforcement control.

A practical workflow is:

Detect known names in request logs.
Group traffic by route, source network, status code, and request rate.
Verify trusted crawler infrastructure where possible.
Compare claimed identity with TLS, HTTP/2, and browser fingerprints.
Apply route-aware allow, rate-limit, challenge, or block policies.
Review outcomes against origin load, analytics, SEO, and AI visibility goals.

For the detection process, see how to detect AI crawlers. For enforcement options, see how to block AI crawlers.

AI Crawler User Agents

What are AI crawler user agents?

Common AI crawler and agent names

Which AI crawlers should be blocked?

How should user-agent lists be used?

Related Articles

What is an Account-Control Surface?

How to defend against Account Takeovers

What is an Account Takeover?

AI For Cybersecurity

AI Image Generation

AI Misuse