How to defend against Account Takeovers
Learn about account takeover threats, protection strategies, and detection methods to secure your digital accounts and prevent unauthorised access.
Support FAQ
Blocking AI crawlers means preventing or controlling automated requests from AI training crawlers, AI search crawlers, live retrieval bots, and agentic workflows. The right action is not always a hard block. Some AI traffic may help visibility or represent a user request; other traffic may copy content, overload origin, or ignore site rules.
The practical approach is to define policy by crawler type, then enforce it with multiple controls.
Start with the business decision:
| Traffic type | Typical action |
|---|---|
| Approved search crawlers | Verify and allow |
| AI training crawlers | Block unless approved |
| AI search crawlers | Allow, block, or rate-limit based on visibility goals |
| Live retrieval crawlers | Often allow on public pages with rate controls |
| Unknown AI-like traffic | Challenge or rate-limit until verified |
| Spoofed or evasive scrapers | Block |
This prevents accidental overblocking. For example, a publisher may block training crawlers but allow live retrieval crawlers that bring citations or user-driven discovery.
The robots.txt file is the most visible place to state crawler preferences. A simple block looks like this:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: CCBot
Disallow: /
This is useful, but it is not a security control. robots.txt is advisory. Well-behaved crawlers may follow it; malicious or evasive crawlers can ignore it.
Use robots.txt to express preference, then enforce important policies at the edge, WAF, reverse proxy, or application layer.
Server-side user-agent rules can stop simple crawlers. For example, an Nginx rule can reject known names:
if ($http_user_agent ~* "(GPTBot|ClaudeBot|anthropic-ai|PerplexityBot|CCBot|Bytespider)") {
return 403;
}
This is easy to deploy and easy to bypass. A crawler can change its user-agent string, so user-agent blocks should be treated as a baseline control rather than the final defense.
A stronger policy combines the user-agent with request evidence:
With that evidence, the response can be more precise:
| Evidence | Response |
|---|---|
| Verified crawler, low rate, approved route | Allow |
| Known training crawler, no approval | Block |
| AI crawler on expensive search/API routes | Rate-limit |
| Unknown crawler with browser inconsistencies | Challenge |
| Spoofed crawler using proxy rotation | Block |
| User-driven retrieval on public pages | Allow with monitoring |
This avoids blanket rules that block valuable traffic and miss spoofed traffic at the same time.
Rate limiting is useful when a crawler has some value but is requesting too much. Good rate limits can be route-aware:
Simple IP-based limits are not enough against residential proxy rotation. Better limits use session, fingerprint, route, and behavior signals as well as source IP.
AI crawlers often target high-value routes:
Apply stricter controls to routes where scraping creates commercial harm or origin cost. Some sites allow AI crawlers to see marketing pages while blocking product, price, and search routes.
AI crawler names, operators, and behavior change quickly. Review:
robots.txtBlocking AI crawlers is not a one-time rule. It is an ongoing governance and security process.
Peakhour's recommended model is:
robots.txt.For the detection workflow, see how to detect AI crawlers. For names to watch, see AI crawler user agents.
Learn about account takeover threats, protection strategies, and detection methods to secure your digital accounts and prevent unauthorised access.
An overview of Account Takeover Attacks
A practical reference for common AI crawler user agents, operators, purposes, and recommended Peakhour bot-management actions.
AI For Cybersecurity explains the concept in the context of AI security, with practical checks and mitigation considerations for site operators.
AI Image Generation explains the concept in the context of AI security, with practical checks and mitigation considerations for site operators.
AI Misuse explains the concept in the context of AI security, with practical checks and mitigation considerations for site operators.
© PEAKHOUR.IO PTY LTD 2025 ABN 76 619 930 826 All rights reserved.