Adam Cassar

Co-Founder

4 min read

The word "bot" is often used as shorthand for unwanted automation: scripts trying to break into accounts, scrape content, or overwhelm websites. A large share of internet traffic does come from bad bots, but automated traffic is not automatically harmful. Some bots are part of how the web is discovered, monitored, and kept usable.

Effective bot management is not about blocking every automated request. It depends on accurate classification: separating good bots from bad bots, and recognising the "grey" bots that sit between them. That classification lets you apply controls that reduce risk without cutting off traffic that helps your site operate.

Good Bots: The Essential Workers of the Web

Good bots are automated programs that perform useful or necessary tasks. They are usually clear about who they are and respect the rules you set in your robots.txt file. Blocking them can damage search visibility, monitoring, or other business workflows.

Examples of Good Bots:

  • Search Engine Crawlers: Bots like Googlebot and Bingbot are the best-known good bots. They crawl and index your website's content, which is how your pages appear in search engine results. Blocking them would make your site invisible on Google.
  • Performance Monitoring Bots: These bots are used by services to check your website's uptime and performance from different locations around the world, and to alert you if your site goes down.
  • Copyright Bots: These bots scan the web for plagiarised content, helping to protect your intellectual property.

Management Strategy: Good bots should be identified and allowed to access your site freely. Verification techniques, such as reverse DNS lookups, can be used to confirm that a bot claiming to be Googlebot is actually coming from Google.

Bad Bots: The Malicious Actors

Bad bots are designed for malicious activity. They are a major reason bot management exists as a security function. These bots are deceptive, often hiding their identity and purpose, and they can be responsible for a wide range of costly and damaging activity.

Examples of Bad Bots:

  • Credential Stuffers: These bots use stolen usernames and passwords to carry out account takeover attacks.
  • Content and Price Scrapers: These bots steal your valuable content, product listings, and pricing data, often for use by competitors.
  • Spam Bots: These bots flood comment sections, forums, and contact forms with unwanted ads or malicious links.
  • Denial of Service (DDoS) Bots: These bots are part of a botnet used to overwhelm a website with traffic, causing it to slow down or crash.
  • Inventory Hoarding Bots: Common in e-commerce, these bots automatically add limited-edition products to shopping carts to prevent legitimate customers from buying them, often for resale at a higher price (scalping).

Management Strategy: Bad bots need to be accurately identified and blocked as quickly as possible, ideally at the network edge before they consume your server resources.

Grey Bots: The Nuanced Category

Grey bots are not inherently malicious, but their behaviour can still cause problems. They often serve a legitimate purpose, but become an issue when they crawl too aggressively, consume excessive bandwidth or server resources, and slow the site down for real users.

Examples of Grey Bots:

  • Aggressive SEO Tools: Bots from marketing tools like Ahrefs, SEMrush, and Majestic crawl websites to gather data for backlink analysis and competitive research. They can be useful, but their crawling can also be heavy.
  • Partner and Aggregator Bots: These could be bots from partner companies or price comparison websites that need to access your data. The activity may be legitimate, but it still needs to be managed.
  • Feed Fetchers: Bots that collect data for news aggregators or other applications fall into this category.

Management Strategy: Grey bots require more than a simple allow or block rule. The best strategy is often to rate-limit or tarpit them.

  • Rate-Limiting: This allows the bot to continue accessing your site, but slows it to a manageable level so it does not overwhelm your servers.
  • Tarpitting: This intentionally slows the connection for a specific bot, increasing the cost and time required to crawl your site and discouraging overly aggressive behaviour.

By classifying incoming bot traffic and applying the right control for each category, organisations can block threats, manage resource consumption, and allow the useful automation the modern web depends on.