Adam Cassar

Co-Founder

6 min read

Modern sophisticated bad bots often work around traditional security controls. They disrupt websites, mobile applications, and APIs. Malicious bot tactics include scraping user and pricing data, creating fake accounts, running advertising click fraud, exhausting online inventories, and taking websites offline with automated DDoS attacks.

About one-quarter of all website traffic in 2019 originated from bad bots, an increase of 18 percent over 2018. Advanced persistent bots (APBs) made up seventy-five percent of that bad bot traffic as they attempted to evade detection by cycling through random IP addresses, using anonymous/residential proxies, and changing their identities (user agent). The industries hit hardest by bad bots in 2019 included financial services, education, ecommerce, and government as well as media and airlines.

“Bot attack campaigns have become big business for threat actors, and major organizations are now fighting to support legitimate users and prospects while keeping attackers out of online applications and services,” says Paula Musich, Research Director, Enterprise Management Associates.

Bots have moved from simple scripts to distributed networks of automated agents that can mimic human interactions with machine learning techniques. They can avoid detection by network security technologies that have not kept pace with the way automated agents now operate.

Reducing the damage from bad bots means using security countermeasures that detect automated traffic and make attacks uneconomic, not just visible.

Bot Countermeasure Best Practices:

The following bad bot countermeasure practices cover network security, machine learning, and behavioural analysis. The aim is to reduce the economic harm that malicious bots inflict on businesses and end-users.

Web Application Firewalls

Web Application Firewalls (WAF) are a common first line of defence that filter out harmful Layer 7 web application (HTTP) traffic using rules or policies that protect organisations against Distributed Denial of Service (DDoS) bot attacks. WAFs also protect against cross-site forgery, cross-site-scripting (XSS), file inclusion, and SQL injection attacks. A WAF is considered a reverse proxy that protects servers and can be deployed as an appliance, server plug‑in, or filter, and customised by application type or use case. WAF rules can be updated or changed based on the type of bot attack.

IP Tracking and Reputation

Sophisticated bots can be detected with network forensics by inspecting web traffic and assessing whether requests come from actual users or bad bots. Requests can be analysed using data sources including Tor/proxy IPs, IP addresses, IP geo-location information, ISP information, and IP owners. Additional sources for real-time and near-time malicious IP threat data can come from network data, CERTs, MITRE and cooperating competitors.

Client/Device Fingerprinting

Fingerprinting attempts to identify devices, including PCs, Internet of Things (IoT) devices, mobile devices and servers, using data attributes that create real-time risk profiles to stop bot attacks. Using web page access data, a bot detection fingerprinting engine generates unique fingerprints for each end-user device and checks them against bad bots that use evasion techniques, including dynamic IP addresses and anonymous web proxies.

Machine Learning

Artificial Intelligence (AI) and machine learning algorithms are increasingly used to analyse malicious bot activity and make mitigation recommendations using data from sources such as user activity history, behavioural patterns and meta-data. Machine learning can use custom-tailored algorithms to target bots and iteratively process user data and identities to discern emerging bot attack patterns from very large amounts of real-time information.

Tarpitting

Tarpitting is a bot countermeasure that delays and slows down incoming malicious traffic from suspect connections. The technique is used to increase the financial and resource costs of bot attacks in an attempt to discourage malicious actors. Bad bot tar pits can delay bot request responses or take the bad bot IP address attack source offline completely. Innovative tarpitting techniques include requiring bad bots to solve computationally complex maths challenges to access resources or websites, thereby slowing down or stopping bot activity.

User Behavior Analysis

User interaction behaviour and identifying characteristics on a web page or mobile app differ from the behaviour of an automated malicious bot. Factors such as number of pages visited per session, time spent on each web page or within a mobile app and repeat visit frequency all help differentiate authentic users from bad bots. Defeating bad bots using Behavior Analysis involves creating a user model for individual sites with historical visitor data, then checking for anomalies that may indicate bad bot activity.

Intent-based Deep Behavior Analysis (IDBA)

Compared with Behavior Analysis, Intent-based Deep Behavior Analysis (IDBA) conducts behavioural analysis at the user intent level rather than the commonly used interaction-based behaviour analysis. IDBA consists of intent encoding, intent analysis, and adaptive learning. It also employs machine learning techniques to detect bad bots emulating on-site human behaviour interactions. Bad bot mitigation techniques include limiting attempts on login pages, web authentication pages and API call authentication pages.

Rate Limiting

Rate Limiting mitigates bad bots and DDoS attacks by restricting the amount of incoming traffic accepted by specific applications and API endpoints using pre-defined bandwidth limitation policies. Web applications, GET versus POST requests, APIs that receive queries, and login credentials can all be blocked if clients, IP addresses or IP and user-agent pairs violate Rate Limiting rules. Intellectual property scraping can also be protected by Rate Limiting policies that restrict repeated image or digital downloads.

Javascript Injection

JavaScript Injection techniques can help mitigate bad bot attacks in several ways. Scripts can be placed into web applications that “fingerprint” a user’s browser to distinguish humans versus bad bots emulating “human-like” mouse movements, keystrokes or clicks. Fingerprinting detection may also involve user agent identification, HTML5 canvas and audio fingerprinting, and protocol-level fingerprinting with TLS and HTTP2. JavaScript combined with browser cookies can also be used to identify anomalous behaviour from unwanted traffic or bad bots trending over time.

ANYCast DDoS Mitigation

Anycast is an IP addressing method that routes incoming traffic requests to the nearest location or “node.” Using ANYCast for selective routing enables network load resilience against DDoS attacks by routing high traffic across multiple servers and data centres. This prevents network resources from becoming overwhelmed with malicious or irrelevant traffic.

Alternative Content Serving

Serving Alternate and Cached Content when a bad bot is detected gives organisations a way to mislead bots without blocking them altogether. For instance, e-commerce sites may fool price scraping bots by serving alternative web pages that look like legitimate pages but with higher prices. Serving Cached Content when a bot is detected also minimises load on servers without affecting site performance.

Challenges

Requests from suspected bots can be redirected to Challenges or puzzles such as a CAPTCHA, also known as a Completely Automated Public Turing test, to help identify a bad bot versus a human. Online puzzles, such as letter matching, are easy for humans to solve but difficult for automated bots. reCAPTCHA, offered free from Google, is an advanced version of CAPTCHA puzzles that require users to identify text from real-world images such as street address signs, printed books or text from paper newspapers.

Final Thoughts

Bad bots hijack user accounts, create fake accounts, scrape websites for data and personal information, flood websites with traffic through automated distributed denial of service attacks and attack public-facing APIs using constantly changing techniques. They hide behind dynamic IP addresses, change their attack signatures, mimic human behaviours, and take over vast networks of hosts and IoT devices, creating zombie machines that distribute malware across the internet. Countermeasures ranging from Web Application Firewalls to sophisticated Machine Learning algorithms form an organisation's primary line of defence against bad bots.