<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Peakhour.IO - Anomaly Detection</title><link href="https://www.peakhour.io/" rel="alternate"></link><link href="https://www.peakhour.io/feeds/tag/anomaly-detection.atom.xml" rel="self"></link><id>https://www.peakhour.io/</id><updated>2023-11-10T00:00:00+11:00</updated><entry><title>Dive into CVSS Scores</title><link href="https://www.peakhour.io/blog/confluence-cvss-vectors/" rel="alternate"></link><published>2023-11-10T00:00:00+11:00</published><updated>2023-11-10T00:00:00+11:00</updated><author><name>AC</name></author><id>tag:www.peakhour.io,2023-11-10:/blog/confluence-cvss-vectors/</id><summary type="html">&lt;p&gt;Understand CVSS by examining the Atlassian CVE-2023-22515 and CVE-2023-22518.&lt;/p&gt;</summary><content type="html">&lt;h3&gt;Understanding CVSS through Atlassian Confluence Vulnerabilities&lt;/h3&gt;
&lt;p&gt;The Common Vulnerability Scoring System (CVSS) gives security teams a shared way to rate the severity of software vulnerabilities. It does not predict risk on its own; it describes the characteristics of a specific security flaw. CVSS uses three metric groups: Base, Temporal, and Environmental. The result is a score from 0 to 10, represented by a vector string that records the details behind the score.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Base Metrics&lt;/strong&gt; describe the inherent aspects of a vulnerability, including how it can be exploited and its potential system impact.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Temporal Metrics&lt;/strong&gt; change over time, reflecting current exploitability and available mitigations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Environmental Metrics&lt;/strong&gt; account for the specific environment where the vulnerability exists, tailoring the score to the affected organisation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;a href="https://nvd.nist.gov/vuln-metrics/cvss"&gt;National Vulnerability Database (NVD)&lt;/a&gt; utilises CVSS to assign base scores and provides tools for calculating Temporal and Environmental scores.&lt;/p&gt;
&lt;h4&gt;Atlassian Confluence Vulnerability Analysis&lt;/h4&gt;
&lt;p&gt;Two Atlassian Confluence vulnerabilities show why the vector matters as much as the headline score:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CVE-2023-22515&lt;/strong&gt; is a critical flaw with a base score of 10.0. It is exploitable remotely, with low complexity, no privilege requirements, and no need for user interaction. The attack vector is network-based, so exposure is not limited to local access. Its broad scope and impact across confidentiality, integrity, and availability make it a vulnerability that needs immediate attention.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CVE-2023-22518&lt;/strong&gt; shares many similarities with CVE-2023-22515, including a critical base score of 10.0. It can also be exploited remotely without privileges or user interaction, and with low complexity. Its impact on the system's confidentiality, integrity, and availability is high, allowing attackers to gain complete control and shut down the affected resources.&lt;/p&gt;
&lt;p&gt;Both CVE-2023-22515 and CVE-2023-22518 are critical vulnerabilities that demand urgent remediation. Understanding their CVSS vectors helps prioritise the security response and the mitigations needed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CVE-2023-22515&lt;/strong&gt; carries a CVSS score of 10 because it is remotely exploitable, easy to execute, and does not require privileges or user interaction.&lt;/p&gt;
&lt;h5&gt;CVSS Vector for CVE-2023-22515&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Base Score:&lt;/strong&gt; 10.0 (Critical)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vector:&lt;/strong&gt; CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This vector indicates:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Attack Vector (AV): Network (N)&lt;/strong&gt; - The vulnerability is remotely exploitable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Attack Complexity (AC): Low (L)&lt;/strong&gt; - It is easy to exploit without major obstacles.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Privileges Required (PR): None (N)&lt;/strong&gt; - No special access is needed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User Interaction (UI): None (N)&lt;/strong&gt; - It can be exploited without user involvement.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scope (S): Changed (C)&lt;/strong&gt; - The impact extends beyond the initial target.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Confidentiality, Integrity, Availability (C/I/A): High (H)&lt;/strong&gt; - There is a complete loss of confidentiality, integrity, and availability.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Atlassian's high CVSS score for CVE-2023-22515 reflects its critical nature and the need for immediate action.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CVE-2023-22518&lt;/strong&gt; has the same CVSS score of 10, with similar impact across confidentiality, integrity, and availability.&lt;/p&gt;
&lt;h5&gt;CVSS Vector for CVE-2023-22518&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Base Score:&lt;/strong&gt; 10.0 (Critical)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vector:&lt;/strong&gt; CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This vector means:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Attack Vector (AV): Network (N)&lt;/strong&gt; - Exploitable remotely.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Attack Complexity (AC): Low (L)&lt;/strong&gt; - Easy to exploit with minimal barriers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Privileges Required (PR): None (N)&lt;/strong&gt; - No user privileges required.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User Interaction (UI): None (N)&lt;/strong&gt; - No need for user action.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scope (S): Changed (C)&lt;/strong&gt; - Broad impact beyond the initial system.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Confidentiality, Integrity, Availability (C/I/A): High (H)&lt;/strong&gt; - Complete compromise of the system's security.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Understanding the CVSS scores for these vulnerabilities helps teams prioritise their security response. For a full breakdown and history of CVSS, see &lt;a href="https://en.wikipedia.org/wiki/Common_Vulnerability_Scoring_System"&gt;Wikipedia&lt;/a&gt;. More detailed information on CVSS can also be found in &lt;a href="https://www.first.org/cvss/"&gt;FIRST's official CVSS documentation&lt;/a&gt;.&lt;/p&gt;</content><category term="Interest"></category><category term="Threat Detection"></category><category term="DevSecOps"></category><category term="Application Security"></category><category term="Anomaly Detection"></category><category term="Credential Stuffing"></category><category term="Core Web Vitals"></category></entry><entry><title>A Risk Based Approach To Vulnerability Scoring</title><link href="https://www.peakhour.io/blog/epss-explained/" rel="alternate"></link><published>2023-11-10T00:00:00+11:00</published><updated>2023-11-10T00:00:00+11:00</updated><author><name>AC</name></author><id>tag:www.peakhour.io,2023-11-10:/blog/epss-explained/</id><summary type="html">&lt;p&gt;An in-depth exploration of EPSS, its data-driven approach to assessing cybersecurity threats, and how it complements CVSS.&lt;/p&gt;</summary><content type="html">&lt;p&gt;The Exploit Prediction Scoring System (EPSS) estimates the likelihood that a published CVE will be exploited in the wild. Its value is that it brings several signals into one risk score, instead of treating every vulnerability with the same CVSS severity as equally urgent. The main inputs are:&lt;/p&gt;
&lt;h3&gt;Data Sources of EPSS&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;MITRE’s CVE List&lt;/strong&gt;: EPSS scores only vulnerabilities that are "published" on this list.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Text-based “Tags”&lt;/strong&gt;: Extracted from CVE descriptions and related discussions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Publication Duration&lt;/strong&gt;: The time period since the CVE was published.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reference Count&lt;/strong&gt;: The number of references in the CVE entry.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Published Exploit Code&lt;/strong&gt;: Code from platforms such as Metasploit, ExploitDB, or GitHub.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security Scanners&lt;/strong&gt;: Data from security tools such as Jaeles and Nuclei.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CVSS v3 Vectors&lt;/strong&gt;: Based on the base score in the National Vulnerability Database (NVD).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CPE (vendor) Information&lt;/strong&gt;: Details about the vendors of the products involved, also from NVD.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ground Truth Data&lt;/strong&gt;: Real-world exploitation data from sources such as AlienVault.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;EPSS Model and Tools&lt;/h3&gt;
&lt;p&gt;The current EPSS model, version 2022.01.01, uses 1,164 variables and is based on Gradient Boosting, a machine learning technique. For a visual and interactive view of EPSS scores, the &lt;a href="https://holisticinfosec.shinyapps.io/epsscall/"&gt;EPSScall&lt;/a&gt; tool is useful. It provides historical data and graphs that make score movement easier to inspect.&lt;/p&gt;
&lt;h3&gt;The Drivers of EPSS Scores&lt;/h3&gt;
&lt;p&gt;To understand EPSS, it helps to look at which inputs carry the most weight. The variable importance graph shows the strongest contributors to the EPSS score.&lt;/p&gt;
&lt;p&gt;&lt;img alt="EPSS Variable Importance Graph" src="/static/images/blog/epss_variable_importance.png"&gt;&lt;/p&gt;
&lt;p&gt;Vendor data plays an outsized role in the scoring process. The graph shows how much weight each component has when estimating whether a vulnerability is likely to be exploited.&lt;/p&gt;
&lt;h2&gt;Why Does This Matter?&lt;/h2&gt;
&lt;p&gt;EPSS uses these data sources to predict exploit likelihood more directly than severity-only methods. By considering factors from the age of the CVE to real-world exploit instances, EPSS gives defenders a clearer view of which vulnerabilities are more likely to matter operationally. That makes patching and mitigation decisions easier to prioritise when resources are limited.&lt;/p&gt;
&lt;p&gt;Understanding the components of EPSS also makes the score easier to interpret. It is not a single severity metric; it is a blend of several data points, each with different weight. Tools like EPSScall make those inputs easier to inspect when tuning a vulnerability management process.&lt;/p&gt;
&lt;h2&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;EPSS is useful because it shifts vulnerability triage away from severity alone and towards exploit likelihood. Its use of multiple data sources and machine learning makes it a practical tool for defenders who need to decide what to fix first. Prioritising vulnerabilities this way does not replace judgement, but it gives teams a stronger starting point than CVSS alone.&lt;/p&gt;</content><category term="Interest"></category><category term="Threat Detection"></category><category term="Application Security"></category><category term="DevSecOps"></category><category term="Anomaly Detection"></category><category term="DDoS"></category><category term="Credential Stuffing"></category></entry><entry><title>When Bots Break Bad</title><link href="https://www.peakhour.io/blog/when-good-bots-break-bad/" rel="alternate"></link><published>2023-05-16T13:00:00+10:00</published><updated>2023-05-16T13:00:00+10:00</updated><author><name>Dan</name></author><id>tag:www.peakhour.io,2023-05-16:/blog/when-good-bots-break-bad/</id><summary type="html">&lt;p&gt;Even 'good' bots can end up abusing your site and impacting performance, learn why and how to stop it.&lt;/p&gt;</summary><content type="html">&lt;p&gt;Bots account for a large share of web traffic. Recent studies put automated traffic at nearly 50% of all internet
requests. Some bots are useful, such as search engine crawlers that index your site. Some are clearly harmful, such
as scrapers and sneaker bots. Others sit in a grey area, including backlink and marketing bots from services such as
Ahrefs and SEMrush. Even useful bots can create problems when they crawl too hard. This article looks at the main bot
types and how to manage them with robots.txt and &lt;a href="/learning/bots/bot-management/"&gt;bot management&lt;/a&gt; tools.&lt;/p&gt;
&lt;h2&gt;Understanding the Different Types of Bots&lt;/h2&gt;
&lt;h3&gt;'Good Bots'&lt;/h3&gt;
&lt;p&gt;Good bots perform legitimate work. Search engine crawlers like Googlebot and Bingbot index webpages so search results
can stay current and relevant. Other examples include uptime and performance monitoring bots.&lt;/p&gt;
&lt;h3&gt;'Bad Bots'&lt;/h3&gt;
&lt;p&gt;Bad bots harm websites, users, or both. Common examples include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Scraping content&lt;/strong&gt;, copying and repurposing data from websites.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sneaker bots&lt;/strong&gt;, automatically purchasing limited-edition products (like sneakers) before human users can.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Spam bots&lt;/strong&gt;, posting unsolicited messages and advertisements in comment sections or forums.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vulnerability Scanners&lt;/strong&gt;, trying thousands of website URLs to find security vulnerabilities.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Account Takeover&lt;/strong&gt;, attempting to gain access to existing user/admin
  accounts using either credential stuffing or brute-force
  attacks.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;'Grey Bots'&lt;/h3&gt;
&lt;p&gt;Grey bots sit between good and bad. They often serve a useful purpose and may follow crawling directives in robots.txt,
but they can still cause problems when they crawl too aggressively. Common examples include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AhrefsBot: A backlink analysis bot used by Ahrefs, an SEO tool.&lt;/li&gt;
&lt;li&gt;SEMrushBot: A bot used by SEMrush, another popular SEO and digital marketing tool.&lt;/li&gt;
&lt;li&gt;MJ12bot: A bot used by Majestic, a service that provides backlink data and analysis.&lt;/li&gt;
&lt;li&gt;ScreamingFrog: An SEO analyser run from a local desktop.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;When Grey bots (and even Good Bots) go bad.&lt;/h2&gt;
&lt;p&gt;Left unattended, grey bots can create practical problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Slow page loading times, which affect user experience.&lt;/li&gt;
&lt;li&gt;Strain on server resources, potentially causing crashes, downtime, and higher costs.&lt;/li&gt;
&lt;li&gt;Distorted website analytics, when bot traffic is mistaken for human traffic.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Managing Grey Bots with Robots.txt&lt;/h2&gt;
&lt;p&gt;The robots.txt file is a simple text file that tells web crawlers which parts of your site they can or cannot access.
You can use it to manage bot behaviour and protect &lt;a href="/learning/performance/how-to-pass-core-web-vitals/"&gt;your website&lt;/a&gt; from aggressive crawling. Useful controls
include:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Disallowing specific bots:&lt;/strong&gt; You can block specific bots from accessing your site by adding a "User-agent" and
"Disallow" directive to your robots.txt file. For example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;User-agent: AhrefsBot
Disallow: /
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Limiting crawl rate:&lt;/strong&gt; You can ask bots to slow down their crawling by adding a "Crawl-delay" directive:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;User-agent: SEMrushBot
Crawl-delay: 10
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Not all bots will follow robots.txt. ScreamingFrog, for example, can be instructed to ignore robots.txt and crawl a
site as quickly as possible. You would not want a competitor doing this to your site.&lt;/p&gt;
&lt;h2&gt;Bot Management Tools&lt;/h2&gt;
&lt;p&gt;In addition to robots.txt, bot management tools (like those provided by Peakhour) can protect your website from
abusive bots. Good bot management tools automatically block most unwanted traffic using a combination of
&lt;a href="/blog/ip-threat-intelligence/"&gt;Threat Intelligence&lt;/a&gt;, &lt;a href="/blog/tls-fingerprinting/"&gt;Fingerprinting techniques&lt;/a&gt;, Reverse DNS
verification, and Header Inspection.&lt;/p&gt;
&lt;p&gt;Advanced techniques like rate limiting and machine learning can help identify more sophisticated bad bots.&lt;/p&gt;
&lt;h2&gt;Search Bots and Double Crawling&lt;/h2&gt;
&lt;p&gt;Search bots like Bingbot can sometimes blindly follow links and crawl the same page multiple times due to different
URL parameters. This double, triple, or worse crawling can increase server load and make indexing less efficient.
eCommerce sites are especially exposed because product catalogues often have several filtering paths. We've seen Bing
go haywire on a number of sites. Most recently, it was issuing around 50,000 requests per day to the search function
of a Magento 2 store while cycling through parameters. This dropped to 2-3k requests per day when fixed. On another
store, Bing was responsible for nearly half of all page requests (40k page requests) on a busy OpenCart store.
Configuring it to ignore parameters dropped this to around 4k per day.&lt;/p&gt;
&lt;h3&gt;Configuring Search Bots to Ignore Query Parameters&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Note: Since publishing both Google and Bing have removed the ability to ignore parameters when crawling via their
webmaster/search console tools. See &lt;a href="/blog/how-to-exclude-query-string-parameters-from-search-engines-using-robots-txt/"&gt;using robots.txt to instruct search engines to ignore query string parameters&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To help search bots crawl your site efficiently, you can configure them to ignore specific query parameters. Use these
methods:&lt;/p&gt;
&lt;h4&gt;Configuring Bing Webmaster Tools&lt;/h4&gt;
&lt;p&gt;Bing Webmaster Tools provides an option to specify URL parameters that should be ignored during the crawling process.
To configure this setting, follow these steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Sign in to your Bing Webmaster Tools account and select the website you want to manage.&lt;/li&gt;
&lt;li&gt;Navigate to the "Configure My Site" section and click on "URL Parameters."&lt;/li&gt;
&lt;li&gt;Click on "Add Parameter" and enter the parameter name you want Bingbot to ignore.&lt;/li&gt;
&lt;li&gt;Select "Ignore this parameter" from the dropdown menu and click on "Save."&lt;/li&gt;
&lt;li&gt;Configuring Bing Webmaster Tools this way helps stop Bingbot double crawling pages with specific URL parameters, reducing server load and improving indexing efficiency.&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;Managing Other Search Bots&lt;/h4&gt;
&lt;p&gt;For other search engines like Google, use the relevant webmaster tools to manage URL parameters. In Google Search
Console, follow these steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Sign in to your Google Search Console account and select the property you want to manage.&lt;/li&gt;
&lt;li&gt;Navigate to the "Crawl" section and click on "URL Parameters."&lt;/li&gt;
&lt;li&gt;Click on "Add Parameter" and enter the parameter name you want Googlebot to ignore.&lt;/li&gt;
&lt;li&gt;Choose "No URLs" from the "Does this parameter change page content seen by the user?" dropdown menu.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Click on "Save."&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Specifying the parameters you want search bots to ignore can prevent double crawling and make indexing more efficient.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;When good or grey bots crawl too aggressively, they can cause the same operational problems as malicious bots:
overloaded servers, slower pages, and worse user experience. Monitor website traffic and server load, set clear
robots.txt rules, and use the major search engines' webmaster tools to control inefficient crawling. Done properly,
this improves website performance and can lower infrastructure costs.&lt;/p&gt;</content><category term="Bots"></category><category term="Bot Management"></category><category term="SEO"></category><category term="Residential Proxies"></category><category term="DNS"></category><category term="Web Performance"></category><category term="Anomaly Detection"></category></entry><entry><title>Advanced Anomaly Detection</title><link href="https://www.peakhour.io/blog/advanced-anomaly-detection-rrcf-application-security/" rel="alternate"></link><published>2023-05-15T13:00:00+10:00</published><updated>2023-05-15T13:00:00+10:00</updated><author><name>AC</name></author><id>tag:www.peakhour.io,2023-05-15:/blog/advanced-anomaly-detection-rrcf-application-security/</id><summary type="html">&lt;p&gt;Deep dive into Robust Random Cut Forest (RRCF) implementation for real-time anomaly detection in Application Security Platforms. Learn how advanced machine learning algorithms enhance threat detection and automated response capabilities.&lt;/p&gt;</summary><content type="html">&lt;p&gt;Modern Application Security Platforms need reliable &lt;a href="/learning/threat-detection/what-is-anomaly-detection/"&gt;anomaly detection&lt;/a&gt; to identify and respond to emerging threats in real-time. For DevOps, SRE, and DevSecOps teams, machine learning algorithms such as Robust Random Cut Forest (RRCF) provide the foundation for automated threat detection and response systems that can operate at the scale and speed contemporary applications require.&lt;/p&gt;
&lt;h2&gt;Strategic Importance of Anomaly Detection in Application Security&lt;/h2&gt;
&lt;p&gt;Real-time anomaly detection is a core Application Security Platform capability. It helps identify threats before attacks affect application performance or security posture:&lt;/p&gt;
&lt;h3&gt;Enterprise Threat Landscape&lt;/h3&gt;
&lt;p&gt;Modern applications face attack vectors that traditional signature-based detection cannot address:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Adaptive Bot Networks&lt;/strong&gt;: AI-powered bots that modify behaviour based on defensive responses&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Zero-Day Exploits&lt;/strong&gt;: Previously unknown attack patterns that bypass traditional security rules&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Volumetric Attacks&lt;/strong&gt;: DDoS attacks that scale dynamically to evade rate limiting&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Insider Threats&lt;/strong&gt;: Subtle anomalies in user behaviour that indicate account compromise&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Application Security Platform Requirements&lt;/h3&gt;
&lt;p&gt;Effective anomaly detection needs to integrate cleanly with broader security capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Processing&lt;/strong&gt;: Threat identification within milliseconds of detection&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalable Architecture&lt;/strong&gt;: Analysis of millions of requests without performance degradation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context Awareness&lt;/strong&gt;: Integration with application metadata and user behaviour profiles&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Response&lt;/strong&gt;: Immediate threat mitigation through dynamic rule deployment&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Advanced Machine Learning for Security&lt;/h2&gt;
&lt;p&gt;Robust Random Cut Forest provides anomaly detection capabilities designed for streaming data environments common in Application Security Platforms:&lt;/p&gt;
&lt;h3&gt;Algorithmic Advantages for Security Applications&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Streaming Data Processing&lt;/strong&gt;: Real-time analysis without historical data dependencies&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dimensionality Handling&lt;/strong&gt;: Effective analysis of high-dimensional security feature vectors&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adaptive Learning&lt;/strong&gt;: Continuous model updates based on evolving traffic patterns&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Computational Efficiency&lt;/strong&gt;: Linear scaling suitable for high-throughput security processing&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Implementation in Application Security Platforms&lt;/h3&gt;
&lt;p&gt;RRCF enables threat detection across multiple security dimensions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Traffic Pattern Analysis&lt;/strong&gt;: Identification of unusual request volumes, frequencies, and distributions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Behavioural Anomalies&lt;/strong&gt;: Detection of user actions that deviate from established profiles&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Network Fingerprinting&lt;/strong&gt;: Recognition of abnormal connection patterns and protocol usage&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Content Analysis&lt;/strong&gt;: Identification of malicious payloads and injection attempts&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;RRCF Advantages for Application Security Platforms&lt;/h2&gt;
&lt;p&gt;Traditional batch-processing anomaly detection systems are a poor fit for Application Security Platforms that must respond to threats in real-time. RRCF's streaming approach provides practical advantages:&lt;/p&gt;
&lt;h3&gt;Real-Time Threat Detection&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Immediate Analysis&lt;/strong&gt;: Process and analyse security events as they occur, without waiting for batch processing&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adaptive Baselines&lt;/strong&gt;: Continuously update normal behaviour models based on current traffic patterns&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory Efficiency&lt;/strong&gt;: Maintain configurable rolling windows of security data for optimal performance&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalable Processing&lt;/strong&gt;: Handle millions of security events per second without degradation&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Security-Optimised Implementation&lt;/h3&gt;
&lt;p&gt;RRCF's forest-based approach is useful for security applications:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Multi-Dimensional Analysis&lt;/strong&gt;: Analyse request patterns, user behaviour, and network characteristics at the same time&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shape-Sensitive Detection&lt;/strong&gt;: Identify subtle changes in attack patterns that signature-based systems miss&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;False Positive Reduction&lt;/strong&gt;: Leverage ensemble methods to reduce noise in security alerting&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Contextual Awareness&lt;/strong&gt;: Understand normal application behaviour patterns for more accurate threat detection&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Application Security Platform Integration&lt;/h2&gt;
&lt;h3&gt;Enterprise Deployment Architecture&lt;/h3&gt;
&lt;p&gt;Peakhour's Application &lt;a href="/solutions/use-case/prevent-account-takeovers/"&gt;Security Platform&lt;/a&gt; implements RRCF through high-performance Rust-based processing:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Edge Processing Capabilities&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Global Deployment&lt;/strong&gt;: RRCF analysis deployed across CDN edge locations for minimal latency&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Distributed Learning&lt;/strong&gt;: Aggregated threat intelligence from multiple geographic regions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Local Response&lt;/strong&gt;: Immediate threat mitigation at the edge without central processing delays&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bandwidth Optimisation&lt;/strong&gt;: Process security events locally to reduce data transmission requirements&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Platform Integration Benefits&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Unified Threat Detection&lt;/strong&gt;: RRCF analysis integrated with WAF/WAAP, bot management, and DDoS protection&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated Response&lt;/strong&gt;: Dynamic security rule generation based on anomaly detection results&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DevSecOps Workflow&lt;/strong&gt;: API-first architecture enabling integration with security automation tools&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compliance Reporting&lt;/strong&gt;: Detailed anomaly detection logs for security audits and regulatory requirements&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Advanced Security Use Cases&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Credential Stuffing Detection&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Behavioural Analysis&lt;/strong&gt;: Identify unusual login patterns that indicate automated credential testing&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Geographic Anomalies&lt;/strong&gt;: Detect impossible travel scenarios and location-based attack patterns&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Volume Analysis&lt;/strong&gt;: Recognise subtle increases in authentication attempts that indicate coordinated attacks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Success Rate Monitoring&lt;/strong&gt;: Identify campaigns through abnormal authentication success/failure ratios&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;API Threat Detection&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Endpoint Anomalies&lt;/strong&gt;: Detect unusual API usage patterns that indicate reconnaissance or exploitation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rate Pattern Analysis&lt;/strong&gt;: Identify sophisticated rate limiting evasion techniques&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Response Time Analysis&lt;/strong&gt;: Detect performance impacts from malicious API usage&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Authentication Anomalies&lt;/strong&gt;: Recognise token abuse and API key misuse patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Zero-Day Threat Identification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Traffic Pattern Deviations&lt;/strong&gt;: Identify new attack vectors through unusual request characteristics&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Response Pattern Analysis&lt;/strong&gt;: Detect exploitation attempts through server response anomalies&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Protocol Anomalies&lt;/strong&gt;: Recognise malformed requests that indicate exploit attempts&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Payload Analysis&lt;/strong&gt;: Identify suspicious content patterns in request bodies and parameters&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Operational Excellence Through Advanced Anomaly Detection&lt;/h2&gt;
&lt;h3&gt;Performance and Security Integration&lt;/h3&gt;
&lt;p&gt;RRCF implementation delivers measurable improvements across security and performance metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Threat Detection Speed&lt;/strong&gt;: Sub-millisecond anomaly identification for real-time response&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;False Positive Reduction&lt;/strong&gt;: Ensemble methods reduce security alert fatigue&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;System Performance&lt;/strong&gt;: Efficient processing maintains CDN performance whilst enhancing security&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adaptive Learning&lt;/strong&gt;: Continuous improvement in threat detection accuracy over time&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;DevSecOps Enablement&lt;/h3&gt;
&lt;p&gt;Modern Application Security Platforms provide APIs and automation capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Security Automation&lt;/strong&gt;: Programmatic access to anomaly detection results for automated response&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CI/CD Integration&lt;/strong&gt;: Security testing and validation integrated into development workflows&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitoring Integration&lt;/strong&gt;: SIEM and SOC platform integration for security operations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Custom Rule Development&lt;/strong&gt;: Framework for developing application-specific anomaly detection rules&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;Advanced anomaly detection through RRCF is a fundamental capability for modern Application Security Platforms. By implementing machine learning algorithms at the edge, organisations can achieve real-time threat detection that adapts to evolving attack patterns whilst maintaining application performance.&lt;/p&gt;
&lt;p&gt;The integration of RRCF with security capabilities including WAAP, bot management, and DDoS protection creates a unified platform that addresses the security requirements of contemporary applications and APIs. For DevSecOps teams, this approach enables automated &lt;a href="/learning/threat-detection/what-is-real-time-threat-response/"&gt;threat response&lt;/a&gt; whilst providing the visibility and control needed for effective security operations.&lt;/p&gt;</content><category term="Security"></category><category term="Threat Detection"></category><category term="Anomaly Detection"></category><category term="DDoS"></category><category term="DevSecOps"></category><category term="Bot Management"></category><category term="Application Security"></category></entry><entry><title>Double MAD?</title><link href="https://www.peakhour.io/blog/double-mad/" rel="alternate"></link><published>2023-05-15T13:00:00+10:00</published><updated>2023-05-15T13:00:00+10:00</updated><author><name>AC</name></author><id>tag:www.peakhour.io,2023-05-15:/blog/double-mad/</id><summary type="html">&lt;p&gt;This article explores the use of Double Median Absolute Deviation (Double MAD) for anomaly detection in time series data, particularly in skewed or non-symmetric distributions.&lt;/p&gt;</summary><content type="html">&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;This&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;explores&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Double&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Median&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Absolute&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Deviation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Double&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MAD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;anomaly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;detection&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;learning&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;threat&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;detection&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;what&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;is&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;anomaly&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;detection&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;series&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;particularly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;skewed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;non&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;symmetric&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;distributions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Double&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MAD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;which&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;calculates&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;two&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;median&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;absolute&lt;/span&gt;
&lt;span class="n"&gt;deviations&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;one&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;below&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;median&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;one&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;above&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;provides&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;more&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;nuanced&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;approach&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;than&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;traditional&lt;/span&gt;
&lt;span class="n"&gt;MAD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;allowing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;accurate&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;detection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;anomalies&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;even&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;skewed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;distributions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;We&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;also&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;delve&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;its&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;application&lt;/span&gt;
&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;identifying&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;slow&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;abuse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;like&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bots&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;catching&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;anomalies&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;However&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;important&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;note&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Double&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MAD&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="n"&gt;limitations&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;such&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;not&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;capturing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seasonal&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trends&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;over&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;comparison&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;also&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;drawn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Z&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;
&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;highlighting&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;that&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;between&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;two&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;depends&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;nature&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;provides&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;insights&lt;/span&gt;
&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;practical&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;implementation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Double&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MAD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;its&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;potential&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;improve&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;toolkit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Operational systems increasingly rely on time-series data for decisions. Anomaly detection is one practical use: by
identifying patterns that deviate from the norm, businesses can investigate potential issues early or understand
unexpected opportunities.&lt;/p&gt;
&lt;p&gt;One useful technique for anomaly detection is the Median Absolute Deviation (MAD) and, more specifically, its extension,
the Double MAD. This article explains where Double MAD fits in time-series anomaly detection and how it can help identify
anomalous clients.&lt;/p&gt;
&lt;h2&gt;Understanding MAD and Double MAD&lt;/h2&gt;
&lt;p&gt;MAD, a robust measure of variability, is less susceptible to outliers than standard deviation. It calculates the median
of absolute deviations from the data's median, often providing a better representation of 'normal' behaviour in
datasets with skewed distributions or outliers.&lt;/p&gt;
&lt;p&gt;Double MAD is an extension of MAD, where two MADs are calculated — one for the data below the median and another for the
data above. This split gives the detection process a better fit for asymmetric data, which is common in real-world time
series data.&lt;/p&gt;
&lt;h2&gt;Why Double MAD?&lt;/h2&gt;
&lt;p&gt;While MAD provides a robust way to understand the 'normal' range of a dataset, it assumes a symmetric distribution of
data around the median, which may not always hold true. Double MAD is useful where that assumption breaks down, offering
an improved anomaly detection process for skewed or asymmetric datasets.&lt;/p&gt;
&lt;p&gt;In time-series analysis, especially with 24-hour cycles like web traffic or server usage, patterns can exhibit
seasonality and trend components. These patterns can often be asymmetric, making Double MAD a valuable tool for
capturing the variability in different parts of the data.&lt;/p&gt;
&lt;h2&gt;Using Double MAD in Anomaly Detection&lt;/h2&gt;
&lt;p&gt;The Double MAD implementation provided uses Rust, a system programming language known for speed and memory safety. The
code calculates the lower and upper MAD values, along with their respective thresholds. Anomalies can then be detected by
comparing each data point to these thresholds.&lt;/p&gt;
&lt;p&gt;An anomaly is defined as a data point that deviates significantly from the expected range. If a data point falls below
the lower MAD threshold or above the upper one, it can be flagged as an anomaly. This approach is especially effective
when handling datasets with high variability or extreme values.&lt;/p&gt;
&lt;h2&gt;Double MAD for Anomalous Client Detection&lt;/h2&gt;
&lt;p&gt;Beyond time-series data, Double MAD can also be instrumental in identifying anomalous behaviour among clients. By
comparing each client's behaviour against the Double MAD of the time-series data, teams can pinpoint clients that deviate
from the norm.&lt;/p&gt;
&lt;p&gt;For instance, in the context of web service usage, an anomalous client might be one that is sending an unusually high or
low number of requests. By using Double MAD, you can flag such outliers and take appropriate action, such as
investigating potential misuse or reaching out to understand and address any issues they may be facing.&lt;/p&gt;
&lt;h2&gt;Detecting Lower-Range Anomalies: A Case of Slow Abuse&lt;/h2&gt;
&lt;p&gt;An interesting application of Double MAD is in detecting lower-range anomalies, a pattern often associated with slow
abuse such as bots or Distributed Denial of Service (DDoS) attacks. These abuses are characterised by an unusually low
frequency of activity that is consistent over a prolonged period. This consistent, low-level activity can fly under the
radar of typical anomaly detection systems.&lt;/p&gt;
&lt;p&gt;By setting a lower MAD threshold, Double MAD can effectively detect these lower-range anomalies, providing early warning
of slow abuse. Its ability to detect both high and low anomalies makes Double MAD a flexible tool for anomaly detection.&lt;/p&gt;
&lt;h2&gt;The Math Behind Double MAD&lt;/h2&gt;
&lt;p&gt;To illustrate the power of Double MAD, let's consider a dataset from a right-skewed distribution. Applying the
conventional MAD approach might lead to false positives where normal data points are marked as outliers. This is because
MAD uses a symmetric interval around the median, which doesn't account for the skewed nature of our data.&lt;/p&gt;
&lt;p&gt;With Double MAD, we instead calculate two MADs — one for the data below the median (MAD-lower) and another for the data
above (MAD-upper). Outlier thresholds are then defined using these two MADs. The lower threshold is calculated as the
median minus a multiplier (k) times MAD-lower. The upper threshold is the median plus k times MAD-upper.&lt;/p&gt;
&lt;p&gt;This approach takes into account the asymmetric nature of our data, providing more accurate anomaly detection.
For example, in a right-skewed distribution, Double MAD would correctly identify only the extreme right tail values as
outliers without incorrectly flagging data points on the left tail.&lt;/p&gt;
&lt;h2&gt;Wrapping Up&lt;/h2&gt;
&lt;p&gt;Accurate anomaly detection matters when teams rely on time-series data to operate and investigate systems. The Double
MAD approach provides a robust method for this, allowing businesses to better understand their data, spot potential
issues early, and make more informed decisions.&lt;/p&gt;
&lt;p&gt;Whether you're monitoring web traffic, server usage, or client behaviour, leveraging Double MAD can offer valuable
insights and help ensure your operations continue to run smoothly. The ability to detect both high and low anomalies
makes it especially powerful, providing protection against potential threats like slow abuse.&lt;/p&gt;
&lt;p&gt;Understanding and implementing Double MAD gives your data analysis toolkit a more complete view of asymmetric data and
helps you detect potential anomalies earlier.&lt;/p&gt;</content><category term="Technical"></category><category term="Anomaly Detection"></category><category term="Threat Detection"></category><category term="Bot Management"></category><category term="Residential Proxies"></category><category term="DDoS"></category></entry><entry><title>Double MAD vs the Rest</title><link href="https://www.peakhour.io/blog/double-mad-vs-zscore/" rel="alternate"></link><published>2023-05-15T13:00:00+10:00</published><updated>2023-05-15T13:00:00+10:00</updated><author><name>AC</name></author><id>tag:www.peakhour.io,2023-05-15:/blog/double-mad-vs-zscore/</id><summary type="html">&lt;p&gt;A look at the limitations of Double MAD for anomaly detection, and a comparison with the Z-score method, to help you choose the right approach for your data.&lt;/p&gt;</summary><content type="html">&lt;h2&gt;Limitations of Double MAD and Comparison with Z-Score&lt;/h2&gt;
&lt;p&gt;Double MAD is useful for anomaly detection, but it has clear limits. One is that it does not account for the shape of
seasonal data. Time series data often show cyclical patterns by time of day, week, or year. For instance, web traffic to
an e-commerce site might spike during holidays and dip on off-peak days.&lt;/p&gt;
&lt;p&gt;Double MAD can capture shifts in the median of these data, but it does not consider the shape or pattern within these
cycles. It might therefore miss anomalies that occur within a specific season, or flag normal seasonal variations as
anomalies.&lt;/p&gt;
&lt;p&gt;Another limitation is that Double MAD does not account for trends over time. If your time series data shows a consistent
increase or decrease, &lt;a href="/blog/double-mad/"&gt;Double MAD&lt;/a&gt; might misinterpret this trend as a series of anomalies.&lt;/p&gt;
&lt;h3&gt;Double MAD vs. Z-Score&lt;/h3&gt;
&lt;p&gt;In anomaly detection, Double MAD is often compared with the more traditional Z-score method. A Z-score measures how many
standard deviations a data point is from the mean. It assumes that the data follows a Gaussian (or normal) distribution,
which often does not hold true for real-world data.&lt;/p&gt;
&lt;p&gt;Double MAD, on the other hand, is a non-parametric method that does not make assumptions about the distribution of data.
This makes it more robust to outliers and skewed distributions.&lt;/p&gt;
&lt;p&gt;However, Z-score has an advantage when data follows a Gaussian distribution, or when the data size is large enough for
the Central Limit Theorem to take effect. It also accounts for the mean and standard deviation, giving it an edge in
datasets where these measures are informative.&lt;/p&gt;
&lt;p&gt;In contrast, Double MAD is more robust for datasets with outliers or skewed distributions, as it uses the median and
absolute deviations from the median, which are less sensitive to extreme values.&lt;/p&gt;
&lt;p&gt;Both Double MAD and Z-score have strengths, and the choice between them should be guided by the nature of your data.
Understanding these nuances helps you choose the method that fits your specific use case.&lt;/p&gt;</content><category term="Technical"></category><category term="Anomaly Detection"></category></entry><entry><title>Scaling anomaly detection with RRCF</title><link href="https://www.peakhour.io/blog/rrcf-scaling/" rel="alternate"></link><published>2023-05-15T13:00:00+10:00</published><updated>2023-05-15T13:00:00+10:00</updated><author><name>AC</name></author><id>tag:www.peakhour.io,2023-05-15:/blog/rrcf-scaling/</id><summary type="html">&lt;p&gt;Discusses strategies for scaling the Robust Random Cut Forest (RRCF) algorithm for large-scale anomaly detection, including using summary statistics, buffering input, and parallelisation.&lt;/p&gt;</summary><content type="html">&lt;p&gt;As data volumes grow, the &lt;a href="/learning/threat-detection/what-is-anomaly-detection/"&gt;anomaly detection&lt;/a&gt; process has to scale with them. RRCF is
efficient, but large, high-dimensional datasets can still create performance challenges. The following strategies focus
on reducing dimensionality, smoothing bursts of input, and distributing independent work.&lt;/p&gt;
&lt;h2&gt;Compute Summary Statistics Instead of Shingling&lt;/h2&gt;
&lt;p&gt;Shingling transforms a single time series into a multivariate one by stacking lagged versions of the data. This can help
capture temporal dependencies, but it also increases the dimensionality of the points inserted into each tree, which can
slow the algorithm down.&lt;/p&gt;
&lt;p&gt;An alternative is to compute summary statistics that capture the types of anomalies you are looking for. For instance,
if you are detecting spikes, the data points could consist of second central differences. If you are looking for
long-term trends, the data points could consist of rolling means at different window sizes. This reduces the dimension
of the points inserted into each tree, improving performance.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Placeholder for Summary Statistics graph" src="#"&gt;&lt;/p&gt;
&lt;h2&gt;Buffer Input and Compute Rolling Summary Statistics&lt;/h2&gt;
&lt;p&gt;When data arrives too quickly to be inserted into the trees directly, buffer the input and compute rolling summary
statistics (mean, median, max, etc.). This reduces the number of points that need to be inserted into the trees and
helps the algorithm keep up with the streaming data.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Placeholder for Rolling Summary Statistics graph" src="#"&gt;&lt;/p&gt;
&lt;h2&gt;Parallelisation&lt;/h2&gt;
&lt;p&gt;RRCF can be parallelised, which is particularly useful when dealing with multiple independent time series. Different
RRCF instances can be run for each time series, using separate processes or server instances. This distributes the
computational load and can improve performance.&lt;/p&gt;
&lt;p&gt;For instance, if you have 10 independent time series, you can run 10 instances of RRCF in parallel, each focusing on one
time series. This scales the anomaly detection process to handle larger volumes of data.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Placeholder for Parallelization graph" src="#"&gt;&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Scaling RRCF for large datasets usually means reducing the work each tree has to do, controlling input volume, and
parallelising where the data allows it. Summary statistics, input buffering, and independent RRCF instances can help
manage high-dimensional data and high data velocities without changing the underlying anomaly detection goal.&lt;/p&gt;</content><category term="Technical"></category><category term="Anomaly Detection"></category><category term="Threat Detection"></category></entry><entry><title>Applied RRCF - thresholding techniques.</title><link href="https://www.peakhour.io/blog/rrcf-thresholding/" rel="alternate"></link><published>2023-05-15T13:00:00+10:00</published><updated>2023-05-15T13:00:00+10:00</updated><author><name>AC</name></author><id>tag:www.peakhour.io,2023-05-15:/blog/rrcf-thresholding/</id><summary type="html">&lt;p&gt;Explores various thresholding techniques like Median Absolute Deviation (MAD), Min/Max, and Z-Score for interpreting Robust Random Cut Forest (RRCF) anomaly scores, crucial for classifying data points as normal or anomalous.&lt;/p&gt;</summary><content type="html">&lt;p&gt;Once we've applied the RRCF algorithm to our streaming data, the resulting scores measure how anomalous each data point
is. To classify data points as "normal" or "anomalous", we still need to set a threshold. This defines the level of
deviation considered anomalous and controls how often anomalies are over-identified or missed.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Placeholder for RRCF score graph" src="#"&gt;&lt;/p&gt;
&lt;h2&gt;Why is Thresholding Needed?&lt;/h2&gt;
&lt;p&gt;Thresholding matters in anomaly detection because it separates normal and anomalous behaviour. Without a threshold, the
scores still indicate relative degrees of anomalousness, but they do not provide a clear dividing line between normal
points and anomalies.&lt;/p&gt;
&lt;p&gt;Set the threshold too low and normal data points may be misclassified as anomalies, increasing false positives. Set it
too high and actual anomalies may be missed, increasing false negatives.&lt;/p&gt;
&lt;h2&gt;How to Set the Threshold?&lt;/h2&gt;
&lt;p&gt;There are several ways to set a threshold for RRCF scores, including the Median Absolute Deviation (MAD), Min/Max, and
others. The right method depends on the characteristics of the data and the specific use case.&lt;/p&gt;
&lt;h3&gt;Median Absolute Deviation (MAD)&lt;/h3&gt;
&lt;p&gt;The Median Absolute Deviation is a robust measure of variability in a data set. For RRCF scores, MAD can be used to set
a threshold. A typical approach is to set the threshold as some multiple of the MAD above the median. This approach is
robust to outliers and can be useful when the data has heavy-tailed distributions.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Placeholder for MAD graph" src="#"&gt;&lt;/p&gt;
&lt;h3&gt;Min/Max&lt;/h3&gt;
&lt;p&gt;Another approach is to use the minimum and maximum RRCF scores to set the threshold. This could mean setting the
threshold as a percentage of the range between the minimum and maximum scores. The method is straightforward, but it can
be sensitive to extreme score values.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Placeholder for Min/Max graph" src="#"&gt;&lt;/p&gt;
&lt;h3&gt;Z-Score&lt;/h3&gt;
&lt;p&gt;Several other methods can be used to set the threshold, depending on the data. These include statistical techniques such
as setting the threshold based on standard deviations from the mean, using quartiles of the data, or using machine
learning techniques to dynamically adjust the threshold based on observed data.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Placeholder for Other Methods graph" src="#"&gt;&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Thresholding gives anomaly detection a clear boundary between normal and anomalous scores, which helps identify
potential issues such as cyber threats or system errors. The choice of thresholding method depends on the use case and
the characteristics of the data. Whatever method is used, the threshold needs to balance anomaly detection against the
risk of false positives and false negatives.&lt;/p&gt;</content><category term="Technical"></category><category term="Anomaly Detection"></category><category term="Threat Detection"></category></entry></feed>