Double MAD?

Mon 15 May 2023

Co-Founder

5 min read

This article explores the use of Double Median Absolute Deviation (Double MAD) for [anomaly detection](/learning/threat-detection/what-is-anomaly-detection/) in time series
data, particularly in skewed or non-symmetric distributions. Double MAD, which calculates two median absolute
deviations — one for data below the median and one for data above — provides a more nuanced approach than traditional
MAD, allowing for accurate detection of anomalies even in skewed data distributions. We also delve into its application
in identifying slow abuse, like bots, by catching lower range anomalies. However, it's important to note Double MAD's
limitations such as not capturing seasonal data shape and trends over time. A comparison is also drawn with the Z-score
method, highlighting that the choice between the two depends on the nature of your data. The article provides insights
into the practical implementation of Double MAD and its potential to improve your data analysis toolkit.

Operational systems increasingly rely on time-series data for decisions. Anomaly detection is one practical use: by identifying patterns that deviate from the norm, businesses can investigate potential issues early or understand unexpected opportunities.

One useful technique for anomaly detection is the Median Absolute Deviation (MAD) and, more specifically, its extension, the Double MAD. This article explains where Double MAD fits in time-series anomaly detection and how it can help identify anomalous clients.

Understanding MAD and Double MAD

MAD, a robust measure of variability, is less susceptible to outliers than standard deviation. It calculates the median of absolute deviations from the data's median, often providing a better representation of 'normal' behaviour in datasets with skewed distributions or outliers.

Double MAD is an extension of MAD, where two MADs are calculated — one for the data below the median and another for the data above. This split gives the detection process a better fit for asymmetric data, which is common in real-world time series data.

Why Double MAD?

While MAD provides a robust way to understand the 'normal' range of a dataset, it assumes a symmetric distribution of data around the median, which may not always hold true. Double MAD is useful where that assumption breaks down, offering an improved anomaly detection process for skewed or asymmetric datasets.

In time-series analysis, especially with 24-hour cycles like web traffic or server usage, patterns can exhibit seasonality and trend components. These patterns can often be asymmetric, making Double MAD a valuable tool for capturing the variability in different parts of the data.

Using Double MAD in Anomaly Detection

The Double MAD implementation provided uses Rust, a system programming language known for speed and memory safety. The code calculates the lower and upper MAD values, along with their respective thresholds. Anomalies can then be detected by comparing each data point to these thresholds.

An anomaly is defined as a data point that deviates significantly from the expected range. If a data point falls below the lower MAD threshold or above the upper one, it can be flagged as an anomaly. This approach is especially effective when handling datasets with high variability or extreme values.

Double MAD for Anomalous Client Detection

Beyond time-series data, Double MAD can also be instrumental in identifying anomalous behaviour among clients. By comparing each client's behaviour against the Double MAD of the time-series data, teams can pinpoint clients that deviate from the norm.

For instance, in the context of web service usage, an anomalous client might be one that is sending an unusually high or low number of requests. By using Double MAD, you can flag such outliers and take appropriate action, such as investigating potential misuse or reaching out to understand and address any issues they may be facing.

Detecting Lower-Range Anomalies: A Case of Slow Abuse

An interesting application of Double MAD is in detecting lower-range anomalies, a pattern often associated with slow abuse such as bots or Distributed Denial of Service (DDoS) attacks. These abuses are characterised by an unusually low frequency of activity that is consistent over a prolonged period. This consistent, low-level activity can fly under the radar of typical anomaly detection systems.

By setting a lower MAD threshold, Double MAD can effectively detect these lower-range anomalies, providing early warning of slow abuse. Its ability to detect both high and low anomalies makes Double MAD a flexible tool for anomaly detection.

The Math Behind Double MAD

To illustrate the power of Double MAD, let's consider a dataset from a right-skewed distribution. Applying the conventional MAD approach might lead to false positives where normal data points are marked as outliers. This is because MAD uses a symmetric interval around the median, which doesn't account for the skewed nature of our data.

With Double MAD, we instead calculate two MADs — one for the data below the median (MAD-lower) and another for the data above (MAD-upper). Outlier thresholds are then defined using these two MADs. The lower threshold is calculated as the median minus a multiplier (k) times MAD-lower. The upper threshold is the median plus k times MAD-upper.

This approach takes into account the asymmetric nature of our data, providing more accurate anomaly detection. For example, in a right-skewed distribution, Double MAD would correctly identify only the extreme right tail values as outliers without incorrectly flagging data points on the left tail.

Wrapping Up

Accurate anomaly detection matters when teams rely on time-series data to operate and investigate systems. The Double MAD approach provides a robust method for this, allowing businesses to better understand their data, spot potential issues early, and make more informed decisions.

Whether you're monitoring web traffic, server usage, or client behaviour, leveraging Double MAD can offer valuable insights and help ensure your operations continue to run smoothly. The ability to detect both high and low anomalies makes it especially powerful, providing protection against potential threats like slow abuse.

Understanding and implementing Double MAD gives your data analysis toolkit a more complete view of asymmetric data and helps you detect potential anomalies earlier.

#Anomaly Detection #Threat Detection #Bot Management #Residential Proxies #DDoS