Adam Cassar

Co-Founder

2 min read

Limitations of Double MAD and Comparison with Z-Score

Double MAD is useful for anomaly detection, but it has clear limits. One is that it does not account for the shape of seasonal data. Time series data often show cyclical patterns by time of day, week, or year. For instance, web traffic to an e-commerce site might spike during holidays and dip on off-peak days.

Double MAD can capture shifts in the median of these data, but it does not consider the shape or pattern within these cycles. It might therefore miss anomalies that occur within a specific season, or flag normal seasonal variations as anomalies.

Another limitation is that Double MAD does not account for trends over time. If your time series data shows a consistent increase or decrease, Double MAD might misinterpret this trend as a series of anomalies.

Double MAD vs. Z-Score

In anomaly detection, Double MAD is often compared with the more traditional Z-score method. A Z-score measures how many standard deviations a data point is from the mean. It assumes that the data follows a Gaussian (or normal) distribution, which often does not hold true for real-world data.

Double MAD, on the other hand, is a non-parametric method that does not make assumptions about the distribution of data. This makes it more robust to outliers and skewed distributions.

However, Z-score has an advantage when data follows a Gaussian distribution, or when the data size is large enough for the Central Limit Theorem to take effect. It also accounts for the mean and standard deviation, giving it an edge in datasets where these measures are informative.

In contrast, Double MAD is more robust for datasets with outliers or skewed distributions, as it uses the median and absolute deviations from the median, which are less sensitive to extreme values.

Both Double MAD and Z-score have strengths, and the choice between them should be guided by the nature of your data. Understanding these nuances helps you choose the method that fits your specific use case.